[{"data":1,"prerenderedAt":93466},["ShallowReactive",2],{"active-banner":3,"navbar-featured-partner-blog":24,"blogs-listing":306,"navbar-pricing-featured":92940},{"id":4,"title":5,"date":6,"dismissible":7,"extension":8,"link":9,"link2":10,"linkText":11,"linkText2":12,"meta":13,"stem":21,"variant":22,"__hash__":23},"banners\u002Fbanners\u002Flakestream-ufk-launch.md","StreamNative Introduces Lakestream Architecture and Launches Native Kafka Service","2026-04-07",true,"md","\u002Fblog\u002Ffrom-streams-to-lakestreams","https:\u002F\u002Fconsole.streamnative.cloud\u002Fsignup?from=banner_lakestream-launch","Read Announcement","Sign Up Now",{"body":14},{"type":15,"value":16,"toc":17},"minimark",[],{"title":18,"searchDepth":19,"depth":19,"links":20},"",2,[],"banners\u002Flakestream-ufk-launch","default","zRueBGutATZB0ZnFFHwaEV7F0Di4tnZUHhgOiI4cu6k",{"id":25,"title":26,"authors":27,"body":29,"category":289,"createdAt":290,"date":291,"description":292,"extension":8,"featured":7,"image":293,"isDraft":294,"link":290,"meta":295,"navigation":7,"order":296,"path":297,"readingTime":298,"relatedResources":290,"seo":299,"stem":300,"tags":301,"__hash__":305},"blogs\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025.md","StreamNative Recognized as a Contender in The Forrester Wave™: Streaming Data Platforms, Q4 2025",[28],"David Kjerrumgaard",{"type":15,"value":30,"toc":276},[31,39,47,51,67,73,78,81,87,102,109,115,118,124,127,134,140,143,146,157,163,169,172,175,178,184,191,194,197,204,207,210,224,229,233,237,241,245,249,251,268,270],[32,33,35],"h3",{"id":34},"receives-highest-possible-scores-in-both-the-messaging-and-resource-optimization-criteria",[36,37,38],"em",{},"Receives Highest Possible Scores in BOTH the Messaging and Resource Optimization Criteria",[40,41,43],"h2",{"id":42},"introduction",[44,45,46],"strong",{},"Introduction",[48,49,50],"p",{},"Real-time data has become the backbone of modern innovation. As artificial intelligence (AI) and digital services demand instantaneous insights, organizations are realizing that streaming data is no longer optional – it's essential for delivering timely, context-rich experiences. StreamNative's data streaming platform is built precisely for this reality, ensuring data is immediate, reliable, and ready to power critical applications.",[48,52,53,54,63,64],{},"Today, we're excited to announce that Forrester Research has named StreamNative as a Contender in its evaluation, ",[55,56,58],"a",{"href":57},"\u002Freports\u002Frecognized-in-the-forrester-wave-tm-streaming-data-platforms-q4-2025",[36,59,60],{},[44,61,62],{},"The Forrester Wave™: Streaming Data Platforms, Q4 2025",". This report evaluated 15 top streaming data platform providers, and we're proud to share that ",[44,65,66],{},"StreamNative received the highest scores possible—5 out of 5—in both the Messaging and Resource Optimization criteria.",[48,68,69,70],{},"***Forrester's Take: ***",[36,71,72],{},"\"StreamNative is a good fit for enterprises that want an Apache Pulsar implementation that is also compatible with Kafka APIs.\"",[48,74,75],{},[36,76,77],{},"— The Forrester Wave™: Streaming Data Platforms, Q4 2025",[48,79,80],{},"Being recognized in the Forrester Wave is a proud milestone, and for us, it highlights how far StreamNative has come in enabling enterprises to unlock the power of real-time data. In the sections below, we'll dive into what we believe sets StreamNative apart—from our modern architecture and cloud-native design to our open-source foundation and real-time use cases—and how we see these strengths aligning with Forrester's findings.",[40,82,84],{"id":83},"trusted-by-industry-leaders",[44,85,86],{},"Trusted by Industry Leaders",[48,88,89,90,93,94,97,98,101],{},"Companies across industries are already leveraging StreamNative to drive real-time outcomes. Global enterprises like ",[44,91,92],{},"Cisco"," rely on StreamNative to handle massive IoT telemetry, supporting 245 million+ connected devices. Martech leaders such as ",[44,95,96],{},"Iterable"," process billions of events per day with StreamNative for hyper-personalized customer engagement. And in financial services, ",[44,99,100],{},"FICO"," trusts StreamNative to power its real-time fraud detection and analytics pipelines with a secure, scalable streaming backbone.",[48,103,104,105,108],{},"The Forrester report notes that, “",[36,106,107],{},"Customers appreciate the lower infrastructure costs that result from StreamNative’s cost-efficient, Kafka-compatible architecture. Customers note excellent support responsiveness…","”",[40,110,112],{"id":111},"modern-cloud-native-architecture-built-for-scale",[44,113,114],{},"Modern, Cloud-Native Architecture Built for Scale",[48,116,117],{},"From day one, StreamNative was designed with a modern architecture to meet the demanding scale and flexibility requirements of real-time data. Unlike legacy streaming systems that often rely on tightly coupled storage and compute, StreamNative's platform takes a cloud-native approach: it decouples these layers to enable elastic scalability and efficient resource utilization across any environment. The core is powered by Apache Pulsar—a distributed messaging and streaming engine—enhanced with multi-protocol support (including native Apache Kafka API compatibility) to unify diverse data streams under one roof. This means organizations can consolidate siloed messaging systems and handle both high-volume event streams and traditional message queues on a single platform, without sacrificing performance or reliability.",[48,119,120,121,108],{},"Forrester's evaluation described that “",[36,122,123],{},"StreamNative aims to provide a high-performance, multi-protocol streaming data platform: It uses Apache Pulsar with Kafka API compatibility to deliver cost-efficient, real-time applications for enterprises. It appeals to organizations that want a flexible, low-cost streaming solution, due to its focus on scalability and resource optimization, while its investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.",[48,125,126],{},"Our cloud-first, leaderless architecture (with no single broker bottlenecks) and tiered storage model were built to maximize throughput and cost-efficiency for real-time workloads. By separating compute from storage and leveraging distributed object storage, StreamNative can retain huge volumes of event data indefinitely while keeping compute costs in check—effectively providing a flexible, low-cost streaming solution.",[48,128,129,130,133],{},"This modern design not only delivers high performance, but also ensures fault tolerance and geo-distribution out of the box, so enterprises can trust their streaming data is always available and durable. As Forrester’s evaluation noted, StreamNative ",[36,131,132],{},"\"excels at messaging and resource optimization\" and “Its platform supports use cases like real-time analytics and event-driven architectures with robust scalability.","” Our architecture provides the strong foundation that today's real-time applications demand, from ultra-fast data ingestion to seamless scale-out across hybrid and multi-cloud environments.",[40,135,137],{"id":136},"open-source-foundation-and-pulsar-expertise",[44,138,139],{},"Open Source Foundation and Pulsar Expertise",[48,141,142],{},"StreamNative's DNA is rooted in open source innovation. Our founders are the original creators of Apache Pulsar, and we've built our platform with the same open principles: freedom, flexibility, and community-driven innovation. For developers and data teams, this means adopting StreamNative comes with no proprietary lock-in—instead, you get a platform built on open standards and a thriving ecosystem. We offer broad API compatibility (Pulsar, Kafka, JMS, MQTT, and more) so that teams can work with familiar interfaces and integrate StreamNative into existing systems with ease.",[48,144,145],{},"StreamNative is the primary commercial contributor to the Apache Pulsar project and its surrounding ecosystem. We invest heavily in Pulsar's ongoing improvements our investments in Pulsar's open-source ecosystem and performance optimization bolster StreamNative's value. We also foster a vibrant community through initiatives like the Data Streaming Summit and free training resources.",[48,147,148,149,152,153,156],{},"Forrester's assessment noted that StreamNative’s “",[36,150,151],{},"events-driven agents, extensibility, and performance architecture are solid,","” and we're continuing to build on that foundation. ",[44,154,155],{},"We're actively investing in expanding our tooling for observability, governance, schema management, and developer productivity","—areas we recognize as critical for enterprise adoption and where we're committed to accelerating our roadmap.",[48,158,159,160],{},"Being open also means embracing an open ecosystem of technologies. StreamNative actively integrates with the tools and platforms that matter most to our users. We partner with industry leaders like Snowflake, Databricks, Google, and Ververica to ensure our streaming platform works seamlessly with data warehouses, lakehouse storage, and stream processing frameworks. Forrester’s evaluation observed that StreamNative’s ",[36,161,162],{},"\"investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.\"",[40,164,166],{"id":165},"powering-real-time-use-cases-across-industries",[44,167,168],{},"Powering Real-Time Use Cases Across Industries",[48,170,171],{},"One of the greatest validations of StreamNative's approach is the success our customers are achieving with real-time data. StreamNative's platform is versatile and use-case agnostic—if an application demands high-volume, low-latency data movement, we can power it. This flexibility is why our customer base spans industries from finance and IoT to major automobile manufacturers and online gaming. The common thread is that these organizations need to process and react to data in milliseconds, and StreamNative is delivering the capabilities to make that possible.",[48,173,174],{},"Cisco uses StreamNative to underpin an IoT telemetry system of colossal scale, connecting hundreds of millions of devices and thousands of enterprise clients with real-time data streams. The platform's multi-tenant design and proven reliability allow Cisco to offer its customers a live feed of device data with unwavering confidence. In the financial sector, FICO has built streaming pipelines on StreamNative to detect fraud as transactions happen and to monitor systems in real time. With StreamNative's strong guarantees around message durability and ordering, FICO can catch anomalies or suspicious patterns within seconds. And in digital customer engagement, Iterable relies on StreamNative to process billions of events every day—clicks, views, purchases—so that marketers can trigger personalized campaigns instantly based on user behavior.",[48,176,177],{},"Our customers uniformly deal with mission-critical data streams, where downtime or delays are unacceptable. StreamNative's fault-tolerant, scalable infrastructure has proven equal to the task, handling scenarios like bursting to millions of events per second or seamlessly spanning multiple cloud regions. Forrester's report recognized StreamNative for supporting event-driven architectures with robust scalability—which for us is a reflection of our platform's ability to meet the most demanding enterprise requirements.",[40,179,181],{"id":180},"continuing-to-innovate-ursa-orca-and-the-road-ahead",[44,182,183],{},"Continuing to Innovate: Ursa, Orca, and the Road Ahead",[48,185,186,187,190],{},"While we are thrilled to be recognized in Forrester's Streaming Data Platforms Wave, we view this as just the beginning. StreamNative's vision has always been bold: to ",[44,188,189],{},"provide a unified platform that not only handles today's streaming needs but also anticipates the emerging requirements of tomorrow",".",[48,192,193],{},"One key area of focus is the convergence of streaming data with advanced analytics and AI. As Forrester points out in the report, technology leaders should look for platforms that natively integrate messaging, stream processing, and analytics to provide AI agents with real-time, contextualized information. We couldn't agree more. Our award-winning Ursa Engine and Orca Agent Engine are aimed at extending our platform up the stack—bridging the gap between data streams and data lakes, and between event streams and intelligent processing.",[48,195,196],{},"Our new Ursa Engine introduces a lakehouse-native approach to streaming: it can write events directly to table formats like Iceberg on cloud storage, eliminating entire classes of ETL jobs and making fresh data instantly available for analytics queries. By integrating streaming and lakehouse technologies, we help customers collapse data silos and accelerate their AI\u002FML pipelines.",[48,198,199,200,203],{},"Beyond analytics integration, we are also enhancing StreamNative with more out-of-the-box processing and governance capabilities. In the coming months, we plan to introduce new features for lightweight stream processing and transformation, making it easier to build reactive applications directly on the platform. We're also expanding our ecosystem of connectors and integrations, so that whether your data lands in Snowflake, Databricks, or an AI model, StreamNative will seamlessly feed it. ",[44,201,202],{},"We're investing significantly in enterprise features including security, schema registry, governance, and monitoring tooling","—capabilities that are essential for mission-critical deployments and where we're committed to continued improvement.",[48,205,206],{},"This recognition from Forrester energizes us to keep innovating at full speed. We're sharing this honor with our amazing customers, community, and partners who drive us forward every day. Your feedback and real-world challenges have helped shape StreamNative into what it is today, and together, we will shape the future of streaming data. Thank you for joining us on this journey—we're just getting started, and we can't wait to deliver even more value as we continue to evolve our platform. Onward to real-time everything!",[208,209],"hr",{},[32,211,213],{"id":212},"streamnative-in-the-forrester-wave-evaluation-findings",[44,214,215,216,223],{},"StreamNative in ",[44,217,218],{},[55,219,220],{"href":57},[44,221,222],{},"The Forrester Wave™",": Evaluation Findings",[225,226,228],"h5",{"id":227},"recognized-as-a-contender-among-15-streaming-data-platform-providers","• Recognized as a Contender among 15 streaming data platform providers",[225,230,232],{"id":231},"received-the-highest-scores-possible-50-in-both-the-messaging-and-resource-optimization-criteria","* Received the highest scores possible (5.0) in both the Messaging and Resource Optimization criteria",[225,234,236],{"id":235},"cited-as-the-primary-platform-for-enterprises-wishing-to-implement-pulsar","• Cited as the primary platform for enterprises wishing to implement Pulsar",[225,238,240],{"id":239},"noted-for-excelling-at-messaging-and-resource-optimization","• Noted for excelling at messaging and resource optimization",[225,242,244],{"id":243},"customers-cited-lower-infrastructure-costs-and-excellent-support-responsiveness","• Customers cited lower infrastructure costs and excellent support responsiveness",[225,246,248],{"id":247},"recognized-for-supporting-event-driven-architectures-with-robust-scalability","• Recognized for supporting event-driven architectures with robust scalability",[208,250],{},[252,253,255,256,259,260,190],"h6",{"id":254},"forrester-disclaimer-forrester-does-not-endorse-any-company-product-brand-or-service-included-in-its-research-publications-and-does-not-advise-any-person-to-select-the-products-or-services-of-any-company-or-brand-based-on-the-ratings-included-in-such-publications-information-is-based-on-the-best-available-resources-opinions-reflect-judgment-at-the-time-and-are-subject-to-change-for-more-information-read-about-forresters-objectivity-here","**Forrester Disclaimer: **",[36,257,258],{},"Forrester does not endorse any company, product, brand, or service included in its research publications and does not advise any person to select the products or services of any company or brand based on the ratings included in such publications. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change",". *For more information, read about Forrester’s objectivity *",[55,261,265],{"href":262,"rel":263},"https:\u002F\u002Fwww.forrester.com\u002Fabout-us\u002Fobjectivity\u002F",[264],"nofollow",[36,266,267],{},"here",[208,269],{},[252,271,273],{"id":272},"apache-apache-pulsar-apache-kafka-apache-flink-and-other-names-are-trademarks-of-the-apache-software-foundation-no-endorsement-by-apache-or-other-third-parties-is-implied",[36,274,275],{},"Apache®, Apache Pulsar®, Apache Kafka®, Apache Flink® and other names are trademarks of The Apache Software Foundation. No endorsement by Apache or other third parties is implied.",{"title":18,"searchDepth":19,"depth":19,"links":277},[278,280,281,282,283,284,285],{"id":34,"depth":279,"text":38},3,{"id":42,"depth":19,"text":46},{"id":83,"depth":19,"text":86},{"id":111,"depth":19,"text":114},{"id":136,"depth":19,"text":139},{"id":165,"depth":19,"text":168},{"id":180,"depth":19,"text":183,"children":286},[287],{"id":212,"depth":279,"text":288},"StreamNative in The Forrester Wave™: Evaluation Findings","Company",null,"2025-12-16","StreamNative is recognized in The Forrester Wave™: Streaming Data Platforms, Q4 2025. Discover why Forrester highlights StreamNative's high-performance messaging, efficient resource use, and cost-effective Kafka API compatibility for real-time innovation.","\u002Fimgs\u002Fblogs\u002F693bd36cf01b217dcb67278f_Streamnative_blog_thumbnail.png",false,{},0,"\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025","10 mins read",{"title":26,"description":292},"blog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025",[302,303,304],"Announcements","Real-Time","Forrester","sOeeJtEO3O-IIfTPJjY1AFOMawZ_rf8FOH8A98NEKgU",[307,802,1334,1756,2202,2601,3004,3308,3562,3991,4480,4714,5511,5956,6123,6288,6424,6496,6607,6781,6965,7169,7349,7700,7889,7991,8060,8498,8640,8994,9057,9146,9517,9638,9937,10056,10324,10505,10729,11045,11185,11353,11514,11680,11901,12108,12256,12497,12760,12987,13184,13491,13611,13713,13812,13992,14115,14220,14391,14525,14677,14838,15033,15152,15397,15600,15777,16201,16567,16859,16987,17166,17265,17444,17611,17939,18655,18884,19147,19709,20012,20149,20345,21143,21387,21793,22029,22785,22994,23097,23280,23487,23872,24020,24251,24400,24666,24772,24871,24988,25363,25672,25992,26399,26749,26905,27326,27484,27662,27849,28162,28347,28574,28661,28866,29027,29210,29316,29571,29815,30127,30507,31044,31283,31502,31714,31927,32287,32481,32624,32703,32814,33209,33371,33696,34096,34201,34426,34714,34955,35070,35171,35237,35346,35561,35766,35910,36052,36278,36391,36522,37157,37412,37929,38132,38444,38945,39252,39468,39716,39875,40481,40984,41181,41473,41691,41867,42199,42798,43293,43620,43771,43950,44073,44442,44540,44838,45248,45523,46119,46353,46759,47358,47527,47809,48137,48302,48442,48571,48925,49488,49970,50346,50620,50854,51034,51210,51322,51658,51873,52255,52549,52736,53261,53431,53624,53889,54013,54195,54452,54659,55068,55330,55573,55831,56005,56296,56418,56870,57069,57306,57540,57808,57989,58127,58386,58517,58662,58852,59790,60120,60437,60488,60960,61296,61607,62322,62550,62825,63055,63270,63536,63799,64292,64912,65168,65380,65754,65949,66148,66805,67186,67368,67804,68011,68399,68451,68527,68645,68980,69349,69511,69617,69772,69982,70191,70593,70884,71240,71500,71659,71907,72004,72129,72209,72498,73993,74189,74775,75263,75475,75786,75910,76065,76312,76405,76807,77156,77354,77501,77588,78071,78383,78655,78973,79096,79619,79728,80471,81432,81880,82008,82215,82370,82575,82777,82984,83144,83235,83560,84184,84325,84696,84948,85532,86523,86769,87005,87072,87354,87827,88050,88325,88651,88891,89165,89227,89697,89877,89984,90049,90277,90563,90790,91039,91298,91598,91789,91956,92181,92494],{"id":308,"title":309,"authors":310,"body":312,"category":289,"createdAt":290,"date":6,"description":792,"extension":8,"featured":294,"image":793,"isDraft":294,"link":290,"meta":794,"navigation":7,"order":296,"path":795,"readingTime":290,"relatedResources":290,"seo":796,"stem":797,"tags":798,"__hash__":801},"blogs\u002Fblog\u002Fannouncing-streamnative-kafka-service-launch-partners.md","Announcing StreamNative Kafka Service Launch Partners",[311],"Kundan Vyas",{"type":15,"value":313,"toc":781},[314,320,330,338,343,369,372,378,381,387,393,396,401,416,421,430,435,443,448,458,463,469,472,477,482,487,492,503,508,513,523,528,533,538,543,548,553,558,563,573,578,584,587,592,602,607,612,623,628,633,644,649,654,665,670,675,686,691,696,706,711,717,720,725,730,735,741,744,747,753],[40,315,317],{"id":316},"summary",[44,318,319],{},"Summary",[321,322,323,327],"ul",{},[324,325,326],"li",{},"StreamNative Kafka service is now in Public Preview, introducing a cloud-native Kafka service powered by Ursa's diskless, leaderless architecture to unify real-time streaming and lakehouse workloads on StreamNative Cloud. Currently available for Dedicated clusters, with Serverless and BYOC support planned, it provides flexible deployment options for modern data platforms.",[324,328,329],{},"StreamNative Kafka service launch partners are helping customers accelerate adoption by validating integrations across real-time analytics, lakehouse platforms, AI workloads, and operational applications.",[48,331,332,333,190],{},"These partners extend the value of StreamNative Kafka service by\nsupporting both real-time streaming integrations and lakehouse analytics\nuse cases. Together, they enable organizations to ingest streaming data\ninto downstream platforms, build operational applications, validate data\npipelines, and query Delta Lake and Iceberg tables created from data\nwritten through ",[55,334,337],{"href":335,"rel":336},"https:\u002F\u002Fstreamnative.io\u002Fursa",[264],"Ursa's diskless storage\narchitecture",[48,339,340],{},[44,341,342],{},"Core partner categories:",[321,344,345,351,357,363],{},[324,346,347,350],{},[44,348,349],{},"Data platforms and lakehouse ecosystem partners:"," Enable seamless integration with modern data platforms and support querying Delta Lake and Iceberg tables created by StreamNative Kafka service for AI, BI, and lakehouse analytics.",[324,352,353,356],{},[44,354,355],{},"Real-time analytics and stream processing partners:"," Enable ingestion of streaming data from StreamNative Kafka and support querying Delta Lake and Iceberg tables for real-time analytics and lakehouse insights.",[324,358,359,362],{},[44,360,361],{},"Operational applications and developer ecosystem partners:"," Provide tools and platforms for building applications, improving developer productivity, and operationalizing streaming data.",[324,364,365,368],{},[44,366,367],{},"Testing and data pipeline validation partners:"," Help organizations simulate workloads, test integrations, and validate streaming data pipelines before production deployment.",[48,370,371],{},"Together, these launch partners demonstrate how StreamNative Kafka\nintegrates seamlessly into modern data architectures.",[32,373,375],{"id":374},"featured-launch-partners",[44,376,377],{},"Featured launch partners",[48,379,380],{},"Our launch partners include leading independent software vendors (ISVs)\nacross the data, analytics, and AI ecosystem. These partners have\nvalidated integrations with StreamNative Kafka to help customers\naccelerate real-time data adoption, simplify streaming operations, and\nbuild modern analytics and AI-driven applications. Together, they extend\nthe StreamNative Kafka ecosystem by enabling seamless interoperability\nacross lakehouse platforms, stream processing engines, real-time\nanalytics systems, and developer tooling.",[48,382,383],{},[384,385],"img",{"alt":18,"src":386},"\u002Fimgs\u002Fblogs\u002Fannouncing-streamnative-kafka-service-launch-partners-image1.jpg",[40,388,390],{"id":389},"data-platforms-and-lakehouse-ecosystem-partners",[44,391,392],{},"Data platforms and lakehouse ecosystem partners",[48,394,395],{},"Modern data architectures require seamless integration between streaming\nplatforms and lakehouse technologies. StreamNative Kafka service enables\ndirect ingestion of streaming data into open table formats and governed\ndata platforms through partnerships with leading data ecosystem\nproviders.",[48,397,398],{},[44,399,400],{},"Streaming Data to the Databricks Enterprise AI Platform",[48,402,403,404,409,410,415],{},"StreamNative's integration with Databricks enables organizations to\nstream data from the StreamNative Kafka service into Unity Catalog in\nopen table formats such as Delta Lake and Apache Iceberg. In addition to\n",[55,405,408],{"href":406,"rel":407},"https:\u002F\u002Fstreamnative.io\u002Fblog\u002Fstreamnative-expands-unitycatalog-integration-with-iceberg-tables",[264],"existing support for managed Iceberg\ntables","\nand ",[55,411,414],{"href":412,"rel":413},"https:\u002F\u002Fstreamnative.io\u002Fblog\u002Fseamless-streaming-to-lakehouse-unveiling-streamnative-clouds-integration-with-databricks-unity-catalog",[264],"external Delta\ntables",",\nStreamNative Cloud now also supports Managed Tables using Unity\nCatalog's Catalog-based commits, enabling governed real-time data for\nAI, BI, and analytics.",[48,417,418],{},[44,419,420],{},"Streaming Data into Snowflake's AI Data Cloud",[48,422,423,424,429],{},"StreamNative's integration with Snowflake enables organizations to\nstream data from the StreamNative Kafka service into Iceberg tables\nmanaged in Snowflake catalogs. While ",[55,425,428],{"href":426,"rel":427},"https:\u002F\u002Fstreamnative.io\u002Fblog\u002Fstreamnative-enables-seamless-streaming-into-apache-iceberg-tm-snowflake-open-catalog",[264],"StreamNative Cloud already\nsupported writing topic data to Snowflake Open\nCatalog",",\nit now also supports Snowflake Horizon Catalog---part of the Snowflake\nAI Data Cloud---adding enterprise capabilities such as governance, RBAC,\nand centralized data management for real-time analytics and AI\nworkloads.",[48,431,432],{},[44,433,434],{},"Streaming Data into Google's BigLake - Apache Iceberg Lakehouse",[48,436,437,442],{},[55,438,441],{"href":439,"rel":440},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Flakehouse\u002Fexternal-tables\u002Fintegrations\u002Fintegrate-iceberg-with-google-biglake",[264],"StreamNative's integration with Google\nBigLake","\nenables organizations to stream data from the StreamNative Kafka service\ndirectly into Iceberg tables managed by the BigLake metastore using the\nIceberg REST Catalog standard. This integration allows enterprises to\ncombine real-time streaming with open lakehouse architectures, making\nKafka data immediately available for analytics, AI, and governance\nworkflows within Google Cloud's data ecosystem while maintaining\nopenness through Apache Iceberg.",[48,444,445],{},[44,446,447],{},"Starburst",[48,449,450],{},[36,451,452,453,457],{},"\"StreamNative and ",[55,454,447],{"href":455,"rel":456},"https:\u002F\u002Fwww.starburst.io\u002F",[264],"\nbring together real-time streaming and seamless Iceberg ingestion to\nsimplify how data moves and becomes usable. Together, we create an open,\nAI-ready foundation. With Starburst, organizations can query that data\ndirectly at scale for real-time analytics and AI.\"",[48,459,460],{},[36,461,462],{},"Jitender Aswani - Senior Vice President, Engineering @ Starburst",[40,464,466],{"id":465},"real-time-analytics-and-stream-processing-partners",[44,467,468],{},"Real-time analytics and stream processing partners",[48,470,471],{},"StreamNative Kafka integrates with leading real-time processing and\nanalytics platforms to help organizations transform raw events into\nactionable insights.",[48,473,474],{},[44,475,476],{},"StarTree",[48,478,479],{},[36,480,481],{},"\"Real-time applications depend on fast, reliable streaming data. With\nStarTree's native integration with StreamNative Kafka powered by the\nUrsa engine, organizations can transform event streams into real-time\nanalytics. StarTree also enables customers to query Kafka data written\nas Iceberg tables by StreamNative, bringing real-time insights to open\nlakehouse environments.\"",[48,483,484],{},[36,485,486],{},"--- Chinmay Soman, VP Product, StarTree",[48,488,489],{},[44,490,491],{},"Ververica",[48,493,494],{},[36,495,496,497,502],{},"\"Streaming and real-time processing are foundational to modern data\nplatforms. Together, Ververica and StreamNative enable customers to\ncombine StreamNative's native Kafka with Ververica's ",[55,498,501],{"href":499,"rel":500},"https:\u002F\u002Fwww.ververica.com\u002Fvera",[264],"VERA\nengine"," to build scalable,\nenterprise-grade streaming architectures that power real-time analytics\nand AI workloads.\"",[48,504,505],{},[36,506,507],{},"--- Vladimir Jandreski, Chief Product Officer, Ververica",[48,509,510],{},[44,511,512],{},"RisingWave",[48,514,515],{},[36,516,517,518,522],{},"\"StreamNative's native Kafka service, powered by Ursa, provides a\nstrong foundation for modern stream processing by simplifying real-time\ningestion into lakehouse architectures. Together,\n",[55,519,512],{"href":520,"rel":521},"https:\u002F\u002Frisingwave.com\u002F",[264]," and StreamNative\nenable developers to continuously enrich streaming data with context,\nmaking it easier to build real-time pipelines that transform event\nstreams into analytics-ready data for lakehouse and AI workloads.\"",[48,524,525],{},[36,526,527],{},"--- Rayees Pasha, Chief Product Officer, RisingWave",[48,529,530],{},[44,531,532],{},"VeloDB",[48,534,535],{},[36,536,537],{},"\"StreamNative and VeloDB combine real-time data streaming with hybrid\nanalytics to power the next generation of AI-driven applications.\nStreamNative enables continuous, reliable data ingestion into VeloDB or\nopen table formats like Iceberg, while VeloDB delivers real-time hybrid\nsearch across structured, semi-structured, and vector data. Together, we\nprovide an open, scalable, and AI-ready data platform for real-time\ninsights and intelligent decision-making.\"",[48,539,540],{},[36,541,542],{},"--- Mingyu Chen, VP Engineering, VeloDB",[48,544,545],{},[44,546,547],{},"Celer Data",[48,549,550],{},[36,551,552],{},"\"Together, StreamNative and CelerData Cloud enable organizations to\nstream live event data through StreamNative's Ursa-powered Kafka\nplatform and query it in real time through CelerData Cloud, so AI agents\nand applications always have fresh, governed context without the\noverhead of extra pipelines.\"",[48,554,555],{},[36,556,557],{},"--- Sida Shen, Product Manager, CelerData",[48,559,560],{},[44,561,562],{},"Timeplus",[48,564,565],{},[36,566,567,568,572],{},"\"Real-time analytics starts with reliable streaming infrastructure. By\nintegrating ",[55,569,562],{"href":570,"rel":571},"https:\u002F\u002Fwww.timeplus.com\u002F",[264]," with\nStreamNative's native Kafka service powered by the Ursa engine,\norganizations can accelerate how quickly they analyze streaming data and\noperationalize insights without complex data movement.\"",[48,574,575],{},[36,576,577],{},"--- Ting Wang, CEO, Timeplus",[40,579,581],{"id":580},"operational-applications-and-developer-ecosystem-partners",[44,582,583],{},"Operational applications and developer ecosystem partners",[48,585,586],{},"StreamNative Kafka also integrates with partners focused on operations,\ndeveloper productivity, and operational intelligence to help teams build\nand manage streaming applications more effectively.",[48,588,589],{},[44,590,591],{},"Lenses",[48,593,594],{},[36,595,596,597,601],{},"\"Businesses are increasingly running multi-Kafka environments as they\nmodernize their critical systems to real-time. Ursa for Kafka adds a\nmodern architecture and unique infrastructure capabilities to this\nlandscape. Integrated with ",[55,598,591],{"href":599,"rel":600},"https:\u002F\u002Flenses.io\u002F",[264],",\nengineers gain the productivity and unified governance to build and\noperate on Ursa alongside their broader streaming estate.\"",[48,603,604],{},[36,605,606],{},"--- Guillaume Aymé, CEO - Lenses",[48,608,609],{},[44,610,611],{},"Factor House",[48,613,614],{},[36,615,616,617,622],{},"\"StreamNative Kafka powered by Ursa delivers the infrastructure, and\n",[55,618,621],{"href":619,"rel":620},"https:\u002F\u002Ffactorhouse.io\u002Fproducts\u002Fkpow",[264],"Factor House's\nKpow"," gives teams the\nvisibility and control to run it with confidence. The result: faster\nincident response, stronger governance, and engineering teams that spend\nless time firefighting.\"",[48,624,625],{},[36,626,627],{},"---Derek Troy-West, Co-founder & CEO, Factor House",[48,629,630],{},[44,631,632],{},"Advantco",[48,634,635],{},[36,636,637,638,643],{},"\"Enterprises increasingly rely on Kafka to connect mission-critical\nbusiness systems like SAP with modern data platforms. Through\n",[55,639,642],{"href":640,"rel":641},"https:\u002F\u002Fwww.advantco.com\u002Fsap-integration-adapters\u002Fsap-kafka-integration",[264],"Advantco's Kafka\nWorkbench","\nand its native integration with StreamNative's Kafka service powered by\nthe Ursa engine, we enable organizations to seamlessly stream SAP data\ninto real-time architectures and accelerate their digital transformation\ninitiatives.\"",[48,645,646],{},[36,647,648],{},"--- Nick Persavich, President, Advantco International",[48,650,651],{},[44,652,653],{},"Volt Active Data",[48,655,656],{},[36,657,658,659,664],{},"\"Combining StreamNative's native Kafka service powered by the Ursa\nengine with ",[55,660,663],{"href":661,"rel":662},"https:\u002F\u002Fwww.voltactivedata.com\u002F",[264],"Volt Active\nData"," enables enterprises\nto power next-generation operational applications driven by contextual,\nreal-time decisions. Together we help organizations transform streaming\ndata into stateful, actionable intelligence that drives immediate\nbusiness outcomes.\"",[48,666,667],{},[36,668,669],{},"--- Anna Criscione, Global Head of Partners and Alliances - Volt Active\nData",[48,671,672],{},[44,673,674],{},"PuppyGraph",[48,676,677],{},[36,678,679,680,685],{},"\"By combining StreamNative's native Kafka ingestion into Iceberg\ntables with ",[55,681,684],{"href":682,"rel":683},"https:\u002F\u002Fwww.puppygraph.com\u002F",[264],"PuppyGraph's","\ngraph query engine, organizations can analyze relationships in their\ndata instantly without building or managing separate graph pipelines,\nand enable their data agent in production with the harness.\"",[48,687,688],{},[36,689,690],{},"--- Weimo Liu, CEO, PuppyGraph",[48,692,693],{},[44,694,695],{},"CocoIndex",[48,697,698],{},[36,699,700,701,705],{},"\"StreamNative's native Kafka service powered by the Ursa engine\nprovides a reliable real-time data foundation for modern AI and data\nplatforms. Together, ",[55,702,695],{"href":703,"rel":704},"https:\u002F\u002Fcocoindex.io\u002F",[264],"\nand StreamNative enable organizations to continuously transform and\nindex streaming data, helping teams accelerate the delivery of AI-ready\nand search-ready data products.\"",[48,707,708],{},[36,709,710],{},"--- Linghua Jin, Cofounder & CEO, CocoIndex",[40,712,714],{"id":713},"testing-and-data-pipeline-validation-partners",[44,715,716],{},"Testing and data pipeline validation partners",[48,718,719],{},"Ensuring reliability in production streaming environments requires\nstrong testing and simulation capabilities. StreamNative Kafka partners\nhelp teams validate architectures before deployment.",[48,721,722],{},[44,723,724],{},"ShadowTraffic",[48,726,727],{},[36,728,729],{},"\"Had a chance to take StreamNative's new native Kafka service for a\ntest drive---phenomenal stuff! Diskless, modern architecture with a\ngreat UI and clean data lake integration. Well done!\"",[48,731,732],{},[36,733,734],{},"--- Michael Drogalis, Founder, ShadowTraffic",[40,736,738],{"id":737},"building-the-future-of-real-time-data-together",[44,739,740],{},"Building the future of real-time data together",[48,742,743],{},"The launch of StreamNative Kafka service marks an important step in our\nmission to unify streaming and lakehouse architectures. With Ursa as the\nfoundation, StreamNative Kafka service enables organizations to\nmodernize their Kafka deployments while seamlessly integrating with the\nbroader data ecosystem.",[48,745,746],{},"We are proud to partner with these innovative companies to help\ncustomers build the next generation of real-time data platforms for\nanalytics, AI, and operational applications.",[40,748,750],{"id":749},"getting-started",[44,751,752],{},"Getting started",[48,754,755,758,759,764,765,770,773,774,777,778,190],{},[44,756,757],{},"Try it yourself:"," ",[55,760,763],{"href":761,"rel":762},"https:\u002F\u002Fconsole.streamnative.cloud\u002Fsignup",[264],"Sign up for a free\ntrial"," to\nexperience how Kafka data from StreamNative can be seamlessly ingested\ninto the Snowflake Horizon Catalog. ",[55,766,769],{"href":767,"rel":768},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fbilling\u002Fdiscounts",[264],"Use promo\ncode",[44,771,772],{},"UFK1000"," between ",[44,775,776],{},"April 7 and April 14"," to receive ",[44,779,780],{},"$1,000 in\ncredits",{"title":18,"searchDepth":19,"depth":19,"links":782},[783,786,787,788,789,790,791],{"id":316,"depth":19,"text":319,"children":784},[785],{"id":374,"depth":279,"text":377},{"id":389,"depth":19,"text":392},{"id":465,"depth":19,"text":468},{"id":580,"depth":19,"text":583},{"id":713,"depth":19,"text":716},{"id":737,"depth":19,"text":740},{"id":749,"depth":19,"text":752},"StreamNative Kafka service enters Public Preview with launch partners across real-time analytics, lakehouse platforms, AI workloads, and operational applications.","\u002Fimgs\u002Fblogs\u002Fblog-thumbnail-announcing-streamnative-kafka-service-launch-partners.png",{},"\u002Fblog\u002Fannouncing-streamnative-kafka-service-launch-partners",{"title":309,"description":792},"blog\u002Fannouncing-streamnative-kafka-service-launch-partners",[799,302,800,303],"Apache Kafka","Lakehouse","_tv0H6Il5i6X0m6_HdqXh5IQe0mf8ph8wQINGrKA3So",{"id":803,"title":804,"authors":805,"body":811,"category":289,"createdAt":290,"date":6,"description":1324,"extension":8,"featured":7,"image":1325,"isDraft":294,"link":290,"meta":1326,"navigation":7,"order":296,"path":9,"readingTime":290,"relatedResources":290,"seo":1327,"stem":1328,"tags":1329,"__hash__":1333},"blogs\u002Fblog\u002Ffrom-streams-to-lakestreams.md","From Streams to Lakestreams: The Next Paradigm in Data Infrastructure",[806,807,28,311,808,809,810],"Sijie Guo","Matteo Merli","Penghui Li","Hang Chen","Neng Lu",{"type":15,"value":812,"toc":1311},[813,816,823,830,833,843,847,850,853,864,869,874,881,884,888,895,898,901,912,915,924,927,930,934,937,947,950,953,958,961,972,975,980,985,989,992,1003,1006,1009,1012,1023,1026,1029,1037,1041,1044,1047,1050,1053,1056,1059,1064,1067,1070,1073,1076,1080,1083,1086,1092,1095,1103,1107,1110,1113,1120,1123,1127,1130,1135,1140,1145,1148,1159,1166,1169,1174,1177,1180,1191,1196,1199,1202,1205,1211,1214,1217,1220,1224,1227,1233,1239,1245,1251,1257,1260,1265,1270,1274,1277,1288,1291,1294,1297,1300,1305,1308],[48,814,815],{},"When we founded StreamNative, we set out to build the world's best data\nstreaming platform. We succeeded --- but the more interesting discovery\ncame from what we found along the way: streaming, on its own, is only\npart of the story.",[48,817,818,819,822],{},"StreamNative was founded by the original creators of ",[44,820,821],{},"Apache Pulsar",",\na system born inside Yahoo to handle unified messaging and real-time\ndata movement at a scale few organizations ever face. When we\nopen-sourced that work and brought it to the broader market, we believed\nthe core problem was speed and scale. We were right --- but we were\nasking a narrower question than the industry actually needed us to\nanswer.",[48,824,825,826,829],{},"What enterprises kept running into wasn't a streaming problem in\nisolation. It was an ",[36,827,828],{},"integration"," problem: how does real-time data\ncoexist with the analytical systems, storage layers, and AI workloads\nthat define the modern data stack? The answer, we came to realize,\npoints toward something bigger than streaming --- a unified lakehouse\narchitecture where real-time and historical data aren't siloed, but\ngenuinely converge.",[48,831,832],{},"This post is about that journey: from building a streaming company, to\nrecognizing a fundamental architectural shift --- and why we believe it\nmatters for every organization serious about making data a competitive\nadvantage.",[48,834,835,836,190],{},"We call it\n",[55,837,840],{"href":838,"rel":839},"https:\u002F\u002Fstreamnative.io\u002Flakestream",[264],[44,841,842],{},"Lakestream",[40,844,846],{"id":845},"starting-with-pulsar-and-learning-the-limits-of-protocols","Starting with Pulsar -- and Learning the Limits of Protocols",[48,848,849],{},"When we built Apache Pulsar at Yahoo, we were solving a problem that\ndidn't have a good answer yet: how do you build a multi-tenant, unified\nmessaging and streaming platform capable of handling millions of topics,\nbillions of messages, and the operational complexity of a global\ninternet company --- all on shared infrastructure?",[48,851,852],{},"Some of the architectural decisions we made turned out to be more\nconsequential than we realized at the time.",[48,854,855,856,859,860,863],{},"The first was ",[44,857,858],{},"compute-storage separation",". While other streaming\nsystems tightly coupled brokers to local disks, we built Pulsar around\n",[44,861,862],{},"Apache BookKeeper"," as an independent, distributed storage layer.\nBrokers were stateless. Storage was durable and decoupled. In 2012, this\nwas an unconventional bet. Today, it's a foundational design pattern\nacross virtually every large-scale data system --- from cloud data\nwarehouses to modern streaming platforms. We didn't predict the future;\nwe just kept following the engineering logic until it led somewhere\ninteresting.",[48,865,866],{},[384,867],{"alt":18,"src":868},"\u002Fimgs\u002Fblogs\u002Ffrom-streams-to-lakestreams-image3.png",[48,870,871],{},[36,872,873],{},"Figure 1. From Monolith to Compute\u002FStorage Separation",[48,875,876,877,880],{},"The second was ",[44,878,879],{},"messaging semantics",". We embedded rich subscription\nmodels --- including shared subscriptions --- directly into the client\nprotocol from the beginning, because real enterprise workloads demanded\nthem. Kafka added shared subscriptions more than a decade later. We're\nnot pointing this out to score points; we're pointing it out because it\nreflects something important about what happens when you design for the\nfull complexity of enterprise use cases from day one, rather than\noptimizing narrowly and retrofitting later.",[48,882,883],{},"Getting these things right taught us something, too: good architecture\ncreates options. And the options Pulsar's design left open would matter\nmore than we initially expected.",[40,885,887],{"id":886},"from-managed-service-to-market-reality","From Managed Service to Market Reality",[48,889,890,891,894],{},"When we founded StreamNative, we brought Pulsar to the cloud as a\n",[44,892,893],{},"fully managed service",". Enterprises adopted it quickly for their most\ndemanding workloads --- financial transaction processing, IoT telemetry\nat scale, real-time fraud detection. Pulsar's multi-tenancy,\ngeo-replication, and unified messaging model made it a natural fit for\nuse cases where reliability and operational isolation aren't optional.",[48,896,897],{},"But as our customer base grew, a pattern emerged --- and it was\nremarkably consistent.",[48,899,900],{},"Most enterprises were running a split world. They used StreamNative for\nmission-critical messaging and queuing --- the workloads where\ntransactional guarantees and tenant isolation matter most. But their\ndata streaming and ingestion pipelines ran on Kafka. Not because Kafka\nwas architecturally superior for those use cases, but because the Kafka\nprotocol had become the industry's lingua franca. Every connector,\nevery SaaS tool, every cloud service spoke Kafka. Switching meant\nrewriting integrations, not just swapping infrastructure.",[48,902,903,904,907,908,911],{},"So we made a pragmatic call: we made StreamNative Kafka-compatible.\nFirst came ",[44,905,906],{},"KoP (Kafka-on-Pulsar)",", an open-source protocol handler\nletting Pulsar brokers speak the Kafka wire protocol natively. Then came\n",[44,909,910],{},"KSN (Kafka-on-StreamNative)"," --- a more deeply integrated,\nproduction-hardened compatibility layer built for enterprise scale.",[48,913,914],{},"This worked. Customers could consolidate their Kafka workloads onto our\nplatform without changing application code. But it taught us our first\ncrucial lesson:",[916,917,918],"blockquote",{},[48,919,920,923],{},[44,921,922],{},"Lesson 1:"," Protocol compatibility is table stakes, not a\ndifferentiator.",[48,925,926],{},"By the time we shipped KSN, every major streaming vendor was racing\ntoward Kafka compatibility --- including Confluent itself, which\nacquired WarpStream, a Kafka-compatible alternative. The protocol\nwasn't a moat anymore; it was becoming commodity infrastructure. If we\nwere going to build something with lasting value, it had to live\nsomewhere deeper than the wire protocol.",[48,928,929],{},"That realization forced a harder question: if the protocol layer was\nbeing commoditized, what actually mattered?",[40,931,933],{"id":932},"rethinking-storage-and-the-moment-everything-clicked","Rethinking Storage -- and the Moment Everything Clicked",[48,935,936],{},"By 2023, we were operating large-scale streaming clusters across three\nmajor cloud providers, and we were seeing the same problem everywhere\n--- regardless of industry, workload, or team size.",[48,938,939,946],{},[44,940,941,190],{},[55,942,945],{"href":943,"rel":944},"https:\u002F\u002Fstreamnative.io\u002Fblog\u002Fa-guide-to-evaluating-the-infrastructure-costs-of-apache-pulsar-and-apache-kafka",[264],"Cross-AZ replication costs were eating 60-90% of infrastructure\nbudgets","\nThe root cause was structural. Kafka's leader-per-partition\narchitecture --- and most streaming systems like it --- requires every\nwrite to be replicated from a leader broker to follower brokers,\ntypically across availability zones. In on-premises environments,\nthat's an architectural inconvenience. In the cloud, where cross-AZ\ndata transfer is metered, it becomes the single largest line item on the\nbill. We watched operational teams spending more cycles managing\ninfrastructure costs than shipping products. Something was fundamentally\nbroken about the economic model.",[48,948,949],{},"The rest of the industry was attacking this at the broker layer. Some\nwere rewriting streaming engines in C++ for raw throughput gains. Others\nwere offloading cold log segments to S3 as tiered storage, or adding\nobject storage as a secondary backend. These were legitimate engineering\nefforts --- but they were addressing symptoms rather than the underlying\ncause. They were making an expensive architecture incrementally more\nefficient, without stepping back to ask whether the architecture itself\nneeded rethinking.",[48,951,952],{},"We asked a different question.",[48,954,955],{},[44,956,957],{},"\"What if streaming data didn't need its own storage format at all?\nWhat if it could live natively in the lakehouse?\"",[48,959,960],{},"This question changed everything.",[48,962,963,964,967,968,971],{},"We built a new storage foundation from the ground up -- ",[44,965,966],{},"leaderless,\ndiskless, writing directly to object storage in open lakehouse\nformats",". Instead of Kafka's local log segments or Pulsar's\nBookKeeper ledgers, data went straight to S3, GCS, or Azure Blob Storage\nas Parquet files in Apache Iceberg or Delta Lake format. A distributed\nwrite-ahead log (WAL) handled the low-latency append path, ensuring\nproducers got sub-second acknowledgments. But the durable, queryable\ndata wasn't waiting to be exported or transformed into a lakehouse\nformat downstream --- it ",[36,969,970],{},"was"," a lakehouse table from the moment it was\ncommitted.",[48,973,974],{},"This wasn't tiered storage. It wasn't an export pipeline. It was a\nfundamental reconception of where streaming data lives --- and what it\ncan do from the moment it arrives.",[48,976,977],{},[384,978],{"alt":18,"src":979},"\u002Fimgs\u002Fblogs\u002Ffrom-streams-to-lakestreams-image4.png",[48,981,982],{},[36,983,984],{},"Figure 2. Ursa Architecture",[40,986,988],{"id":987},"the-results-and-the-surprise-they-revealed","The Results --- and the Surprise They Revealed",[48,990,991],{},"The architectural payoff was immediate and measurable. No proprietary\nsegment format. No inter-broker replication. No cross-AZ data transfer\nfor durability --- that responsibility shifted to the object store\nitself, which delivers eleven-nines durability at a fraction of the cost\nof traditional streaming infrastructure.",[48,993,994,995,998,999,1002],{},"The economics were stark: ",[44,996,997],{},"up to 95% cost reduction at 5 GB\u002Fs sustained\nthroughput",". We published the benchmark and the architecture openly.\nThe work was subsequently recognized with a ",[44,1000,1001],{},"Best Industry Paper award\nat VLDB 2025"," --- one of the most respected academic venues in data\nmanagement --- selected over submissions from Databricks, Meta, and\nAlibaba. We share this not to collect trophies, but because independent\nvalidation from the research community matters when you're asking the\nindustry to rethink a foundational assumption.",[48,1004,1005],{},"But the cost savings, as dramatic as they were, turned out to be the\nleast interesting part of what we'd built.",[48,1007,1008],{},"As we deployed this new storage layer with customers, something\nunexpected kept happening. Engineers would produce data to a Kafka topic\n--- same client code, same producers, same workflows they'd always\nused. Then they'd open Spark, Snowflake, or Databricks, and discover\nthey could query that exact data. No Kafka Connect. No sink connector.\nNo materialization pipeline. No batch ETL window to wait out.",[48,1010,1011],{},"The data was already there. Already in Iceberg format. Already a table.",[48,1013,1014],{},[44,1015,1016,1019,1020],{},[36,1017,1018],{},"\"Wait,\""," they'd say. ",[36,1021,1022],{},"\"My Kafka topic IS a table?\"",[48,1024,1025],{},"Yes. That's exactly what it is.",[48,1027,1028],{},"That moment --- repeated across customer after customer --- is when we\nunderstood what we'd actually built. Not a cheaper streaming engine.\nNot a better Kafka. Something that dissolved the boundary between\nstreaming infrastructure and the lakehouse entirely. The storage layer\nwasn't a cost optimization with a happy side effect. It was a\nunification --- one that made a decades-old architectural divide simply\ndisappear.",[916,1030,1031],{},[48,1032,1033,1036],{},[44,1034,1035],{},"Lesson 2:"," The real breakthrough wasn't cost savings. It was\ndiscovering that streaming data and lakehouse data can be the same\nthing -- and that the bridge between them isn't a connector. It's\nthe storage layer itself.",[40,1038,1040],{"id":1039},"the-convergence-nobody-planned-and-what-the-industry-got-half-right","The Convergence Nobody Planned -- and What the Industry Got Half-Right",[48,1042,1043],{},"We weren't alone in recognizing the gap between streaming and the\nlakehouse. By the time we were deep in this problem, the entire industry\nwas trying to close it --- just from different directions, with\ndifferent foundational assumptions.",[48,1045,1046],{},"One camp built bridges. Confluent's Tableflow, Kafka Connect with\nIceberg sinks, and similar approaches treat streaming and the lakehouse\nas separate systems and materialize data between them. These solutions\nwork --- but they work by adding complexity: another pipeline to manage,\nanother failure mode to monitor, and an irreducible latency between when\ndata is produced and when it's queryable downstream.",[48,1048,1049],{},"Another camp added streaming capabilities to lakehouse platforms.\nDatabricks has Spark Structured Streaming and Delta Live Tables.\nSnowflake has Snowpipe Streaming and Dynamic Tables. These are genuinely\npowerful tools for analytics teams --- but streaming as an analytics\nfeature is categorically different from streaming as infrastructure. You\ncannot build a mission-critical messaging system, a financial\ntransaction backbone, or a real-time fraud detection pipeline on Spark\nStructured Streaming. The operational guarantees simply aren't there.",[48,1051,1052],{},"A third camp attempted something more ambitious: entirely new unified\narchitectures. Ververica's Streamhouse concept --- combining Apache\nFlink with Apache Paimon --- is a serious and thoughtful approach. But\nit asks organizations to adopt a new ecosystem wholesale, which means\nleaving Kafka compatibility, existing tooling, and years of operational\ninvestment behind.",[48,1054,1055],{},"Each approach solves part of the problem. None of them questions the\nassumption underneath it.",[48,1057,1058],{},"They all treat streaming and the lakehouse as fundamentally separate\nsystems that need to be connected --- and compete on how elegantly they\nbuild that connection.",[48,1060,1061],{},[44,1062,1063],{},"What if that assumption is wrong?",[48,1065,1066],{},"The historical parallel is hard to ignore. In 2020, Databricks\nintroduced the lakehouse concept with a deceptively simple insight: data\nwarehouses and data lakes didn't need to be separate systems connected\nby ETL pipelines. You could implement warehouse-grade capabilities ---\nACID transactions, schema enforcement, fine-grained governance ---\ndirectly on top of cheap, open-format lake storage. The lakehouse\ndidn't build a better bridge. It made the bridge unnecessary.",[48,1068,1069],{},"The streaming industry is standing at the same inflection point.",[48,1071,1072],{},"For years, we've accepted that real-time event streaming and analytical\ndata infrastructure are different systems with different storage\nformats, different operational models, and different teams responsible\nfor keeping them in sync. The connector ecosystem exists to paper over\nthat divide. But connectors are a symptom, not a solution --- evidence\nof an architectural boundary that perhaps shouldn't exist in the first\nplace.",[48,1074,1075],{},"We weren't setting out to write a manifesto. We were trying to fix\ninfrastructure costs. But the storage architecture we built kept\npointing toward the same conclusion: streaming and the lakehouse don't\nneed to be separate either.",[40,1077,1079],{"id":1078},"naming-what-we-built-lakestream","Naming What We Built: Lakestream",[48,1081,1082],{},"Looking back, the through-line was always there. Pulsar's\ncompute-storage separation. Kafka protocol compatibility.\nLakehouse-native storage. Each felt like a distinct product decision at\nthe time. In retrospect, they were pieces of the same architectural\nargument --- we just didn't have a name for it yet.",[48,1084,1085],{},"Now we do.",[48,1087,1088,1089,1091],{},"We call it ",[44,1090,842],{}," -- a lakehouse-native streaming architecture\nthat treats streams as first-class lakehouse primitives alongside\ntables. Not a streaming system that exports to the lakehouse. Not a\nlakehouse that ingests from streams. A unified foundation where the\ndistinction stops being meaningful.",[48,1093,1094],{},"Just as the lakehouse dissolved the boundary between data warehouses and\ndata lakes, Lakestream dissolves the boundary between data streaming and\nthe lakehouse. Not by building better bridges. By making the bridge\nunnecessary.",[916,1096,1097],{},[48,1098,1099,1102],{},[44,1100,1101],{},"Key Principle:"," Lakestream is NOT a replacement for the lakehouse.\nIt is an extension that augments the lakehouse with real-time\nstreaming capabilities -- adding streams as first-class primitives\nalongside tables.",[32,1104,1106],{"id":1105},"the-core-insight-push-interoperability-down-the-stack","The Core Insight: Push Interoperability Down the Stack",[48,1108,1109],{},"Most streaming systems today solve interoperability at the protocol\nlayer --- translating between Kafka, Pulsar, MQTT, and other protocols\nat the top of the stack. It's a reasonable approach, and it's where\nmost of the industry's engineering energy has gone. But it has a\nstructural consequence: every protocol becomes its own data silo, and\nconnecting them requires maintaining point-to-point translation at the\napplication layer indefinitely.",[48,1111,1112],{},"Lakestream takes a different approach. Rather than pushing\ninteroperability up to the protocol, we push it down --- to the storage\nand catalog layers, where it can be solved once and inherited by\neverything above it.",[48,1114,1115,1116,1119],{},"The result is an architectural property that's easy to state but hard\nto overstate: ",[44,1117,1118],{},"the protocol becomes a choice of interface, not a choice\nof data silo."," Write via Kafka. Consume via Pulsar. Query via SQL.\nSubscribe via MQTT. The data underneath is identical --- the same\nIceberg tables, the same catalog entries, the same durable objects in\nyour object store.",[48,1121,1122],{},"This is what makes Lakestream structurally different from compatibility\nlayers and connector ecosystems. Those approaches translate between\nsilos. Lakestream eliminates the silo.",[32,1124,1126],{"id":1125},"the-architecture","The Architecture",[48,1128,1129],{},"Lakestream is built on three layers -- each one a direct consequence of\nsomething we learned along the way:",[48,1131,1132],{},[384,1133],{"alt":18,"src":1134},"\u002Fimgs\u002Fblogs\u002Ffrom-streams-to-lakestreams-image2.png",[48,1136,1137],{},[36,1138,1139],{},"Figure 3. Lakestream Architecture",[48,1141,1142],{},[44,1143,1144],{},"1. The Data Layer: Cloud-Native Stream Storage",[48,1146,1147],{},"The foundation is lakehouse-native stream storage, and it's where the\neconomics of Lakestream begin.",[48,1149,1150,1151,1154,1155,1158],{},"A distributed write-ahead log handles real-time ingestion with\nsub-second producer acknowledgments. But rather than writing to\nproprietary broker-local storage, data is durably committed to object\nstorage --- S3, GCS, or Azure Blob --- as Parquet files organized as\n",[44,1152,1153],{},"Apache Iceberg"," or ",[44,1156,1157],{},"Delta Lake"," tables. The architecture is\nleaderless and diskless: any broker can serve any partition, and no\nlocal disks are required for durability.",[48,1160,1161,1162,1165],{},"The consequences are significant. Cross-AZ replication costs disappear\n--- durability is delegated to the object store, which provides\neleven-nines reliability at commodity pricing. The result is ",[44,1163,1164],{},"up to 95%\nlower infrastructure cost"," compared to traditional streaming\ndeployments at equivalent throughput.",[48,1167,1168],{},"But the deeper consequence isn't the cost. It's that streaming data is\nlakehouse data from the moment it's written --- no transformation, no\nexport, no pipeline in between.",[48,1170,1171],{},[44,1172,1173],{},"2. Metadata Layer: The Lakestream Catalog",[48,1175,1176],{},"If the data layer unifies how streams are stored, the catalog layer\nunifies how they're understood.",[48,1178,1179],{},"The Lakestream Catalog provides a single metadata plane for both streams\nand tables, organized around a three-level namespace ---\ncatalog.namespace.stream --- that will feel immediately familiar to\nanyone working in modern data platforms. Every stream has a\ncorresponding lakehouse table, and the catalog maintains that linkage\nautomatically. Producers don't need to think about it. Consumers don't\nneed to configure it. It's just there.",[48,1181,1182,1183,1186,1187,1190],{},"Critically, the Lakestream Catalog federates with the catalogs\norganizations already use --- ",[44,1184,1185],{},"Databricks Unity Catalog",", ",[44,1188,1189],{},"Snowflake\nHorizon Catalog",", and others. This means streams become discoverable in\nthe same metadata layer as batch tables, governed by the same policies,\nand visible to the same tools. Streaming data stops being invisible\ninfrastructure and starts being a first-class asset in your data\nplatform.",[48,1192,1193],{},[44,1194,1195],{},"3. Protocol Layer: Stateless Protocol Servers",[48,1197,1198],{},"The protocol layer is where Lakestream meets the world as it actually\nexists --- and where one of our hardest-learned lessons shaped the\ndesign most directly.",[48,1200,1201],{},"We've lived through what happens when an industry consolidates around a\nsingle protocol. Kafka's dominance brought enormous ecosystem benefits\n--- ubiquitous tooling, broad cloud integration, a generation of\nengineers who know it deeply. But it also meant the industry inherited\nKafka's limitations as fixed constraints. Messaging semantics that\nenterprises needed --- shared subscriptions, exclusive consumers,\nfailure queues --- simply weren't there. We built those capabilities\ninto Pulsar over a decade ago because real-world workloads demanded\nthem. Kafka added shared subscriptions years later, after the absence\nhad already forced countless teams into workarounds.",[48,1203,1204],{},"The lesson isn't that Kafka is wrong. It's that betting your entire\ndata architecture on a single protocol's roadmap is a structural risk\n--- one that compounds over time as your use cases grow beyond what that\nprotocol was originally designed to handle.",[48,1206,1207,1208],{},"Lakestream is built on the opposite principle: ",[44,1209,1210],{},"no single protocol owns\nthe architecture.",[48,1212,1213],{},"Kafka, Pulsar, REST, gRPC --- all implemented as stateless protocol\nservers writing to the same underlying storage layer. Your existing\nproducers and consumers work without code changes. Your existing\nconnectors and tooling work without reconfiguration. Adding support for\na new protocol means deploying a new stateless server --- not migrating\ndata, not redesigning pipelines, not waiting years for a standards\ncommittee to catch up to your use case.",[48,1215,1216],{},"This modularity is only possible because interoperability lives at the\nstorage layer, not the protocol layer. When the data underneath is\nprotocol-agnostic, the protocol above it becomes a genuine choice ---\nnot a lock-in decision made once and lived with indefinitely.",[48,1218,1219],{},"The protocol is an interface. The data belongs to everyone who needs it.\nAnd when the next protocol matters --- because it will --- you add it\nwithout touching anything underneath.",[40,1221,1223],{"id":1222},"what-this-changes","What This Changes",[48,1225,1226],{},"Lakestream isn't a product feature. It's an architectural shift ---\nand like most genuine architectural shifts, its implications extend well\nbeyond the layer where the change actually happens.",[48,1228,1229,1232],{},[44,1230,1231],{},"Stream-Table Duality"," is the most immediate consequence, and the one\nthat consistently surprises people when they see it for the first time.\nEvery stream is simultaneously a table. Produce to a Kafka topic and\nquery it from Spark, Snowflake, or Databricks --- not after a connector\nruns, not after a batch job completes, but immediately, because the data\nwas never anywhere else. The pipeline between streaming and analytics\ndoesn't get faster. It ceases to exist.",[48,1234,1235,1238],{},[44,1236,1237],{},"Governed Self-Service Streaming"," follows naturally from the catalog\nlayer. When streams live in the same metadata plane as batch tables,\nthey inherit the same access controls, the same audit trails, and the\nsame schema governance --- automatically. Data teams stop managing\nstreaming infrastructure as a separate operational concern and start\ntreating streams as first-class assets in the same platform they already\ngovern. This is what makes streaming accessible to the broader\norganization, not just the engineers who built the pipelines.",[48,1240,1241,1244],{},[44,1242,1243],{},"Multi-Protocol, Single Data"," means the protocol fragmentation that\nhas quietly balkanized data organizations for years simply stops. Write\nvia Kafka. Consume via Pulsar. Query via SQL. Subscribe via MQTT. The\ndata underneath is identical. Teams can use the interface that fits\ntheir workload rather than the one that fits the infrastructure they\ninherited.",[48,1246,1247,1250],{},[44,1248,1249],{},"Universal Linking"," replaces the brittle point-to-point connector\ntopology that most organizations have quietly accumulated over years of\ngrowth. Replicating data across clusters, regions, or systems happens\nthrough the shared storage and catalog layer --- not through a web of\nconnectors, each one a potential failure mode and a maintenance burden.\nThe architecture gets simpler as it scales, rather than more fragile.",[48,1252,1253,1256],{},[44,1254,1255],{},"Freedom to Evolve"," may be the most important long-term consequence,\nand the hardest to appreciate until you've been burned by protocol\nlock-in. By decoupling the protocol from the storage layer, Lakestream\nmakes the storage layer independently improvable. New compression\nschemes, new indexing strategies, new query optimizations --- none of\nthese require protocol changes, client updates, or application\nmigrations. The architecture can absorb innovation without disruption,\nwhich is a property that compounds in value over time.",[48,1258,1259],{},"Taken together, these aren't five separate benefits. They're five\nexpressions of the same underlying idea: when streaming and the\nlakehouse share a foundation, the constraints that have defined the\nstreaming category for a decade stop being constraints.",[48,1261,1262],{},[384,1263],{"alt":18,"src":1264},"\u002Fimgs\u002Fblogs\u002Ffrom-streams-to-lakestreams-image1.png",[48,1266,1267],{},[36,1268,1269],{},"Figure 4: Streaming Architecture Evolution: From Monolith to\nLakestream",[40,1271,1273],{"id":1272},"the-road-ahead","The Road Ahead",[48,1275,1276],{},"This week, we're moving from architecture to practice.",[48,1278,1279,1280,1283,1284,190],{},"We're launching ",[44,1281,1282],{},"Ursa for Kafka (UFK)"," --- a native Kafka service\nbuilt on the Lakestream foundation. And when we say native, we mean it\nprecisely: not Kafka-compatible, not a translation layer, but Apache\nKafka itself running on Lakestream's lakehouse-native stream storage.\nAny Kafka workload becomes lakehouse-native with zero code changes. No\nmigration. No reconfiguration. No compromise on the Kafka semantics your\napplications already depend on. We'll cover the full details in our ",[55,1285,1287],{"href":1286},"\u002Fblog\u002Fursa-for-kafka-native-apache-kafka-service-on-lakestream","companion post",[48,1289,1290],{},"We're also committed to open-sourcing Ursa and the core Lakestream\ncomponents in the coming months. We've thought carefully about this,\nand our conviction is straightforward: an architectural shift of this\nmagnitude belongs to the community, not to any single vendor. The\nlakehouse succeeded in part because its foundations --- Parquet,\nIceberg, Delta Lake --- were open and composable. We intend to build\nLakestream the same way.",[48,1292,1293],{},"Seven years ago, we thought we were building a better streaming\nplatform. And in the narrow sense, we were. But looking back at the full\narc --- Pulsar's compute-storage separation, the Kafka compatibility\nwork, the lakehouse-native storage breakthrough --- it's clear that\neach step wasn't a detour. It was the path.",[48,1295,1296],{},"We weren't building a better version of what already existed. We were\nworking, iteratively and sometimes without knowing it, toward something\nthe industry didn't have a name for yet.",[48,1298,1299],{},"Now it does.",[48,1301,1302],{},[44,1303,1304],{},"Lakestream.",[48,1306,1307],{},"If you're rethinking your streaming architecture --- or questioning\nassumptions you've held for years about where streaming ends and the\nlakehouse begins --- we'd like to think through it with you. The shift\nwe're describing isn't something one company builds alone. It's\nsomething the industry figures out together.",[48,1309,1310],{},"Let's get started!",{"title":18,"searchDepth":19,"depth":19,"links":1312},[1313,1314,1315,1316,1317,1318,1322,1323],{"id":845,"depth":19,"text":846},{"id":886,"depth":19,"text":887},{"id":932,"depth":19,"text":933},{"id":987,"depth":19,"text":988},{"id":1039,"depth":19,"text":1040},{"id":1078,"depth":19,"text":1079,"children":1319},[1320,1321],{"id":1105,"depth":279,"text":1106},{"id":1125,"depth":279,"text":1126},{"id":1222,"depth":19,"text":1223},{"id":1272,"depth":19,"text":1273},"How StreamNative's journey from Pulsar to lakehouse-native streaming crystallized into Lakestream — a new architecture where streams become first-class lakehouse primitives.","\u002Fimgs\u002Fblogs\u002Fblog-thumbnail-from-streams-to-lakestreams.png",{},{"title":804,"description":1324},"blog\u002Ffrom-streams-to-lakestreams",[799,800,1330,1331,1332],"Iceberg","Thought Leadership","Ursa","5vYmsiY_BPA57BpEaj607B3Q5vmMkygBgGa9XOsyDC4",{"id":1335,"title":1336,"authors":1337,"body":1339,"category":289,"createdAt":290,"date":6,"description":1748,"extension":8,"featured":294,"image":1749,"isDraft":294,"link":290,"meta":1750,"navigation":7,"order":296,"path":1751,"readingTime":290,"relatedResources":290,"seo":1752,"stem":1753,"tags":1754,"__hash__":1755},"blogs\u002Fblog\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka.md","StreamNative and StarTree Partner to Deliver Real-Time Analytics on Native Kafka",[311,1338],"Vivek Sinha",{"type":15,"value":1340,"toc":1737},[1341,1355,1362,1368,1371,1374,1396,1399,1405,1412,1417,1420,1438,1443,1446,1452,1459,1464,1467,1499,1504,1507,1513,1516,1521,1524,1529,1535,1538,1541,1568,1571,1577,1584,1587,1592,1595,1600,1603,1617,1626,1632,1635,1638,1655,1658,1662,1665,1725,1731,1734],[48,1342,1343,1344,758,1347,1354],{},"As organizations continue to adopt real-time data architectures, the\nability to seamlessly connect streaming platforms with real-time\nanalytics engines has become critical. Today, we are excited to announce\na ",[44,1345,1346],{},"native integration between StarTree Cloud and",[55,1348,1351],{"href":1349,"rel":1350},"https:\u002F\u002Fstreamnative.io\u002Fblog\u002Fursa-for-kafka-native-apache-kafka-service-on-lakestream",[264],[44,1352,1353],{},"StreamNative's\nNative Kafka\nservice",",\nenabling customers to easily build real-time analytics pipelines\ndirectly from Kafka topics in StreamNative Cloud.",[48,1356,1357,1358,1361],{},"This integration allows StarTree customers to connect directly to\nStreamNative Kafka and create ",[44,1359,1360],{},"live tables powered by streaming data",",\nmaking it easier than ever to transform event streams into operational\ninsights. And for organizations that also store historical or offline\nevent data in Apache Iceberg via StreamNative, StarTree can query that\noffline data directly --- delivering a unified analytics experience\nacross both real-time streams and the data lakehouse.",[40,1363,1365],{"id":1364},"why-this-integration-matters",[44,1366,1367],{},"Why this integration matters",[48,1369,1370],{},"Modern enterprises are increasingly building architectures where Kafka\nserves as the central nervous system for operational data. However,\nturning that streaming data into fast, queryable analytics often\nrequires complex ingestion pipelines and custom integrations.",[48,1372,1373],{},"The StreamNative and StarTree partnership simplifies this journey by\nproviding:",[321,1375,1376,1381,1386,1391],{},[324,1377,1378],{},[44,1379,1380],{},"Native connectivity between streaming and analytics platforms",[324,1382,1383],{},[44,1384,1385],{},"Simplified ingestion of Kafka topics into real-time analytics tables",[324,1387,1388],{},[44,1389,1390],{},"Reduced operational complexity for real-time data pipelines",[324,1392,1393],{},[44,1394,1395],{},"Faster time-to-insight for operational and AI use cases",[48,1397,1398],{},"By combining StreamNative's cloud-native Kafka service with StarTree's\nreal-time analytics platform, organizations can accelerate their\ntransition to truly real-time data platforms.",[40,1400,1402],{"id":1401},"native-streamnative-kafka-integration-in-startree-cloud",[44,1403,1404],{},"Native StreamNative Kafka integration in StarTree Cloud",[48,1406,1407,1408,1411],{},"StarTree has introduced StreamNative Kafka as a ",[44,1409,1410],{},"native data source\noption inside StarTree Cloud",", allowing customers to easily configure\nconnections and begin ingesting streaming data.",[48,1413,1414],{},[384,1415],{"alt":18,"src":1416},"\u002Fimgs\u002Fblogs\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka-image5.png",[48,1418,1419],{},"With just a few configuration steps, users can:",[321,1421,1422,1429,1432,1435],{},[324,1423,1424,1425,1428],{},"Select ",[44,1426,1427],{},"StreamNative Kafka"," as a data source directly from StarTree Cloud",[324,1430,1431],{},"Configure broker endpoints and security settings",[324,1433,1434],{},"Test connectivity within the StarTree interface",[324,1436,1437],{},"Create live tables directly from StreamNative Kafka topics",[48,1439,1440],{},[384,1441],{"alt":18,"src":1442},"\u002Fimgs\u002Fblogs\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka-image4.png",[48,1444,1445],{},"This streamlined experience allows data teams to operationalize\nstreaming data without building custom ingestion layers.",[40,1447,1449],{"id":1448},"build-online-tables-directly-from-streamnative-kafka-topics",[44,1450,1451],{},"Build online tables directly from StreamNative Kafka topics",[48,1453,1454,1455,1458],{},"Once connected, StarTree enables customers to create ",[44,1456,1457],{},"live tables","\nthat continuously ingest and index streaming data from StreamNative\nKafka topics.",[48,1460,1461],{},[384,1462],{"alt":18,"src":1463},"\u002Fimgs\u002Fblogs\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka-image3.png",[48,1465,1466],{},"This enables powerful real-time use cases such as:",[321,1468,1469,1475,1481,1487,1493],{},[324,1470,1471,1474],{},[44,1472,1473],{},"Real-time business dashboards"," -- Monitor KPIs as events happen",[324,1476,1477,1480],{},[44,1478,1479],{},"Customer behavior analytics"," -- Analyze engagement data instantly",[324,1482,1483,1486],{},[44,1484,1485],{},"Fraud and anomaly detection"," -- Identify issues in real time",[324,1488,1489,1492],{},[44,1490,1491],{},"AI and feature engineering pipelines"," -- Prepare streaming features for AI systems",[324,1494,1495,1498],{},[44,1496,1497],{},"Operational monitoring"," -- Track infrastructure and application events continuously",[48,1500,1501],{},[384,1502],{"alt":18,"src":1503},"\u002Fimgs\u002Fblogs\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka-image6.png",[48,1505,1506],{},"By enabling live ingestion, organizations can eliminate delays between\ndata generation and business insight.",[40,1508,1510],{"id":1509},"turning-real-time-streams-into-actionable-dashboards",[44,1511,1512],{},"Turning Real-Time Streams into Actionable Dashboards",[48,1514,1515],{},"A live table in StarTree Cloud that continuously ingests data from a\nStreamNative Kafka topic enables real-time analytics by making fresh\nstreaming data immediately queryable. This live table can be connected\nto Apache Superset as a datasource, allowing teams to build interactive\ndashboards that visualize up-to-the-second metrics, trends, and\noperational insights.",[48,1517,1518],{},[384,1519],{"alt":18,"src":1520},"\u002Fimgs\u002Fblogs\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka-image1.png",[48,1522,1523],{},"By combining StreamNative's real-time data streaming, StarTree's\nlow-latency OLAP capabilities, and Superset's rich visualization layer,\norganizations can create dashboards for use cases such as real-time\nmonitoring, anomaly detection, and business performance tracking without\nneeding batch data pipelines.",[48,1525,1526],{},[384,1527],{"alt":18,"src":1528},"\u002Fimgs\u002Fblogs\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka-image2.png",[40,1530,1532],{"id":1531},"built-for-streamnative-kafka",[44,1533,1534],{},"Built for StreamNative Kafka",[48,1536,1537],{},"StreamNative's Native Kafka service provides a modern Kafka experience\ndesigned for cloud environments, including elastic scalability,\nsimplified operations, and cost-efficient infrastructure.",[48,1539,1540],{},"Through this integration, customers benefit from:",[321,1542,1543,1548,1553,1558,1563],{},[324,1544,1545],{},[44,1546,1547],{},"Fully managed Kafka infrastructure",[324,1549,1550],{},[44,1551,1552],{},"Seamless integration with real-time analytics",[324,1554,1555],{},[44,1556,1557],{},"Reduced pipeline complexity",[324,1559,1560],{},[44,1561,1562],{},"Enterprise-grade security and governance",[324,1564,1565],{},[44,1566,1567],{},"Cloud-native scalability",[48,1569,1570],{},"Together, StreamNative and StarTree provide a production-ready\nfoundation for modern real-time data architectures.",[40,1572,1574],{"id":1573},"flexibility-with-kafka-on-streamnative-ksn",[44,1575,1576],{},"Flexibility with Kafka on StreamNative (KSN)",[48,1578,1579,1580,1583],{},"In addition to StreamNative Native Kafka, this integration can also\nsupport environments using ",[44,1581,1582],{},"Kafka on StreamNative (KSN)",", which\nprovides Kafka protocol compatibility on top of Apache Pulsar.",[48,1585,1586],{},"This gives customers the flexibility to choose their preferred\ndeployment model while maintaining a consistent analytics experience\nwith StarTree.",[48,1588,1589],{},[44,1590,1591],{},"Unified Real-Time and Historical Analytics with Apache Iceberg",[48,1593,1594],{},"Beyond streaming ingestion, StreamNative can also sink data to open\nlakehouse formats such as Apache Iceberg -- enabling durable,\ncost-efficient storage for historical event data. StarTree natively\nsupports querying this offline data through its External Table\ncapability, allowing organizations to run high-concurrency, low-latency\nanalytics across both streaming and historical datasets without\nduplicating data or building complex ingestion pipelines.",[48,1596,1597],{},[384,1598],{"alt":18,"src":1599},"\u002Fimgs\u002Fblogs\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka-image7.png",[48,1601,1602],{},"This means StarTree can simultaneously power:",[321,1604,1605,1611],{},[324,1606,1607,1610],{},[44,1608,1609],{},"Live tables"," --- continuously ingesting fresh events from StreamNative Kafka topics",[324,1612,1613,1616],{},[44,1614,1615],{},"External tables"," --- directly querying offline data stored in Iceberg tables written by StreamNative",[48,1618,1619,1620,1625],{},"By federating queries across both layers, organizations gain a complete\npicture from millisecond-fresh events to months of historical trends ---\nall through a single analytics platform. ",[55,1621,1624],{"href":1622,"rel":1623},"https:\u002F\u002Fstartree.ai\u002Fresources\u002Fsla-driven-analytics-on-iceberg\u002F",[264],"StarTree's ability to query\nIceberg data\ndirectly","\nis particularly valuable for use cases such as trend analysis,\ntime-series comparisons, machine learning feature backfills, and\ncompliance reporting, where combining real-time signals with historical\ncontext is essential.",[40,1627,1629],{"id":1628},"joint-value-for-customers",[44,1630,1631],{},"Joint value for customers",[48,1633,1634],{},"This partnership reflects a shared vision between StreamNative and\nStarTree to simplify real-time data architectures.",[48,1636,1637],{},"Together we enable:",[321,1639,1640,1645,1650],{},[324,1641,1642],{},[44,1643,1644],{},"StreamNative → Reliable real-time data streaming",[324,1646,1647],{},[44,1648,1649],{},"StarTree → High-performance real-time analytics",[324,1651,1652],{},[44,1653,1654],{},"Native integration → Faster time to production",[48,1656,1657],{},"Customers can now move from streaming ingestion to real-time analytics\nwith fewer moving parts and lower operational overhead.",[40,1659,1660],{"id":749},[44,1661,752],{},[48,1663,1664],{},"Customers can start using this integration today by:",[1666,1667,1668,1684,1687,1694,1697,1700,1703,1711,1718],"ol",{},[324,1669,1670,758,1672,1676,1677,773,1679,777,1681,190],{},[44,1671,757],{},[55,1673,1675],{"href":761,"rel":1674},[264],"Sign up for a free trial"," to experience how Kafka data from StreamNative can be seamlessly ingested into the Snowflake Horizon Catalog. Use promo code ",[44,1678,772],{},[44,1680,776],{},[44,1682,1683],{},"$1,000 in credits",[324,1685,1686],{},"Creating a StreamNative Kafka cluster",[324,1688,1689],{},[55,1690,1693],{"href":1691,"rel":1692},"https:\u002F\u002Fdocs.startree.ai\u002Fcorecapabilities\u002Fingestdata\u002Fdataportal\u002Fstreaming\u002Fstreamnative",[264],"Selecting StreamNative Kafka as a data source in StarTree Cloud",[324,1695,1696],{},"Configuring connection details",[324,1698,1699],{},"Creating live tables from Kafka topics",[324,1701,1702],{},"Building real-time dashboards and analytics applications",[324,1704,1705,1706],{},"Watch a ",[55,1707,1710],{"href":1708,"rel":1709},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=23vQhKMcIrU",[264],"demo of StreamNative Kafka service",[324,1712,1705,1713],{},[55,1714,1717],{"href":1715,"rel":1716},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0PmURurJv-8",[264],"demo of StreamNative's integration with StarTree",[324,1719,1720],{},[55,1721,1724],{"href":1722,"rel":1723},"https:\u002F\u002Fstreamnative.io\u002Fbook-a-demo",[264],"Contact StreamNative to schedule a demo",[40,1726,1728],{"id":1727},"whats-next",[44,1729,1730],{},"What's next",[48,1732,1733],{},"StreamNative continues to expand its ecosystem of technology partners\naround its Native Kafka service to help customers build complete\nreal-time data platforms spanning streaming, processing, storage, and\nanalytics.",[48,1735,1736],{},"The StarTree integration represents another step toward StreamNative's\nvision of providing an open, ecosystem-driven real-time data platform\nthat supports both Kafka and Pulsar workloads.",{"title":18,"searchDepth":19,"depth":19,"links":1738},[1739,1740,1741,1742,1743,1744,1745,1746,1747],{"id":1364,"depth":19,"text":1367},{"id":1401,"depth":19,"text":1404},{"id":1448,"depth":19,"text":1451},{"id":1509,"depth":19,"text":1512},{"id":1531,"depth":19,"text":1534},{"id":1573,"depth":19,"text":1576},{"id":1628,"depth":19,"text":1631},{"id":749,"depth":19,"text":752},{"id":1727,"depth":19,"text":1730},"StarTree Cloud now natively integrates with StreamNative's Kafka service, enabling real-time analytics pipelines directly from Kafka topics with no custom ingestion layers.","\u002Fimgs\u002Fblogs\u002Fblog-thumbnail-streamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka.png",{},"\u002Fblog\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka",{"title":1336,"description":1748},"blog\u002Fstreamnative-and-startree-partner-to-deliver-real-time-analytics-on-native-kafka",[799,303,800,302],"RWxnzckaGiBpcO0LCkzpoKLBjjswAha68TOcjy6iMJM",{"id":1757,"title":1758,"authors":1759,"body":1761,"category":289,"createdAt":290,"date":6,"description":2194,"extension":8,"featured":294,"image":2195,"isDraft":294,"link":290,"meta":2196,"navigation":7,"order":296,"path":2197,"readingTime":290,"relatedResources":290,"seo":2198,"stem":2199,"tags":2200,"__hash__":2201},"blogs\u002Fblog\u002Fstreamnative-biglake-integration.md","StreamNative Collaborates with Google Cloud to Integrate Kafka Service with BigLake metastore",[311,1760],"Jobin George",{"type":15,"value":1762,"toc":2178},[1763,1766,1780,1783,1792,1796,1799,1813,1816,1819,1823,1826,1831,1837,1854,1859,1862,1865,1879,1883,1886,1889,1921,1924,1930,1939,1942,1956,1959,1965,1968,1974,1977,1982,1996,2001,2004,2010,2026,2031,2036,2041,2046,2051,2054,2059,2062,2067,2070,2076,2079,2084,2087,2092,2095,2100,2103,2109,2112,2117,2120,2126,2144,2150,2153,2156],[48,1764,1765],{},"As organizations modernize their data platforms, the convergence of\nreal-time streaming and lakehouse architectures is becoming essential.\nEnterprises increasingly want the ability to move streaming data\ndirectly into governed lakehouse tables without complex pipelines,\nduplicate storage, or operational overhead.",[48,1767,1768,1769,1773,1774,1779],{},"Today, StreamNative is announcing that we've collaborated with Google\nCloud to integrate ",[55,1770,1772],{"href":1349,"rel":1771},[264],"StreamNative's Kafka service"," with ",[55,1775,1778],{"href":1776,"rel":1777},"https:\u002F\u002Fcloud.google.com\u002Fbiglake",[264],"BigLake\nmetastore",". The integration\nis now available in Private Preview.",[48,1781,1782],{},"With StreamNative's Kafka service powered by Ursa, organizations can\nmore seamlessly stream Kafka topics into Apache Iceberg tables managed\nby the BigLake, enabling a unified real-time data foundation for\nanalytics and AI workloads.",[48,1784,1785,1786,1791],{},"This integration enables StreamNative to bridge operational streaming\nsystems and analytical lakehouse platforms through open standards such\nas Apache Iceberg and the ",[55,1787,1790],{"href":1788,"rel":1789},"https:\u002F\u002Ficeberg.apache.org\u002Frest-catalog-spec\u002F",[264],"Iceberg REST\ncatalog",",\nsimplifying how enterprises build real-time lakehouse architectures.",[40,1793,1795],{"id":1794},"bringing-streaming-and-the-lakehouse-together","Bringing Streaming and the Lakehouse Together",[48,1797,1798],{},"Traditionally, Kafka data needed multiple connectors, ETL jobs, and\nbatch pipelines before it could be analyzed in a lakehouse. This\nintroduced:",[321,1800,1801,1804,1807,1810],{},[324,1802,1803],{},"Data duplication",[324,1805,1806],{},"Operational complexity",[324,1808,1809],{},"Pipeline latency",[324,1811,1812],{},"Governance fragmentation",[48,1814,1815],{},"StreamNative's lakehouse architecture eliminates these challenges by\nenabling direct streaming from Kafka topics into Iceberg tables\nregistered in BigLake metastore, simplifying both ingestion and\ngovernance.",[48,1817,1818],{},"As outlined in the architecture, StreamNative enables streaming topics\nto be materialized directly as Iceberg tables governed through BigLake\nmetastore services.",[40,1820,1822],{"id":1821},"how-the-integration-works","How the Integration Works",[48,1824,1825],{},"StreamNative Kafka service uses its lakehouse integration capabilities\nto stream topic data into Iceberg tables while leveraging BigLake\nmetastore for metadata management and governance.",[48,1827,1828],{},[384,1829],{"alt":18,"src":1830},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image2.png",[32,1832,1834],{"id":1833},"integration-flow",[44,1835,1836],{},"Integration Flow",[1666,1838,1839,1842,1845,1848,1851],{},[324,1840,1841],{},"Applications produce data into StreamNative Kafka topics",[324,1843,1844],{},"StreamNative Ursa writes data into Iceberg table format",[324,1846,1847],{},"Iceberg metadata is managed through BigLake metastore APIs",[324,1849,1850],{},"Data is stored in Google Cloud Storage",[324,1852,1853],{},"BigQuery and other analytics engines can query the data",[48,1855,1856],{},[384,1857],{"alt":18,"src":1858},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image12.png",[48,1860,1861],{},"This approach allows organizations to treat streaming data as\nimmediately queryable analytical datasets.",[48,1863,1864],{},"The integration supports:",[321,1866,1867,1870,1873,1876],{},[324,1868,1869],{},"Streaming topics as Iceberg tables",[324,1871,1872],{},"Native integration with BigLake metastore",[324,1874,1875],{},"Apache Iceberg REST catalog support",[324,1877,1878],{},"Development preview availability",[40,1880,1882],{"id":1881},"architecture-overview","Architecture Overview",[48,1884,1885],{},"StreamNative Kafka service uses Ursa's decoupled compute and storage\narchitecture to enable efficient lakehouse integration.",[48,1887,1888],{},"Key architectural capabilities include:",[321,1890,1891,1897,1903,1909,1915],{},[324,1892,1893,1896],{},[44,1894,1895],{},"Leaderless architecture -"," Reduces inter-zone data transfer\ncosts.",[324,1898,1899,1902],{},[44,1900,1901],{},"Object storage based persistence -"," Uses Google Cloud Storage for\ncost-efficient storage.",[324,1904,1905,1908],{},[44,1906,1907],{},"Compute-storage separation -"," Enables independent scaling of\nstreaming and storage workloads.",[324,1910,1911,1914],{},[44,1912,1913],{},"Native lakehouse streaming -"," Eliminates connector overhead.",[324,1916,1917,1920],{},[44,1918,1919],{},"Catalog integration -"," Reduces duplicate data copies through\nunified metadata.",[48,1922,1923],{},"These design principles allow organizations to significantly reduce\nKafka infrastructure costs while improving analytical readiness.",[40,1925,1927],{"id":1926},"built-on-streamnative-ursa-lakehouse-architecture",[44,1928,1929],{},"Built on StreamNative Ursa Lakehouse Architecture",[48,1931,1932,1933,1938],{},"This integration is powered by ",[55,1934,1936],{"href":335,"rel":1935},[264],[44,1937,1332],{},", StreamNative's cloud-native\nstorage engine designed to unify streaming and lakehouse architectures.",[48,1940,1941],{},"Ursa enables:",[321,1943,1944,1947,1950,1953],{},[324,1945,1946],{},"Direct streaming into Iceberg tables",[324,1948,1949],{},"Separation of compute and storage",[324,1951,1952],{},"Multi-protocol access (Kafka and Pulsar)",[324,1954,1955],{},"Multi-modal workloads (stream + table)",[48,1957,1958],{},"The architecture allows Kafka topics to become governed lakehouse\ndatasets with minimal operational overhead.",[40,1960,1962],{"id":1961},"streaming-topics-data-to-managed-iceberg-tables",[44,1963,1964],{},"Streaming Topics data to Managed Iceberg Tables",[48,1966,1967],{},"This section outlines the steps required to configure StreamNative's Kafka service to stream data into BigLake metastore's catalog as Iceberg tables.",[32,1969,1971],{"id":1970},"step-1-register-the-biglake-metastore-catalog-in-streamnative-cloud",[44,1972,1973],{},"Step 1: Register the BigLake metastore catalog in StreamNative Cloud",[48,1975,1976],{},"Register your BigLake metastore catalog in StreamNative Cloud to enable\ntopic-to-table streaming.",[48,1978,1979],{},[384,1980],{"alt":18,"src":1981},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image4.png",[321,1983,1984,1987,1990,1993],{},[324,1985,1986],{},"Navigate to Lakehouse Catalogs in StreamNative Cloud",[324,1988,1989],{},"Select Register Catalog and choose the BigLake metastore catalog",[324,1991,1992],{},"Provide Google Cloud project id, Warehouse name (Catalog name)",[324,1994,1995],{},"Validate connectivity to enable metadata discovery",[48,1997,1998],{},[384,1999],{"alt":18,"src":2000},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image7.png",[48,2002,2003],{},"Once registered, StreamNative can map Kafka topics to Iceberg tables\nwithin the selected catalog.",[32,2005,2007],{"id":2006},"step-2-create-a-kafka-cluster-in-streamnative-cloud",[44,2008,2009],{},"Step 2: Create a Kafka Cluster In StreamNative Cloud",[321,2011,2012,2015,2020],{},[324,2013,2014],{},"Create a Kafka cluster in StreamNative Cloud and associate it with\nthe registered catalog.",[324,2016,2017],{},[44,2018,2019],{},"Create a Kafka cluster",[324,2021,1424,2022,2025],{},[44,2023,2024],{},"Kafka Cluster"," to create a new cluster in StreamNative\nCloud",[48,2027,2028],{},[384,2029],{"alt":18,"src":2030},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image3.png",[321,2032,2033],{},[324,2034,2035],{},"Configure cluster by entering the details as shown below.",[48,2037,2038],{},[384,2039],{"alt":18,"src":2040},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image6.png",[321,2042,2043],{},[324,2044,2045],{},"Select the registered BigLake catalog during setup.",[48,2047,2048],{},[384,2049],{"alt":18,"src":2050},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image5.png",[48,2052,2053],{},"Note: While this example uses a StreamNative Kafka service Dedicated\ncluster, this lakehouse integration capability is also supported with\nStreamNative Pulsar clusters. Currently, this integration is supported\non Dedicated clusters in Public Preview, with support for Serverless and\nBYOC deployments planned for future releases.",[48,2055,2056],{},[44,2057,2058],{},"Credentials Vending Mode",[48,2060,2061],{},"Credential vending mode in BigLake metastore allows the metastore to\nsecurely provide temporary, scoped cloud credentials to authorized\ncompute engines instead of requiring long-lived storage keys.\nStreamNative's BigLake integration supports credential vending mode,\nenabling StreamNative Kafka service and Pulsar workloads to securely\naccess Iceberg tables managed by BigLake without directly managing\nGoogle Cloud Storage credentials.",[48,2063,2064],{},[384,2065],{"alt":18,"src":2066},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image1.png",[48,2068,2069],{},"By leveraging BigLake's IAM-based access controls and short-lived\ncredentials, StreamNative helps ensure secure, governed, and simplified\nauthentication while maintaining centralized data access policies.",[32,2071,2073],{"id":2072},"step-3-review-ingested-data-in-biglake-metastore-catalog",[44,2074,2075],{},"Step 3: Review ingested data in BigLake metastore catalog",[48,2077,2078],{},"Once configured, StreamNative automatically streams Kafka topic data as\nmanaged Iceberg tables in BigLake metastore catalog.",[48,2080,2081],{},[384,2082],{"alt":18,"src":2083},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image8.png",[48,2085,2086],{},"In BigLake metastore, catalog details define how StreamNative connects\nto the metastore, including the catalog name, catalog URI (Iceberg REST\nendpoint), and warehouse storage location. These settings allow\nStreamNative Cloud to register Kafka or Pulsar topic data as Iceberg\ntables with centralized governance and metadata management.",[48,2088,2089],{},[384,2090],{"alt":18,"src":2091},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image9.png",[48,2093,2094],{},"In BigLake metastore, a target catalog acts as the logical container\nwhere table metadata is registered and managed. When streaming data from\nStreamNative Cloud, topics from StreamNative Kafka service or Pulsar are\ncontinuously written to lakehouse storage in Apache Iceberg format and\nregistered within a specified BigLake catalog.",[48,2096,2097],{},[384,2098],{"alt":18,"src":2099},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image11.png",[48,2101,2102],{},"This targeted catalog organizes the ingested datasets into schemas and\ntables, enabling downstream analytics engines---such as BigQuery or\nSpark---to discover, query, and govern the streaming data using\nBigLake's centralized metadata, access controls, and governance\ncapabilities.",[32,2104,2106],{"id":2105},"step-4-query-iceberg-tables-in-biglake-metastore-catalog",[44,2107,2108],{},"Step 4: Query Iceberg tables in BigLake metastore catalog",[48,2110,2111],{},"Once streaming data from StreamNative Cloud is registered as Apache\nIceberg tables in the BigLake metastore catalog, it can be queried using\nApache Spark.",[48,2113,2114],{},[384,2115],{"alt":18,"src":2116},"\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-image10.png",[48,2118,2119],{},"By connecting Spark to the BigLake Iceberg REST catalog, users can run\nstandard SQL queries to analyze streaming datasets, enabling batch and\nreal-time analytics on continuously ingested data with consistent\ngovernance and metadata management.",[40,2121,2123],{"id":2122},"conclusion",[44,2124,2125],{},"Conclusion",[48,2127,2128,2129,2132,2133,2136,2137,2140,2141,190],{},"StreamNative's integration with ",[44,2130,2131],{},"BigLake metastore"," enables\norganizations to stream Kafka and Pulsar data into governed ",[44,2134,2135],{},"Apache\nIceberg"," tables for real-time and batch analytics. By combining\nStreamNative's real-time data platform with BigLake's centralized\ngovernance, teams can simplify data access, improve security, and\naccelerate lakehouse analytics using tools like ",[44,2138,2139],{},"Apache Spark"," and\n",[44,2142,2143],{},"BigQuery",[40,2145,2147],{"id":2146},"get-started",[44,2148,2149],{},"Get Started",[48,2151,2152],{},"Support for streaming topic data into BigLake metastore as Apache\nIceberg tables is now available in StreamNative Cloud through a Private\nPreview.",[48,2154,2155],{},"To learn more:",[321,2157,2158,2173],{},[324,2159,2160,758,2162,2165,2166,773,2168,777,2170,190],{},[44,2161,757],{},[55,2163,1675],{"href":761,"rel":2164},[264]," to\nexperience how Kafka data from StreamNative can be seamlessly\ningested into the Snowflake Horizon Catalog. Use promo code\n",[44,2167,772],{},[44,2169,776],{},[44,2171,2172],{},"$1,000\nin credits",[324,2174,1705,2175],{},[55,2176,1710],{"href":1708,"rel":2177},[264],{"title":18,"searchDepth":19,"depth":19,"links":2179},[2180,2181,2184,2185,2186,2192,2193],{"id":1794,"depth":19,"text":1795},{"id":1821,"depth":19,"text":1822,"children":2182},[2183],{"id":1833,"depth":279,"text":1836},{"id":1881,"depth":19,"text":1882},{"id":1926,"depth":19,"text":1929},{"id":1961,"depth":19,"text":1964,"children":2187},[2188,2189,2190,2191],{"id":1970,"depth":279,"text":1973},{"id":2006,"depth":279,"text":2009},{"id":2072,"depth":279,"text":2075},{"id":2105,"depth":279,"text":2108},{"id":2122,"depth":19,"text":2125},{"id":2146,"depth":19,"text":2149},"StreamNative announces a collaboration with Google Cloud to integrate its Kafka service with BigLake metastore, enabling organizations to stream Kafka topics directly into governed Apache Iceberg tables for real-time lakehouse analytics.","\u002Fimgs\u002Fblogs\u002Fstreamnative-collaborates-with-google-cloud-to-integrate-kafka-service-with-biglake-metastore-cover.png",{},"\u002Fblog\u002Fstreamnative-biglake-integration",{"title":1758,"description":2194},"blog\u002Fstreamnative-biglake-integration",[799,1330,800,303,1332],"uk7MuokIaJRlRZD6nqTq30dkjVU-hJXytA9kZ-G0Z5o",{"id":2203,"title":2204,"authors":2205,"body":2207,"category":289,"createdAt":290,"date":6,"description":2592,"extension":8,"featured":294,"image":2593,"isDraft":294,"link":290,"meta":2594,"navigation":7,"order":296,"path":2595,"readingTime":290,"relatedResources":290,"seo":2596,"stem":2597,"tags":2598,"__hash__":2600},"blogs\u002Fblog\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables.md","StreamNative Enables Real-Time Streaming to Unity Catalog Managed Tables",[311,2206],"Michelle Leon",{"type":15,"value":2208,"toc":2580},[2209,2212,2215,2229,2232,2235,2243,2252,2255,2261,2264,2267,2272,2275,2278,2304,2307,2313,2316,2319,2322,2333,2336,2340,2343,2348,2390,2393,2399,2402,2408,2411,2416,2440,2445,2448,2454,2457,2461,2466,2471,2474,2479,2482,2487,2490,2496,2499,2504,2515,2520,2523,2529,2532,2543,2546,2550,2557,2559],[48,2210,2211],{},"As enterprises accelerate their adoption of the lakehouse architecture,\nthe need to combine real-time streaming data with governed, AI-ready\ndata platforms has never been greater. Organizations increasingly rely\non platforms like Databricks to power analytics, machine learning, and\nAI workloads on top of open table formats such as Delta Lake.",[48,2213,2214],{},"Organizations often face challenges such as:",[321,2216,2217,2220,2223,2226],{},[324,2218,2219],{},"Manual lifecycle management of storage locations and table metadata",[324,2221,2222],{},"Lack of automatic optimization such as compaction and clustering",[324,2224,2225],{},"Governance gaps due to storage being managed outside Unity Catalog",[324,2227,2228],{},"Additional operational complexity to maintain performance and reliability",[48,2230,2231],{},"While external tables provide flexibility, they place the burden of\nstorage management, optimization, and governance coordination on\nplatform teams.",[48,2233,2234],{},"As enterprises move toward governed AI and analytics platforms, they\nneed a better approach --- one where streaming pipelines can directly\nwrite to managed, governed tables without operational overhead.",[48,2236,2237,2238,2242],{},"StreamNative has been steadily expanding its lakehouse integration\ncapabilities to support this evolution. On February 3, 2025,\nStreamNative announced support for streaming data into ",[55,2239,2241],{"href":412,"rel":2240},[264],"external Delta\ntables in Databricks Unity\nCatalog",",\nenabling customers to operationalize real-time data within their open\nlakehouse architecture.",[48,2244,2245,2246,2251],{},"Today, we are excited to announce the support for ",[55,2247,2250],{"href":2248,"rel":2249},"https:\u002F\u002Fwww.unitycatalog.io\u002Fblogs\u002Fintroducing-unity-catalog-managed-tables",[264],"Unity Catalog\nManaged\nTables","\nintegration alongside the launch of the new StreamNative Kafka service.\nAs part of StreamNative's Lakehouse integration, this new capability\nenables organizations to seamlessly stream real-time data from\nStreamNative Kafka into the Databricks Lakehouse with enhanced\ngovernance, simplified table management, and improved performance\noptimization.",[48,2253,2254],{},"This integration builds on the Delta Kernel SDK and introduces support\nfor catalog commits, which coordinate writes through Unity Catalog for\ncentralized governance and better performance, enabling direct streaming\ningestion into managed tables governed by Unity Catalog.",[40,2256,2258],{"id":2257},"whats-new-direct-streaming-into-unity-catalog-managed-tables",[44,2259,2260],{},"What's New: Direct Streaming into Unity Catalog Managed Tables",[48,2262,2263],{},"Unity Catalog serves as the central governance layer for the Databricks\nData Intelligence Platform, providing unified metadata management,\naccess control, lineage, and auditing across data and AI workloads.",[48,2265,2266],{},"With the introduction of managed tables, Databricks further simplifies\ndata operations by allowing Unity Catalog to manage the entire lifecycle\nof table storage, optimization, and maintenance.",[48,2268,2269],{},[384,2270],{"alt":18,"src":2271},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image1.png",[48,2273,2274],{},"Unlike external tables---where the data lifecycle is managed outside the\nplatform---managed tables allow Databricks to automatically optimize\ndata layout, performance, and storage costs while maintaining strong\ngovernance and interoperability across tools.",[48,2276,2277],{},"Key benefits of Unity Catalog managed tables include:",[321,2279,2280,2286,2292,2298],{},[324,2281,2282,2285],{},[44,2283,2284],{},"Automated storage and lifecycle management"," of Delta tables",[324,2287,2288,2291],{},[44,2289,2290],{},"Performance optimizations"," such as clustering, compaction, and statistics collection",[324,2293,2294,2297],{},[44,2295,2296],{},"Fine-grained governance and RBAC"," through Unity Catalog",[324,2299,2300,2303],{},[44,2301,2302],{},"Open interoperability"," with Delta Lake clients and external compute engines",[48,2305,2306],{},"These capabilities are why organizations standardize on Unity Catalog\nmanaged tables as their foundation for analytics and AI workloads.",[40,2308,2310],{"id":2309},"bringing-real-time-streaming-to-managed-tables",[44,2311,2312],{},"Bringing Real-Time Streaming to Managed Tables",[48,2314,2315],{},"With this new integration, StreamNative enables streaming pipelines to\ndirectly write into Unity Catalog managed tables.",[48,2317,2318],{},"The integration leverages the Delta Kernel SDK, an open interface that\nallows external systems to interact with Delta tables without requiring\nSpark. By supporting catalog-based commits, StreamNative can coordinate\nwrites through Unity Catalog, ensuring that all transactions are\nproperly governed and recorded.",[48,2320,2321],{},"This architecture allows organizations to combine:",[321,2323,2324,2327,2330],{},[324,2325,2326],{},"Real-time data streaming from Kafka or Pulsar",[324,2328,2329],{},"Open Delta Lake storage",[324,2331,2332],{},"Unified governance with Unity Catalog",[48,2334,2335],{},"The result is a fully governed pipeline from streaming ingestion to\nAI-ready lakehouse tables.",[40,2337,2338],{"id":1881},[44,2339,1882],{},[48,2341,2342],{},"The integration follows a streamlined architecture:",[48,2344,2345],{},[384,2346],{"alt":18,"src":2347},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image7.png",[1666,2349,2350,2360,2370,2380],{},[324,2351,2352,2355],{},[44,2353,2354],{},"Create Managed Tables",[321,2356,2357],{},[324,2358,2359],{},"Managed tables are provisioned in Databricks Unity Catalog, where Databricks controls the table lifecycle, metadata, storage locations, and governance policies through the metastore, catalog, and schema hierarchy.",[324,2361,2362,2365],{},[44,2363,2364],{},"Write Topics Data (Parquet) to Managed Storage Location",[321,2366,2367],{},[324,2368,2369],{},"StreamNative continuously ingests Kafka and Pulsar topic data and writes it in Parquet format into the cloud storage, handling batching, file sizing, and schema mapping for efficient downstream processing.",[324,2371,2372,2375],{},[44,2373,2374],{},"Commit Parquet Files to Delta Tables",[321,2376,2377],{},[324,2378,2379],{},"Using the Delta Kernel SDK and catalog-based commit APIs, StreamNative performs transactional commits to the Delta log, ensuring ACID guarantees, schema enforcement, and immediate table consistency for downstream query engines.",[324,2381,2382,2385],{},[44,2383,2384],{},"Query Managed Tables for Analytics and AI Workloads",[321,2386,2387],{},[324,2388,2389],{},"Once committed, the data becomes immediately available through Unity Catalog for querying via Databricks SQL, Spark, and AI\u002FML workloads, benefiting from centralized governance, fine-grained access controls, and performance optimizations such as data skipping and caching.",[48,2391,2392],{},"This architecture allows organizations to eliminate complex batch\npipelines and instead move toward continuous streaming ingestion into\nthe lakehouse.",[40,2394,2396],{"id":2395},"streaming-data-to-managed-tables",[44,2397,2398],{},"Streaming Data to Managed Tables",[48,2400,2401],{},"This section outlines the steps required to configure StreamNative's native Kafka service to stream data into Unity Catalog Managed tables.",[32,2403,2405],{"id":2404},"step-1-register-unity-catalog-in-streamnative-cloud",[44,2406,2407],{},"Step 1: Register Unity Catalog in StreamNative Cloud",[48,2409,2410],{},"Register your Databricks Unity Catalog in StreamNative Cloud to enable topic-to-table streaming.",[48,2412,2413],{},[384,2414],{"alt":18,"src":2415},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image2.png",[321,2417,2418,2420,2426,2429,2432,2435,2438],{},[324,2419,1986],{},[324,2421,2422,2423],{},"Select Register Catalog and choose ",[44,2424,2425],{},"Databricks Unity Catalog For Delta Lake",[324,2427,2428],{},"Provide workspace URL, catalog name",[324,2430,2431],{},"Enter authentication details (OAuth clientID \u002F secret or PAT token)",[324,2433,2434],{},"Grant catalog, schema, and table write permissions",[324,2436,2437],{},"Configure credentials in StreamNative",[324,2439,1995],{},[48,2441,2442],{},[384,2443],{"alt":18,"src":2444},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image4.png",[48,2446,2447],{},"Once registered, StreamNative can map Kafka topics to Delta tables within the selected catalog.",[32,2449,2451],{"id":2450},"step-2-create-a-native-kafka-cluster-in-streamnative-cloud",[44,2452,2453],{},"Step 2: Create a Native Kafka Cluster In StreamNative Cloud",[48,2455,2456],{},"Create a StreamNative Kafka cluster and associate it with the registered catalog.",[48,2458,2459],{},[44,2460,2019],{},[48,2462,1424,2463,2465],{},[44,2464,2024],{}," to create a new cluster in StreamNative Cloud",[48,2467,2468],{},[384,2469],{"alt":18,"src":2470},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image3.png",[48,2472,2473],{},"Configure the cluster by entering the details as shown below.",[48,2475,2476],{},[384,2477],{"alt":18,"src":2478},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image5.png",[48,2480,2481],{},"Select the registered Unity Catalog during setup",[48,2483,2484],{},[384,2485],{"alt":18,"src":2486},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image6.png",[48,2488,2489],{},"Note: While this example uses a StreamNative native Kafka Dedicated cluster, this lakehouse integration capability is also supported with StreamNative Pulsar clusters. Currently, this integration is supported on Dedicated clusters in Public Preview, with support for Serverless and BYOC deployments planned for future releases.",[32,2491,2493],{"id":2492},"step-3-validate-and-query-ingested-data",[44,2494,2495],{},"Step 3: Validate and query ingested data",[48,2497,2498],{},"Once configured, StreamNative automatically streams Kafka topic data into managed tables.",[48,2500,2501],{},[384,2502],{"alt":18,"src":2503},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image8.png",[321,2505,2506,2509,2512],{},[324,2507,2508],{},"Topic data is written as Parquet files",[324,2510,2511],{},"StreamNative performs Delta commits",[324,2513,2514],{},"Tables become immediately queryable in Databricks",[48,2516,2517],{},[384,2518],{"alt":18,"src":2519},"\u002Fimgs\u002Fblogs\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables-image9.png",[48,2521,2522],{},"You can then query the data using Databricks SQL or Spark.",[40,2524,2526],{"id":2525},"unified-streaming-and-lakehouse-architecture",[44,2527,2528],{},"Unified Streaming and Lakehouse Architecture",[48,2530,2531],{},"By combining StreamNative's real-time streaming platform with Databricks\nUnity Catalog managed tables, organizations can build a modern\narchitecture where:",[321,2533,2534,2537,2540],{},[324,2535,2536],{},"Streaming systems capture real-time operational data",[324,2538,2539],{},"Lakehouse tables store governed, analytics-ready datasets",[324,2541,2542],{},"AI and analytics engines operate on continuously updated data",[48,2544,2545],{},"This integration represents another step toward a fully unified\nstreaming and lakehouse ecosystem built on open technologies such as\nKafka, Pulsar, Delta Lake, and Unity Catalog.",[40,2547,2548],{"id":2146},[44,2549,2149],{},[48,2551,2552,2553,2556],{},"Support for streaming into ",[44,2554,2555],{},"Unity Catalog Managed Tables"," is now\navailable in StreamNative Cloud in Private Preview.",[48,2558,2155],{},[321,2560,2561,2575],{},[324,2562,2563,758,2565,2568,2569,773,2571,777,2573,190],{},[44,2564,757],{},[55,2566,1675],{"href":761,"rel":2567},[264]," to experience how Kafka data from StreamNative can be seamlessly ingested into the Unity Catalog. Use promo code ",[44,2570,772],{},[44,2572,776],{},[44,2574,1683],{},[324,2576,1705,2577],{},[55,2578,1710],{"href":1708,"rel":2579},[264],{"title":18,"searchDepth":19,"depth":19,"links":2581},[2582,2583,2584,2585,2590,2591],{"id":2257,"depth":19,"text":2260},{"id":2309,"depth":19,"text":2312},{"id":1881,"depth":19,"text":1882},{"id":2395,"depth":19,"text":2398,"children":2586},[2587,2588,2589],{"id":2404,"depth":279,"text":2407},{"id":2450,"depth":279,"text":2453},{"id":2492,"depth":279,"text":2495},{"id":2525,"depth":19,"text":2528},{"id":2146,"depth":19,"text":2149},"StreamNative now supports streaming directly into Unity Catalog Managed Tables, enabling governed, AI-ready data pipelines from Kafka or Pulsar into the Databricks Lakehouse.","\u002Fimgs\u002Fblogs\u002Fblog-thumbnail-streamnative-enables-real-time-streaming-to-unity-catalog-managed-tables.png",{},"\u002Fblog\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables",{"title":2204,"description":2592},"blog\u002Fstreamnative-enables-real-time-streaming-to-unity-catalog-managed-tables",[799,800,2599,303],"Databricks","0klqsgGbWYO4Tbzbmmlj3mhKOyAAgPGppXSDhMcL5dA",{"id":2602,"title":2603,"authors":2604,"body":2605,"category":1332,"createdAt":290,"date":6,"description":2997,"extension":8,"featured":7,"image":2998,"isDraft":294,"link":290,"meta":2999,"navigation":7,"order":296,"path":1286,"readingTime":290,"relatedResources":290,"seo":3000,"stem":3001,"tags":3002,"__hash__":3003},"blogs\u002Fblog\u002Fursa-for-kafka-native-apache-kafka-service-on-lakestream.md","Ursa For Kafka: Native Apache Kafka Service on Lakestream",[808,809,810,311],{"type":15,"value":2606,"toc":2981},[2607,2619,2632,2636,2639,2655,2658,2666,2671,2676,2680,2686,2694,2698,2701,2727,2730,2735,2740,2744,2782,2786,2793,2798,2801,2804,2811,2814,2818,2822,2825,2832,2836,2843,2850,2854,2857,2878,2882,2885,2888,2891,2898,2930,2939,2943,2949,2954,2959,2969,2972,2975,2978],[48,2608,2609,2610,2612,2613,2618],{},"In our ",[55,2611,1287],{"href":9},", we shared the story of how our journey -- from\nPulsar to lakehouse-native streaming -- crystallized into a new\narchitectural paradigm we call\n",[55,2614,2616],{"href":838,"rel":2615},[264],[44,2617,842],{},".\nToday, we're showing what Lakestream looks like in practice - We are\nexcited to announce a Native Kafka service enters Limited Public Preview\ntogether with a set of partners in the broader Kafka & Lakehouse\necosystem.",[48,2620,2621,2631],{},[55,2622,2625],{"href":2623,"rel":2624},"https:\u002F\u002Fstreamnative.io\u002Fdata-streaming\u002Fkafka",[264],[44,2626,2627],{},[2628,2629,2630],"span",{},"Ursa For Kafka\n(UFK)"," is\nnative Apache Kafka -- not compatible, native -- running on\nLakestream's lakehouse-native foundation. Every Kafka topic is a\nlakehouse table. No connectors. No ETL. Just Kafka, reimagined for the\nlakehouse era.",[40,2633,2635],{"id":2634},"why-we-built-ufk","Why We Built UFK",[48,2637,2638],{},"Lakestream's core insight is that interoperability belongs at the\nstorage and catalog layers, not the protocol layer. If that's true,\nthen any streaming protocol should be able to plug into Lakestream's\nfoundation -- and work better for it.",[48,2640,2641,2642,2654],{},"We'd already proven this with Kafka-compatible access to our\nlakehouse-native storage. It worked -- customers ran Kafka workloads at\nup to 95% lower cost, and we won the ",[55,2643,2646,2651],{"href":2644,"rel":2645},"https:\u002F\u002Fstreamnative.io\u002Fblog\u002Fursa-wins-vldb-2025-best-industry-paper-the-first-lakehouse-native-streaming-engine-for-kafka",[264],[44,2647,2648],{},[2628,2649,2650],{},"VLDB 2025 Best Industry\nPaper",[2628,2652,2653],{}," award,","\nvalidating the architecture at 5 GB\u002Fs sustained throughput. But\ncompatibility layers have limits. Edge cases in protocol behavior.\nFeatures that don't translate cleanly across the abstraction boundary.\nThe gap between \"compatible\" and \"native\" -- small in benchmarks,\nreal in production.",[48,2656,2657],{},"So we asked: why not make Kafka itself -- native Apache Kafka -- run\non Lakestream's foundation? Not Kafka-compatible. Not\nKafka-on-something. Apache Kafka with lakehouse-native storage\nunderneath.",[48,2659,2660,2661,2665],{},"That question became\n",[55,2662,2664],{"href":2623,"rel":2663},[264],"UFK",". And\nthe answer turned out to be remarkably elegant: take Apache Kafka 4.2,\nextend its local disk storage with Lakestream's storage layer, and let\nthe rest of the Kafka ecosystem work exactly as it always has. The\nprotocol stays the same. The storage becomes lakehouse-native. The data\nbecomes immediately available for analytics.",[48,2667,2668],{},[384,2669],{"alt":18,"src":2670},"\u002Fimgs\u002Fblogs\u002Fursa-for-kafka-native-apache-kafka-service-on-lakestream-image1.png",[48,2672,2673],{},[36,2674,2675],{},"Figure 1. Ursa For Kafka Architecture",[40,2677,2679],{"id":2678},"what-is-ursa-for-kafka","What Is Ursa For Kafka?",[48,2681,2682,2685],{},[44,2683,2684],{},"UFK is a native Apache Kafka fork -- built on Kafka 4.2+ -- that\nextends Kafka's local disk storage with Lakestream's lakehouse-native\nstorage foundation."," It's the lakehouse-native Kafka engine where\nevery topic is simultaneously a live event stream and an up-to-date\nlakehouse table.",[916,2687,2688],{},[48,2689,2690,2693],{},[44,2691,2692],{},"UFK extends Kafka with lakehouse-native storage -- it doesn't\nforce you to abandon what you have."," Mixed storage in one cluster:\nKeep some topics on traditional disk-based storage while moving others\nto lakehouse-native storage. No separate clusters needed. Dual\nprofiles: Support both cost-optimized (lakehouse-native, up to 95%\ncheaper) and latency-optimized (disk-based, single-digit ms) topic\nprofiles in the same cluster. Rollout at your own pace: Start with\nhigh-volume, latency-relaxed topics -- your biggest cost drivers.\nExpand to more topics when you're ready. No big bang migration\nrequired.",[32,2695,2697],{"id":2696},"how-it-works","How It Works",[48,2699,2700],{},"The architecture is straightforward -- and that's the point:",[321,2702,2703,2709,2715,2721],{},[324,2704,2705,2708],{},[44,2706,2707],{},"Kafka clients produce data"," through the standard Kafka protocol. Zero code changes. Every existing Kafka client, tool, and connector works.",[324,2710,2711,2714],{},[44,2712,2713],{},"Kafka brokers receive the data"," and write to Lakestream's storage layer: a distributed WAL (Write-Ahead Log) buffers writes for low-latency acknowledgment; object storage (S3, GCS, or Azure Blob) stores data durably in Parquet format; the Lakestream Catalog registers the data as Iceberg or Delta Lake table updates.",[324,2716,2717,2720],{},[44,2718,2719],{},"Kafka consumers read"," through the standard Kafka protocol -- exactly as they always have.",[324,2722,2723,2726],{},[44,2724,2725],{},"Analytics engines"," -- Spark, Trino, Snowflake, Databricks -- query the same data as tables. No connectors needed. The data is already in their native format.",[48,2728,2729],{},"The write path and read path for Kafka clients are unchanged. What's\ndifferent is what happens underneath: data lands in the lakehouse the\nmoment it's committed, not hours later via a batch pipeline.",[48,2731,2732],{},[384,2733],{"alt":18,"src":2734},"\u002Fimgs\u002Fblogs\u002Fursa-for-kafka-native-apache-kafka-service-on-lakestream-image3.png",[48,2736,2737],{},[36,2738,2739],{},"Figure 2. Stream-Table Duality",[32,2741,2743],{"id":2742},"key-capabilities","Key Capabilities",[321,2745,2746,2752,2758,2764,2770,2776],{},[324,2747,2748,2751],{},[44,2749,2750],{},"Native Kafka Protocol:"," Apache Kafka 4.2+ fork. Every Kafka client, tool, and connector works with zero modifications. This isn't a compatibility layer -- it IS Kafka.",[324,2753,2754,2757],{},[44,2755,2756],{},"Lakehouse-Native Storage:"," Every topic is stored as Iceberg or Delta Lake tables on object storage. Query your streaming data from any analytics engine that reads open table formats.",[324,2759,2760,2763],{},[44,2761,2762],{},"Leaderless Architecture:"," No leader elections. No partition rebalancing storms. Any broker can serve any partition. Brokers are stateless compute -- add or remove them like web servers.",[324,2765,2766,2769],{},[44,2767,2768],{},"Up to 95% Cost Reduction:"," Eliminates cross-AZ replication -- the single largest cost in cloud streaming infrastructure. Validated at 5 GB\u002Fs sustained throughput in production benchmarks.",[324,2771,2772,2775],{},[44,2773,2774],{},"Zero-Connector Lakehouse Integration:"," No Kafka Connect. No materialization pipelines. No sink connectors. Your Kafka topics ARE your lakehouse tables. Produce to a topic, query it from Snowflake.",[324,2777,2778,2781],{},[44,2779,2780],{},"Catalog Integrations:"," Works with Databricks Unity Catalog, Snowflake Open Catalog, and AWS S3 Tables out of the box. Your streaming data is governed by the same catalog as your batch data.",[40,2783,2785],{"id":2784},"ufk-proves-the-lakestream-vision","UFK Proves the Lakestream Vision",[48,2787,2788,2789,2792],{},"In our Lakestream post, we described the core insight: ",[44,2790,2791],{},"push\ninteroperability from the protocol layer down to storage and catalog.","\nIf the foundation is right, any protocol can become a lakehouse citizen.",[48,2794,2795],{},[44,2796,2797],{},"UFK is the clearest validation of this principle.",[48,2799,2800],{},"We didn't build a new protocol. We took the world's most popular\nstreaming protocol -- Apache Kafka, used by tens of thousands of\norganizations worldwide -- and plugged Lakestream underneath it. The\nprotocol stayed the same. The storage became lakehouse-native. The data\nbecame immediately available for analytics.",[48,2802,2803],{},"UFK didn't require reinventing Kafka. It just gave Kafka a Lakestream\nfoundation.",[48,2805,2806,2807,2810],{},"That's the beauty of the Lakestream approach: ",[44,2808,2809],{},"the value isn't in the\nprotocol. It's in the storage and catalog."," Once those layers are\nright, every protocol benefits. Kafka is the first -- and given its\ndominance, the most impactful. But the same foundation already powers\nPulsar workloads, and it's designed to support any protocol that needs\nlakehouse-native streaming.",[48,2812,2813],{},"Kafka is just the beginning.",[40,2815,2817],{"id":2816},"what-ufk-unlocks-for-your-organization","What UFK Unlocks for Your Organization",[32,2819,2821],{"id":2820},"for-data-engineers","For Data Engineers",[48,2823,2824],{},"Stop maintaining Kafka Connect pipelines to your lakehouse. With UFK,\nevery topic IS a lakehouse table. Your producer code doesn't change.\nYour consumer code doesn't change. But now your analytics team can\nquery the same data from Spark or Trino -- immediately, not after a\nbatch window.",[48,2826,2827,2828,2831],{},"No more ",[44,2829,2830],{},"\"connector jungle\""," -- the fragile web of sink connectors,\nschema converters, dead-letter queues, and materialization jobs that\nbreak at 2 AM and page you on weekends. The connector between streaming\nand the lakehouse is the storage layer itself. There's nothing to\nbreak.",[32,2833,2835],{"id":2834},"for-platform-teams","For Platform Teams",[48,2837,2838,2839,2842],{},"UFK's leaderless architecture means no more ",[44,2840,2841],{},"partition rebalancing\nstorms"," during scaling events. No more leader election cascades when a\nbroker goes down. No more careful capacity planning to ensure leaders\nare evenly distributed. Brokers are stateless compute -- add or remove\nthem like web servers behind a load balancer.",[48,2844,2845,2846,2849],{},"The cost impact is significant: ",[44,2847,2848],{},"up to 95% cost reduction"," on\nstreaming infrastructure, validated at 5 GB\u002Fs sustained throughput. The\nsavings come from eliminating cross-AZ replication -- the single\nlargest cost driver in cloud-deployed streaming. At that kind of\nthroughput, the difference between traditional Kafka and UFK is hundreds\nof thousands of dollars per month.",[32,2851,2853],{"id":2852},"for-data-and-analytics-teams","For Data and Analytics Teams",[48,2855,2856],{},"Real-time data in your lakehouse without waiting for batch ETL windows\nor connector lag. Query a Kafka topic as an Iceberg table from\nSnowflake, Databricks, or any engine that reads open table formats. The\ndata is fresh -- not hours old, not minutes old, but current to the\nlast committed write.",[48,2858,2859,2860,2865,2866,2869,2870,2877],{},"Governed by the same catalog as your batch tables -- ",[55,2861,2862],{"href":2595},[44,2863,2864],{},"Unity Catalog",",\n",[44,2867,2868],{},"Snowflake Horizon Catalog",", or ",[55,2871,2874],{"href":2872,"rel":2873},"https:\u002F\u002Faws.amazon.com\u002Fblogs\u002Fstorage\u002Fseamless-streaming-to-amazon-s3-tables-with-streamnative-ursa-engine\u002F",[264],[44,2875,2876],{},"AWS S3 Tables",". One set of access\npolicies. One audit trail. One source of truth. No more governance gaps\nwhere real-time data lives outside the catalog.",[32,2879,2881],{"id":2880},"for-architects","For Architects",[48,2883,2884],{},"One less system to operate. The streaming platform and the lakehouse\ningestion layer collapse into one. Instead of Kafka + Kafka Connect +\nschema converter + sink connector + monitoring for all of the above, you\nhave UFK. Simpler architecture. Fewer failure modes. Faster time to\ninsight.",[48,2886,2887],{},"The total cost of ownership story compounds: you save on streaming\ninfrastructure (leaderless, no cross-AZ replication), you save on\nconnector infrastructure (no Kafka Connect clusters to operate), and you\nsave on engineering time (no pipelines to maintain). The architecture is\nnot just cheaper -- it's fundamentally simpler.",[40,2889,2890],{"id":749},"Getting Started",[48,2892,2893,2894,2897],{},"UFK is available in ",[44,2895,2896],{},"Limited Public Preview"," starting April 7, 2026.",[321,2899,2900,2906,2912,2918,2924],{},[324,2901,2902,2905],{},[44,2903,2904],{},"Cloud support:"," Available on AWS and GCP today, with Azure expansion planned",[324,2907,2908,2911],{},[44,2909,2910],{},"Kafka compatibility:"," Works with existing Kafka clients version 0.9 and above -- zero code changes required",[324,2913,2914,2917],{},[44,2915,2916],{},"Table formats:"," Supports Apache Iceberg and Delta Lake",[324,2919,2920,2923],{},[44,2921,2922],{},"Catalog integrations:"," Databricks Unity Catalog, Snowflake Horizon Catalog, and AWS S3 Tables",[324,2925,2926,2929],{},[44,2927,2928],{},"Migration:"," Move from existing Kafka clusters via Universal Linking -- replicate topics from any Kafka deployment (Confluent, MSK, Redpanda, self-managed) into UFK with continuous synchronization",[48,2931,2932,2933,2938],{},"If you are operating some of the largest Kafka clusters in your industry\nand would like to explore having UFK enabled via in-place upgrade, we\nwould love to help and explore a partnership. ",[55,2934,2937],{"href":2935,"rel":2936},"https:\u002F\u002Fstreamnative.io\u002Fcontact",[264],"Talk to\nus"," -- we love to chat.",[40,2940,2942],{"id":2941},"connecting-the-arc","Connecting the Arc",[48,2944,2945,2946,190],{},"Every step of our journey -- from Pulsar's compute-storage separation\nto Kafka compatibility to lakehouse-native storage to UFK -- has been\nbuilding toward one vision: ",[44,2947,2948],{},"making streaming a first-class citizen of\nthe lakehouse",[48,2950,2951],{},[384,2952],{"alt":18,"src":2953},"\u002Fimgs\u002Fblogs\u002Fursa-for-kafka-native-apache-kafka-service-on-lakestream-image2.png",[48,2955,2956],{},[36,2957,2958],{},"Figure 3. From Apache Kafka to Lakestream Kafka",[48,2960,2961,2962,190],{},"That vision is\n",[55,2963,2965],{"href":838,"rel":2964},[264],[44,2966,2967],{},[2628,2968,842],{},[48,2970,2971],{},"UFK is the moment where that vision meets the Kafka world. Native Kafka.\nLakehouse-native storage. No compromises.",[48,2973,2974],{},"This is more than a product launch -- it's the first proof that the\nLakestream architecture delivers on its promise. Take the world's most\npopular streaming protocol, plug Lakestream underneath, and everything\njust works -- except now your streaming data is also your lakehouse\ndata.",[48,2976,2977],{},"This is what Lakestream looks like in practice. And we're just getting\nstarted.",[48,2979,2980],{},"Let's build the future -- together.",{"title":18,"searchDepth":19,"depth":19,"links":2982},[2983,2984,2988,2989,2995,2996],{"id":2634,"depth":19,"text":2635},{"id":2678,"depth":19,"text":2679,"children":2985},[2986,2987],{"id":2696,"depth":279,"text":2697},{"id":2742,"depth":279,"text":2743},{"id":2784,"depth":19,"text":2785},{"id":2816,"depth":19,"text":2817,"children":2990},[2991,2992,2993,2994],{"id":2820,"depth":279,"text":2821},{"id":2834,"depth":279,"text":2835},{"id":2852,"depth":279,"text":2853},{"id":2880,"depth":279,"text":2881},{"id":749,"depth":19,"text":2890},{"id":2941,"depth":19,"text":2942},"Ursa For Kafka (UFK) is a native Apache Kafka fork running on Lakestream's lakehouse-native storage — every topic is simultaneously a live event stream and an up-to-date lakehouse table.","\u002Fimgs\u002Fblogs\u002Fblog-thumbnail-ursa-for-kafka-native-apache-kafka-service-on-lakestream.png",{},{"title":2603,"description":2997},"blog\u002Fursa-for-kafka-native-apache-kafka-service-on-lakestream",[799,800,1330,1332],"4YM5IMzLRM56dXJpFSXXLDkKW2Vr2Whwhh13Jtmx5EU",{"id":3005,"title":3006,"authors":3007,"body":3008,"category":289,"createdAt":290,"date":3299,"description":3300,"extension":8,"featured":7,"image":3301,"isDraft":294,"link":290,"meta":3302,"navigation":7,"order":296,"path":3303,"readingTime":290,"relatedResources":290,"seo":3304,"stem":3305,"tags":3306,"__hash__":3307},"blogs\u002Fblog\u002Fwe-are-a-kafka-company-too.md","We Are a Kafka Company, Too",[806],{"type":15,"value":3009,"toc":3290},[3010,3016,3019,3022,3025,3033,3036,3040,3043,3046,3049,3052,3055,3059,3062,3065,3068,3071,3074,3080,3099,3102,3106,3112,3118,3123,3127,3130,3133,3140,3143,3146,3150,3153,3179,3182,3190,3194,3197,3203,3206,3211,3214,3217,3222,3225,3236,3240,3246,3259,3262,3282,3285],[48,3011,3012,3013],{},"Let me say something that might surprise you: ",[44,3014,3015],{},"We are a Kafka company,\ntoo.",[48,3017,3018],{},"Yes, us. StreamNative. Founded by the creators of Apache Pulsar. The\nPulsar company. The \"not Kafka\" company.",[48,3020,3021],{},"And no, this isn't an April Fools' joke. (Although the timing is...\nconvenient.)",[48,3023,3024],{},"The truth is, we've been on a journey that none of us fully anticipated\nwhen we started. That journey has gifted us a lot of experience that\nhelped form the following conviction: we should operate a native Kafka\nservice.",[48,3026,3027,3028,190],{},"What in 2019 may have sounded to us like defeat - offering a competitor\nsystem as a first-class product - no longer holds true. In 2026 the\nmajority of engineers no longer define Kafka as a system, they define it\nas a ",[36,3029,3030],{},[44,3031,3032],{},"protocol",[48,3034,3035],{},"Supporting another protocol seamlessly is an expression of a platform's\nstrength. Let me explain:",[40,3037,3039],{"id":3038},"the-power-of-open-protocols","The Power of Open Protocols",[48,3041,3042],{},"There's a pattern in data infrastructure that's worth understanding\nbefore we tell our story: when you integrate with an open protocol or\nformat, you inherit an entire ecosystem overnight.",[48,3044,3045],{},"Consider what happened with open table formats. When Apache Iceberg and\nDelta Lake emerged as open standards for lakehouse storage, something\nremarkable followed. Any system that wrote data in these formats ---\nregardless of who built it --- instantly became queryable from\nSnowflake, Databricks, Spark, Trino, and dozens of other analytics\nengines (growing by the day). No partnerships required. No custom\nconnectors. The format was the integration.",[48,3047,3048],{},"We experienced this firsthand ourselves. When our storage engine, Ursa,\nbegan writing directly to Iceberg and Delta Lake formats on object\nstorage, we didn't need to build connectors to every analytics\nplatform. Our streaming data was simply there --- immediately\ndiscoverable and queryable from any engine that reads open table\nformats. The open format did the work for us.",[48,3050,3051],{},"The same principle applies to streaming protocols. The Kafka wire\nprotocol has become the lingua franca of data streaming --- not just\nbecause of Apache Kafka itself, but because dozens of systems\ncollectively made the protocol ubiquitous. Every cloud service, every\nvendor, every connector, every monitoring tool, every tutorial speaks\nKafka. The protocol has become the TCP\u002FIP of streaming.",[48,3053,3054],{},"When you attach yourself to an open protocol, your product benefits\nautomatically. That insight --- validated by our experience with\nlakehouse formats and now with the Kafka protocol --- is the thread that\nruns through everything we're about to share.",[40,3056,3058],{"id":3057},"how-we-got-here","How We Got Here",[48,3060,3061],{},"We started StreamNative a few years ago with Apache Pulsar -- a system\nwe built at Yahoo to handle the massive scale of unified messaging and\ndata streaming. Pulsar had the right architecture from the start:\ncompute-storage separation, multi-tenancy, multi-protocol support. It\nwas designed for the cloud before \"cloud-native\" was a buzzword.",[48,3063,3064],{},"And enterprises loved it. They deployed Pulsar for their most\nmission-critical workloads -- the ones where downtime isn't an option\nand data loss is unthinkable.",[48,3066,3067],{},"But along the way, we noticed a pattern. A very consistent pattern.",[48,3069,3070],{},"Most of the world lives in a split world. Organizations run their\nmission-critical messaging and queuing workloads on Pulsar -- and their\ndata streaming pipelines on Kafka. For better or for worse, the Kafka\nprotocol had become the default --- not because Kafka was\narchitecturally superior for those use cases, but because the sheer\nweight of ecosystem adoption made it the path of least resistance. Every\nvendor, every tool, every connector, every tutorial -- Kafka.",[48,3072,3073],{},"We have a long history of making StreamNative Kafka-friendly and are no\nstrangers to the API. In 2020, we released KoP -- Kafka-on-Pulsar ---\nan open-source protocol handler that let Pulsar brokers speak the Kafka\nwire protocol. At the time, we weren't conceding anything --- we were\nhedging our bets, meeting users where they were. Then came KSN --\nKafka-on-StreamNative --- a more deeply integrated, production-grade\nKafka compatibility layer. Each iteration brought us closer to full\nKafka compatibility. Customers could bring their Kafka workloads to our\nplatform without changing a line of code.",[48,3075,3076,3077,3079],{},"Over time, the picture became clear. It wasn't that Kafka-the-system\nbeat Pulsar. It was that the Kafka ",[44,3078,3032],{}," had won --- propelled\nnot by Confluent nor the open source project alone, but by the dozens of\nKafka-compatible systems that collectively made the wire protocol the\nindustry standard.",[48,3081,3082,3083,3085,3086,3089,3090,3098],{},"Then we built ",[44,3084,1332],{}," -- a lakehouse-native streaming storage engine\nthat writes directly to object storage in open formats like Iceberg and\nDelta Lake. The engine didn't require local disk, didn't require leader\nelections and didn't incur any cross-AZ replication costs. It happened\nto be ",[36,3087,3088],{},"really, really good"," at running Kafka workloads -- up to 95%\ncheaper, in fact. Good enough to ",[55,3091,3093,3094,3097],{"href":2644,"rel":3092},[264],"win the ",[44,3095,3096],{},"VLDB 2025\nBest Industry Paper"," award",",\nbeating submissions from Databricks, Meta, Alibaba, and many others.",[48,3100,3101],{},"And then we did something nobody expected -- including us.",[40,3103,3105],{"id":3104},"introducing-ursa-for-kafka","Introducing Ursa For Kafka",[48,3107,3108,3109,190],{},"We took Apache Kafka 4.2, extended its storage layer with Ursa, and\nbuilt a native Kafka service - ",[44,3110,3111],{},"Ursa For Kafka (UFK)",[48,3113,3114,3115],{},"Not Kafka-compatible. Not Kafka-on-something. ",[44,3116,3117],{},"Native Kafka with its\nclassic storage engine, AND with an additional lakehouse engine\nunderneath.",[48,3119,3120],{},[384,3121],{"alt":18,"src":3122},"\u002Fimgs\u002Fblogs\u002Fwe-are-a-kafka-company-too-image1.png",[40,3124,3126],{"id":3125},"why-build-on-native-kafka","Why build on Native Kafka?",[48,3128,3129],{},"A natural question. Why build directly on the Apache Kafka codebase\ninstead of continuing to build protocol-compatible systems on top of\ndifferent storage engines?",[48,3131,3132],{},"Because we tried that --- and learned exactly why it doesn't work\nlong-term.",[48,3134,3135,3136,3139],{},"We spent years building Kafka protocol compatibility layers. Each\ngeneration got closer, but we kept hitting the same fundamental problem:\n",[44,3137,3138],{},"the Kafka wire protocol is not just a specification --- it's a living\nsystem with undocumented behaviors, implicit client-broker contracts,\nand edge cases that no spec captures",". Kafka has 89 live request\nfamilies today, and when you account for backwards-compatible versions,\nyou're looking at hundreds of protocol permutations to implement and\nmaintain. Every time upstream Kafka evolved, we had to catch up --- and\nevery new version meant new surprises hiding in the gaps between what\nthe protocol says and what clients actually expect. We'll share the full\ncatalog of war stories in an upcoming post, but the lesson was\nunambiguous: reimplementing the Kafka protocol is a treadmill with no\nfinish line.",[48,3141,3142],{},"Building directly on native Apache Kafka changes the equation entirely.\nUFK inherits perfect protocol compatibility by simply using the same\ncode. When upstream Kafka introduces new APIs, changes behavior or fixes\na bug, we inherit it (for free). No reverse-engineering, no guessing at\nundocumented semantics, no playing catch-up. Our extension is scoped to\nthe storage layer --- the Kafka protocol handling, client interactions,\nand API surface remain the real thing.",[48,3144,3145],{},"This is not a one-off April Fools experiment. It's a high-conviction\nstrategic bet. We are extending native Apache Kafka with a lakestream\nfoundation --- and we plan to open source this work and invest in it for\nthe long term.",[40,3147,3149],{"id":3148},"what-does-it-mean-in-practice","What does it mean in practice?",[48,3151,3152],{},"It's just Kafka with the potential for richer topic storage options.",[321,3154,3155,3161,3167,3173],{},[324,3156,3157,3160],{},[44,3158,3159],{},"Your Kafka topics are simultaneously lakehouse tables"," --\nbecause Ursa writes directly to Iceberg and Delta Lake on object\nstorage. No expensive SSDs. No connectors. No ETL. No duplicate\nstorage costs.",[324,3162,3163,3166],{},[44,3164,3165],{},"No cross-AZ replication costs"," -- because Ursa's leaderless\narchitecture eliminates the single largest cost driver in cloud\nstreaming.",[324,3168,3169,3172],{},[44,3170,3171],{},"No connectors to get data into your lakehouse"," -- because the\ndata is already there. Produce to a Kafka topic, query it as an\nIceberg table from Spark, Snowflake, or Databricks.",[324,3174,3175,3178],{},[44,3176,3177],{},"Your Kafka clients work with zero changes"," -- because it IS the\nliteral Kafka codebase. Every client, every tool, every connector\nyou already use just work",[48,3180,3181],{},"We didn't replace Kafka. We gave it a lakehouse foundation.",[916,3183,3184],{},[48,3185,3186,3189],{},[44,3187,3188],{},"UFK extends Kafka -- it doesn't replace it."," You don't have to\ngo all-in. Move some topics to lakehouse-native storage while keeping\nothers on traditional disk-based storage -- in the same cluster.\nSupport both cost-optimized (lakehouse-native, up to 95% cheaper) and\nlatency-optimized (disk-based, single-digit ms) topic profiles side by\nside. Move topics between profiles as your needs evolve. Roll out at\nyour own pace. Start with your highest-volume, latency-relaxed topics\n-- your biggest cost drivers. Expand when you're ready. No big bang\nmigration required.",[40,3191,3193],{"id":3192},"so-what-about-pulsar","So... What About Pulsar?",[48,3195,3196],{},"Fair question. If you've followed us for any length of time, Pulsar is\nprobably what you associate with StreamNative. And here's the honest\nanswer:",[48,3198,3199,3202],{},[44,3200,3201],{},"Pulsar isn't going anywhere --- it's central to who we are."," Pulsar\ncontinues to power the most mission-critical, business-impacting\nworkloads for customers worldwide. Organizations choose Pulsar for its\nmulti-tenancy, its ability to handle both point-to-point queuing and\nordered log streaming in a single platform, its decoupled architecture\nand its battle-tested reliability -- those workloads continue to run,\nand we continue to invest in them.",[48,3204,3205],{},"But here's what Pulsar taught us -- and this is the part that matters\nmost:",[48,3207,3208],{},[44,3209,3210],{},"The future of streaming isn't about which protocol wins. It's about\nwhat's underneath the protocol.",[48,3212,3213],{},"Pulsar showed us that compute-storage separation changes everything.\nThat decoupling the broker from its storage unlocks a fundamentally\ndifferent operational model --- one where brokers are stateless,\nelastic, and cheap. We've been operating stateless, storage-separated\nstreaming at scale for over seven years through Pulsar --- long before\n\"diskless Kafka\" became a trend. That operational experience is baked\ninto every layer of Ursa and UFK.",[48,3215,3216],{},"But Ursa taught us something else, too --- something even more powerful.\nWhen we built Ursa and moved streaming data directly into open lakehouse\nformats, we discovered that lakehouse-native storage doesn't just save\nmoney. It completely dissolves the boundary between streaming and\nanalytics, unifying it all into a single, queryable system.",[48,3218,3219,3221],{},[44,3220,3111],{}," will show that any streaming protocol -- even\nthe world's most popular one -- can benefit from that foundation.",[48,3223,3224],{},"Pulsar remains at the heart of StreamNative --- it's the foundation\nthat taught us everything about cloud-native streaming. But we are not a\nPulsar company. We're not a Kafka company either. There are no such\ncompanies anymore - what truly matters is what is behind the protocol.",[48,3226,3227,3228,3231,3232,3235],{},"Instead, we define ourselves as a ",[44,3229,3230],{},"Lakestream company",". We ship a\n",[44,3233,3234],{},"streaming-meets-lakehouse solution"," that now speaks both protocols\nnatively. UFK is how we bring our lakestream vision to the Kafka world.",[40,3237,3239],{"id":3238},"try-it","Try It",[48,3241,3242,3243,3245],{},"UFK will be available as a Native Kafka service on StreamNative Cloud.\nIt will enter ",[44,3244,2896],{}," soon.",[48,3247,3248,3249,3258],{},"Here's the deal: ",[44,3250,3251,3252,3257],{},"anyone who ",[55,3253,3256],{"href":3254,"rel":3255},"https:\u002F\u002Fstreamnative.io\u002Fnative-kafka-service-waitlist",[264],"enrolls\nnow","\nwill receive $1,000 in credits"," to use exclusively for Kafka clusters\non StreamNative Cloud. No strings attached. Just sign up, spin up a\nNative Kafka cluster, point your existing Kafka clients at it, and see\nwhat happens.",[48,3260,3261],{},"What you'll find:",[321,3263,3264,3270,3276],{},[324,3265,3266,3269],{},[44,3267,3268],{},"Your existing Kafka clients, tools, and workflows"," -- they all\njust work with best-in-class support. Zero code changes.",[324,3271,3272,3275],{},[44,3273,3274],{},"Your Kafka topics"," -- they're now also Iceberg and Delta Lake\ntables, queryable from Spark, Trino, Snowflake, and Databricks.",[324,3277,3278,3281],{},[44,3279,3280],{},"Your infrastructure bill, transformed"," -- it's about to get a\nlot smaller.",[48,3283,3284],{},"If you use Pulsar, we are a Pulsar company. If you use Kafka, we're a\nKafka company, too. Come see what that means.",[48,3286,3287],{},[36,3288,3289],{},"Next week, we'll share more stories behind UFK. Stay tuned.",{"title":18,"searchDepth":19,"depth":19,"links":3291},[3292,3293,3294,3295,3296,3297,3298],{"id":3038,"depth":19,"text":3039},{"id":3057,"depth":19,"text":3058},{"id":3104,"depth":19,"text":3105},{"id":3125,"depth":19,"text":3126},{"id":3148,"depth":19,"text":3149},{"id":3192,"depth":19,"text":3193},{"id":3238,"depth":19,"text":3239},"2026-04-01","StreamNative announces Ursa For Kafka (UFK), a native Kafka service built on Apache Kafka 4.2 extended with the Ursa lakehouse storage engine, offering up to 95% cost savings and seamless Iceberg\u002FDelta Lake integration.","\u002Fimgs\u002Fblogs\u002Fwe-are-a-kafka-company-too-cover.png",{},"\u002Fblog\u002Fwe-are-a-kafka-company-too",{"title":3006,"description":3300},"blog\u002Fwe-are-a-kafka-company-too",[799,1332,800,1330,1331],"dU5PaTNVO4RX2dIM2PrXM8h-FV8B23-KC0-863ls1jg",{"id":3309,"title":3310,"authors":3311,"body":3312,"category":3550,"createdAt":290,"date":3551,"description":3552,"extension":8,"featured":294,"image":3553,"isDraft":294,"link":290,"meta":3554,"navigation":7,"order":296,"path":3555,"readingTime":3556,"relatedResources":290,"seo":3557,"stem":3558,"tags":3559,"__hash__":3561},"blogs\u002Fblog\u002Fintroducing-api-key-v2-simplified-authentication-for-streamnative-cloud.md","Introducing API Key v2: Simplified Authentication for StreamNative Cloud",[311],{"type":15,"value":3313,"toc":3540},[3314,3317,3324,3328,3331,3334,3348,3351,3355,3362,3365,3368,3373,3381,3386,3397,3400,3404,3407,3413,3419,3425,3431,3435,3438,3441,3449,3452,3456,3459,3462,3465,3476,3479,3482,3485,3499,3503,3506,3520,3524,3527,3530,3534,3537],[48,3315,3316],{},"At StreamNative, we continuously invest in improving the developer and operator experience across our cloud platform. As organizations scale their real-time data platforms, managing access securely and efficiently becomes increasingly important.",[48,3318,3319,3320,3323],{},"Today, we are excited to introduce ",[44,3321,3322],{},"API Key v2",", a major improvement to authentication in StreamNative Cloud that simplifies credential management, strengthens security controls, and lays the foundation for future platform capabilities. API Key v2 will be available starting March 17.",[40,3325,3327],{"id":3326},"the-challenge-with-instance-scoped-api-keys","The challenge with instance-scoped API keys",[48,3329,3330],{},"Until now, StreamNative Cloud has supported API Key v1, where API keys are scoped at the instance level. While this model works well for smaller environments, it can become operationally complex as organizations scale across multiple instances and clusters.",[48,3332,3333],{},"Customers told us they wanted:",[321,3335,3336,3339,3342,3345],{},[324,3337,3338],{},"A simpler way to manage authentication across their organization",[324,3340,3341],{},"Fewer credentials to manage as their environments grow",[324,3343,3344],{},"Better alignment with RBAC-driven security models",[324,3346,3347],{},"Expanded API key usage beyond just Pulsar clusters",[48,3349,3350],{},"API Key v2 was designed to address these needs.",[40,3352,3354],{"id":3353},"introducing-api-key-v2-organization-level-authentication","Introducing API Key v2: Organization-level authentication",[48,3356,3357,3358,3361],{},"With API Key v2, API keys are no longer tied to a specific instance. Instead, they operate at the ",[44,3359,3360],{},"organization level",", significantly simplifying authentication management.",[48,3363,3364],{},"Authorization continues to be enforced through role-based access control (RBAC), ensuring users and services still have precise, fine-grained permissions without requiring multiple instance-specific keys.",[48,3366,3367],{},"API Key v2 also expands API key authentication beyond the data plane.",[48,3369,3370],{},[44,3371,3372],{},"Previously:",[321,3374,3375,3378],{},[324,3376,3377],{},"API keys could only be used for Pulsar cluster authentication",[324,3379,3380],{},"Accessing the StreamNative Cloud API (control plane) required OAuth",[48,3382,3383],{},[44,3384,3385],{},"With API Key v2:",[321,3387,3388,3391,3394],{},[324,3389,3390],{},"API keys can be used for both Pulsar clusters and Cloud APIs",[324,3392,3393],{},"This enables more consistent automation workflows",[324,3395,3396],{},"It simplifies integration with platform tooling",[48,3398,3399],{},"Support for API key authentication with snctl and the Terraform provider for Cloud API access will be introduced in an upcoming release.",[40,3401,3403],{"id":3402},"key-benefits-of-api-key-v2","Key benefits of API Key v2",[48,3405,3406],{},"API Key v2 delivers several important improvements:",[48,3408,3409,3412],{},[44,3410,3411],{},"Organization-level scope","\nManage authentication once at the organization level instead of per instance.",[48,3414,3415,3418],{},[44,3416,3417],{},"Simplified access management","\nCreate and manage fewer credentials while maintaining strong access controls.",[48,3420,3421,3424],{},[44,3422,3423],{},"Improved operational efficiency","\nReduce operational overhead when managing multiple clusters.",[48,3426,3427,3430],{},[44,3428,3429],{},"Foundation for future enhancements","\nEnables upcoming security, automation, and platform capabilities.",[40,3432,3434],{"id":3433},"what-this-means-for-new-organizations","What this means for new organizations",[48,3436,3437],{},"Organizations created after March 17 will automatically use API Key v2. Any Pulsar clusters created after this date will also use API Key v2 and will no longer require instance-scoped API keys.",[48,3439,3440],{},"It is important to note:",[321,3442,3443,3446],{},[324,3444,3445],{},"New API keys created under API Key v2 will not work with older Pulsar clusters until those clusters are upgraded.",[324,3447,3448],{},"Existing clusters can continue using their current API keys until they are upgraded.",[48,3450,3451],{},"This approach ensures a smooth transition while maintaining backward compatibility.",[40,3453,3455],{"id":3454},"what-this-means-for-existing-organizations","What this means for existing organizations",[48,3457,3458],{},"Existing organizations currently using API Key v1 will continue to operate without disruption. All existing API keys will remain fully functional.",[48,3460,3461],{},"Over the coming months, StreamNative will gradually upgrade existing organizations to API Key v2. Customers will receive advance communication before their organization is upgraded.",[48,3463,3464],{},"The upgrade process includes:",[321,3466,3467,3470,3473],{},[324,3468,3469],{},"Enabling API Key v2 at the organization level",[324,3471,3472],{},"Upgrading existing Pulsar clusters",[324,3474,3475],{},"Ensuring continuity of existing API keys",[48,3477,3478],{},"This process is handled by StreamNative engineering and requires no action from customers.",[48,3480,3481],{},"Customers who want to accelerate their upgrade timeline can contact StreamNative Support.",[48,3483,3484],{},"After your organization is upgraded:",[321,3486,3487,3490,3493,3496],{},[324,3488,3489],{},"New Pulsar clusters will automatically use API Key v2",[324,3491,3492],{},"Existing API keys will continue to work across all clusters",[324,3494,3495],{},"New API keys created after the upgrade will only work with upgraded clusters",[324,3497,3498],{},"The StreamNative Cloud console will show which API key version your organization is using",[40,3500,3502],{"id":3501},"migration-considerations","Migration considerations",[48,3504,3505],{},"When planning your transition to API Key v2, keep the following in mind:",[321,3507,3508,3511,3514,3517],{},[324,3509,3510],{},"Existing API keys will continue to function after migration",[324,3512,3513],{},"Migration can be completed without service disruption",[324,3515,3516],{},"New API keys created under API Key v2 will not work with clusters that have not yet been upgraded",[324,3518,3519],{},"Older clusters must be upgraded to work with the new API Key v2 credentials",[40,3521,3523],{"id":3522},"building-the-foundation-for-the-future","Building the foundation for the future",[48,3525,3526],{},"API Key v2 is an important step toward making StreamNative Cloud easier to operate at scale while strengthening security and improving automation capabilities. This enhancement reflects our ongoing commitment to simplifying platform operations while enabling enterprise-grade governance.",[48,3528,3529],{},"We will continue expanding authentication capabilities as part of our broader roadmap to support secure, AI-ready, and automation-driven data platforms.",[40,3531,3533],{"id":3532},"learn-more","Learn more",[48,3535,3536],{},"Learn more about API Key v2 authentication capabilities. If you are interested in upgrading or have questions, please contact StreamNative Support or your account team.",[48,3538,3539],{},"We appreciate your continued partnership and look forward to continuing to improve the security and usability of StreamNative Cloud.",{"title":18,"searchDepth":19,"depth":19,"links":3541},[3542,3543,3544,3545,3546,3547,3548,3549],{"id":3326,"depth":19,"text":3327},{"id":3353,"depth":19,"text":3354},{"id":3402,"depth":19,"text":3403},{"id":3433,"depth":19,"text":3434},{"id":3454,"depth":19,"text":3455},{"id":3501,"depth":19,"text":3502},{"id":3522,"depth":19,"text":3523},{"id":3532,"depth":19,"text":3533},"StreamNative Cloud","2026-03-17","StreamNative introduces API Key v2, a major improvement to authentication in StreamNative Cloud that simplifies credential management, strengthens security controls, and lays the foundation for future platform capabilities.","\u002Fimgs\u002Fblogs\u002Fapi-key-v2-blog-thumbnail.png",{},"\u002Fblog\u002Fintroducing-api-key-v2-simplified-authentication-for-streamnative-cloud","8 min read",{"title":3310,"description":3552},"blog\u002Fintroducing-api-key-v2-simplified-authentication-for-streamnative-cloud",[3550,3560],"API","d6T_z1vOg-mNN2UmaEiazOuRJP5PCH17HrqnGsske2M",{"id":3563,"title":3564,"authors":3565,"body":3566,"category":3550,"createdAt":290,"date":3980,"description":3981,"extension":8,"featured":294,"image":3982,"isDraft":294,"link":290,"meta":3983,"navigation":7,"order":296,"path":3984,"readingTime":3556,"relatedResources":290,"seo":3985,"stem":3986,"tags":3987,"__hash__":3990},"blogs\u002Fblog\u002Fannouncing-public-preview-of-the-streamnative-remote-mcp-server.md","Announcing Public Preview of the StreamNative Remote MCP Server",[311],{"type":15,"value":3567,"toc":3968},[3568,3571,3585,3592,3598,3608,3620,3627,3633,3640,3645,3649,3660,3665,3670,3684,3687,3709,3712,3715,3720,3726,3729,3734,3737,3743,3750,3764,3770,3773,3784,3790,3797,3800,3806,3813,3821,3824,3830,3837,3848,3855,3861,3864,3875,3881,3888,3899,3912,3923,3929,3932,3939,3959,3962],[48,3569,3570],{},"At StreamNative, we’re building toward a future where agentic AI systems can natively reason over, interact with, and operate real-time data infrastructure—safely, securely, and at scale.",[48,3572,3573,3574,3580,3581,3584],{},"As part of this journey, we introduced Agentic AI capabilities last year with the launch of our ",[55,3575,3577],{"href":3576},"\u002Fblog\u002Fintroducing-the-streamnative-mcp-server-connecting-streaming-data-to-ai-agents",[44,3578,3579],{},"Local MCP Server",", enabling AI agents to interact with StreamNative clusters through a standardized ",[44,3582,3583],{},"Model Context Protocol (MCP)"," interface.",[48,3586,3587,3588,3591],{},"Today, we’re taking the next step: the ",[44,3589,3590],{},"Public Preview of the StreamNative Remote MCP Server"," is now available out of the box in StreamNative Cloud.",[40,3593,3595],{"id":3594},"what-is-the-streamnative-remote-mcp-server",[44,3596,3597],{},"What Is the StreamNative Remote MCP Server?",[48,3599,3600,3601,3604,3605,190],{},"The ",[44,3602,3603],{},"Remote MCP Server"," exposes a secure, hosted MCP endpoint that allows AI agents and tools to interact with StreamNative clusters ",[44,3606,3607],{},"without requiring a locally deployed MCP process",[321,3609,3610,3615],{},[324,3611,3612,3614],{},[44,3613,3579],{},": Runs alongside your tooling or environment",[324,3616,3617,3619],{},[44,3618,3603],{},": Fully managed by StreamNative and accessible remotely via StreamNative Cloud",[48,3621,3622,3623,3626],{},"This dramatically lowers the barrier to integrating ",[44,3624,3625],{},"AI agents, copilots, and automation frameworks"," with StreamNative’s data streaming platform.",[40,3628,3630],{"id":3629},"managing-preview-features-in-streamnative-cloud",[44,3631,3632],{},"Managing Preview Features in StreamNative Cloud",[48,3634,3635,3636,3639],{},"We’ve introduced a new ",[44,3637,3638],{},"Preview Features"," section in StreamNative Cloud that gives users a simple, self-serve way to discover, enable, or disable preview capabilities directly from the console. This makes it easier to try new features at your own pace while maintaining control over what’s active in each environment. The screenshot below highlights how the StreamNative Remote MCP Server can be enabled directly from the Previews section with a single toggle.",[48,3641,3642],{},[384,3643],{"alt":18,"src":3644},"\u002Fimgs\u002Fblogs\u002F698b2f3494a17e215c0d36ca_7acec952.png",[40,3646,3648],{"id":3647},"connect-your-ai-tools","Connect Your AI Tools",[48,3650,3651,3652,3655,3656,3659],{},"Each StreamNative cluster exposes a ",[44,3653,3654],{},"Remote MCP endpoint"," along with a guided ",[44,3657,3658],{},"Connect Your AI Tool"," experience in the console. You can go from zero to connected in under a minute.",[48,3661,3662],{},[384,3663],{"alt":18,"src":3664},"\u002Fimgs\u002Fblogs\u002F698b2f3494a17e215c0d36c7_67526ebe.png",[48,3666,3667],{},[44,3668,3669],{},"1. Choose an authentication method",[321,3671,3672,3678],{},[324,3673,3674,3677],{},[44,3675,3676],{},"Interactive (OAuth 2.0)"," – Browser-based sign-in, best suited for development and experimentation",[324,3679,3680,3683],{},[44,3681,3682],{},"API Key"," – Service account–based authentication, ideal for automation and non-interactive workflows",[48,3685,3686],{},"**2. Select your AI tool or environment\n** StreamNative provides pre-generated commands for popular MCP-compatible tools, including:",[321,3688,3689,3694,3699,3704],{},[324,3690,3691],{},[44,3692,3693],{},"Claude Code",[324,3695,3696],{},[44,3697,3698],{},"Cursor",[324,3700,3701],{},[44,3702,3703],{},"VS Code",[324,3705,3706],{},[44,3707,3708],{},"cURL",[48,3710,3711],{},"**3. Copy and run the generated command\n**Based on your selected authentication method and tool, StreamNative generates a ready-to-run command that you can execute in your terminal to add the Remote MCP Server to your chosen environment.",[48,3713,3714],{},"Once connected, your AI tool can immediately start interacting with the StreamNative cluster through the MCP interface.",[48,3716,3717],{},[384,3718],{"alt":18,"src":3719},"\u002Fimgs\u002Fblogs\u002F698b2f3494a17e215c0d36cd_6d4694ef.png",[40,3721,3723],{"id":3722},"mcp-access-tool-permissions",[44,3724,3725],{},"MCP Access & Tool Permissions",[48,3727,3728],{},"Access Mode defines how much control the Remote MCP Server has over the cluster. Read-Only allows MCP clients to safely view metadata, topics, messages, and metrics without making changes, while Read\u002FWrite enables full operational access, including creating and managing topics, producing messages, and updating schemas.",[48,3730,3731],{},[384,3732],{"alt":18,"src":3733},"\u002Fimgs\u002Fblogs\u002F698b4c32011197da0a3b4573_5e87465b.png",[48,3735,3736],{},"Allowed Tools lets you precisely scope what MCP clients can do by enabling or disabling specific MCP tools (for example, topics, namespaces, tenants, or brokers), ensuring least-privilege access while supporting targeted agent workflows.",[40,3738,3740],{"id":3739},"what-can-you-do-with-the-remote-mcp-server",[44,3741,3742],{},"What Can You Do With the Remote MCP Server?",[48,3744,3745,3746,3749],{},"The Remote MCP Server supports ",[44,3747,3748],{},"most of the same cluster-level capabilities"," available in the Local MCP Server, enabling AI agents to:",[321,3751,3752,3755,3758,3761],{},[324,3753,3754],{},"Discover cluster metadata and configuration",[324,3756,3757],{},"Inspect topics, subscriptions, and schemas",[324,3759,3760],{},"Observe operational state and runtime details",[324,3762,3763],{},"Interact with cluster resources using MCP-standard tools",[48,3765,3766,3767],{},"***Example: ***",[36,3768,3769],{},"Ask your AI assistant “What topics have consumer lag?” or “Show me the schema for my orders topic” and get an immediate, contextual answer—no CLI commands, no dashboard clicks.",[48,3771,3772],{},"These capabilities make it possible to build AI-driven workflows such as:",[321,3774,3775,3778,3781],{},[324,3776,3777],{},"Conversational cluster exploration and diagnostics",[324,3779,3780],{},"Automated environment introspection for agents and copilots",[324,3782,3783],{},"Intelligent operational assistants for streaming platforms",[40,3785,3787],{"id":3786},"cluster-level-availability",[44,3788,3789],{},"Cluster-Level Availability",[48,3791,3792,3793,3796],{},"In this Public Preview, the Remote MCP Server is exposed at the ",[44,3794,3795],{},"cluster level",". Every StreamNative cluster—regardless of deployment type—provides its own MCP endpoint. Supported deployment models include Serverless, Dedicated, and BYOC (Bring Your Own Cloud).",[48,3798,3799],{},"Each cluster acts as a well-defined MCP boundary, making it easy for agents to reason about and interact with the specific streaming environment they’re targeting. Organization-level support with cross-cluster visibility is on the roadmap.",[40,3801,3803],{"id":3802},"current-limitations",[44,3804,3805],{},"Current Limitations",[48,3807,3808,3809,3812],{},"To ensure safety and control during Public Preview, some ",[44,3810,3811],{},"destructive or high-risk operations"," are intentionally restricted, including actions such as:",[321,3814,3815,3818],{},[324,3816,3817],{},"Deleting clusters",[324,3819,3820],{},"Other privileged administrative operations",[48,3822,3823],{},"We’ll continue to evolve the supported surface area based on customer feedback and usage patterns.",[40,3825,3827],{"id":3826},"why-this-matters-for-agentic-ai",[44,3828,3829],{},"Why This Matters for Agentic AI",[48,3831,3832,3833,3836],{},"Agentic systems need ",[44,3834,3835],{},"reliable, real-time context"," about the systems they operate on. With the Remote MCP Server:",[321,3838,3839,3842,3845],{},[324,3840,3841],{},"AI agents no longer need direct infrastructure access",[324,3843,3844],{},"MCP endpoints are secure, managed, and standardized",[324,3846,3847],{},"StreamNative Cloud becomes natively “AI-addressable” — meaning AI tools can discover and interact with your streaming infrastructure through a standard protocol, just like they would with any other API",[48,3849,3850,3851,3854],{},"This aligns with our broader vision of ",[44,3852,3853],{},"StreamNative as an AI-ready streaming platform",", where real-time data, governance, and agentic workflows come together seamlessly.",[40,3856,3858],{"id":3857},"whats-coming-next",[44,3859,3860],{},"What’s Coming Next",[48,3862,3863],{},"Looking ahead, we plan to extend Remote MCP Server support to the organization level, enabling:",[321,3865,3866,3869,3872],{},[324,3867,3868],{},"Cross-cluster visibility for agents",[324,3870,3871],{},"Organization-wide governance and policy enforcement",[324,3873,3874],{},"Listings on the Databricks MCP Marketplace, Docker MCP Catalog, and other marketplace directories",[40,3876,3878],{"id":3877},"get-started-today",[44,3879,3880],{},"Get Started Today",[48,3882,3883,3884,3887],{},"The StreamNative Remote MCP Server is available now in ",[44,3885,3886],{},"Public Preview"," at no additional cost. Enable it from your StreamNative Cloud Console and connect your first AI tool in minutes.",[48,3889,3890,3891,3898],{},"**➤  **Watch the ",[55,3892,3895],{"href":3893,"rel":3894},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=LvrP8TItgC0",[264],[44,3896,3897],{},"demo video"," for a quick start",[48,3900,3901,3904,3905],{},[44,3902,3903],{},"➤  Enable it now"," in the ",[55,3906,3909],{"href":3907,"rel":3908},"https:\u002F\u002Fconsole.streamnative.cloud",[264],[44,3910,3911],{},"StreamNative Cloud Console",[48,3913,3914,3915,3922],{},"**➤  **",[55,3916,3919],{"href":3917,"rel":3918},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Forca-engine\u002Fsn-remote-mcp\u002Fremote-mcp-access",[264],[44,3920,3921],{},"Read the documentation"," for a full walkthrough",[48,3924,3925,3928],{},[44,3926,3927],{},"➤  Explore the open-source MCP server"," on GitHub",[48,3930,3931],{},"‍",[3933,3934,3936],"h4",{"id":3935},"️-want-to-see-it-in-action",[44,3937,3938],{},"🎙️  Want to see it in action?",[48,3940,3941,3942,3945,3946],{},"Join us on ",[44,3943,3944],{},"February 26"," for a live webinar where we’ll demo the full Remote MCP Server setup, walk through governance controls, and show real-world use cases with Claude Code and Cursor. ",[44,3947,3948],{},[2628,3949,3950],{},[44,3951,3952],{},[55,3953,3956],{"href":3954,"rel":3955},"https:\u002F\u002Fhs.streamnative.io\u002Fwebinar-introducing-the-streamnative-remote-mcp-server",[264],[44,3957,3958],{},"Register here",[48,3960,3961],{},"We’re excited to see what you build—and how agentic AI transforms the way teams interact with real-time data streaming platforms.",[48,3963,3964,3967],{},[44,3965,3966],{},"Happy streaming."," 🚀",{"title":18,"searchDepth":19,"depth":19,"links":3969},[3970,3971,3972,3973,3974,3975,3976,3977,3978,3979],{"id":3594,"depth":19,"text":3597},{"id":3629,"depth":19,"text":3632},{"id":3647,"depth":19,"text":3648},{"id":3722,"depth":19,"text":3725},{"id":3739,"depth":19,"text":3742},{"id":3786,"depth":19,"text":3789},{"id":3802,"depth":19,"text":3805},{"id":3826,"depth":19,"text":3829},{"id":3857,"depth":19,"text":3860},{"id":3877,"depth":19,"text":3880},"2026-02-10","StreamNative announces the public preview of its Remote MCP Server, enabling AI agents to securely interact with real-time streaming clusters in StreamNative Cloud.","\u002Fimgs\u002Fblogs\u002F698b2a240cbe95d63a4024e3_MCP-Server-Blog-Thumbnail.png",{},"\u002Fblog\u002Fannouncing-public-preview-of-the-streamnative-remote-mcp-server",{"title":3564,"description":3981},"blog\u002Fannouncing-public-preview-of-the-streamnative-remote-mcp-server",[3988,3989,3550],"Agentic AI","MCP","D_vNoX4AcEU2hG4IqXXkrzEPfhvSwjsBcoLaZzbBs08",{"id":3992,"title":3993,"authors":3994,"body":3995,"category":3550,"createdAt":290,"date":4470,"description":4471,"extension":8,"featured":294,"image":4472,"isDraft":294,"link":290,"meta":4473,"navigation":7,"order":296,"path":4474,"readingTime":4475,"relatedResources":290,"seo":4476,"stem":4477,"tags":4478,"__hash__":4479},"blogs\u002Fblog\u002Fintroducing-organization-settings-in-the-streamnative-cloud-console.md","Introducing Organization Settings in the StreamNative Cloud Console",[311],{"type":15,"value":3996,"toc":4447},[3997,4012,4019,4025,4032,4043,4054,4060,4063,4074,4077,4083,4086,4089,4118,4121,4127,4130,4136,4141,4159,4162,4168,4175,4183,4188,4191,4197,4208,4214,4217,4244,4247,4253,4256,4262,4274,4280,4296,4302,4308,4313,4316,4322,4327,4333,4345,4351,4362,4373,4379,4385,4396,4402,4413,4419,4430,4434,4441,4444],[48,3998,3999,4000,4003,4004,4007,4008,4011],{},"As organizations scale their use of StreamNative Cloud, the distinction between ",[44,4001,4002],{},"day-to-day operational resources"," and ",[44,4005,4006],{},"organization-level administration"," becomes increasingly important. Until now, some organization-scoped resources—most notably ",[36,4009,4010],{},"Secrets","—appeared alongside instance-level resources in the console. This blurred ownership boundaries, increased cognitive load, and made it harder for users to quickly find what they needed.",[48,4013,4014,4015,4018],{},"To address this, we’re introducing a new ",[44,4016,4017],{},"Organization Settings"," experience in the StreamNative Cloud console. This update brings a clearer ownership model, cleaner navigation, and a scalable foundation for future organization-level capabilities.",[40,4020,4022],{"id":4021},"why-organization-settings",[44,4023,4024],{},"Why Organization Settings?",[48,4026,4027,4028,4031],{},"At a high level, this change is about ",[44,4029,4030],{},"clarity and focus",":",[321,4033,4034,4037,4040],{},[324,4035,4036],{},"Organization-scoped resources should live in one predictable place",[324,4038,4039],{},"Runtime and operational workflows should stay focused on daily work",[324,4041,4042],{},"The console should reflect how resources are actually owned and managed",[48,4044,4045,4046,4049,4050,4053],{},"With Organization Settings, the StreamNative Cloud console now clearly separates the ",[44,4047,4048],{},"Org Working Area"," (where you run and operate services) from ",[44,4051,4052],{},"Org Settings"," (where you administer and govern the organization).",[40,4055,4057],{"id":4056},"whats-changing",[44,4058,4059],{},"What’s Changing",[48,4061,4062],{},"This update introduces three major improvements:",[1666,4064,4065,4068,4071],{},[324,4066,4067],{},"A simplified global profile dropdown",[324,4069,4070],{},"A cleaner Org Working Area focused on runtime resources",[324,4072,4073],{},"A brand-new Organization Settings area for administration",[48,4075,4076],{},"Let’s take a closer look at each.",[40,4078,4080],{"id":4079},"simplified-profile-dropdown-global",[44,4081,4082],{},"Simplified Profile Dropdown (Global)",[48,4084,4085],{},"We’ve streamlined the profile dropdown to make it faster and more predictable to use, while keeping essential actions easily accessible.",[48,4087,4088],{},"The updated profile dropdown now includes:",[321,4090,4091,4097],{},[324,4092,4093,4096],{},[44,4094,4095],{},"Current organization information"," Organization name",[324,4098,4099,4100,4115],{},"Organization ID (with copy support)\n",[44,4101,4102,4103,4108,4109,4114],{},"Switch Organization",[44,4104,4105],{},[44,4106,4107],{},"Notifications","Documentation",[44,4110,4111],{},[44,4112,4113],{},"Support","Logout",[384,4116],{"alt":18,"src":4117},"\u002Fimgs\u002Fblogs\u002F6979c0129824a4a0fdc61ef8_3e974462.png",[48,4119,4120],{},"This simplification removes unnecessary clutter and provides a consistent entry point for global actions across the console.",[40,4122,4124],{"id":4123},"org-working-area-focused-on-day-to-day-resources",[44,4125,4126],{},"Org Working Area: Focused on Day-to-Day Resources",[48,4128,4129],{},"The left-hand navigation in the Org Working Area is now intentionally focused on resources used for daily operations.",[32,4131,4133],{"id":4132},"resources",[44,4134,4135],{},"Resources",[48,4137,3600,4138,4140],{},[44,4139,4135],{}," section continues to be the primary workspace for developers and operators and includes:",[321,4142,4143,4148,4154],{},[324,4144,4145],{},[44,4146,4147],{},"Instances",[324,4149,4150,4153],{},[44,4151,4152],{},"UniLink"," (Public Preview)",[324,4155,4156],{},[44,4157,4158],{},"Cloud Environments",[48,4160,4161],{},"These are the resources you interact with regularly to build, run, and operate streaming workloads.",[32,4163,4165],{"id":4164},"admin-section",[44,4166,4167],{},"ADMIN Section",[48,4169,4170,4171,4174],{},"To clearly distinguish administrative workflows from runtime work, we’ve introduced a dedicated ",[44,4172,4173],{},"ADMIN"," section in the left navigation:",[321,4176,4177],{},[324,4178,4179,4182],{},[44,4180,4181],{},"Settings"," – the entry point into the new Organization Settings area",[48,4184,4185],{},[384,4186],{"alt":18,"src":4187},"\u002Fimgs\u002Fblogs\u002F6979c0129824a4a0fdc61ef2_c0738c1b.png",[48,4189,4190],{},"This separation ensures that org-level administration is always available—but never in the way of day-to-day work.",[40,4192,4194],{"id":4193},"organization-settings-a-dedicated-admin-experience",[44,4195,4196],{},"Organization Settings: A Dedicated Admin Experience",[48,4198,4199,4200,4203,4204,4207],{},"Clicking ",[44,4201,4202],{},"ADMIN → Settings"," takes you into a new ",[44,4205,4206],{},"Organization Settings mode",", designed specifically for organization-level administration.",[32,4209,4211],{"id":4210},"a-dedicated-settings-shell",[44,4212,4213],{},"A Dedicated Settings Shell",[48,4215,4216],{},"Once in Organization Settings, you’ll notice several UI cues that clearly indicate you’re in an administrative context:",[321,4218,4219,4228,4237],{},[324,4220,4221,4222,4225,4226],{},"A ",[44,4223,4224],{},"top navigation bar"," with the context set to ",[36,4227,4181],{},[324,4229,4221,4230,4233,4234,4236],{},[44,4231,4232],{},"breadcrumb"," showing ",[36,4235,4181],{}," as the current location",[324,4238,4239,4240,4243],{},"A clear ",[44,4241,4242],{},"Exit Settings"," button that takes you back to the Org Working Area",[48,4245,4246],{},"This makes it easy to move between operational work and administrative tasks without losing context.",[32,4248,4250],{"id":4249},"grouped-settings-navigation",[44,4251,4252],{},"Grouped Settings Navigation",[48,4254,4255],{},"Organization Settings uses a structured, grouped left-hand navigation that scales as new capabilities are added.",[3933,4257,4259],{"id":4258},"general",[44,4260,4261],{},"General",[321,4263,4264,4269],{},[324,4265,4266],{},[44,4267,4268],{},"Billing & Payment",[324,4270,4271],{},[44,4272,4273],{},"Organization Usage",[3933,4275,4277],{"id":4276},"access",[44,4278,4279],{},"Access",[321,4281,4282,4287,4292],{},[324,4283,4284],{},[44,4285,4286],{},"Users",[324,4288,4289],{},[44,4290,4291],{},"Service Accounts",[324,4293,4294],{},[44,4295,4279],{},[3933,4297,4299],{"id":4298},"security",[44,4300,4301],{},"Security",[321,4303,4304],{},[324,4305,4306],{},[44,4307,4010],{},[48,4309,4310],{},[384,4311],{"alt":18,"src":4312},"\u002Fimgs\u002Fblogs\u002F6979c0129824a4a0fdc61ef5_5cba1136.png",[48,4314,4315],{},"By grouping related capabilities together, Organization Settings makes it easier for org admins to discover and manage shared resources in one place.",[40,4317,4319],{"id":4318},"secrets-correct-ownership-clear-routing",[44,4320,4321],{},"Secrets: Correct Ownership, Clear Routing",[48,4323,4324,4325,190],{},"One of the most important changes in this update is the relocation of ",[44,4326,4010],{},[32,4328,4330],{"id":4329},"what-changed",[44,4331,4332],{},"What changed?",[321,4334,4335,4340],{},[324,4336,4337],{},[44,4338,4339],{},"Secrets have been removed from instance-level navigation",[324,4341,4342],{},[44,4343,4344],{},"Secrets now live under Organization Settings → Security",[32,4346,4348],{"id":4347},"why-this-matters",[44,4349,4350],{},"Why this matters",[48,4352,4353,4354,4357,4358,4361],{},"Secrets are organization-scoped by nature. While they may be ",[36,4355,4356],{},"referenced"," by instances or pool members, they are not ",[36,4359,4360],{},"owned"," by them. Moving Secrets into Organization Settings:",[321,4363,4364,4367,4370],{},[324,4365,4366],{},"Reflects the correct ownership model",[324,4368,4369],{},"Prevents confusion about where Secrets should be managed",[324,4371,4372],{},"Aligns the UI with how Secrets are actually used across the organization",[40,4374,4376],{"id":4375},"what-this-means-for-you",[44,4377,4378],{},"What This Means for You",[32,4380,4382],{"id":4381},"for-organization-admins",[44,4383,4384],{},"For Organization Admins",[321,4386,4387,4390,4393],{},[324,4388,4389],{},"All org-level resources are now centralized in one place",[324,4391,4392],{},"Administrative workflows are easier to find and manage",[324,4394,4395],{},"The UI clearly reflects responsibility and ownership",[32,4397,4399],{"id":4398},"for-developers-and-operators",[44,4400,4401],{},"For Developers and Operators",[321,4403,4404,4407],{},[324,4405,4406],{},"Cleaner navigation with fewer admin-only objects in daily workflows",[324,4408,4409,4410],{},"A predictable structure: ",[44,4411,4412],{},"Working Area for runtime work, Settings for administration",[32,4414,4416],{"id":4415},"for-the-platform",[44,4417,4418],{},"For the Platform",[321,4420,4421,4424,4427],{},[324,4422,4423],{},"Reduced cognitive load and fewer “where do I find this?” moments",[324,4425,4426],{},"A scalable information architecture for future org-level features",[324,4428,4429],{},"A stronger foundation for governance and administration as organizations grow",[40,4431,4432],{"id":2146},[44,4433,2149],{},[48,4435,4436,4437,4440],{},"The new Organization Settings experience is rolling out starting ",[44,4438,4439],{},"January 27th, 2026",". Once available, we recommend taking a few minutes to explore the updated navigation and familiarize yourself with the new layout.",[48,4442,4443],{},"As always, we welcome your feedback as you start using Organization Settings. Your input helps us continue improving the StreamNative Cloud experience.",[48,4445,4446],{},"Happy streaming!",{"title":18,"searchDepth":19,"depth":19,"links":4448},[4449,4450,4451,4452,4456,4460,4464,4469],{"id":4021,"depth":19,"text":4024},{"id":4056,"depth":19,"text":4059},{"id":4079,"depth":19,"text":4082},{"id":4123,"depth":19,"text":4126,"children":4453},[4454,4455],{"id":4132,"depth":279,"text":4135},{"id":4164,"depth":279,"text":4167},{"id":4193,"depth":19,"text":4196,"children":4457},[4458,4459],{"id":4210,"depth":279,"text":4213},{"id":4249,"depth":279,"text":4252},{"id":4318,"depth":19,"text":4321,"children":4461},[4462,4463],{"id":4329,"depth":279,"text":4332},{"id":4347,"depth":279,"text":4350},{"id":4375,"depth":19,"text":4378,"children":4465},[4466,4467,4468],{"id":4381,"depth":279,"text":4384},{"id":4398,"depth":279,"text":4401},{"id":4415,"depth":279,"text":4418},{"id":2146,"depth":19,"text":2149},"2026-01-28","Discover the new Organization Settings in the StreamNative Cloud Console. This update brings a clearer ownership model by separating operational workflows from organization-level administration, including the key relocation of Secrets to a dedicated admin area for improved governance and focus.","\u002Fimgs\u002Fblogs\u002F6979be9687c90d57e941a1a3_Organization-Settings-Blog-Thumbnail.png",{},"\u002Fblog\u002Fintroducing-organization-settings-in-the-streamnative-cloud-console","6 min read",{"title":3993,"description":4471},"blog\u002Fintroducing-organization-settings-in-the-streamnative-cloud-console",[3550],"43IjHMgBqQNP4ZPucfjaZ__aMnSvRglAhpEPxDuGrGw",{"id":4481,"title":4482,"authors":4483,"body":4485,"category":3550,"createdAt":290,"date":4705,"description":4706,"extension":8,"featured":294,"image":4707,"isDraft":294,"link":290,"meta":4708,"navigation":7,"order":296,"path":4709,"readingTime":3556,"relatedResources":290,"seo":4710,"stem":4711,"tags":4712,"__hash__":4713},"blogs\u002Fblog\u002Fstreamnative-ryft-real-time-iceberg-ingestion-intelligent-lakehouse-management.md","StreamNative + Ryft: Real-Time Iceberg Ingestion with Intelligent Lakehouse Management",[311,4484],"Yuval Yogev",{"type":15,"value":4486,"toc":4692},[4487,4501,4511,4517,4520,4523,4532,4538,4545,4549,4552,4567,4570,4574,4577,4603,4607,4610,4613,4618,4624,4628,4631,4634,4641,4645,4649,4652,4656,4659,4663,4666,4672,4684],[48,4488,4489,4490,4493,4494,4003,4497,4500],{},"Modern data platforms increasingly converge on ",[44,4491,4492],{},"open lakehouse architectures",", where real-time data ingestion, open table formats, and independent optimization layers work together. The combination of ",[44,4495,4496],{},"StreamNative",[44,4498,4499],{},"Ryft"," exemplifies this pattern - each system focusing on what it does best.",[48,4502,4503,4003,4505,4507,4508,4510],{},[44,4504,4496],{},[44,4506,4499],{}," address this end-to-end: StreamNative ensures reliable, low-latency ingestion of streaming data into ",[44,4509,1153],{},", while Ryft continuously manages those tables so they remain fast, cost-efficient, and compliant as they grow.",[40,4512,4514],{"id":4513},"the-challenge",[44,4515,4516],{},"The Challenge",[48,4518,4519],{},"Modern lakehouses face increasing pressure as data volumes grow and businesses demand lower latency for real-time, mission-critical analysis.",[48,4521,4522],{},"Handling this challenge requires a robust streaming system, capable of handling high throughput and ensuring reliable data ingestion at scale.",[48,4524,4525,4526,4531],{},"It also requires handling ",[55,4527,4530],{"href":4528,"rel":4529},"https:\u002F\u002Fwww.ryft.io\u002Fblog\u002Fstreaming-with-apache-iceberg-the-operational-problems-at-scale",[264],"challenges in the storage layer"," - the rapid generation of data and metadata files which can degrade performance if not managed effectively.",[40,4533,4535],{"id":4534},"architectural-overview",[44,4536,4537],{},"Architectural Overview",[48,4539,4540,4003,4542,4544],{},[44,4541,4496],{},[44,4543,4499],{}," work together to address those 2 challenges. This architecture offers a clean separation of concerns.",[32,4546,4548],{"id":4547},"streamnative-real-time-ingestion-into-iceberg","StreamNative: Real-Time Ingestion into Iceberg",[48,4550,4551],{},"StreamNative brings real-time streaming directly into the lakehouse by:",[321,4553,4554,4557,4564],{},[324,4555,4556],{},"Continuously ingesting high-volume, real-time Pulsar or Kafka topics",[324,4558,4559,4560,4563],{},"Writing data as ",[44,4561,4562],{},"Apache Iceberg tables",", rather than transient files",[324,4565,4566],{},"Enabling downstream analytics engines to query fresh data with minimal latency",[48,4568,4569],{},"This approach removes any batch pipelines and ETL chains, making streaming data immediately available for analytics, AI, and operational use cases.",[32,4571,4573],{"id":4572},"ryft-optimizing-and-managing-iceberg-tables","Ryft: Optimizing and Managing Iceberg Tables",[48,4575,4576],{},"Once Iceberg tables are created by StreamNative, Ryft steps in to handle the heavy lifting of lakehouse operations, including:",[321,4578,4579,4585,4591,4597],{},[324,4580,4581,4584],{},[44,4582,4583],{},"Compaction and file optimization"," to improve query performance",[324,4586,4587,4590],{},[44,4588,4589],{},"Intelligent snapshot lifecycle"," to control table growth in streaming use cases by keeping daily or weekly snapshots",[324,4592,4593,4596],{},[44,4594,4595],{},"Data Lifecycle management",", including data tiering and data retention for storage efficiency and compliance use-cases",[324,4598,4599,4602],{},[44,4600,4601],{},"Governance & GDPR",": safe and efficient deletion of regulated or expired data",[32,4604,4606],{"id":4605},"flexible-and-open-lakehouse","Flexible and Open Lakehouse",[48,4608,4609],{},"This architecture supports any object storage and catalog - no need for heavy rewrites or migrations.",[48,4611,4612],{},"StreamNative and Ryft make your tables available and performant in real-time, and your lakehouse remains fully open for any catalog and query engine.",[48,4614,4615],{},[384,4616],{"alt":18,"src":4617},"\u002Fimgs\u002Fblogs\u002F69704c0dcac67451d75efaaf_f1f341f9.png",[40,4619,4621],{"id":4620},"what-customers-get",[44,4622,4623],{},"What Customers Get",[32,4625,4627],{"id":4626},"real-time-data-without-long-term-degradation","Real-Time Data Without Long-Term Degradation",[48,4629,4630],{},"StreamNative writes streaming data directly into Iceberg tables in high throughput, making data immediately queryable. Ryft ensures those same tables stay fast and usable over time by performing intelligent compaction and snapshot management.",[48,4632,4633],{},"Since Ryft connects directly to the catalog and storage, it can be used together with StreamNative without changing any streaming pipelines.",[48,4635,4636,4637,4640],{},"Together, they form a ",[44,4638,4639],{},"low-latency, open lakeouse",", that is ready for fast AI and analytics use cases.",[32,4642,4644],{"id":4643},"key-use-cases","Key Use Cases",[3933,4646,4648],{"id":4647},"_1-real-time-analytics","1. Real-Time Analytics",[48,4650,4651],{},"Serve fresh, continuously updated data to analytical queries with low latency, enabling dashboards, alerting, and operational decision-making on live business events. Support high-concurrency access patterns without relying on precomputed aggregates or batch refresh cycles.",[3933,4653,4655],{"id":4654},"_2-ai-ml-pipelines","2. AI & ML Pipelines",[48,4657,4658],{},"Provide consistent, point-in-time datasets for feature engineering, model training, and evaluation, ensuring reproducibility across experiments. Enable reuse of the same data for offline training and online inference.",[3933,4660,4662],{"id":4661},"_3-production-workloads","3. Production Workloads",[48,4664,4665],{},"Power real-time, customer-facing applications such as personalization, recommendations, pricing, and fraud detection using shared lakehouse data as the system of record. Enable consistent, up-to-date data access across online services and batch pipelines without duplicating data into separate operational stores.",[40,4667,4669],{"id":4668},"final-takeaway",[44,4670,4671],{},"Final Takeaway",[48,4673,4674,4675,4003,4677,4683],{},"Together, ",[44,4676,4496],{},[55,4678,4681],{"href":4679,"rel":4680},"https:\u002F\u002Fryft.io",[264],[44,4682,4499],{}," enable real-time lakehouses that remain reliable at scale. Streaming data flows continuously into Apache Iceberg and becomes immediately available for analytics and AI, while the tables themselves are continuously kept performant, bounded, and compliant as data volumes grow. File layouts are optimized, snapshots and storage are controlled, and retention and GDPR policies are enforced at the table level, so teams can ingest aggressively without accumulating performance, cost, or governance debt over time.",[48,4685,4686,4691],{},[55,4687,4690],{"href":4688,"rel":4689},"https:\u002F\u002Fconsole.streamnative.cloud\u002Fsignup?from=site_sn-blog",[264],"Sign up for a trial ","and get started for free.",{"title":18,"searchDepth":19,"depth":19,"links":4693},[4694,4695,4700,4704],{"id":4513,"depth":19,"text":4516},{"id":4534,"depth":19,"text":4537,"children":4696},[4697,4698,4699],{"id":4547,"depth":279,"text":4548},{"id":4572,"depth":279,"text":4573},{"id":4605,"depth":279,"text":4606},{"id":4620,"depth":19,"text":4623,"children":4701},[4702,4703],{"id":4626,"depth":279,"text":4627},{"id":4643,"depth":279,"text":4644},{"id":4668,"depth":19,"text":4671},"2026-01-21","Enable a low-latency, open lakehouse with StreamNative and Ryft. Learn how they combine real-time Apache Iceberg ingestion with intelligent table management for peak performance, cost efficiency, and compliance in AI and analytics workloads.","\u002Fimgs\u002Fblogs\u002F69704a13cac67451d75dd9da_Blog-SN+Ryft.png",{},"\u002Fblog\u002Fstreamnative-ryft-real-time-iceberg-ingestion-intelligent-lakehouse-management",{"title":4482,"description":4706},"blog\u002Fstreamnative-ryft-real-time-iceberg-ingestion-intelligent-lakehouse-management",[800,1330,303],"LPFrGTSOZIl0nHdYVuNyHB0IVJ-ipnscs1-csz8E4Q4",{"id":4715,"title":4716,"authors":4717,"body":4718,"category":289,"createdAt":290,"date":5500,"description":5501,"extension":8,"featured":7,"image":5502,"isDraft":294,"link":290,"meta":5503,"navigation":7,"order":296,"path":5504,"readingTime":5505,"relatedResources":290,"seo":5506,"stem":5507,"tags":5508,"__hash__":5510},"blogs\u002Fblog\u002Fstreamnatives-2025-year-in-review.md","StreamNative’s 2025 Year in Review",[806],{"type":15,"value":4719,"toc":5490},[4720,4727,4733,4757,4776,4803,4834,4852,4858,4888,4918,4942,4948,4978,5013,5044,5050,5083,5110,5136,5142,5178,5200,5234,5240,5247,5310,5325,5341,5361,5367,5397,5435,5441,5475],[48,4721,4722,4723,4726],{},"Welcome to the new year! As we kick off 2026, we’re thrilled to take a moment to reflect on 2025—a year of remarkable growth, innovation, and community momentum at StreamNative. From major product milestones like ",[44,4724,4725],{},"Ursa Engine"," reaching general availability to breakthroughs in real-time AI integration, 2025 was a pivotal year that solidified StreamNative’s role at the forefront of lakehouse-native data streaming. In this review, we highlight the key product releases, community achievements, business developments, and events that defined our year – and share a glimpse of what’s ahead in 2026.",[40,4728,4730],{"id":4729},"ursa-engine-goes-ga-and-everywhere-lakehouse-native-streaming-at-scale",[44,4731,4732],{},"Ursa Engine Goes GA and Everywhere – Lakehouse-Native Streaming at Scale",[48,4734,4735,4736,4738,4739,4742,4743,4749,4750,4756],{},"2025 was a breakthrough year for ",[44,4737,4725],{},", StreamNative’s next-generation, lakehouse-native streaming engine for Apache Pulsar and Kafka. ",[44,4740,4741],{},"Ursa Engine reached General Availability (GA) on AWS"," in Q1, delivering on its promise to ",[55,4744,4746,758],{"href":4745},"\u002Fblog\u002Fannouncing-ursa-engine-ga-on-aws-leaderless-lakehouse-native-data-streaming-that-slashes-kafka-costs-by-95#:~:text=We%E2%80%99re%20excited%20to%20announce%20a,compared%20to%20traditional%20Kafka%20deployments",[36,4747,4748],{},"slash streaming costs by up to 95%","compared to traditional Kafka. Built on a leaderless, stateless architecture that writes data directly to cloud object storage in open table formats, Ursa dramatically reduces infrastructure overhead while remaining fully Kafka-compatible. Its innovative design was validated on the world stage when ",[55,4751,4753],{"href":4752},"\u002Fblog\u002Fursa-wins-vldb-2025-best-industry-paper-the-first-lakehouse-native-streaming-engine-for-kafka",[44,4754,4755],{},"our Ursa paper won the Best Industry Paper award at VLDB 2025",", underscoring Ursa as the first “lakehouse-native” streaming engine for Kafka.",[48,4758,4759,4760,4763,4764,4767,4768,4771,4772,4775],{},"We also put a name to the architectural shift we’ve been building toward: ",[36,4761,4762],{},"lakehouse-native data streaming",". By “lakehouse-native,” we mean a streaming system where ",[44,4765,4766],{},"open lakehouse tables (Iceberg\u002FDelta) on object storage are the primary storage layer",", not an after-the-fact destination fed by connector pipelines. Instead of “stream first, copy later,” Ursa makes it possible to ",[44,4769,4770],{},"write once into open table formats"," and make the same data immediately usable for streaming consumers ",[36,4773,4774],{},"and"," analytics\u002FAI engines through catalog integrations — reducing duplication, simplifying governance, and collapsing two infrastructures into one.",[48,4777,4778,4781,4782,4003,4786,4790,4791,4795,4796,4799,4800,4802],{},[44,4779,4780],{},"Ursa expanded to every cloud and cluster."," Following AWS GA, we introduced Ursa on ",[55,4783,4785],{"href":4784},"\u002Fblog\u002Fstreamnative-ursa-is-now-available-for-public-preview-on-microsoft-azure","Microsoft Azure",[55,4787,4789],{"href":4788},"\u002Fblog\u002Fannouncing-ursa-engine-preview-on-gcp","Google Cloud"," in Public Preview. By late 2025, organizations could deploy Ursa in their own accounts on all three major clouds, or consume it as a fully-managed service. Crucially, Ursa’s lakehouse storage tier ",[55,4792,4794],{"href":4793},"\u002Fblog\u002Fursa-everywhere-lakehouse-native-future-data-streaming#:~:text=our%20next,the%20Classic%20Engine%20to%20Ursa","became available for every StreamNative Cloud cluster (Serverless, Dedicated, BYOC)"," via a new tiered storage extension. This means even ",[36,4797,4798],{},"classic"," Pulsar clusters can now offload data to Iceberg\u002FDelta lakehouse tables, immediately making each topic a live stream ",[36,4801,4774],{}," an analytics-ready table. Users get the familiar Pulsar\u002FKafka experience while data automatically lands in their cloud storage (e.g. S3 or ADLS) as compacted Parquet files. This “Ursa Everywhere” approach allows seamless upgrades to the full Ursa engine in future, with data already in the right format and place – a pragmatic path to reduce total cost of ownership without disruptive migrations.",[48,4804,4805,4808,4809,4815,4816,4822,4823,4829,4830,4833],{},[44,4806,4807],{},"Deep integration with data lakehouse catalogs"," was another highlight. Ursa now natively integrates with popular governance and catalog systems to unify streaming and batch data under consistent governance. For example, ",[55,4810,4812],{"href":4811},"\u002Fblog\u002Fseamless-streaming-to-lakehouse-unveiling-streamnative-clouds-integration-with-databricks-unity-catalog",[44,4813,4814],{},"Databricks Unity Catalog integration"," allows streaming topics to register as Unity Catalog–governed Delta or Iceberg tables, so real-time data inherits the same access controls and lineage as the rest of the lakehouse. ",[55,4817,4819],{"href":2872,"rel":4818},[264],[44,4820,4821],{},"Amazon S3 Tables integration"," enables Ursa to write streams directly into Iceberg tables backed by AWS S3, using Iceberg’s REST catalog for centralized metadata. And ",[55,4824,4826],{"href":4825},"\u002Fblog\u002Fstreamnative-enables-seamless-streaming-into-apache-iceberg-tm-snowflake-open-catalog",[44,4827,4828],{},"Snowflake Open Catalog integration"," makes Ursa’s Iceberg tables discoverable and queryable from Snowflake, bridging real-time data into Snowflake’s analytical ecosystem. Together, these ",[44,4831,4832],{},"“streaming augmented lakehouse”"," capabilities brought truly unified governance: streaming topics and batch tables can be one and the same, controlled by the same catalog policies.",[48,4835,4836,4837,4840,4841,4847,4848,4851],{},"Finally, StreamNative’s ",[44,4838,4839],{},"Serverless"," offering reached ",[55,4842,4844],{"href":4843},"\u002Fblog\u002Fstreamnative-serverless-is-now-generally-available-on-aws-google-cloud-and-azure",[44,4845,4846],{},"General Availability on AWS, Google Cloud, and Azure"," in 2025. This Serverless mode delivers instant, elastic streams without cluster management, enabling teams to spin up Pulsar\u002FUrsa clusters on-demand across all major clouds. With seamless auto-scaling and multi-tenancy, the GA release of StreamNative Serverless opened real-time streaming to a wider audience by removing operational overhead. Developers can now build real-time applications faster with ",[36,4849,4850],{},"instant start, automatic scaling",", and support for both Pulsar and Kafka APIs on a unified serverless platform.",[40,4853,4855],{"id":4854},"adaptive-universal-linking-seamless-kafka-migrations",[44,4856,4857],{},"Adaptive Universal Linking – Seamless Kafka Migrations",[48,4859,4860,4861,4868,4869,4872,4873,4876,4877,4880,4881,4884,4885],{},"To ease the journey to Ursa and modern streaming, we ",[55,4862,4864,4865],{"href":4863},"\u002Fblog\u002Feffortless-kafka-migration-real-time-data-replication-with-streamnative-universal-linking","introduced ",[44,4866,4867],{},"Universal Linking (“UniLink”)"," – a powerful tool for ",[36,4870,4871],{},"seamless cross-cluster data migration",". In March, ",[44,4874,4875],{},"UniLink entered Public Preview"," as a ",[44,4878,4879],{},"“full-fidelity Kafka-to-Ursa replication tool”",". This allowed organizations to ",[44,4882,4883],{},"live-migrate"," from legacy Kafka (or classic Pulsar clusters) to Ursa Engine with zero downtime. UniLink continuously replicates topics, schemas, and consumer state from the source to Ursa, so teams can cut over applications at their own pace without data loss or dual-writes. By leveraging smart, zone-aware reads and writing directly to Ursa’s object storage, UniLink avoids broker bottlenecks and costly cross-AZ traffic during migration. This made migrating to Ursa’s leaderless architecture faster and cheaper, ",[36,4886,4887],{},"“replicating more while spending less, without compromise.”",[48,4889,4890,4891,4897,4898,4901,4902,4905,4906,4909,4910,4913,4914,4917],{},"Mid-year, ",[55,4892,4894],{"href":4893},"\u002Fblog\u002Foctober-2025-data-streaming-launch-adaptive-linking-cloud-spanner-connector-and-orca-with-langgraph",[44,4895,4896],{},"UniLink evolved with Adaptive Linking"," to support more flexible migration strategies. ",[44,4899,4900],{},"Two linking modes – stateful vs. stateless –"," were introduced to let teams choose how to handle consumer offsets during migration. In ",[44,4903,4904],{},"stateful mode",", UniLink preserves the exact offsets and ordering between source and destination clusters, so consumers see a continuous stream as if nothing changed. This allows a clean cutover (with full auditability) but requires coordinating a final producer switch in a maintenance window. In ",[44,4907,4908],{},"stateless mode",", UniLink does ",[36,4911,4912],{},"not"," preserve offsets on the target, which greatly relaxes rollout: consumers can start reading from the new cluster independently of when producers move. This mode shines for migrations that may stretch over weeks or involve many independent teams, as it tolerates offset discontinuities that downstream systems can handle. Together, these modes turn “all-or-nothing” migrations into an ",[36,4915,4916],{},"engineering choice"," – tightly coordinated when needed, or gradual and decoupled when possible.",[48,4919,4920,4921,4924,4925,4929,4930,4933,4934,4937,4938,4941],{},"UniLink also added support for ",[44,4922,4923],{},"topic rename mapping",", making re-platforming even smoother. This lets users migrate a topic from one name\u002Fnamespace to a different name on the new cluster – for example, mirror ",[4926,4927,4928],"code",{},"payments.orders"," into ",[4926,4931,4932],{},"finance_orders"," – without breaking schema compatibility or consumer group behavior. Organizations used this to reorganize and clean up topic taxonomy during migration (e.g. consolidating topics or aligning naming conventions) while UniLink kept the data and schema continuity intact. By the end of 2025, ",[44,4935,4936],{},"Adaptive UniLink Linking"," enabled truly seamless ",[44,4939,4940],{},"cross-cluster migrations",", whether upgrading from open-source Kafka, moving from self-managed Kafka to StreamNative Cloud, or consolidating multiple clusters. Companies could “link” their data streams over with confidence, knowing they can preserve critical ordering when required or opt for flexibility when speed is paramount.",[40,4943,4945],{"id":4944},"expanding-connectivity-snowflake-snowpipe-and-google-spanner-integration",[44,4946,4947],{},"Expanding Connectivity: Snowflake Snowpipe and Google Spanner Integration",[48,4949,4950,4951,4954,4955,4961,4962,4965,4966,4969,4970,4973,4974,4977],{},"We also significantly expanded its ",[44,4952,4953],{},"integrations and connectors"," in 2025, making it easier to connect diverse systems into the streaming platform. One major enhancement was ",[55,4956,4958],{"href":4957},"\u002Fblog\u002Fjanuary-data-streaming-launch-organization-profile-ursa-engine-on-azure-enhancements-for-streamnative-cloud-and-more#:~:text=Snowpipe%20Streaming%20Support%20in%20Snowflake,Sink%20Connector",[44,4959,4960],{},"Snowflake Snowpipe Streaming support"," in our Snowflake Sink Connector. The Snowflake Streaming Sink (introduced in private preview in late 2024) was upgraded with Snowpipe Streaming, enabling ",[44,4963,4964],{},"near-real-time loading of data into Snowflake"," tables. Instead of staging files on cloud storage and waiting for batch loads, the connector now uses Snowflake’s Snowpipe Streaming API to push messages directly into Snowflake as soon as they arrive. This delivers ",[44,4967,4968],{},"lower latency"," – data is queryable in Snowflake within seconds, not minutes. It also ",[44,4971,4972],{},"reduces cost and complexity"," by eliminating intermediate storage and batch jobs. In short, streaming pipelines from Pulsar\u002FUrsa into Snowflake became ",[44,4975,4976],{},"faster, cheaper, and simpler",", unlocking use cases like real-time analytics dashboards on Snowflake and up-to-date ML feature tables without complex ETL.",[48,4979,4980,4981,4984,4985,4988,4989,4992,4993,4996,4997,5000,5001,5004,5005,5008,5009,5012],{},"On the source side, StreamNative ",[44,4982,4983],{},"onboarded a suite of Debezium-powered CDC connectors"," in 2025, bringing a rich array of enterprise database integrations into the fold. We added fully-managed source connectors (built on Debezium Kafka Connect) for popular databases: ",[44,4986,4987],{},"MySQL, PostgreSQL, Microsoft SQL Server, MongoDB,"," and a ",[44,4990,4991],{},"universal JDBC"," connector for other relational DBs. These connectors capture ",[44,4994,4995],{},"change data capture (CDC)"," events from databases and stream them into Pulsar topics in real time – all as a native part of StreamNative Cloud (no self-managed Connect cluster needed). For example, the ",[44,4998,4999],{},"Debezium MySQL Source"," connector is available ",[36,5002,5003],{},"built-in"," on StreamNative Cloud; with a few clicks or CLI commands, users can start streaming MySQL binlog events into Pulsar. Similar connectors for Postgres, SQL Server, and MongoDB allow streaming inserts\u002Fupdates\u002Fdeletes with low latency. This year’s additions meant customers could use StreamNative Cloud as a ",[44,5006,5007],{},"universal data pipeline",", seamlessly integrating operational databases into their event streams. With these ",[44,5010,5011],{},"new CDC sources",", microservices can react to DB changes (e.g. an order status update) in real time, and data lakes can ingest fresh transactional data continuously rather than via nightly dumps.",[48,5014,5015,5016,5024,5025,5028,5029,5032,5033,5036,5037,5043],{},"Another noteworthy integration was the ",[55,5017,5018,758,5021],{"href":4893},[44,5019,5020],{},"Debezium Cloud Spanner Source",[44,5022,5023],{},"connector"," introduced in Q4. Google Cloud Spanner – a globally-distributed SQL database – can emit change streams, and StreamNative’s managed connector now taps into those to produce Pulsar events. This connector listens to Spanner’s change streams and publishes every row-level insert\u002Fupdate\u002Fdelete event into a Pulsar topic in near real-time. It is fully managed and ",[44,5026,5027],{},"handles all the heavy lifting"," (scaling, partitioning, offset management), so users simply provide their Spanner instance details and let the platform stream the changes. ",[44,5030,5031],{},"Google Spanner integration"," unlocks powerful patterns: for example, applications can subscribe to Spanner change topics to trigger downstream processes the moment critical data changes (fraud detection, cache updates), and analytics pipelines can keep BigQuery or lakehouse tables in sync without batch jobs. All Debezium-based connectors include rich observability (throughput, lag, error rates in our console) and are designed for reliability at scale. With ",[44,5034,5035],{},"Snowpipe Streaming + a growing connector roster",", 2025 solidified StreamNative’s vision of ",[55,5038,5040],{"href":5039},"\u002Fproducts\u002Funiconn",[44,5041,5042],{},"Universal Connectivity",": whatever data source or sink you use – cloud data warehouse, relational database, NoSQL store – we likely have a native integration to plug it into your streaming pipeline.",[40,5045,5047],{"id":5046},"orca-eventdriven-ai-agents-come-to-life",[44,5048,5049],{},"Orca: Event‑Driven AI Agents Come to Life",[48,5051,5052,5053,5059,5060,5063,5064,5070,5071,5074,5075,5078,5079,5082],{},"Perhaps the most futuristic development of 2025 was the advent of ",[55,5054,5056],{"href":5055},"\u002Fproducts\u002Forca-agent-engine",[44,5057,5058],{},"Orca",", Our new ",[44,5061,5062],{},"Event-Driven Agent Engine"," for AI. Unveiled at the Data Streaming Summit in San Francisco, ",[55,5065,5067],{"href":5066},"\u002Fblog\u002Fintroducing-orca-agent-engine-private-preview",[44,5068,5069],{},"Orca entered Private Preview"," as the industry’s first event-driven runtime for production AI agents. The idea behind Orca is simple but powerful: if your enterprise data already ",[36,5072,5073],{},"streams through Pulsar",", why not host your AI “agents” directly in the stream? Traditional LLM-powered agents often run as stateless APIs or notebook experiments, but ",[44,5076,5077],{},"Orca transforms AI agents from passive, request\u002Fresponse bots into persistent, real-time actors",". An Orca agent can subscribe to one or more topics, ",[36,5080,5081],{},"maintain state"," (memory) between events, take actions (call APIs or trigger workflows), and emit new events – all with the resilience and scalability of Pulsar behind it.",[48,5084,5085,5086,5089,5090,5093,5094,5097,5098,5101,5102,5105,5106,5109],{},"In practice, ",[44,5087,5088],{},"Orca provides a production-grade sandbox for autonomous AI",". Agents run inside a ",[36,5091,5092],{},"durable event loop",": they consume messages from streams (e.g. a customer event topic), use an LLM or other AI logic to decide on an output, and produce results or commands to other topics. Unlike ephemeral Lambda functions, Orca agents can keep long-lived state (via in-memory or streaming storage), allowing them to “remember” past interactions or maintain a chain of thought over time. The Orca engine handles ",[44,5095,5096],{},"concurrency, fault tolerance, and observability"," – multiple agents can coordinate, no single agent stalls the system, and every decision or action is logged and traceable. In essence, Orca enables an ",[44,5099,5100],{},"“agent mesh”"," architecture where multiple AI agents collaborate via the Pulsar event bus, sharing context and tasks in real time. Notably, Orca is ",[36,5103,5104],{},"polyglot",": it leverages Pulsar’s multi-protocol support, meaning it can work with ",[44,5107,5108],{},"OpenAI functions\u002Fagents, Google’s Agent Framework (ADK), LangChain\u002FLangGraph",", or custom Python agents without heavy rewrites.",[48,5111,3600,5112,5115,5116,5119,5120,5123,5124,5127,5128,5131,5132,5135],{},[44,5113,5114],{},"use cases for Orca are ground-breaking",". Imagine a cybersecurity agent that subscribes to network intrusion events and ",[36,5117,5118],{},"autonomously orchestrates"," containment actions, or a customer support AI that listens to user activity streams and ",[36,5121,5122],{},"proactively engages"," with personalized responses. With Orca, such agents run ",[36,5125,5126],{},"natively in the streaming platform",", eliminating latency and integration barriers. They don’t poll for data – they react ",[44,5129,5130],{},"the instant events occur",". StreamNative built Orca with enterprise needs in mind: integration with corporate single sign-on and secrets management, role-based controls on what tools an agent can use, and full audit logs of agent decisions. By year’s end, Orca remained in Private Preview (initially available for BYOC deployments), but it had already sparked imagination among early users. Orca’s debut signals that ",[44,5133,5134],{},"autonomous, event-driven AI is no longer science fiction","; it’s the next chapter of streaming, where data streams feed AI agents that continuously perceive and act.",[40,5137,5139],{"id":5138},"security-and-governance-rbac-ga-and-schema-governance-previews",[44,5140,5141],{},"Security and Governance: RBAC GA and Schema Governance Previews",[48,5143,5144,5145,5157,5158,5161,5162,5165,5166,5169,5170,5173,5174,5177],{},"StreamNative Cloud matured its enterprise security and governance features in 2025, making it easier for organizations to confidently run multi-tenant, production workloads. ",[55,5146,5148,5149,5152,5153,5156],{"href":5147},"\u002Fblog\u002Fq3-2025-data-streaming-launch-lakehouse-streaming-governed-analytics-and-event-driven-agents#:~:text=RBAC%20GA%3A%20least,tenant%20streaming","A major milestone was ",[44,5150,5151],{},"Role-Based Access Control (RBAC)"," reaching ",[44,5154,5155],{},"General Availability"," in Q3",". ",[44,5159,5160],{},"RBAC in StreamNative Cloud is now GA",", bringing a consistent, fine-grained security model across all Pulsar and Kafka interfaces. This means platform admins can centrally define who is allowed to do what – e.g. ",[44,5163,5164],{},"who can create or delete topics, publish or subscribe on a given namespace, or evolve a schema"," – all through a unified roles and permissions system. Roles can mirror real-world teams and least-privilege principles (for example, a ",[36,5167,5168],{},"Data Producer"," role that grants publish rights on specific topics but no consume rights). These permissions apply uniformly whether clients connect via Pulsar protocols or the Kafka API, ensuring no backdoor by using a different interface. With RBAC GA, enterprises no longer need ad-hoc ACL scripts or manual enforcement – they get a ",[44,5171,5172],{},"single source of truth for access control",", manageable in the Cloud Console or via API\u002FTerraform for automation. As noted in the announcement, ",[36,5175,5176],{},"“consolidating onto one platform doesn’t mean compromising on governance”"," – RBAC provides the guardrails to confidently host many applications and teams on the same streaming cluster.",[48,5179,5180,5181,5184,5185,5188,5189,5192,5193,5196,5197,5199],{},"StreamNative also introduced new ",[44,5182,5183],{},"schema governance"," capabilities. Since Pulsar’s schema registry is built-in, RBAC now covers ",[44,5186,5187],{},"who can register or update schemas"," for each topic, adding protection against unauthorized or incompatible schema changes. Moreover, in January we launched ",[44,5190,5191],{},"Kafka Schema Registry RBAC"," in Private Preview. This feature extends fine-grained access control to the Kafka-compatibility Schema Registry API, allowing enterprises to enforce who can read or write schema definitions on a per-subject basis. By locking down schema evolution, companies can ensure only approved data models make it to production – a big win for compliance and data quality. These schema governance tools, combined with RBAC, move StreamNative Cloud toward a ",[36,5194,5195],{},"“secure by default”"," posture: no more open access by default; everything is governed by roles that map to business needs. It shifts access management from scattered configs to a single auditable model. And because RBAC applies to Pulsar ",[36,5198,4774],{}," Kafka endpoints, security teams have one framework to understand, rather than separate ACL systems.",[48,5201,5202,5203,5206,5207,5213,5214,5217,5218,5221,5222,5225,5226,5229,5230,5233],{},"Other enhancements focused on ",[44,5204,5205],{},"administrative ease and platform hardening",". We rolled out a ",[55,5208,5210],{"href":5209},"\u002Fblog\u002Fjanuary-data-streaming-launch-organization-profile-ursa-engine-on-azure-enhancements-for-streamnative-cloud-and-more",[44,5211,5212],{},"new Organization Profile page"," in the Console for centralized org management. Administrators can now easily update key info like billing contacts and technical contacts, ensuring they don’t miss critical notifications. The profile page also provides a clear overview of the organization’s clusters and resources in one place, simplifying management for large teams. Under the hood, we delivered a ",[44,5215,5216],{},"“slim” StreamNative Cloud container image"," that uses a Bill of Materials for dependency management. This trimmed the core image size to ~1 GB, improving startup times and reducing the attack surface for security. A smaller image means faster autoscaling and easier upgrades, as well as fewer components to monitor for vulnerabilities. This change, though not visible to end users, exemplifies our commitment to ",[36,5219,5220],{},"enterprise-grade reliability and security",". In sum, by end of 2025 StreamNative Cloud offered a much tighter security and governance story: ",[44,5223,5224],{},"GA-grade RBAC"," for all resources, ",[44,5227,5228],{},"schema controls"," to prevent data chaos, and polished admin experiences – all contributing to a ",[44,5231,5232],{},"trustworthy, governable streaming platform"," for the enterprise.",[40,5235,5237],{"id":5236},"business-growth-and-global-expansion",[44,5238,5239],{},"Business Growth and Global Expansion",[48,5241,5242,5243,5246],{},"StreamNative’s business saw robust growth in 2025, underpinned by new customer wins, cloud footprint expansion, and industry recognition. In 2025, we saw more “AI-native” products depend on ",[44,5244,5245],{},"continuous, high-volume event streams"," — because when your product reacts in real time, your data pipeline can’t be batch.",[48,5248,5249,5255,5256,5259,5260,5263,5264,5266,5272,5273,5276,5277,4003,5280,5283,5284,5287,5288,5291,5292,5294,5295,5301,5302,5305,5306,5309],{},[55,5250,5252],{"href":5251},"\u002Fsuccess-stories\u002Funify-achieves-real-time-go-to-market-scale-with-apache-pulsar-and-streamnative-cloud",[44,5253,5254],{},"Unify",", an AI-native go-to-market platform, built a real-time backbone on ",[44,5257,5258],{},"StreamNative Cloud + Apache Pulsar"," that ingests ",[44,5261,5262],{},"tens of millions of events per day",", replacing batch jobs and legacy queuing so their platform can react to buyer signals in seconds and trigger downstream workflows immediately.",[55,5265,758],{"href":5251},[55,5267,5269],{"href":5268},"\u002Fsuccess-stories\u002Fsafari-ai-cuts-cloud-costs-by-50-while-scaling-real-time-computer-vision-analytics-with-streamnative",[44,5270,5271],{},"Safari AI"," scaled real-time ",[44,5274,5275],{},"computer vision"," analytics on top of customers’ existing camera infrastructure — tracking operational metrics like occupancy and queue wait times — and as they grew to ",[44,5278,5279],{},"10,000+ pipelines",[44,5281,5282],{},"50,000+ cameras",", StreamNative helped them achieve a ",[44,5285,5286],{},"50% infrastructure cost reduction"," while maintaining ",[44,5289,5290],{},"sub‑10‑second"," end-to-end delivery for real-time metrics.",[55,5293,758],{"href":5268},"And in security and fraud prevention, ",[55,5296,5298],{"href":5297},"\u002Fsuccess-stories\u002Fhow-q6-cyber-tamed-85-billion-cyberthreat-records-with-apache-pulsar-streamnative-new",[44,5299,5300],{},"Q6 Cyber"," replaced ",[44,5303,5304],{},"Google Cloud Pub\u002FSub"," with StreamNative’s Pulsar platform to process ",[44,5307,5308],{},"85B+ cyberthreat records",", using StreamNative as the transport layer at the center of their architecture while retaining the control they needed via BYOC.",[48,5311,5312,5313,5316,5317,5320,5321,5324],{},"These fast-growing organizations chose StreamNative for its unique ability to handle ",[36,5314,5315],{},"both"," high-throughput streaming and mission-critical messaging on one platform – a perfect fit for AI use cases that ingest massive data streams and respond in milliseconds. We also continued to serve ",[44,5318,5319],{},"large enterprises"," modernizing their infrastructures: more Fortune 500 companies moved from self-managed Kafka or legacy messaging systems to StreamNative Cloud to cut costs and accelerate development. This broad adoption across startups and enterprises drove our ",[44,5322,5323],{},"cloud usage"," to new heights – In 2025, StreamNative’s Cloud business nearly tripled in revenue, while enterprise cloud revenue grew over 200% year over year—outpacing overall growth as large customers scaled mission-critical workloads.",[48,5326,5327,5328,5336,5337,5340],{},"On the global front, we made StreamNative Cloud more accessible than ever. In August, ",[55,5329,5331,5332,5335],{"href":5330},"\u002Fblog\u002Fstreamnative-cloud-now-available-for-public-preview-on-alibaba-cloud-marketplace#:~:text=We%E2%80%99re%20excited%20to%20announce%20that,on%20the%20Alibaba%20Cloud%20Marketplace","we launched ",[44,5333,5334],{},"StreamNative Cloud on Alibaba Cloud"," Marketplace",", entering the Chinese and Asia-Pacific cloud ecosystem. Now Alibaba Cloud users can subscribe to StreamNative’s fully-managed Pulsar\u002FUrsa service directly through their local cloud account. This public preview on Alibaba Cloud opened the door to organizations in regulated or region-specific markets who prefer Alibaba’s infrastructure. The offering brought the ",[44,5338,5339],{},"StreamNative’s Data Streaming Platform"," (messaging + lakehouse streaming) to Alibaba’s customer base, with seamless integration to Alibaba services like OSS (object storage) for lakehouse tiered storage. In addition, we extended our marketplace availability – by end of year, StreamNative Cloud listings existed on all three major cloud marketplaces, simplifying procurement for cloud-first enterprises.",[48,5342,5343,5344,5347,5348,5352,5353,5356,5357,5360],{},"Industry analysts took note of StreamNative’s rise. ",[44,5345,5346],{},"Forrester Research"," included StreamNative in ",[55,5349,5350],{"href":57},[36,5351,62],{},", marking our first appearance in this influential evaluation of streaming vendors. We were ",[44,5354,5355],{},"recognized as a “Contender”"," in the Wave – an impressive showing for our debut year – with Forrester highlighting that “",[36,5358,5359],{},"StreamNative excels at messaging and resource optimization","” and supports real-time analytics and event-driven use cases with strong scalability. The report noted our cost-efficient, Kafka-compatible architecture as a key strength appreciated by customers. This independent validation echoed an earlier recognition from GigaOm, which named StreamNative a Leader in its 2024 Radar for Streaming Data Platforms. Such accolades boosted our credibility in the market and have driven an uptick in inbound interest from enterprises looking to modernize their data infrastructure.",[40,5362,5364],{"id":5363},"community-events-and-thought-leadership",[44,5365,5366],{},"Community Events and Thought Leadership",[48,5368,5369,5370,5377,5378,5384,5385,5388,5389,5392,5393,5396],{},"Throughout 2025, StreamNative invested heavily in community education and thought leadership, convening the ",[55,5371,5374],{"href":5372,"rel":5373},"https:\u002F\u002Fdatastreaming-summit.org\u002F",[264],[44,5375,5376],{},"Data Streaming Summit"," series as a forum for practitioners. In the spring, we hosted ",[55,5379,5381],{"href":5380},"\u002Fblog\u002Fdata-streaming-summit-virtual-2025-recap",[44,5382,5383],{},"Data Streaming Summit Virtual 2025"," (May 29), a free two-day online conference that attracted thousands of attendees from around the globe. The virtual summit featured ",[44,5386,5387],{},"36+ sessions"," over multiple tracks, showcasing the latest trends and best practices in real-time data. A central theme was the emergence of ",[44,5390,5391],{},"“Agentic AI”"," – the idea of AI agents driven by streaming data – which was fitting given our Orca announcement. Talks from industry leaders explored how real-time streaming, unified lakehouse architectures, and open source technologies are converging to enable this next wave of intelligent systems. Other sessions dove into Pulsar 4.1’s improvements, user case studies of Pulsar replacing Kafka, and deep-dives into Ursa’s design. By removing geographical barriers, the ",[44,5394,5395],{},"virtual summit democratized knowledge",", allowing anyone to learn from streaming experts. The engagement was tremendous – live Q&As, community Slack discussions, and thousands of views on session recordings.",[48,5398,5399,5400,5406,5407,5410,5411,5414,5415,5418,5419,5422,5423,5426,5427,5430,5431,5434],{},"Building on that momentum, ",[55,5401,5403],{"href":5402},"\u002Fblog\u002Fdata-streaming-summit-2025-on-demand-is-live",[44,5404,5405],{},"Data Streaming Summit San Francisco 2025"," took place in-person on September 29–30 at the Grand Hyatt SFO. This marked the return of an in-person community conference (after prior Pulsar Summits), and it did not disappoint. Over 300 practitioners gathered to network and learn. The summit offered ",[44,5408,5409],{},"30+ sessions across four dedicated tracks",": ",[36,5412,5413],{},"Deep Dive"," (covering architecture and internals), ",[36,5416,5417],{},"Use Cases"," (real-world deployments), ",[36,5420,5421],{},"AI + Stream Processing",", and ",[36,5424,5425],{},"Streaming Lakehouse",". The agenda was packed with exciting content – from how Netflix runs Kafka at massive scale, to insider talks from LinkedIn, Uber, and OpenAI on their streaming infrastructures. Notably, the event was intentionally ",[44,5428,5429],{},"vendor-neutral and multi-technology",". While StreamNative played host, speakers and sponsors came from across the ecosystem: Amazon Web Services, Redpanda, Confluent, RisingWave, and more. This fostered honest discussions on comparing approaches and the future direction of streaming. A highlight was a keynote panel on ",[36,5432,5433],{},"real-time AI in production",", featuring contributors from both Pulsar and Kafka communities discussing how streaming systems must evolve to support AI workloads. The energy at the summit was electric – it underscored that the real-time data community is vibrant and united by common challenges regardless of the tool. By convening these events (virtual and in-person), we continue to support the broad data streaming community and ecosystem, facilitating knowledge-sharing that benefits the entire industry.",[40,5436,5438],{"id":5437},"looking-ahead-to-2026",[44,5439,5440],{},"Looking Ahead to 2026",[48,5442,5443,5444,5447,5448,5451,5452,5455,5456,5459,5460,5463,5464,5467,5468,5471,5472,5474],{},"As we celebrate the successes of 2025, we’re already gearing up for what’s next. ",[44,5445,5446],{},"Data streaming"," will continue to evolve from a siloed pipeline to an integrated ",[36,5449,5450],{},"“data backbone”"," for all enterprise analytics and AI. In 2026, StreamNative will double down on enabling the ",[44,5453,5454],{},"streaming lakehouse"," paradigm – expect even tighter integrations with lakehouse ecosystems, more connectors for real-time analytics, and features that make streaming data ",[36,5457,5458],{},"immediately usable"," for AI\u002FML. Our recently announced ",[44,5461,5462],{},"Agent Engine (Orca)"," will progress toward general availability, bringing ",[44,5465,5466],{},"event-driven agents"," into mainstream use. We plan to expand Orca’s capabilities, adding richer developer tooling, library integrations, and guardrails so that any organization can safely deploy AI agents that live in the stream. On the ",[44,5469,5470],{},"governance"," front, 2026 will see us delivering full ",[44,5473,5183],{}," and auditing features – from Schema Registry ACLs graduating to GA, to advanced schema validation and lineage tracking for streaming data.",[48,5476,5477,5478,5481,5482,5485,5486,5489],{},"In short, ",[44,5479,5480],{},"StreamNative’s vision for 2026"," is an open platform where data streams, batch data, and AI agents all come together in a governed, seamless fashion. We anticipate more enterprises will converge their messaging queues, streaming logs, and data lakes into one cohesive system – and we aim to be the backbone for that transformation. The team is already hard at work on ",[44,5483,5484],{},"Pulsar 5.0"," features, further performance optimizations, and one-click cloud experiences that push the envelope of simplicity and scale. Thank you to our customers, community, and partners for an incredible 2025 – and ",[44,5487,5488],{},"get ready for an even more exciting 2026",", where real-time data powers intelligence like never before!",{"title":18,"searchDepth":19,"depth":19,"links":5491},[5492,5493,5494,5495,5496,5497,5498,5499],{"id":4729,"depth":19,"text":4732},{"id":4854,"depth":19,"text":4857},{"id":4944,"depth":19,"text":4947},{"id":5046,"depth":19,"text":5049},{"id":5138,"depth":19,"text":5141},{"id":5236,"depth":19,"text":5239},{"id":5363,"depth":19,"text":5366},{"id":5437,"depth":19,"text":5440},"2026-01-06","Reflect on StreamNative’s 2025: Ursa Engine GA, lakehouse-native streaming, AI agents, global growth, and what’s ahead for data streaming in 2026.","\u002Fimgs\u002Fblogs\u002F695cfb58adb67ac386d1bab0_2025-in-review.png",{},"\u002Fblog\u002Fstreamnatives-2025-year-in-review","10 min read",{"title":4716,"description":5501},"blog\u002Fstreamnatives-2025-year-in-review",[1332,4152,5058,5509,3550,5376],"RBAC","R5lKty39nG1cUXzwZbrbmttRp7TdmzZxgErH-FooKnA",{"id":5512,"title":5513,"authors":5514,"body":5515,"category":3550,"createdAt":290,"date":5946,"description":5947,"extension":8,"featured":294,"image":5948,"isDraft":294,"link":290,"meta":5949,"navigation":7,"order":296,"path":5950,"readingTime":5505,"relatedResources":290,"seo":5951,"stem":5952,"tags":5953,"__hash__":5955},"blogs\u002Fblog\u002Fone-platform-two-profiles-streaming-for-latency-or-cost.md","One Platform, Two Profiles: Streaming for Latency or Cost",[311],{"type":15,"value":5516,"toc":5933},[5517,5520,5530,5536,5545,5548,5565,5568,5574,5584,5587,5607,5610,5616,5622,5627,5633,5639,5652,5658,5696,5701,5707,5745,5751,5762,5767,5805,5810,5815,5853,5859,5874,5880,5886,5889,5900,5903,5909,5916,5919,5923,5926],[48,5518,5519],{},"As streaming workloads continue to diversify, infrastructure requirements have become increasingly workload-specific. Low-latency, transactional event flows impose very different constraints than high-throughput ingestion pipelines feeding analytical systems or lakehouse storage.",[48,5521,5522,5523,5526,5527,190],{},"StreamNative’s evolution—from the Classic engine to the Ursa engine—has been driven by these differing requirements. ",[44,5524,5525],{},"Cluster Profiles"," formalize this evolution by allowing customers to explicitly select the infrastructure characteristics that best match their workload: ",[44,5528,5529],{},"latency optimization or cost optimization",[40,5531,5533],{"id":5532},"streamnative-classic-engine-latency-first-architecture",[44,5534,5535],{},"StreamNative Classic Engine: Latency-First Architecture",[48,5537,5538,5539,4003,5542,190],{},"The StreamNative Classic engine is based on the original Apache Pulsar architecture, built on ",[44,5540,5541],{},"ZooKeeper for metadata coordination",[44,5543,5544],{},"BookKeeper for durable log storage",[48,5546,5547],{},"This design offers:",[321,5549,5550,5553,5556,5559],{},[324,5551,5552],{},"Strong consistency and predictable write\u002Fread paths",[324,5554,5555],{},"Tight coupling between compute and storage for low-latency access",[324,5557,5558],{},"Mature operational semantics for real-time workloads",[324,5560,5561,5562],{},"Native support for both ",[44,5563,5564],{},"Pulsar and Kafka protocols",[48,5566,5567],{},"The Classic engine has been successfully deployed for latency-sensitive use cases where end-to-end response time and deterministic behavior are critical. However, this architecture inherently couples scaling and cost to persistent storage and network resources, making it less optimal for workloads prioritizing elastic scale and cost efficiency.",[40,5569,5571],{"id":5570},"ursa-engine-cloud-native-storage-decoupled-streaming",[44,5572,5573],{},"Ursa Engine: Cloud-Native, Storage-Decoupled Streaming",[48,5575,5576,5577,5580,5581,190],{},"To address scale, cost, and operational efficiency at cloud scale, StreamNative introduced the ",[44,5578,5579],{},"Ursa engine",", a next-generation streaming architecture designed around ",[44,5582,5583],{},"storage disaggregation",[48,5585,5586],{},"Key architectural characteristics include:",[321,5588,5589,5595,5601,5604],{},[324,5590,5591,5594],{},[44,5592,5593],{},"Object storage–based durability"," using S3, GCS, or Azure Blob Storage",[324,5596,5597,5600],{},[44,5598,5599],{},"Oxia"," for scalable, fault-tolerant metadata management",[324,5602,5603],{},"Decoupled compute and storage layers enabling independent scaling",[324,5605,5606],{},"Reduced reliance on replicated block storage and cross-AZ networking",[48,5608,5609],{},"By shifting durability to object storage and re-architecting metadata handling, Ursa significantly lowers infrastructure costs while enabling high-throughput streaming at scale. The Ursa engine is the strategic foundation for StreamNative’s future innovation and supports Kafka and Pulsar APIs without requiring protocol-specific infrastructure.",[40,5611,5613],{"id":5612},"abstracting-engines-with-cluster-profiles",[44,5614,5615],{},"Abstracting Engines with Cluster Profiles",[48,5617,5618,5619,190],{},"While both engines serve valid and important use cases, exposing engine choice directly to customers adds unnecessary complexity. Most users care less about internal architecture and more about ",[44,5620,5621],{},"performance characteristics, cost models, and operational behavior",[48,5623,5624,5626],{},[44,5625,5525],{}," provide a higher-level abstraction that maps workload intent to infrastructure behavior.",[40,5628,5630],{"id":5629},"cluster-profiles-in-detail",[44,5631,5632],{},"Cluster Profiles in Detail",[32,5634,5636],{"id":5635},"latency-optimized-cluster-profile",[44,5637,5638],{},"Latency Optimized Cluster Profile",[48,5640,3600,5641,5651],{},[55,5642,5645,758,5648],{"href":5643,"rel":5644},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fclusters\u002Fcluster-profiles-overview",[264],[44,5646,5647],{},"Latency Optimized",[44,5649,5650],{},"profile"," is designed for workloads that require consistently low end-to-end latency and predictable performance under load.",[3933,5653,5655],{"id":5654},"technical-characteristics",[44,5656,5657],{},"Technical characteristics:",[321,5659,5660,5666,5672,5678,5684,5690],{},[324,5661,5662,5665],{},[44,5663,5664],{},"Block storage–backed persistence"," using Apache BookKeeper for durable, replicated log storage",[324,5667,5668,5671],{},[44,5669,5670],{},"ZooKeeper-based metadata coordination"," for leader election and cluster state management",[324,5673,5674,5677],{},[44,5675,5676],{},"Tightly coupled compute and storage"," optimized for low-latency read\u002Fwrite paths",[324,5679,5680,5683],{},[44,5681,5682],{},"Synchronous replication"," to minimize tail latency and ensure predictable durability guarantees",[324,5685,5686,5689],{},[44,5687,5688],{},"Optimized networking paths"," for fast broker-to-storage communication",[324,5691,5692,5695],{},[44,5693,5694],{},"Native support for Pulsar and Kafka APIs"," with consistent semantics",[48,5697,5698],{},[384,5699],{"alt":18,"src":5700},"\u002Fimgs\u002Fblogs\u002F694a7109bf7bc1950eef7b85_1e3b12f1.png",[3933,5702,5704],{"id":5703},"recommended-workloads",[44,5705,5706],{},"Recommended workloads:",[321,5708,5709,5715,5721,5727,5733,5739],{},[324,5710,5711,5714],{},[44,5712,5713],{},"Event-driven microservices"," where events synchronously trigger downstream services and user-facing actions",[324,5716,5717,5720],{},[44,5718,5719],{},"Real-time transaction processing"," such as payments, order placement, and inventory updates with strict SLAs",[324,5722,5723,5726],{},[44,5724,5725],{},"Fraud detection and risk scoring"," pipelines requiring immediate event evaluation before action is taken",[324,5728,5729,5732],{},[44,5730,5731],{},"Operational alerting and monitoring"," systems with low tolerance for delivery latency or jitter",[324,5734,5735,5738],{},[44,5736,5737],{},"Online personalization and recommendation triggers"," where user experience depends on sub-second responses",[324,5740,5741,5744],{},[44,5742,5743],{},"Control-plane and coordination messaging"," for distributed systems requiring fast and consistent state propagation",[32,5746,5748],{"id":5747},"cost-optimized-cluster-profile",[44,5749,5750],{},"Cost Optimized Cluster Profile",[48,5752,3600,5753,5761],{},[55,5754,5756,758,5759],{"href":5643,"rel":5755},[264],[44,5757,5758],{},"Cost Optimized",[44,5760,5650],{}," leverages the Ursa engine’s storage-disaggregated architecture to maximize efficiency at scale.",[3933,5763,5765],{"id":5764},"technical-characteristics-1",[44,5766,5657],{},[321,5768,5769,5775,5781,5787,5793,5799],{},[324,5770,5771,5774],{},[44,5772,5773],{},"Object storage–backed persistence"," using Amazon S3, Google Cloud Storage, or Azure Blob Storage",[324,5776,5777,5780],{},[44,5778,5779],{},"Oxia-based metadata management"," for scalable, fault-tolerant coordination",[324,5782,5783,5786],{},[44,5784,5785],{},"Decoupled compute and storage layers"," enabling independent scaling and elasticity",[324,5788,5789,5792],{},[44,5790,5791],{},"Asynchronous durability and batching"," optimized for throughput and cost efficiency",[324,5794,5795,5798],{},[44,5796,5797],{},"Reduced reliance on replicated block storage"," and cross–availability zone networking",[324,5800,5801,5804],{},[44,5802,5803],{},"Unified support for Pulsar and Kafka APIs"," without protocol-specific infrastructure",[48,5806,5807],{},[384,5808],{"alt":18,"src":5809},"\u002Fimgs\u002Fblogs\u002F694a7109bf7bc1950eef7b88_c26ba597.png",[3933,5811,5813],{"id":5812},"recommended-workloads-1",[44,5814,5706],{},[321,5816,5817,5823,5829,5835,5841,5847],{},[324,5818,5819,5822],{},[44,5820,5821],{},"High-volume event ingestion"," from IoT devices, mobile applications, or telemetry sources",[324,5824,5825,5828],{},[44,5826,5827],{},"Streaming pipelines feeding lakehouse platforms"," such as Apache Iceberg and Delta Lake for analytics and AI",[324,5830,5831,5834],{},[44,5832,5833],{},"Clickstream and behavioral analytics"," optimized for throughput and scalable processing",[324,5836,5837,5840],{},[44,5838,5839],{},"Long-term event retention and replay"," for compliance, auditing, or ML feature backfills",[324,5842,5843,5846],{},[44,5844,5845],{},"Data replication and fan-out pipelines"," across regions, clouds, or downstream systems",[324,5848,5849,5852],{},[44,5850,5851],{},"Batch-to-stream modernization workloads"," where elasticity and cost efficiency outweigh low-latency requirements",[40,5854,5856],{"id":5855},"cost-characteristics-by-cluster-profile",[44,5857,5858],{},"Cost Characteristics by Cluster Profile",[48,5860,5861,5862,5867,5868,5870,5871,5873],{},"StreamNative Cluster Profiles are designed to align ",[55,5863,5866],{"href":5864,"rel":5865},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fbilling\u002Fbilling-overview#latency-optimized-byoc-%26-byoc-pro-clusters",[264],"infrastructure costs with workload requirements",": the ",[44,5869,5647],{}," profile prioritizes predictable, low-latency performance using tightly coupled compute and storage, resulting in higher infrastructure costs, while the ",[44,5872,5758],{}," profile leverages a storage-disaggregated, object-storage-backed architecture to significantly reduce storage, replication, and operational costs for high-throughput workloads.",[48,5875,5876],{},[384,5877],{"alt":5878,"src":5879},"__wf_reserved_inherit","\u002Fimgs\u002Fblogs\u002F694a6f8fbf7bc1950eee910a_iShot_2025-12-23_18.31.00.png",[40,5881,5883],{"id":5882},"unified-platform-consistent-apis",[44,5884,5885],{},"Unified Platform, Consistent APIs",[48,5887,5888],{},"Importantly, Cluster Profiles do not fragment the StreamNative platform. Across profiles, customers benefit from:",[321,5890,5891,5894,5897],{},[324,5892,5893],{},"A unified control plane and operational model",[324,5895,5896],{},"Consistent Kafka and Pulsar APIs",[324,5898,5899],{},"Shared security, governance, and observability capabilities",[48,5901,5902],{},"This allows teams to deploy multiple clusters with different profiles—aligned to workload requirements—without introducing platform sprawl.",[40,5904,5906],{"id":5905},"infrastructure-choice-as-a-first-class-concept",[44,5907,5908],{},"Infrastructure Choice as a First-Class Concept",[48,5910,5911,5912,5915],{},"Cluster Profiles represent a shift from engine-centric thinking to ",[44,5913,5914],{},"workload-driven infrastructure selection",". Instead of adapting applications to infrastructure constraints, customers can now select an infrastructure profile that aligns with latency, cost, and scale requirements—while remaining on a single, coherent streaming platform.",[48,5917,5918],{},"This approach reflects StreamNative’s broader strategy: evolving the platform architecture while simplifying how customers consume and operate streaming infrastructure.",[40,5920,5921],{"id":316},[44,5922,319],{},[48,5924,5925],{},"StreamNative Cluster Profiles let you select streaming infrastructure based on what matters most—low latency or cost efficiency—building on StreamNative’s evolution from the Classic engine to the cloud-native Ursa engine.",[48,5927,5928,5929],{},"To get started with StreamNative Cloud, ",[55,5930,5932],{"href":3907,"rel":5931},[264],"Sign up for a trial.",{"title":18,"searchDepth":19,"depth":19,"links":5934},[5935,5936,5937,5938,5942,5943,5944,5945],{"id":5532,"depth":19,"text":5535},{"id":5570,"depth":19,"text":5573},{"id":5612,"depth":19,"text":5615},{"id":5629,"depth":19,"text":5632,"children":5939},[5940,5941],{"id":5635,"depth":279,"text":5638},{"id":5747,"depth":279,"text":5750},{"id":5855,"depth":19,"text":5858},{"id":5882,"depth":19,"text":5885},{"id":5905,"depth":19,"text":5908},{"id":316,"depth":19,"text":319},"2025-12-23","StreamNative Cluster Profiles offer two paths: Latency Optimized for real-time performance (Classic) and Cost Optimized for cloud-native scale (Ursa). Find your perfect fit.","\u002Fimgs\u002Fblogs\u002F694a63c82f663883919c7e32_cloud-clusters.png",{},"\u002Fblog\u002Fone-platform-two-profiles-streaming-for-latency-or-cost",{"title":5513,"description":5947},"blog\u002Fone-platform-two-profiles-streaming-for-latency-or-cost",[3550,821,1332,5647,5954],"TCO","0B1Hd8bqBbOX3lEDFh9bCp4xD7YUDzZTH021rlU57Qk",{"id":25,"title":26,"authors":5957,"body":5958,"category":289,"createdAt":290,"date":291,"description":292,"extension":8,"featured":7,"image":293,"isDraft":294,"link":290,"meta":6120,"navigation":7,"order":296,"path":297,"readingTime":298,"relatedResources":290,"seo":6121,"stem":300,"tags":6122,"__hash__":305},[28],{"type":15,"value":5959,"toc":6109},[5960,5964,5968,5970,5980,5984,5988,5990,5994,6002,6006,6010,6012,6016,6018,6022,6026,6028,6030,6036,6040,6044,6046,6048,6050,6054,6058,6060,6062,6066,6068,6070,6080,6082,6084,6086,6088,6090,6092,6094,6103,6105],[32,5961,5962],{"id":34},[36,5963,38],{},[40,5965,5966],{"id":42},[44,5967,46],{},[48,5969,50],{},[48,5971,53,5972,63,5978],{},[55,5973,5974],{"href":57},[36,5975,5976],{},[44,5977,62],{},[44,5979,66],{},[48,5981,69,5982],{},[36,5983,72],{},[48,5985,5986],{},[36,5987,77],{},[48,5989,80],{},[40,5991,5992],{"id":83},[44,5993,86],{},[48,5995,89,5996,93,5998,97,6000,101],{},[44,5997,92],{},[44,5999,96],{},[44,6001,100],{},[48,6003,104,6004,108],{},[36,6005,107],{},[40,6007,6008],{"id":111},[44,6009,114],{},[48,6011,117],{},[48,6013,120,6014,108],{},[36,6015,123],{},[48,6017,126],{},[48,6019,129,6020,133],{},[36,6021,132],{},[40,6023,6024],{"id":136},[44,6025,139],{},[48,6027,142],{},[48,6029,145],{},[48,6031,148,6032,152,6034,156],{},[36,6033,151],{},[44,6035,155],{},[48,6037,159,6038],{},[36,6039,162],{},[40,6041,6042],{"id":165},[44,6043,168],{},[48,6045,171],{},[48,6047,174],{},[48,6049,177],{},[40,6051,6052],{"id":180},[44,6053,183],{},[48,6055,186,6056,190],{},[44,6057,189],{},[48,6059,193],{},[48,6061,196],{},[48,6063,199,6064,203],{},[44,6065,202],{},[48,6067,206],{},[208,6069],{},[32,6071,6072],{"id":212},[44,6073,215,6074,223],{},[44,6075,6076],{},[55,6077,6078],{"href":57},[44,6079,222],{},[225,6081,228],{"id":227},[225,6083,232],{"id":231},[225,6085,236],{"id":235},[225,6087,240],{"id":239},[225,6089,244],{"id":243},[225,6091,248],{"id":247},[208,6093],{},[252,6095,255,6096,259,6098,190],{"id":254},[36,6097,258],{},[55,6099,6101],{"href":262,"rel":6100},[264],[36,6102,267],{},[208,6104],{},[252,6106,6107],{"id":272},[36,6108,275],{},{"title":18,"searchDepth":19,"depth":19,"links":6110},[6111,6112,6113,6114,6115,6116,6117],{"id":34,"depth":279,"text":38},{"id":42,"depth":19,"text":46},{"id":83,"depth":19,"text":86},{"id":111,"depth":19,"text":114},{"id":136,"depth":19,"text":139},{"id":165,"depth":19,"text":168},{"id":180,"depth":19,"text":183,"children":6118},[6119],{"id":212,"depth":279,"text":288},{},{"title":26,"description":292},[302,303,304],{"id":6124,"title":6125,"authors":6126,"body":6128,"category":5376,"createdAt":290,"date":6280,"description":6281,"extension":8,"featured":294,"image":6282,"isDraft":294,"link":290,"meta":6283,"navigation":7,"order":296,"path":5402,"readingTime":3556,"relatedResources":290,"seo":6284,"stem":6285,"tags":6286,"__hash__":6287},"blogs\u002Fblog\u002Fdata-streaming-summit-2025-on-demand-is-live.md","Data Streaming Summit 2025 — On-Demand Is Live",[6127],"Kathy Song",{"type":15,"value":6129,"toc":6267},[6130,6144,6148,6151,6158,6162,6165,6168,6172,6175,6178,6182,6185,6188,6192,6195,6198,6202,6205,6208,6212,6215,6219,6222,6226,6229,6232,6235,6238,6241,6244,6247,6250,6254,6261,6264],[48,6131,6132,6133,6138,6139,6143],{},"We wrapped the ",[55,6134,6137],{"href":6135,"rel":6136},"https:\u002F\u002Fdatastreaming-summit.org\u002Fevent\u002Fdata-streaming-sf-2025",[264],"Data Streaming Summit San Francisco"," on September 30. Since then, we’ve been editing and polishing every talk so you can revisit the ideas—or catch the ones you missed. Today, the full set of DSS 2025 session videos is live ",[55,6140,6142],{"href":6141},"\u002Fdata-streaming-summit","on demand",", and we’re spotlighting the morning Keynote that set the tone for the day.",[40,6145,6147],{"id":6146},"watch-the-keynote","Watch the Keynote",[48,6149,6150],{},"“Streaming cost is very high. Streaming and analytics live in silos. And real-time AI is now a real requirement.” With that, StreamNative CEO Sijie Guo opened the second in-person Data Streaming Summit and framed the morning around three concrete goals: scale without runaway bills, erase the boundary between streams and tables, and prepare infrastructure for agents that operate on live data.",[3933,6152,6154],{"id":6153},"watch-the-dss-2025-keynote-now-︎",[55,6155,6157],{"href":6156},"\u002Fvideos\u002Fdss-san-francisco-2025-data-streaming-summit-keynote","Watch the DSS 2025 Keynote Now ▶︎",[40,6159,6161],{"id":6160},"keynote-recap-from-data-to-intelligence-cost-lakehouse-and-ai-agents","Keynote Recap: From Data to Intelligence — Cost, Lakehouse, and AI Agents",[48,6163,6164],{},"What followed was a tightly connected story that doubles as a blueprint for data streaming in the Agentic Era. The pattern is intentionally simple and repeatable: stream the data, accelerate the insights, and empower the agents. StreamNative showed how Ursa, a lakehouse‑native engine, is production‑ready and now plugs directly into classic Pulsar clusters, giving teams a way to bend the cloud cost curve without rewriting applications. StreamNative Cloud added managed Apache Iceberg tables in Databricks Unity Catalog, turning topics into queryable, governed tables the moment events arrive—no connectors, no cron jobs, no duplicate copies to keep in sync. And the new Orca Agent Engine places AI agents in the event fabric with state, governance, delayed delivery, and replay built in. Leaders from LinkedIn and OpenAI rounded out the morning with hard‑earned patterns for scaling beyond partition‑era limits, simplifying consumption for product teams, and preparing for the next 10×.",[48,6166,6167],{},"Sijie centered the conversation on three pressures nearly every team feels. Cost comes first: in public clouds, inter‑AZ data transfer, disk‑based replication, and monolithic cluster sizing conspire to make steady‑state streaming expensive. Silos come next: moving data from topics to tables taxes teams with a connector tax—extra compute, extra network hops, and duplicated data—just to make streams usable for analytics. And AI is no longer speculative: agents, retrieval, online features, and human‑in‑the‑loop workflows now depend on fresh signals and clean decoupling, with governance, end to end. The keynote answered each of these with incremental architecture—choices you can adopt at your own pace—that carry data from motion and rest to action.",[32,6169,6171],{"id":6170},"scale-without-runaway-cost-ursa-the-lakehousenative-engine","Scale Without Runaway Cost — Ursa, the Lakehouse‑Native Engine",[48,6173,6174],{},"Matteo Merli, StreamNative Co-founder and CTO, began by retracing the pain that motivated Ursa. Crossing availability‑zone boundaries for replication and connectors incurs hefty inter‑AZ cost; disk‑based replication forces brokers to chat constantly; and monolithic, partition‑bound clusters must be over‑provisioned to survive peaks. Downstream, every new sink process adds connector compute, network overhead, and duplicate storage. Ursa addresses these directly by treating object storage as primary for lakehouse‑native streams while preserving a disk write‑ahead path for low‑latency topics. The effect is one pipeline with two profiles—cost‑optimized and latency‑optimized—that still lands one canonical copy in the lakehouse so analytics never fall behind.",[48,6176,6177],{},"Earlier this year Ursa moved from preview to GA, gained production proof, and earned a Best Industry Paper award for its architecture. The most important operational detail is adoption: Ursa now ships as a storage extension inside classic Pulsar, so operators can enable lakehouse integration per namespace or per topic, keep disk where latency is critical, and let everything else flow to object storage over time—no migration day. In a 5 GB\u002Fs benchmark, the design removed inter‑AZ churn, trimmed over‑provisioned compute, and eliminated disk‑replication tax, yielding dramatic cost reductions.",[32,6179,6181],{"id":6180},"make-streams-firstclass-tables-unity-catalog-iceberg-natively-governed","Make Streams First‑Class Tables — Unity Catalog + Iceberg, Natively Governed",[48,6183,6184],{},"Kundan from StreamNative and Michelle from Databricks made the stream‑to‑table path feel built‑in. Over the past year, StreamNative Cloud added Iceberg REST catalogs, Delta Lake with Unity Catalog, Snowflake Open Catalog, and Amazon S3 Tables. On stage, the integration matured again: managed Iceberg tables in Unity Catalog now sit alongside Delta, so teams can choose their open format without changing their pipeline. In practice, you register a catalog once, point a cluster at it, and select the topics that should materialize as tables.",[48,6186,6187],{},"The “Acme Commerce” demo showed orders, products, and customers streaming into StreamNative Cloud; those topics immediately surfaced as Iceberg tables in Unity Catalog and landed as Parquet and manifests in S3. In Catalog Explorer, attribute‑based access control masked PII as soon as it arrived, and Genie answered natural‑language questions by generating SQL to confirm shape and freshness. The takeaway is strategic and simple: governed, open tables should be the default target for streaming data, and catalogs should manage access, discovery, lineage, metrics, and quality from the first write. This is how you erase the stream\u002Fwarehouse divide and run a streaming lakehouse that keeps analytics in lockstep with operations.",[32,6189,6191],{"id":6190},"data-in-action-orca-places-agents-in-the-event-fabric","Data in Action — Orca Places Agents in the Event Fabric",[48,6193,6194],{},"Neng Lu, Director of Platform Engineering at StreamNative, introduced the Orca Agent Engine with the keynote’s central premise: agents only become reliable when they are event‑driven. Orca is Python‑first and bring‑your‑own‑agent; if you’ve built on the OpenAI SDK or Google ADK, you package the agent and deploy it into the stream. The agent subscribes to topics, calls tools, and emits events. Under the hood, Orca guarantees at‑least‑once delivery, supports delayed messages for scheduled work, enables replay for backfill and recovery, enforces rate limits, and honors RBAC through StreamNative’s GA role‑based controls. Because Orca speaks MCP, agents can discover and invoke other tools and agents dynamically, turning brittle point‑to‑point chains into composable, event‑driven systems.",[48,6196,6197],{},"The live demo kept the code small and the lesson clear. A “weather agent” was zipped with a short YAML file and deployed, scaled from one to two replicas on command, and—after a restart—re‑hydrated context from persistent memory. A second agent queried the MCP registry, discovered the weather tool, and called it—proof that agents can cooperate through events rather than tight RPCs. Patterns like scheduled triggers, automatic retries, parallel fan‑out, agent meshes, and policy‑driven governance fall naturally out of this design, and Oka is available today across Serverless, Dedicated, and BYOC.",[32,6199,6201],{"id":6200},"architectures-that-validate-the-blueprint-linkedin-and-openai","Architectures That Validate the Blueprint — LinkedIn and OpenAI",[48,6203,6204],{},"LinkedIn unveiled Northgard, a next‑generation log store built for 32+ trillion records\u002Fday, 17+ PB\u002Fday, roughly 400,000 topics, and ~10,000 brokers across ~150 clusters. The shift is to make the segment—not the partition—the unit of replication. Topics are composed of ranges (sequences of segments). When a segment seals, the next segment chooses a fresh replica set—often including new brokers—so capacity is used immediately and clusters self‑balance without shuffling history. Metadata is sharded across Raft‑backed vnodes arranged on a consistent‑hash ring, so there is no single hot controller, and brokers only hold minimal global state. Operationally, brokers can be added without Cruise‑Control‑style moves, producer failover is near‑instant as a newly sealed segment takes over, and acknowledgments correspond to fsync on all replicas—on the order of every 10 ms, 20,000 records, or 10 MB.",[48,6206,6207],{},"OpenAI showed how streaming powers the company’s data flywheel—usage events train better models which drive more usage—plus experimentation, model distillation, conversation search and memory extraction, rate limiting, counters, and ML features (via Chronon). The stack pairs Kafka for durable storage with Flink for processing, deployed across regions with a developer experience that hides infrastructure complexity. A publish proxy called Prism distributes writes across clusters; on the other side, uForwarder (from Uber) pulls from Kafka and pushes to consumers, handling retries and DLQs while fanning out beyond partition counts. The trade‑offs are explicit—no global ordering or cross‑cluster partitioning, at‑least‑once delivery by default—while the remedies are practical: sort by logical clocks downstream when needed, and rely on idempotent sinks or de‑duplication for exactly‑once behavior. Adoption tells the story: 200+ processors, roughly 80 GB\u002Fs peak throughput, and growth around 3× per quarter. The team is exploring tiered storage, disaggregated brokers, self‑healing control planes, and a native lakehouse injection path to further collapse the stream\u002Fwarehouse divide.",[32,6209,6211],{"id":6210},"proof-in-the-wild-customer-impact-that-counts","Proof in the Wild — Customer Impact That Counts",[48,6213,6214],{},"Motorq, a connected‑vehicle intelligence platform, outlined a clear target: elastic scale without hand‑rolled isolation, native multi‑tenancy, lower cost, native lakehouse writes, and schema contracts to prevent drift. By leveraging StreamNative and Ursa, Motorq reports about 50% lower streaming cost, lakehouse latency down from about an hour to minutes, and ingestion cost ~60% lower. The pipeline itself is simpler—no custom connectors or sync jobs—while Schema Registry catches errors early. Features like Key_Shared scale consumers without giving up order per key, and Iceberg tables (via the REST catalog) expose insights to both internal teams and external partners.",[32,6216,6218],{"id":6217},"where-the-lakehouse-goes-next-a-fireside-on-streaming-lakehouse-and-agents","Where the Lakehouse Goes Next — A Fireside on Streaming, Lakehouse, and Agents",[48,6220,6221],{},"In the closing conversation, Sijie sat down with Reynold Xin, Databricks co‑founder and chief architect. Reynold revisited why the lakehouse exists: put data on open tables in object storage and run every workload there. He emphasized single‑file commit semantics—baked into Delta from the start and now surfacing across open formats—as essential for low‑latency ingestion. He argued that governance must reach streams, not just tables, and predicted that Parquet will evolve to be more stream‑friendly and faster to decode as the lakehouse becomes the default home for hot data. For agents, he offered a practical test: if you want them to act safely, infrastructure needs git‑for‑data—branching, checkpoint, replay—so exploration and automation don’t endanger production. Looking forward, he encouraged leaders to keep an open mind, experiment through the hype cycle, and optimize not only for throughput and cost but also for provisioning speed, elastic scale, and rapid branching—capabilities that matter when thousands of agents collaborate at machine bandwidth.",[40,6223,6225],{"id":6224},"real-world-insights-breakout-sessions-and-use-cases","Real-World Insights: Breakout Sessions and Use Cases",[48,6227,6228],{},"With the stage set by the keynote, the Summit’s breakout sessions dove into practical challenges and innovations from across the industry. With four tracks running in parallel, attendees had to choose from a smorgasbord of topics—but a few talks truly stood out.",[48,6230,6231],{},"OpenAI — “Streaming to Scale: Real-Time Infrastructure for AI.” A deep look at how OpenAI’s engineers manage streaming data pipelines to serve AI models in production. The talk covered the architectural choices that ensure AI workloads get the data they need with minimal latency, and how streaming fits into an AI-driven organization’s stack.",[48,6233,6234],{},"Netflix — “Kafka Under Pressure.” An eye-opening tale of pushing Apache Kafka to its limits. Netflix shared the trials and triumphs of operating Kafka at massive scale, what happens when you max out throughput, and how they addressed bottlenecks—from broker tuning to architectural guardrails—to keep the platform reliable.",[48,6236,6237],{},"Salesforce — “Streaming 300B+ Telemetry Events per Day with Flink.” Yes, 300 billion daily events. Salesforce discussed their unified observability pipeline on Apache Flink—stateful processing, exactly-once guarantees, and the operational practices that keep services performant and monitored in real time.",[48,6239,6240],{},"Uber — “Safe Streams at Scale.” A masterclass in reliability: guaranteeing delivery across geo-distributed datacenters and implementing guardrails that prevent bad data or spikes from cascading into outages.",[48,6242,6243],{},"Blueshift — “Building a Scalable Customer Engagement Pipeline with Pulsar.” A startup perspective on moving from legacy queues to Pulsar for event ingestion and notifications—lower latency, higher fault tolerance, and patterns any team can borrow when modernizing messaging.",[48,6245,6246],{},"Google — “Beyond Stream Ingestion with Just SQL.” An exploration of streaming analytics with familiar tools. The team showed how far standard SQL (on Pub\u002FSub + BigQuery and friends) can go—and when “just SQL” simplifies problems that once demanded bespoke stream processors.",[48,6248,6249],{},"These are just a few highlights—30+ sessions covered everything from emerging streaming benchmarks to fintech, IoT, and AI case studies. A common thread ran through the day: operationalizing streaming. It’s not only about fast pipelines; it’s about making them cost-efficient, reliable, and integrated with the rest of the data estate—lakehouse tables, ML features, governance, and quality. The community vibe matched the content: Pulsar committers chatting with Kafka veterans, cloud engineers swapping tips with AI researchers.",[40,6251,6253],{"id":6252},"start-watching","Start Watching",[48,6255,6256,6257,6260],{},"Begin with the Keynote to see where data streaming is headed and how teams are delivering it in production. From there, dive into the ",[55,6258,6259],{"href":6141},"full DSS 2025 on-demand playlist"," for the Netflix, OpenAI, and Uber architecture talks, Motorq’s customer story, and deep dives across Pulsar, Kafka, Flink, and Iceberg.",[48,6262,6263],{},"If you operate real-time systems, the path forward is more pragmatic than ever. Ursa lets you keep latency-sensitive topics on disk and move everything else to object storage while landing a consistent copy in the lakehouse. Unity Catalog integration removes the connector tax and brings governance to the moment of arrival. And Orca puts agents where they belong—in the stream—so they can perceive, reason, and act with the guarantees and controls you already trust.",[48,6265,6266],{},"Watch the Keynote, then explore the sessions that matter most to your roadmap. The videos are live; the ideas are yours to ship.",{"title":18,"searchDepth":19,"depth":19,"links":6268},[6269,6270,6278,6279],{"id":6146,"depth":19,"text":6147},{"id":6160,"depth":19,"text":6161,"children":6271},[6272,6273,6274,6275,6276,6277],{"id":6170,"depth":279,"text":6171},{"id":6180,"depth":279,"text":6181},{"id":6190,"depth":279,"text":6191},{"id":6200,"depth":279,"text":6201},{"id":6210,"depth":279,"text":6211},{"id":6217,"depth":279,"text":6218},{"id":6224,"depth":19,"text":6225},{"id":6252,"depth":19,"text":6253},"2025-11-11","Data Streaming Summit 2025 on-demand is live. Watch the keynote and 30+ sessions on streaming cost, lakehouse design, AI agents, and real-time architectures.","\u002Fimgs\u002Fblogs\u002F6912cc58492764d8ea702956_DSS-Video-on-demand.png",{},{"title":6125,"description":6281},"blog\u002Fdata-streaming-summit-2025-on-demand-is-live",[5376,1332,5058],"NlrhhlgxePaeawoT6kyPdrC4aKx86TKy4k5b1cO7acc",{"id":6289,"title":6290,"authors":6291,"body":6292,"category":6415,"createdAt":290,"date":6416,"description":6417,"extension":8,"featured":294,"image":6418,"isDraft":294,"link":290,"meta":6419,"navigation":7,"order":296,"path":4893,"readingTime":5505,"relatedResources":290,"seo":6420,"stem":6421,"tags":6422,"__hash__":6423},"blogs\u002Fblog\u002Foctober-2025-data-streaming-launch-adaptive-linking-cloud-spanner-connector-and-orca-with-langgraph.md","October 2025 Data Streaming Launch: Adaptive Linking, Cloud Spanner Connector, and Orca with LangGraph",[810],{"type":15,"value":6293,"toc":6409},[6294,6297,6301,6307,6310,6313,6321,6326,6331,6334,6338,6341,6344,6347,6350,6355,6358,6362,6373,6376,6379,6387,6395,6399,6402],[48,6295,6296],{},"Modern data teams keep circling the same three priorities: keep streaming costs under control, connect more systems with less custom glue, and turn real-time data into intelligent actions. This month’s release focuses on all three. We’re introducing capabilities that make cross-cluster streaming more cost-efficient, add first-class connectivity for Google Cloud Spanner, and advance event-driven agents with LangGraph support in Orca.",[40,6298,6300],{"id":6299},"unilink-adaptive-linking","UniLink Adaptive Linking",[48,6302,6303,6306],{},[55,6304,4152],{"href":6305},"\u002Fproducts\u002Funiversal-linking"," introduces Adaptive Linking with two modes—stateful and stateless—to match your migration strategy and rollout pace. The goal is to give teams precise control over offset semantics and rollout sequencing so migrations stop being all-or-nothing and start feeling like an engineering choice.",[48,6308,6309],{},"In stateful mode, UniLink preserves offsets across clusters. Consumers see a continuous stream, cutovers are clean, and rollback remains straightforward because source and destination positions align. To maintain that continuity, the final step requires all producers to switch from the old cluster to the new one in a coordinated window. Stateful is the right fit for strict auditability and environments where you must prove message order and position through a cutover.",[48,6311,6312],{},"In stateless mode, UniLink does not preserve offsets on the destination. That single decision dramatically relaxes producer rollout requirements. When some services need weeks or months to move, you can keep consumers on the destination and let producers migrate on their own schedule. Stateless shines in multi-team migrations, long-tail services, and pipelines built with processors that tolerate an offset change",[48,6314,6315,6316,4929,6318,6320],{},"To make re-platforming smoother, UniLink now supports topic-rename mapping. You can mirror ",[4926,6317,4928],{},[4926,6319,4932],{},", reorganize topics under different namespaces, or consolidate topics without breaking schemas or consumer-group behavior. Combined with mode selection, this lets you shape the migration to match your organization: coordinate tightly when you can, or stretch the rollout when you need to.",[48,6322,6323],{},[384,6324],{"alt":18,"src":6325},"\u002Fimgs\u002Fblogs\u002F6900d943c1bb5c3e3a997170_51bfaa71.png",[48,6327,6328],{},[384,6329],{"alt":18,"src":6330},"\u002Fimgs\u002Fblogs\u002F6900d943c1bb5c3e3a997173_f1c16865.png",[48,6332,6333],{},"CTA: To try out Adaptive Linking, start with stateful links for consumer moves, then switch to stateless as you migrate producers and reorganize topics with topic-rename mapping.",[40,6335,6337],{"id":6336},"debezium-cloud-spanner-source-streaming-spanner-changes-managed","Debezium Cloud Spanner Source: Streaming Spanner Changes, Managed",[48,6339,6340],{},"Connectivity expands this month with a Debezium Cloud Spanner Source connector. If Spanner powers your transactional workloads, you can now stream its change events into Kafka-compatible topics on StreamNative without custom pollers or batch jobs. The connector listens to Spanner change streams and emits per-row change events in near real time.",[48,6342,6343],{},"Because the connector is fully managed, setup is simple. Provide connection details, the project ID, Spanner instance ID, Spanner database ID, as well as the change stream name. Optionally, the user can also provide a start timestamp and an end timestamp; the platform handles scaling, partitioning, offset management, and delivery into your topics. Operations teams get the same observability they have for other connectors: throughput, lag, error rates, and retry visibility in one place.",[48,6345,6346],{},"This unlocks clear patterns. Microservices can subscribe to Spanner change topics to trigger workflows the moment a business event lands—order confirmations, fraud checks, fulfillment starts. Analytics teams can keep lakehouse tables fresh without nightly ETL, cutting latency from hours to minutes. Platform owners can centralize CDC across databases while keeping a single streaming backbone for transport, governance, and replay.",[48,6348,6349],{},"For developers already familiar with Debezium, the experience will feel natural: declarative configuration, a clear event model, and strong compatibility with the Kafka Connect ecosystem. For teams new to CDC, the value is turning operational state into event streams with a few clicks while the platform handles the heavy lifting.",[48,6351,6352],{},[384,6353],{"alt":18,"src":6354},"\u002Fimgs\u002Fblogs\u002F6900d943c1bb5c3e3a997176_fad90892.png",[48,6356,6357],{},"CTA: In Console, add a Cloud Spanner Source and stream your first table to a Kafka-compatible topic in minutes.",[40,6359,6361],{"id":6360},"orca-agent-engine-with-langgraph-private-preview","Orca Agent Engine with LangGraph (Private Preview)",[48,6363,6364,6365,6368,6369,6372],{},"During the Data Streaming Summit last month, we ",[55,6366,6367],{"href":5066},"announced"," the ",[55,6370,6371],{"href":5055},"Orca Agent Engine"," in Private Preview; today we’re expanding that preview with native LangGraph support, alongside integrations with the Google Agent Development Kit (ADK) and the OpenAI Agent SDK.",[48,6374,6375],{},"The idea remains the same: if your core systems publish events, your AI should live where those events happen. Orca gives agents a durable, stream-native runtime—subscribe to topics, maintain memory, call tools and services, and emit new events—with concurrency, fault tolerance, and observability built in. LangGraph adds a structured way to design agent workflows: explicit multi-step reasoning, tool use, retries, and memory, expressed as graphs that Orca executes against live streams. Because every perception and action flows through the log, you get persistent context, a complete audit trail, and replay for debugging or compliance—no more ephemeral prompts or black-box decisions.",[48,6377,6378],{},"This is how event-driven agents move past prototypes. A monitoring agent can listen to security alerts, correlate with asset inventories, open tickets, and post runbooks when thresholds trip. A revenue operations agent can watch order and payment topics, reconcile anomalies against a system of record, and notify finance with the exact items to investigate. A developer productivity agent can observe CI events and test outcomes, file issues with the right context, and propose fixes based on known patterns. In each case, the agent is a long-lived service woven into your event fabric, not a one-off request.",[48,6380,6381,6382],{},"Getting started is intentionally simple. Point a LangGraph agent at a single topic, let Orca handle scaling and backpressure, then grow into multi-topic workflows, shared tools, and cross-agent collaboration. ▶️ ",[55,6383,6386],{"href":6384,"rel":6385},"https:\u002F\u002Fyoutu.be\u002Fu8lEIc7kEsA",[264],"Watch the Demo",[48,6388,6389,6390,6394],{},"CTA: ",[55,6391,6393],{"href":6392},"\u002Fcontact","Request Orca Private Preview access"," and deploy your first LangGraph agent on live streams.",[40,6396,6398],{"id":6397},"putting-it-together","Putting It Together",[48,6400,6401],{},"Adaptive Linking removes migration timing risk. Start with stateful to move consumers first with offsets preserved; when you migrate producers, switch to stateless to relax rollout and finish at your pace. Topic-rename mapping lets you reorganize topics during the move instead of after. The Debezium Cloud Spanner Source turns Spanner changes into first-class streams, replacing nightly jobs and custom bridges with a managed connector that keeps downstream data fresh. Orca with LangGraph turns those streams into action: agents subscribe to events, keep state, call tools, and emit new events with full audit trails and replay.",[48,6403,6404,6405,6408],{},"CTA: Try different linking modes and topic-rename mapping for your next migration, configure the Cloud Spanner Source in Console, and ",[55,6406,6407],{"href":6392},"request access to Orca (Private Preview)"," with LangGraph—Adaptive Linking and the Spanner connector are available now in StreamNative Cloud, and Orca with LangGraph is available through the preview program.",{"title":18,"searchDepth":19,"depth":19,"links":6410},[6411,6412,6413,6414],{"id":6299,"depth":19,"text":6300},{"id":6336,"depth":19,"text":6337},{"id":6360,"depth":19,"text":6361},{"id":6397,"depth":19,"text":6398},"Agentic AI","2025-10-28","StreamNative's October 2025 launch introduces Adaptive Linking for cost-efficient migrations, a managed Cloud Spanner connector for real-time CDC, and Orca with LangGraph for event-driven AI agents. Simplify streaming, connect data sources, and build intelligent workflows.","\u002Fimgs\u002Fblogs\u002F6900d991c47c2404dda9b55f_Oct-Data-Streaming-Launch.png",{},{"title":6290,"description":6417},"blog\u002Foctober-2025-data-streaming-launch-adaptive-linking-cloud-spanner-connector-and-orca-with-langgraph",[4152,5058,3988],"tmGJ0LJuoSO81LrrxKKMGzE8lCJtD4jzXll0o2BftaM",{"id":6425,"title":6426,"authors":6427,"body":6428,"category":6415,"createdAt":290,"date":6416,"description":6488,"extension":8,"featured":294,"image":6489,"isDraft":294,"link":290,"meta":6490,"navigation":7,"order":296,"path":6491,"readingTime":3556,"relatedResources":290,"seo":6492,"stem":6493,"tags":6494,"__hash__":6495},"blogs\u002Fblog\u002Fwhat-is-orca-agent-engine.md","What is Orca Agent Engine?",[810],{"type":15,"value":6429,"toc":6483},[6430,6434,6437,6440,6443,6446,6450,6453,6476,6480],[40,6431,6433],{"id":6432},"introduction-of-orca-agent-engine","Introduction of Orca Agent Engine",[48,6435,6436],{},"Autonomous AI agents are moving from research labs to real-world production, but until now the infrastructure to support them at enterprise scale has been lacking. Many teams have tried building AI agents in notebooks or demos using various frameworks, only to hit walls in production due to fragmented data, brittle pipelines, and siloed agent processes. Orca Agent Engine (formerly called StreamNative Agent Engine) is our answer to this challenge – an event-driven runtime and infrastructure designed for always-on, real-time AI agents. It provides a unified streaming backbone so developers can bring their own AI agents and run them with live data in a robust, scalable way.",[48,6438,6439],{},"Orca is not just another agent framework or library – it’s a streaming-native infrastructure layer for deploying, coordinating, and scaling AI agents in production. Think of it as the “missing backbone” that takes you from a prototype agent in a notebook to a production-grade autonomous service. Built on Apache Pulsar’s battle-tested serverless computing foundation (Pulsar Functions), Orca enhances agents with event-driven capabilities. You simply package your existing agent code – whether built with Google’s Agent Development Kit (ADK), OpenAI’s agent APIs, or even plain Python – and deploy it on Orca. Once deployed, the agent automatically joins a shared event bus and registry, immediately tapping into live data streams, maintaining its own state, and emitting actions under the platform’s governance and observability.",[48,6441,6442],{},"In essence, Orca Agent Engine provides the always-on “nervous system” that modern AI agents need but traditional setups lack. Instead of isolated bots operating on stale snapshots of data, each agent connects to a real-time stream of events that delivers fresh, millisecond-level context. The shared event bus acts as a live context layer and communication channel for agents, so they can react to new events instantly and even talk to each other by publishing events. Orca also gives agents a built-in memory: each agent maintains a persistent, distributed state that is continually updated and externalized as events, available for recall or audit. No more “black-box” agents with hidden state – an agent’s observations and decisions become part of an event log that can be inspected and traced later, providing much-needed transparency.",[48,6444,6445],{},"Crucially, Orca’s architecture is cloud-native, scalable, and resilient by design. Because it leverages a streaming data platform under the hood, agents benefit from horizontal scaling, load balancing, and fault tolerance out of the box. Agents run as distributed functions across a cluster – there’s no single choke point. If one instance goes down, others seamlessly take over, preventing any single agent failure from breaking the workflow. In short, Orca handles the hard parts of running always-on, distributed agents – much like Kubernetes did for microservices, Orca provides that operational backbone for AI agents. Developers can focus on their agents’ logic and goals while the platform takes care of real-time data plumbing, scaling, and reliability.",[40,6447,6449],{"id":6448},"key-capabilities-of-orca-agent-engine","Key Capabilities of Orca Agent Engine",[48,6451,6452],{},"Orca introduces a new paradigm for building real-time, AI-driven applications. Its core capabilities include:",[321,6454,6455,6458,6461,6464,6467,6470,6473],{},[324,6456,6457],{},"Event-Driven Streaming Runtime: Agents are “always on,” continuously listening to event streams and emitting new events. Rather than waiting for HTTP requests or periodic batches, agents subscribe to Apache Pulsar or Apache Kafka topics and react the moment events occur. This streaming-first design lets AI agents operate on up-to-the-second information – perfect for scenarios where data never sleeps. One agent’s output can trigger other agents by publishing events, forming an asynchronous pipeline of decisions and actions driven entirely by data flows.",[324,6459,6460],{},"Shared Event Bus (Unified Nervous System): All agents (and other workflows or applications) communicate over a unified event bus, eliminating silos. This bus provides a shared context layer for your AI ecosystem: agents no longer poll for updates or run in isolation, but receive a live feed of context (e.g. sensor readings, user actions, database changes, or other agents’ outputs) and can broadcast their own insights or alerts to others. The result is a network of agents that collaborate in real time, share facts, and avoid redundant work. The event bus comes with built-in features like message ordering, persistence, back-pressure handling, and replay – thanks to Pulsar’s log storage and the Kafka-compatible Ursa engine – so agents can even “time-travel” by replaying past events to recover context or test new logic.",[324,6462,6463],{},"Persistent Streaming Memory: Each agent can maintain stateful memory beyond a single prompt-response cycle. Backed by a distributed state store, an agent’s intermediate results or important observations are logged as events and stored for future reference. In practice, this means an agent can “remember” context over long conversations or continually learn from new data, rather than being stateless between requests. Because this memory is externalized to the event stream, you gain full visibility into what the agent knows – every piece of state or decision rationale can be audited and replayed later. This tackles one of the biggest challenges of agentic AI: making their decision-making process transparent and reproducible.",[324,6465,6466],{},"MCP Integration: In modern agent systems, functions are tools—and Orca streamlines safe tool use via the Model Context Protocol (MCP). Introduced by Anthropic, MCP provides a uniform, secure way for agents to invoke external tools and access data. Orca embraces MCP so agents can call REST APIs, query databases, read from live streams, invoke cloud services, or even manage infrastructure (e.g., Pulsar clusters) through a single interface. Behind the scenes, StreamNative’s open-source MCP Server bridges Pulsar\u002FKafka with external systems and exposes integrations as on-demand, discoverable functions. Define a tool once—with schema and authorization—and any agent can use it without custom glue code or credential sprawl. Combined with Orca’s unified registry, tools and even other agents become callable MCP components, with dynamic discovery keeping capabilities up to date. The result is a governed, auditable tool ecosystem that expands what agents can do—from vector lookups to workflow execution—while preserving security and control.",[324,6468,6469],{},"Modular, Composable Agents: Orca encourages a decomposed, microservices-like approach to building complex agents, unlike monolithic chain-of-thought scripts. Complex tasks can be split into multiple specialized agents or functions that each handle a sub-task and communicate via events. For example, a “fast path” agent might apply quick rule-based decisions on incoming events, while a “smart path” agent performs deeper LLM-powered analysis on trickier cases – both orchestrated through the event bus. This modular design makes workflows dynamic and evolvable: agents can decide at runtime to invoke different tools or even spawn other agents based on the situation. You can add, remove, or update individual agents (much like updating microservices) without rewriting a giant centralized program. In essence, Orca enables building a collaborative agent mesh – a collection of agents that discover and call each other as needed to solve a problem together.",[324,6471,6472],{},"Unified Registry and Tool Directory: Every agent deployed via Orca is registered in a central registry alongside other components like connectors and functions. This acts as a directory of all “brains” (agents) and available tools, along with their metadata (interfaces, versions, owners, etc.). The benefit is twofold: (a) Operators get one control plane to manage and monitor all agents – you can see what agents are running, their status, update them, set permissions, etc. in one place. (b) Agents themselves can perform dynamic lookup of tools or peer agents at runtime. For instance, an orchestrator agent might query the registry to find a specialized “expert” agent or a function, then invoke it as a sub-task. This makes it much easier to build composed workflows where agents use other agents or services as tools, without hard-coding all integrations. The combination of the registry and the event bus enables late-binding and discovery of capabilities at runtime, adding tremendous flexibility.",[324,6474,6475],{},"Bring Your Own Agent (Framework-Agnostic): One of Orca’s biggest strengths is its openness to existing AI ecosystems. Orca is framework-agnostic – it doesn’t force you to rewrite your logic in a new DSL or adhere to a proprietary “agent” API. Instead, you can plug in the agents you’ve already built with the tools you love. Whether your agent is powered by Google’s Agent Development Kit or OpenAI’s Agents API, or just custom Python code, it can run on Orca without modification. This means developers can leverage popular frameworks and models (LangChain, LangGraph and others are on the roadmap) while still benefiting from Orca’s event-driven runtime and governance. In practice, because Orca Agent Engine is framework-agnostic, you can bring an existing agent (for example, an OpenAI agent you’ve already built) and see it immediately run on live streaming data. This lowers the barrier to moving from prototype to production – no need to rebuild your agent from scratch, simply deploy it on Orca and gain the streaming “superpowers” of the platform.‍",[40,6477,6479],{"id":6478},"operational-backbone-for-ai-agents","Operational Backbone for AI Agents",[48,6481,6482],{},"In summary, Orca Agent Engine transforms AI agents from stateless functions or chatbots into always-on, event-driven services with persistent memory and enterprise-grade observability. By leveraging Apache Pulsar or Kafka as a shared event bus, it enables agents to collaborate and discover each other via the Model Context Protocol (MCP) – creating an “agent mesh” where autonomous agents can coordinate actions and share context in real time. And with built-in audit logs and governance, every agent decision and action can be traced end-to-end, which is critical for trust and compliance in production AI systems. Orca Agent Engine provides a simple, neutral, and future-proof backbone for organizations looking to operationalize AI agents on live data streams. It bridges the gap between cutting-edge AI logic and reliable, scalable infrastructure, allowing developers and architects to focus on the what (the agent’s goals and logic) while Orca handles the how (the real-time data integration, scaling, fault tolerance, and oversight).",{"title":18,"searchDepth":19,"depth":19,"links":6484},[6485,6486,6487],{"id":6432,"depth":19,"text":6433},{"id":6448,"depth":19,"text":6449},{"id":6478,"depth":19,"text":6479},"Unlock enterprise-scale AI with Orca Agent Engine. Discover our event-driven runtime and infrastructure for deploying, coordinating, and scaling real-time AI agents in production, built on Apache Pulsar.","\u002Fimgs\u002Fblogs\u002F6900cba7f128ba20b0ccbfd6_what-is-Orca.png",{},"\u002Fblog\u002Fwhat-is-orca-agent-engine",{"title":6426,"description":6488},"blog\u002Fwhat-is-orca-agent-engine",[3988,821,5058,3989],"57oGG0-7llEXBLUNmL4ugBdbJBl7-Igkl06sh2Vn8UQ",{"id":6497,"title":6498,"authors":6499,"body":6502,"category":6415,"createdAt":290,"date":6599,"description":6600,"extension":8,"featured":294,"image":6601,"isDraft":294,"link":290,"meta":6602,"navigation":7,"order":296,"path":5066,"readingTime":3556,"relatedResources":290,"seo":6603,"stem":6604,"tags":6605,"__hash__":6606},"blogs\u002Fblog\u002Fintroducing-orca-agent-engine-private-preview.md","Introducing Orca - StreamNative’s Agent Engine is Now Available for Private Preview",[810,6500,6501],"Rui Fu","Pengcheng Jiang",{"type":15,"value":6503,"toc":6593},[6504,6512,6515,6523,6526,6529,6533,6536,6562,6566,6569,6572,6576,6579,6587,6590],[48,6505,6506,6507,6511],{},"Autonomous AI agents are moving from research labs to real-world production systems – but until now, the infrastructure to support them at enterprise scale has been lacking. ",[55,6508,6510],{"href":6509},"\u002Fblog\u002Fintroducing-the-streamnative-agent-engine#:~:text=another%20a%20script%20making%20API,than%20developing%20the%20agent%E2%80%99s%20logic","Many teams have experimented with agent frameworks in notebooks or demos, only to hit walls in production due to fragmented data, brittle pipelines, and siloed agent processes",". StreamNative’s answer to this challenge is Orca Agent Engine (formerly known as “StreamNative Agent Engine”), an event-driven runtime and agent infrastructure designed for always-on, real-time AI agents. Previously available in limited early access, Orca Agent Engine is now in Private Preview on StreamNative Cloud – ready for developers to bring their own AI agents and run them on a unified streaming backbone.",[40,6513,6426],{"id":6514},"what-is-orca-agent-engine",[48,6516,6517,6518,6522],{},"Orca Agent Engine is not another agent framework or library – it’s a streaming-native infrastructure layer for deploying, coordinating, and scaling AI agents in production. Think of it as ",[55,6519,6521],{"href":6520},"\u002Fblog\u002Fintroducing-the-streamnative-agent-engine#:~:text=It%E2%80%99s%20clear%20that%20a%20new,grade%20autonomous%20services","the “missing backbone” that takes you from a prototype agent in a notebook to production-grade autonomous services",". Evolved from Apache Pulsar’s battle-tested serverless compute foundation (Pulsar Functions), the Engine enhances agents with the event-driven capability. You simply package your existing agent code – whether built with Google’s Agent Development Kit (ADK) or OpenAI’s agent APIs, or even plain Python – and deploy it. Once deployed, the agent automatically joins a shared event bus and registry, immediately tapping into live data streams, maintaining its own state, and emitting actions – all under the governance and observability of the platform.",[48,6524,6525],{},"In essence, Orca Agent Engine provides the always-on “nervous system” that modern AI agents need but traditional setups lack. Instead of isolated bots operating on stale snapshots of data, every agent connects to its own data stream that delivers fresh, millisecond-level context. This event bus serves as a real-time context layer and communication channel for agents, so they can react to new events instantly and even talk to each other through events. The Engine also gives agents a built-in memory: each agent can own a persistent, distributed state that’s continually updated with invocations, externalized as events, and available for recall or audit. No more “black-box” agents with hidden state – an agent’s observations and decisions become part of an event log that can be inspected and traced later, providing much-needed transparency.",[48,6527,6528],{},"Crucially, Orca Agent Engine’s architecture is cloud-native, scalable, and resilient by design. Because it leverages a data streaming platform under the hood, agents benefit from horizontal scaling, load balancing, and fault tolerance out of the box. Agents run as distributed functions across a cluster – there’s no single choke point. If one instance goes down, the workload is seamlessly picked up by others, preventing any single agent failure from breaking your entire workflow. In short, the Engine handles the hard parts of running always-on, distributed agents – much like Kubernetes did for microservices, Orca provides that operational backbone for AI agents. You can focus on your agents’ logic and goals, while the platform takes care of real-time data plumbing, scaling, and reliability.",[40,6530,6532],{"id":6531},"key-capabilities-and-features","Key Capabilities and Features",[48,6534,6535],{},"Orca Agent Engine introduces a new paradigm for building real-time AI-driven applications. Here are some of its core capabilities in this Private Preview:",[321,6537,6538,6541,6544,6547,6550,6553,6556,6559],{},[324,6539,6540],{},"Event-Driven, Streaming Runtime: Agents are always on, continuously listening to event streams and emitting new events as output. Rather than waiting for requests or periodic batches, agents subscribe to Apache Pulsar or Apache Kafka topics and react the moment events occur. This streaming-first design means your AI agents operate on up-to-the-second information – perfect for scenarios where data never sleeps. Agents can even trigger one another by publishing events, forming an asynchronous pipeline of decisions and actions driven entirely by data flows.",[324,6542,6543],{},"Shared Event Bus (“Nervous System”): All agents – and other event producers\u002Fconsumers – communicate over a unified event bus, eliminating silos. This bus acts as a shared context layer for your AI ecosystem. Agents no longer live in isolation or rely on polling for updates; instead, they receive a live feed of context (sensor readings, user actions, database changes, other agents' results\u002Finstructions, etc.) and can broadcast their own insights or alerts. The result is a network of agents that can collaborate in real time, share facts, and avoid redundant work. The event bus also provides built-in capabilities like message ordering, persistence, rate limiting control and replay (thanks to Pulsar’s log storage and the Kafka-compatible Ursa Engine), so agents can even “time-travel” or recover context as needed.",[324,6545,6546],{},"Persistent Streaming Memory: Each agent can maintain stateful memory beyond a single prompt-response cycle. Backed by a distributed state store, an agent’s intermediate results or important observations are logged as events and stored for future reference. This means an agent can “remember” context over long conversations or continually learn from new data, without being limited by the stateless request\u002Fresponse pattern. Because the memory is externalized to the event stream, you gain full visibility into what the agent knows – every piece of state or decision rationale can be audited and replayed. This streaming memory model tackles one of the biggest challenges of agentic AI: making their decision-making process transparent and reproducible.",[324,6548,6549],{},"Modular & Composable Agents: Orca encourages a decomposed approach to agent design, unlike monolithic chain-of-thought scripts. Complex tasks can be split into multiple specialized agents or functions that each handle a sub-task and communicate via events. For example, one “fast path” agent might apply quick rule-based decisions on incoming events, while a second “smart path” agent performs deeper LLM-powered analysis on the trickier cases – all orchestrated through the event bus. This modular design means your workflows are dynamic and data-driven: agents can decide at runtime to invoke different tools or even spawn other agents based on the situation. It also makes the system evolvable – you can add, remove, or update individual agents (much like microservices) without refactoring a giant codebase. In the Private Preview, you can experiment with building multi-agent systems where agents discover and call each other as needed, forming an “Agent Mesh” of collaborating services.",[324,6551,6552],{},"Unified Registry and Tool Directory: Every agent deployed via Orca is registered in a central registry alongside other components like connectors and functions. This registry acts as a directory of all “brains” (agents) and available tools (services, functions, data connectors), along with their metadata. The benefit is twofold: (a) operators get a single control plane to manage and monitor all agents – you can see at a glance what agents are running, pause or update them, review their version and permissions, etc.; and (b) agents themselves can perform dynamic lookup of tools\u002Fpeers. For instance, an orchestrator agent could query the registry to find a specific expert agent or function and then invoke it as a sub-task. This makes it much easier to build composed workflows where agents use other agents as tools. All of this is achieved without hard-coding integrations – the registry and event bus enable late-binding and discovery of capabilities at runtime.",[324,6554,6555],{},"Bring Your Own Agent Framework: One of Orca’s biggest strengths is its framework-agnostic design. We know developers have already invested in popular AI agent frameworks and libraries – and we’re not reinventing yet another framework and asking you to abandon your existing agents. Instead, Orca lets you plug in the agents you’ve built with the tools you love. Whether your agent is powered by Google's Agent Development Kit (ADK) or OpenAI's  Agents SDK or just custom python code, it can run on the Orca Agent Engine without modification. (support for LangChain, LlamaIndex are also coming soon!). The Engine takes care of hosting your agent, connecting it to streams, managing its lifecycle, and scaling it out – no proprietary SDK or rewrite required. Different frameworks can even coexist: you might deploy one agent built with ADK and another with OpenAI’s framework, and they can communicate via the common event bus. This “bring-your-own-framework” approach future-proofs your architecture – you can adopt new agent libraries as they emerge, and Orca will support them as long as they can interface with a Python function. In the current Preview, Python-based agents are supported, with plans to extend to other runtimes as the ecosystem grows.",[324,6557,6558],{},"Tools & External Integrations via MCP: In modern agent systems, functions are tools – and Orca makes it easy for agents to safely use tools through the emerging Model Context Protocol (MCP) standard. MCP (initially introduced by Anthropic) provides a uniform, secure way for AI agents to invoke external tools and access data. The Orca Engine embraces MCP so that your agents can call REST APIs, query databases, read from live data streams, invoke cloud services, or even manage infrastructure (like a Pulsar cluster) via natural language commands, all through a common interface. Under the hood, StreamNative’s own MCP Server (an open-source component) bridges your Pulsar\u002FKafka event streams with external tools and APIs, exposing them as on-demand functions that agents can discover. This means you don’t have to write custom glue code for each integration or worry about leaked credentials – define the tool once (with proper auth and schema) and any agent can use it when needed. With MCP and the unified registry, functions and even other agents become callable MCP tools. Orca Engine’s MCP integration also implements the complete MCP Client specification with dynamic tool discovery: any deployment change to an agent (added\u002Fupdated\u002Fretired) is automatically propagated to other agents that subscribe to it as a tool, and the corresponding agent contexts refresh in near real time with updated capabilities and status. This drastically expands what your agents can do (e.g. fetch knowledge from a vector database, send an alert email, execute a workflow) while keeping the interactions governed and auditable.",[324,6560,6561],{},"Enterprise-Grade Observability & Control: Running autonomous agents in production demands robust monitoring, security, and governance – and Orca is built with these needs in mind. Because agents communicate via standard event streams (Pulsar\u002FKafka) and log every action as events, you get a traceable audit log of every decision and tool invocation. Integration with StreamNative Cloud’s monitoring stack means you can trace event flows end-to-end, measure agent latencies, and catch anomalies. The unified registry and control plane let you enforce role-based access control (RBAC) for agents and tools, manage secrets securely, and roll out updates in a controlled way. Need to pause an agent’s autonomy in an emergency? You can disable its event subscriptions with a click. Want to debug why an agent made a certain decision? You can replay its event inputs or inspect its state log. In Private Preview, you’ll have access to searchable agent logs and basic tracing. These features give enterprises the confidence to deploy “agentic AI” with appropriate safeguards – ensuring that even as agents make decisions, humans stay in the loop with visibility and override capabilities.",[40,6563,6565],{"id":6564},"current-preview-availability-and-getting-started","Current Preview Availability and Getting Started",[48,6567,6568],{},"The Private Preview of Orca Agent Engine launches today on StreamNative Cloud, and we’re excited to open it up for more developers to explore. Initially, the Agent Engine is available on Bring-Your-Own-Cloud (BYOC) deployments of StreamNative Cloud (across AWS, GCP, and Azure), so you can run it in your own cloud environment with StreamNative managing the service. This ensures your data stays within your control while you experiment with real-time agents. Support for dedicated cloud clusters and other deployment modes is on the roadmap as we gather feedback in this preview phase.",[48,6570,6571],{},"At this stage of Private Preview, our focus is on core functionality and stability. The Engine supports Python-based agents and integrates seamlessly with Pulsar and Kafka streams (so you can feed it data from either platform). Key capabilities like the event bus, agent registry, persistent memory\u002Fcontext, and MCP tool connectivity are ready for use. We encourage you to try out the current capabilities and share your feedback with us. Your input will shape the roadmap toward GA (General Availability), including support for additional languages, deeper tooling, and more out-of-the-box agent patterns.",[40,6573,6575],{"id":6574},"try-orca-agent-engine-today","Try Orca Agent Engine Today",[48,6577,6578],{},"With Orca Agent Engine now in Private Preview, you can start turning your AI models and scripts into fully operational, event-driven services. Imagine an agent that monitors streaming customer support tickets and autonomously answers common issues, or a swarm of agents that detect anomalies in IoT sensor data and proactively coordinate responses – the possibilities are vast when you combine continuous streams with autonomous reasoning. We believe this decomposable, streaming-native approach will unlock a new class of intelligent applications that are reactive, context-aware, and scalable by design.",[48,6580,6581,6582,6586],{},"Ready to get started? ",[55,6583,6585],{"href":4688,"rel":6584},[264],"Sign up for the Private Preview"," through StreamNative Cloud (contact us via the StreamNative Console or your account team to enable the Orca Agent Engine preview on your cluster). Our documentation and quickstart guides will help you deploy your first agent in minutes. Point your agent to a live topic, deploy it via our CLI or console, and watch as it comes alive in the stream – processing events in real time. Because Orca Agent Engine is framework-agnostic, you can bring an existing agent (for example, an OpenAI agent you’ve already built) and see it immediately operate on live data streams. No more fake stub data or offline demos – your agent becomes a continuous online service.",[48,6588,6589],{},"We are incredibly excited to see what you build with Orca. This Private Preview is just the beginning, and we’ll be rapidly iterating on the Engine with new features and improvements. Our goal is to empower developers to easily create real-time, autonomous AI systems that can perceive, reason, and act on data as it flows. With a streaming backbone uniting your agents workflows and tools, the limitations of static prompts and isolated bots melt away. Welcome to the era of streaming agents on StreamNative Cloud – where your AI agents can truly live in the stream, today.",[48,6591,6592],{},"Join the preview, give Orca Agent Engine a try, and let us know your feedback. Together, let’s shape the future of event-driven AI!",{"title":18,"searchDepth":19,"depth":19,"links":6594},[6595,6596,6597,6598],{"id":6514,"depth":19,"text":6426},{"id":6531,"depth":19,"text":6532},{"id":6564,"depth":19,"text":6565},{"id":6574,"depth":19,"text":6575},"2025-09-30","StreamNative introduces Orca Agent Engine, an event-driven runtime for deploying, coordinating, and scaling AI agents in production. Now in Private Preview on StreamNative Cloud, it offers a streaming-native infrastructure for real-time AI.","\u002Fimgs\u002Fblogs\u002F68db7f5c0e4a6887bbf97ae1_Orca_no-logo.png",{},{"title":6498,"description":6600},"blog\u002Fintroducing-orca-agent-engine-private-preview",[3988,303],"aRYdaK4sqIMWci0p9DkajZhF0ScGfA0_ZJGZsO-pH2A",{"id":6608,"title":6609,"authors":6610,"body":6611,"category":3550,"createdAt":290,"date":6599,"description":6773,"extension":8,"featured":294,"image":6774,"isDraft":294,"link":290,"meta":6775,"navigation":7,"order":296,"path":6776,"readingTime":3556,"relatedResources":290,"seo":6777,"stem":6778,"tags":6779,"__hash__":6780},"blogs\u002Fblog\u002Fq3-2025-data-streaming-launch-lakehouse-streaming-governed-analytics-and-event-driven-agents.md","Q3 2025 Data Streaming Launch: Lakehouse Streaming, Governed Analytics, and Event-Driven Agents",[311],{"type":15,"value":6612,"toc":6766},[6613,6616,6638,6642,6653,6660,6663,6666,6672,6678,6682,6685,6688,6691,6694,6698,6702,6714,6717,6720,6723,6727,6731,6737,6740,6743,6750,6753,6757,6760,6763],[48,6614,6615],{},"Modern data teams face a three-stage bottleneck: data streaming is getting more expensive to run at scale, insights stall when streaming and analytics live in separate systems, and actions are delayed because AI can’t reliably operate on live context. This quarter’s Data Streaming launch tackles that end-to-end path with one coherent story: stream once, store in the open, govern centrally, make analytics immediately queryable, and operationalize real-time agents that act with confidence.",[48,6617,6618,6619,6623,6624,6628,6629,6633,6634,6637],{},"We’re announcing four upgrades that fit together as a single architectural arc. ",[55,6620,6622],{"href":6621},"\u002Fblog\u002Fursa-everywhere-lakehouse-native-future-data-streaming","Ursa’s lakehouse storage becomes available across every Classic Engine cluster"," so you can adopt lakehouse economics without a disruptive migration. StreamNative Cloud’s ",[55,6625,6627],{"href":6626},"\u002Fblog\u002Fstreamnative-expands-unitycatalog-integration-with-iceberg-tables","Unity Catalog integration expands with managed Apache Iceberg tables",", turning event streams into governed, query-ready Iceberg tables in Databricks. ",[55,6630,6632],{"href":6631},"\u002Fblog\u002Fannouncing-the-general-availability-of-role-based-access-control-in-streamnative-cloud","RBAC reaches General Availability"," in the StreamNative Cloud, bringing least-privilege access control to multi-tenant streaming. And ",[55,6635,6636],{"href":5066},"Orca—our event-driven Agent Engine—enters Private Preview",", giving enterprises an event-driven runtime where autonomous agents live on the same backbone as your data. Taken together, these updates lower total cost of ownership, collapse data silos, and make real-time AI practical in production.",[40,6639,6641],{"id":6640},"ursa-everywhere-the-lakehouse-native-path-to-data-streamingnow-for-every-classic-cluster","Ursa Everywhere: the lakehouse-native path to data streaming—now for every Classic cluster",[48,6643,6644,6645,6648,6649,6652],{},"When we introduced ",[55,6646,1332],{"href":6647},"\u002Fproducts\u002Fursa",", we set out to deliver a streaming engine that preserves the Kafka developer experience while fundamentally rethinking the storage and replication economics underneath. Ursa writes streams directly to cloud object storage in open table formats—think Apache Iceberg or Delta—rather than replicating messages across broker disks and exporting them later via external connectors. By eliminating leader-based broker replication and the “second pipeline” required to feed your data lake, Ursa’s architecture can reduce infrastructure costs by an order of magnitude while decoupling compute from storage for elastic scaling. That design has now moved from paper to practice; our ",[55,6650,6651],{"href":4752},"VLDB 2025 Best Industry Paper"," recognition validates the approach and its impact at scale.",[48,6654,6655,6656,6659],{},"Today we’re taking the next step: ",[55,6657,6658],{"href":6621},"Ursa’s storage layer is available as a lakehouse tier for all Classic Engine clusters—Serverless, Dedicated, and BYOC",". You keep Pulsar’s ultra-low-latency hot path in BookKeeper for operational workloads, and you continuously persist history into Iceberg\u002FDelta on S3, GCS, or Azure Blob as part of the same write. No extra ETL job, no duplicate pipeline, and no change for your producers or consumers. It’s the Classic engine you trust, with the lakehouse durability and open format your analytics estate expects.",[48,6661,6662],{},"This is more than offload. By standardizing on the Ursa streaming storage format beneath your Classic clusters, you’re laying the tracks for a future effortless upgrade. In the future, when your workload profile or cost targets point to Ursa brokers, you attach the new engine to the same object storage and take over serving from day one—no re-ingest, no backfill, no big-bang cutover. The protocols your apps see (Pulsar or Kafka) don’t change; only the engine does. It’s a clean swap of compute against a shared storage substrate that’s already been populated by your Classic clusters.",[48,6664,6665],{},"If your priority is TCO and lakehouse integration, this is the most pragmatic route to the lakehouse ecosystem today and a seamless Ursa upgrade tomorrow. Enable the Ursa storage tier on your Classic clusters, observe your data flow into your lakehouse, and enjoy the seamless analytics-ready experience.",[48,6667,6668,6669,190],{},"Get started: turn on Lakehouse Storage for a Classic cluster in StreamNative Cloud and pick your object store and table format. If you’d like architectural guidance or a cost-reduction analysis, ",[55,6670,6671],{"href":6392},"our team can help you model the impact",[48,6673,6674,6675,190],{},"You can read the detailed announcement in ",[55,6676,6677],{"href":6621},"this blog post",[40,6679,6681],{"id":6680},"streaming-that-arrives-governed-unity-catalog-with-managed-iceberg-tables","Streaming that arrives governed: Unity Catalog with managed Iceberg tables",[48,6683,6684],{},"Streaming is at its best when it doesn’t fork your architecture. With Ursa, events are written into columnar Parquet files and committed to Iceberg tables; with Unity Catalog, those tables are immediately discoverable, governed, and queryable in the same catalog where the rest of your lakehouse lives. That means real-time data becomes a first-class citizen of your analytics estate the moment it lands.",[48,6686,6687],{},"In practice, this looks simple. You stream to Pulsar or Kafka. Ursa writes and compacts to Iceberg, publishing new snapshots as files arrive. Unity Catalog registers those tables and enforces access controls and lineage in line with your enterprise policies. Your analysts and data scientists reach for the same SQL endpoints and notebooks they use today, and they see fresh, governed streaming tables without bespoke glue code or separate pipelines.",[48,6689,6690],{},"The payoff is speed and simplicity. BI dashboards and ML features no longer lag behind your operational reality. Governance does not regress the moment a pipeline becomes “real-time”. And because the tables are open Iceberg underneath, you keep maximum interoperability with Spark, Trino, Flink, and Snowflake, even as Unity Catalog provides a single place to manage discovery and permissions.",[48,6692,6693],{},"Try it now: connect your Ursa or Classic cluster to Databricks Unity Catalog, choose managed Iceberg, and publish a streaming topic as a table. You’ll go from events to governed SQL in minutes, not weeks.",[48,6695,6674,6696,190],{},[55,6697,6677],{"href":6626},[40,6699,6701],{"id":6700},"rbac-ga-least-privilege-access-for-multi-tenant-streaming","RBAC GA: least-privilege access for multi-tenant streaming",[48,6703,6704,6705,6709,6710,6713],{},"As organizations consolidate more teams and workloads onto a shared streaming backbone, centralized access control moves from nice-to-have to mandatory. ",[55,6706,5151],{"href":6707,"rel":6708},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Frbac-overview",[264]," in StreamNative Cloud is now ",[55,6711,6712],{"href":6631},"Generally Available",", bringing a consistent security model to Pulsar- and Kafka-compatible endpoints.",[48,6715,6716],{},"RBAC gives platform owners a single place to define who can create tenants and namespaces, who can publish or subscribe to which topics, who can evolve schemas, and how service accounts should be scoped. Roles can reflect the way your org actually works—a platform team with broad administrative rights, domain teams with namespace-level controls, application services with topic-specific produce or consume permissions—and those roles can be applied across clusters without brittle, per-cluster ACL sprawl. Changes are auditable. Rollouts are predictable. And the principle of least privilege becomes practical instead of aspirational.",[48,6718,6719],{},"You can manage RBAC interactively in the Console or declaratively via API and Terraform as part of your CI\u002FCD flows. Either way, you get a uniform security posture across protocols and deployments, so consolidating onto one platform doesn’t mean compromising on governance.",[48,6721,6722],{},"Enable it today: open the Accounts & Accesses → Access section in the Console, assign your first roles, and replace ad-hoc ACLs with a model that scales with your organization.",[48,6724,6674,6725,190],{},[55,6726,6677],{"href":6631},[40,6728,6730],{"id":6729},"orca-private-preview-a-backbone-for-real-time-event-driven-autonomous-agents","Orca Private Preview: a backbone for real-time, event-driven, autonomous agents",[48,6732,6733,6734,6736],{},"Enterprises have experimented with agent frameworks in notebooks and prototypes. The sticking point has been production: agents need live context, persistent memory, safe tool use, coordination, and observability—all the operational traits we already demand from distributed systems. ",[55,6735,5058],{"href":5066}," is our answer: an event-driven Agent Engine that runs on the same streaming fabric as your data so agents become long-lived services rather than one-off invocations.",[48,6738,6739],{},"With Orca, agents subscribe to topics, react to events as they happen, and emit new events for downstream agents and applications. They maintain a persistent state so knowledge accumulates across sessions. They discover and call tools or peer agents via a shared registry. And because every observation and action flows through durable event logs, you gain traceability and auditability: you can replay inputs, inspect memory, and understand why an agent did what it did. It’s the operational backbone autonomous systems have been missing.",[48,6741,6742],{},"This is infrastructure, not another framework. Bring the agents you already have—Python today, including those built with OpenAI or Google ADK—and let Orca host them as event-driven functions. Under the hood, the engine leans on battle-tested StreamNative primitives for scale, concurrency, and failure recovery, so you can focus on behavior and policy rather than plumbing and retries.",[48,6744,6745,6746,6749],{},"Orca is available ",[55,6747,6748],{"href":5066},"in Private Preview"," on StreamNative Cloud starting with BYOC, with additional deployment modes to follow. Early adopters are already using it to watch operational streams and take first-line actions, to orchestrate multi-agent workflows that triage and escalate based on data, and to close the loop between analytics signals and operational responses.",[48,6751,6752],{},"Try Orca: ask your account team to enable the Agent Engine preview, deploy a Python agent against a live topic, and watch it come alive in the stream. Our quickstarts get you from code to continuously-running agent in minutes.",[40,6754,6756],{"id":6755},"one-platform-one-path-from-data-insights-actions","One platform, one path from data → insights → actions",[48,6758,6759],{},"The industry doesn’t need many disjointed systems to move from raw events to business outcomes. It needs a single platform that lowers the cost of data streaming, makes insights immediately available under governance, and turns those insights into real-time actions. That’s the through-line of this launch. With Ursa storage available across Classic clusters, you get lakehouse economics now and a clean, no-migration path to Ursa brokers when you’re ready. With our expanded Unity Catalog integration for managed Iceberg tables, streams land as governed, query-ready assets the moment they arrive. With RBAC now GA, access control shifts from ad-hoc scripts to a single, auditable model that fits how enterprises actually operate. And with Orca in Private Preview, autonomous agents can finally live on the same backbone as your data—perceiving, deciding, and acting in real time.",[48,6761,6762],{},"Start where your bottleneck is sharpest. If you’re pushing for lakehouse integration and cost relief, enable the Ursa lakehouse tier on your Classic clusters and watch topic data from Pulsar or Kafka persist directly to Iceberg. If analytics friction is the blocker, connect your StreamNative cluster to Unity Catalog, publish your first topic, and run governed SQL over the table it creates—no second pipeline required. If security is the imperative, define roles once in RBAC and apply them uniformly across tenants, namespaces, topics, and schemas. And if your next step is AI, deploy an agent on Orca and let it live in the stream, with memory, observability, and guardrails from day one.",[48,6764,6765],{},"The path from data to insights to actions shouldn’t require a leap across tool silos. With StreamNative Cloud, it’s a continuous flow—practical, open, and intelligent—and it’s available today.",{"title":18,"searchDepth":19,"depth":19,"links":6767},[6768,6769,6770,6771,6772],{"id":6640,"depth":19,"text":6641},{"id":6680,"depth":19,"text":6681},{"id":6700,"depth":19,"text":6701},{"id":6729,"depth":19,"text":6730},{"id":6755,"depth":19,"text":6756},"StreamNative's Q3 2025 launch introduces Lakehouse Streaming, Governed Analytics, and Event-Driven Agents. Discover how Ursa, Unity Catalog, RBAC, and Orca are transforming data streaming, insights, and real-time actions.","\u002Fimgs\u002Fblogs\u002F68dbc129b6181ad3637a61d6_Q325-Product_no-logo.png",{},"\u002Fblog\u002Fq3-2025-data-streaming-launch-lakehouse-streaming-governed-analytics-and-event-driven-agents",{"title":6609,"description":6773},"blog\u002Fq3-2025-data-streaming-launch-lakehouse-streaming-governed-analytics-and-event-driven-agents",[800,1332,3988,5509],"N-EmYldyC73EIdQr4HtgPihJ6WHFViJRWRJC4cFOmB4",{"id":6782,"title":6783,"authors":6784,"body":6786,"category":1332,"createdAt":290,"date":6599,"description":6958,"extension":8,"featured":7,"image":6959,"isDraft":294,"link":290,"meta":6960,"navigation":7,"order":296,"path":6621,"readingTime":5505,"relatedResources":290,"seo":6961,"stem":6962,"tags":6963,"__hash__":6964},"blogs\u002Fblog\u002Fursa-everywhere-lakehouse-native-future-data-streaming.md","Ursa Everywhere: Paving the Path to a Lakehouse-Native Future for Data Streaming",[6785],"Matteo Meril",{"type":15,"value":6787,"toc":6951},[6788,6798,6802,6810,6818,6827,6831,6834,6842,6845,6859,6874,6878,6881,6893,6896,6904,6907,6910,6913,6917,6920,6923,6926,6929,6932,6935,6939,6942,6945,6948],[48,6789,6790,6791,6797],{},"Today at the ",[55,6792,6794],{"href":6135,"rel":6793},[264],[44,6795,6796],{},"Data Streaming Summit 2025",", we are thrilled to announce a major leap in StreamNative’s product evolution. Ursa – our next-generation lakehouse-native data streaming engine – is now being made available across every deployment model. In this announcement, we recap what Ursa is (including its recent accolade as VLDB 2025 Best Industry Paper), revisit the history of the Classic Pulsar Engine versus the new Ursa Engine, and unveil how we’re enabling Ursa’s storage layer as a Lakehouse extension for Classic Engine clusters. This new capability works as a tiered storage extension on all Classic clusters (Serverless, Dedicated, and BYOC), allowing current Pulsar users to start leveraging Ursa’s innovative lakehouse storage today. By doing so, we’re ensuring a smooth upgrade path from the Classic Engine to Ursa Engine in the near future. Our vision is to make Ursa’s stream storage format an open standard for streaming data, benefiting not just StreamNative customers but the broader Apache Pulsar and Kafka communities as well.",[40,6799,6801],{"id":6800},"ursa-engine-kafka-compatibility-meets-lakehouse-innovation","Ursa Engine: Kafka Compatibility Meets Lakehouse Innovation",[48,6803,6804,6805,6809],{},"Ursa Engine is our answer to the need for a more cost-effective, cloud-native streaming platform without sacrificing the developer experience that Apache Kafka made popular. In contrast to legacy architectures, Ursa is fully Kafka API-compatible yet fundamentally different under the hood. It is the first “lakehouse-native” streaming engine – built to write data directly to cloud object storage in open table formats (like Apache Iceberg and Delta Lake) instead of persisting to proprietary broker disks. By eliminating the traditional leader-based replication and external ETL connectors, ",[55,6806,6808],{"href":6807},"\u002Fblog\u002Fursa-wins-vldb-2025-best-industry-paper-the-first-lakehouse-native-streaming-engine-for-kafka#:~:text=the%20first%20and%20only%20%E2%80%9Clakehouse,cloud%20requirements%20for%20elasticity%2C%20high","Ursa’s architecture slashes streaming infrastructure costs by up to 10× (roughly 90–95% lower costs) while maintaining seamless compatibility with existing Kafka applications",". In other words, users get the same Kafka experience but backed by a modern, cloud-optimized design that decouples compute from storage for elastic scalability. This radical approach delivers high performance with dramatically lower operational overhead, allowing organizations to focus on data and workloads rather than low-level infrastructure.",[48,6811,6812,6813,6817],{},"Ursa’s innovative design has not gone unnoticed. This year, ",[55,6814,6816],{"href":6815},"\u002Fblog\u002Fursa-wins-vldb-2025-best-industry-paper-the-first-lakehouse-native-streaming-engine-for-kafka#:~:text=The%20Very%20Large%20Data%20Bases,to%20share%20what%20we%E2%80%99ve%20built","our paper “Ursa: A Lakehouse-Native Data Streaming Engine for Kafka” received the Best Industry Paper award at the prestigious VLDB 2025 conference",". The VLDB recognition underscores the significance of Ursa’s leaderless, lakehouse-integrated approach to streaming – validating that Ursa represents a breakthrough in marrying real-time streams with open data lakehouse systems. We’re incredibly honored by this award and energized to continue pushing the state of the art in streaming data technology.",[48,6819,6820,6821,6826],{},"(For those interested in the technical deep-dive, you can read ",[55,6822,6825],{"href":6823,"rel":6824},"https:\u002F\u002Fvldb.org\u002Fpvldb\u002Fvolumes\u002F18\u002Fpaper\u002FUrsa%3A%20A%20Lakehouse-Native%20Data%20Streaming%20Engine%20for%20Kafka",[264],"the VLDB 2025 paper"," for a comprehensive look at Ursa’s design.)",[40,6828,6830],{"id":6829},"from-classic-pulsar-to-ursa-a-tale-of-two-engines","From Classic Pulsar to Ursa: A Tale of Two Engines",[48,6832,6833],{},"To understand the importance of today’s announcement, it helps to look at how the Classic Pulsar Engine and the Ursa Engine differ. The Classic Engine refers to the original Apache Pulsar architecture that StreamNative Cloud has run for years. It relies on Apache ZooKeeper for metadata coordination and Apache BookKeeper for durable, low-latency storage of messages. This compute-and-storage-separation design powers many mission-critical systems by providing ultra-low latency message delivery and strong consistency. Classic Pulsar is also versatile – it supports not only the Pulsar protocol but also Kafka (via Kafka-on-StreamNative) and MQTT, allowing it to speak multiple messaging APIs on a single platform. Today, the Classic Engine remains the default in StreamNative Cloud, available in all deployment modes (Serverless, Dedicated, BYOC) and trusted for workloads that demand the absolute lowest end-to-end latencies.",[48,6835,6836,6837,6841],{},"Ursa Engine was born from the recognition that cloud-era workloads often prioritize cost efficiency and scalability alongside latency. Ursa is built on the Apache Pulsar foundation but reimagines key components for a more flexible and scalable architecture. Instead of ZooKeeper, Ursa uses ",[55,6838,5599],{"href":6839,"rel":6840},"https:\u002F\u002Fgithub.com\u002Foxia-db\u002Foxia",[264]," – a new scalable metadata store – to manage coordination. Instead of being tied only to BookKeeper for storage, Ursa’s brokers are stateless and leaderless, persisting data directly to cheap and reliable object storage (like AWS S3, GCS, Azure Blob) in open table formats. In short, Ursa shifts from the ZooKeeper-based, disk-centric model of Classic Pulsar toward a “headless” stream storage architecture, using Oxia for metadata and S3\u002FObject Store for durability. This design trades a bit of write latency for massive gains in throughput, cost efficiency, and simplicity.",[48,6843,6844],{},"Key differences between the Classic Engine and Ursa Engine include:",[321,6846,6847,6850,6853,6856],{},[324,6848,6849],{},"Metadata management: Classic Pulsar uses ZooKeeper for cluster metadata and coordination; Ursa replaces this with Oxia, a horizontally scalable and highly available metadata store. This removes the scaling and maintenance challenges of ZooKeeper in large clusters.",[324,6851,6852],{},"Storage layer: Classic relies on BookKeeper (persistent disks) for storing message data, which offers very low latency. Ursa uses cloud object storage as its primary storage, writing data as files in open formats (Iceberg\u002FDelta) for long-term durability. BookKeeper in Ursa is optional and only used for topics that demand the absolute lowest latency, whereas in Classic it’s the only storage.",[324,6854,6855],{},"Architecture: Classic Engine brokers use a leader-based model for each topic partition (managed by ZooKeeper), and data is replicated broker-to-bookie. Ursa’s brokers are leaderless and stateless – any broker can handle any partition – with replication offloaded to the shared storage layer. This eliminates leader election downtime and cross-datacenter replication traffic, simplifying operations.",[324,6857,6858],{},"Protocols and compatibility: Classic Engine supports Pulsar’s native protocol and Kafka out-of-the-box. Ursa Engine is currently focused on 100% Kafka API compatibility (Pulsar protocol support is on the roadmap). Despite different internals, Ursa presents the Kafka interface so that existing Kafka clients and applications work unchanged. (In StreamNative Cloud, you choose Classic vs Ursa engine when creating a cluster instance, but either way you can connect with Kafka clients.)",[48,6860,6861,6862,6866,6867,4003,6870,6873],{},"These changes make Ursa ideal for cost-sensitive, latency-relaxed workloads in the cloud, whereas Classic Pulsar excels for ultra-low-latency requirements. It’s worth noting that as of today, Ursa Engine has been available in ",[55,6863,6865],{"href":6864},"\u002Fblog\u002Fannouncing-ursa-engine-ga-on-aws-leaderless-lakehouse-native-data-streaming-that-slashes-kafka-costs-by-95","General Availability on AWS"," (with Public Preview on ",[55,6868,6869],{"href":4784},"Azure",[55,6871,6872],{"href":4788},"GCP","). Many of our customers run large-scale Classic clusters and are interested in Ursa’s benefits, but until now, moving from Classic to Ursa meant planning a migration or starting a new cluster. After all, you can’t simply “flip a switch” on a running Pulsar cluster to become an Ursa cluster – Ursa’s use of Oxia and S3 storage is fundamentally different and cannot be retrofitted into an existing Classic cluster without downtime or data migration. This is the challenge we set out to solve: how to bring Ursa’s advantages to existing Pulsar deployments in a seamless way.",[40,6875,6877],{"id":6876},"ursa-storage-extension-for-classic-pulsar-lakehouse-for-all-deployments","Ursa Storage Extension for Classic Pulsar: Lakehouse for All Deployments",[48,6879,6880],{},"Today’s announcement addresses that challenge head-on: we are introducing Ursa Stream Storage as a Lakehouse tiered storage extension for the Classic Engine. In practical terms, this means any Classic Pulsar cluster – including Serverless, Dedicated, and BYOC – can now take advantage of Ursa’s lakehouse-based storage layer without immediately switching to the Ursa Engine brokers. This extension works as a tiered storage plugin for Classic Pulsar clusters, allowing them to offload and store data in the same open table format that Ursa uses. With a configuration change, your Pulsar topics can be automatically persisted to long-term cloud storage (e.g. S3) in Apache Iceberg or Delta Lake format, alongside the traditional BookKeeper storage. Think of it as upgrading the back-end storage of your Classic cluster to speak the “Ursa language” of the lakehouse.",[48,6882,6883,6884,6888,6889,6892],{},"We first previewed this concept last year as “Pulsar’s Lakehouse Tiered Storage”. In late 2023, we showed ",[55,6885,6887],{"href":6886},"\u002Fblog\u002Fstreaming-lakehouse-introducing-pulsars-lakehouse-tiered-storage#:~:text=Apache%20Pulsar%20has%20been%20a,greatly%20benefit%20Apache%20Pulsar%20users","how Pulsar could adopt open, industry-standard storage formats as a tiered storage layer, instead of using Pulsar’s proprietary segment format for offloading",". By integrating with table formats like Delta Lake and Apache Iceberg, that development effectively transformed Apache Pulsar into a ",[55,6890,5425],{"href":6891},"\u002Fsolutions\u002Fstreaming-lakehouse",", allowing users to ingest data through Pulsar and have it land directly in their data lakehouse storage. Over the past year, we’ve refined and tested this approach with our users. Now, as a culmination of that work, Ursa Stream Storage is becoming available as a fully supported feature for all Classic Engine clusters – bringing the power of lakehouse tiered storage to every deployment.",[48,6894,6895],{},"What does this mean for Classic Pulsar users? In short, you get the best of both worlds:",[321,6897,6898,6901],{},[324,6899,6900],{},"Low-latency streaming from BookKeeper for your real-time consumers (ensuring no impact to the snappy performance you rely on for “hot” data), plus",[324,6902,6903],{},"Automatic long-term storage of all data in cost-efficient object storage as Iceberg\u002FDelta tables. This long-term tier is managed by the system – as messages age out from BookKeeper, they’re already safely stored in the lakehouse format, without any external connectors or ETL jobs needed.",[48,6905,6906],{},"Once enabled, the Ursa storage extension continuously converts your Pulsar topic streams into analytics-friendly parquet files in the background (using the same compaction approach pioneered by Ursa Engine). Your streaming data becomes immediately available for batch querying or AI\u002Fanalytics pipelines via tools like Spark, Trino, or Snowflake – no separate export step required. Essentially, Classic Pulsar clusters can now produce their own lakehouse tables as a byproduct of streaming, aligning with Ursa’s “stream-table duality” design. And because the data is stored in an open format, you maintain full control and portability – the data in S3 is yours to query with any engine, or even to share across different systems.",[48,6908,6909],{},"From an architecture standpoint, this extension leverages Pulsar’s built-in tiered storage mechanism but swaps the storage format to the open table format. There’s no change required in your producers or consumers – Pulsar continues to serve data to them as it always did. Internally, new writing and offloading policies ensure that every message published to Pulsar is durably written to the object storage tier (and compacted into table format) in addition to the BookKeeper ledgers. This means durability and throughput actually increase (object storage can handle very high throughput writes), while BookKeeper handles the tail of the stream for ultra-fast reads.",[48,6911,6912],{},"Crucially, Ursa Stream Storage for Classic clusters is available across all our cloud deployment options. Whether you run on our multi-tenant Serverless offering, have an isolated Dedicated cluster, or deploy in your own cloud (BYOC), you can take advantage of this feature to turn your Pulsar cluster into a hybrid streaming\u002Flakehouse system. By making Ursa’s storage layer ubiquitous, we ensure that every StreamNative customer – not just those who spin up brand new Ursa clusters – can reap the benefits of lakehouse-native streaming.",[40,6914,6916],{"id":6915},"paving-the-way-for-seamless-upgrades-from-classic-to-ursa","Paving the Way for Seamless Upgrades from Classic to Ursa",[48,6918,6919],{},"Perhaps the most exciting aspect of offering Ursa’s storage layer on Classic Pulsar is that it paves a clear path to upgrade your streaming engine in the future. Adopting the Ursa storage extension today is essentially future-proofing your Pulsar deployment. Once your data is flowing into the Ursa (lakehouse) storage tier, the hardest part of an Ursa migration is already done! All of your topic history is sitting in object storage as Iceberg\u002FDelta tables, just as the Ursa Engine expects it. This means that when the time is right – for example, when Ursa Engine becomes generally available on your cloud of choice, or when your workload profile shifts to favor Ursa’s strengths – you can swap out the Classic Engine brokers for Ursa Engine brokers without re-ingesting or migrating data. The new Ursa brokers can attach to the existing S3 or Blob storage bucket and immediately take over serving the data from the same unified log\u002Ftable, picking up exactly where the Classic brokers left off (with full consistency).",[48,6921,6922],{},"In essence, enabling Ursa storage on a Classic cluster is like laying down railroad tracks for an eventual engine swap: you continue to run the Classic locomotive for now, but the tracks (data format) are already compatible with the new high-speed engine when you’re ready to switch. This approach minimizes risk and downtime. You don’t have to maintain two parallel pipelines or perform a big-bang migration of all your historical data. Your producers and consumers can remain connected during the transition, since the Kafka\u002FPulsar protocols they see don’t change – only the engine behind the scenes does. Our goal is to make moving to Ursa Engine as simple as a rolling upgrade when the time comes.",[48,6924,6925],{},"We understand that many organizations have significant investment in their existing Pulsar clusters (with tailored configurations, client applications, and operational knowledge). With the Ursa storage extension, we’re ensuring those investments continue to pay off. You can incrementally adopt Ursa’s benefits (like cost savings and lakehouse integration) without immediately changing your entire system. Over time, as you gain confidence and as Ursa Engine matures with more features (e.g. Pulsar protocol support, transactions, etc.), you’ll be well-prepared to upgrade your Classic brokers to Ursa brokers. StreamNative will be there to help guide this journey – from sizing the new cluster to orchestrating a cutover – but thanks to this unified storage layer, the journey will be much smoother than a conventional migration.",[48,6927,6928],{},"(As an analogy, consider how cloud databases allow storage to be detached from compute: you can spin up a new compute engine against the same storage. Similarly, Ursa Engine can be “attached” to the storage your Classic Pulsar has been populating, making the upgrade a swap of compute layers rather than a migration of data.)",[48,6930,6931],{},"It’s also worth noting that data governance and catalog integration become easier with this approach. Since your Classic cluster’s data is in Ursa stream storage format, it can be cataloged in systems like Snowflake or Databricks even before you move to Ursa Engine. This brings immediate benefits: for example, you could register your Pulsar topics (now as Iceberg tables) in Databricks Unity Catalog or Snowflake’s Open Catalog, enabling consistent data governance and discovery across streaming and batch worlds. Then, when you transition to Ursa Engine, those integrations remain in place – your data was already in the right format and cataloged under a unified schema. In short, Ursa storage on Classic not only eases the technical migration, but also bridges the gap in how streaming data is used in the broader data ecosystem.",[48,6933,6934],{},"Just as we introduced UniLink for Kafka users (a tool to live-replicate Kafka topics into Ursa Engine with zero downtime) to simplify their path forward, this Lakehouse storage extension serves the Pulsar community’s path to the future. We want every Pulsar user to confidently step into Ursa’s world, at their own pace, and with zero regret.",[40,6936,6938],{"id":6937},"towards-a-unified-standard-for-streaming-data","Towards a Unified Standard for Streaming Data",[48,6940,6941],{},"Beyond just StreamNative or Pulsar, we believe that Ursa’s approach heralds a broader industry shift – one that makes open data formats the backbone of streaming. By leveraging open table formats and cloud object storage as the substrate for streaming data, Ursa effectively turns streaming systems into an integral part of the data lakehouse architecture.",[48,6943,6944],{},"Looking ahead, we anticipate that the lakehouse-native streaming approach can be applied not only in StreamNative’s managed platform, but also in open-source Apache Pulsar and even Apache Kafka environments. The benefits of decoupling storage and compute, and using open formats, are not exclusive to Pulsar or Ursa – they are universal.",[48,6946,6947],{},"Ursa’s availability across every deployment marks a new chapter for StreamNative and our users. Whether you’re a long-time Pulsar user on our Classic Engine or a new user looking for cutting-edge streaming, there is now a clear, incremental path to the future. We invite all our customers to try out the Ursa storage extension on their Classic clusters and start experiencing the benefits of a lakehouse-native streaming architecture. We believe this advancement will not only empower our users with immediate improvements (cost savings, analytics integration, easier migrations), but also accelerate the industry’s move toward more open, unified, and intelligent data streaming systems.",[48,6949,6950],{},"The journey from Classic to Ursa represents more than just an upgrade – it’s the convergence of two worlds (fast streams and durable tables) into one powerful platform. We’re incredibly excited to see what you build with it. Here’s to ushering in the next era of streaming data, together!",{"title":18,"searchDepth":19,"depth":19,"links":6952},[6953,6954,6955,6956,6957],{"id":6800,"depth":19,"text":6801},{"id":6829,"depth":19,"text":6830},{"id":6876,"depth":19,"text":6877},{"id":6915,"depth":19,"text":6916},{"id":6937,"depth":19,"text":6938},"StreamNative announces Ursa, a lakehouse-native data streaming engine now available across all deployment models. Learn about Ursa's Kafka compatibility, VLDB 2025 Best Industry Paper award, and how its storage layer extends Classic Pulsar Engine clusters for a seamless upgrade path to a unified streaming standard.","\u002Fimgs\u002Fblogs\u002F68db7b1670511318e32f31dd_Ursa-every-where_no-logo.png",{},{"title":6783,"description":6958},"blog\u002Fursa-everywhere-lakehouse-native-future-data-streaming",[1332,800],"WElYC2SUGY5fCE722AM6JCNe_BfNiZCYPMtmgaSmPCA",{"id":6966,"title":6967,"authors":6968,"body":6970,"category":3550,"createdAt":290,"date":7161,"description":7162,"extension":8,"featured":294,"image":7163,"isDraft":294,"link":290,"meta":7164,"navigation":7,"order":296,"path":6631,"readingTime":4475,"relatedResources":290,"seo":7165,"stem":7166,"tags":7167,"__hash__":7168},"blogs\u002Fblog\u002Fannouncing-the-general-availability-of-role-based-access-control-in-streamnative-cloud.md","Announcing the General Availability of Role-Based Access Control in StreamNative Cloud",[6969,311],"Baodi Shi",{"type":15,"value":6971,"toc":7154},[6972,6975,6979,6993,6996,6999,7003,7063,7067,7076,7080,7083,7088,7090,7094,7097,7102,7105,7108,7113,7121,7124,7127,7130,7132,7149,7152],[48,6973,6974],{},"We’re thrilled to announce the general availability of Role-Based Access Control (RBAC) in StreamNative Cloud — a powerful capability designed to secure your entire data streaming infrastructure. RBAC is now enabled by default across all organizations and cluster types (Serverless, Dedicated, and BYOC), delivering a consistent and granular approach to permission management that simplifies how access is defined and enforced across every resource in your environment — from organization-wide policies down to individual topics.",[40,6976,6978],{"id":6977},"granular-hierarchical-permissions","Granular, Hierarchical Permissions",[48,6980,6981,6982,4003,6987,6992],{},"Role-Based Access Control (RBAC) is now the core mechanism for managing access in StreamNative Cloud. It enables you to assign fine granular permissions to ",[55,6983,6986],{"href":6984,"rel":6985},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Fauthentication\u002Fuser-accounts",[264],"users",[55,6988,6991],{"href":6989,"rel":6990},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Fauthentication\u002Fservice-accounts\u002Fservice-accounts",[264],"service accounts",", ensuring teams and applications have access only to the resources they need.",[48,6994,6995],{},"The permission model follows a clear hierarchy, cascading from the highest level (Organization) down to the most granular (Topic): Organization → Instance → Cluster → Tenant → Namespace → Topic",[48,6997,6998],{},"This structure allows you to grant broad permissions at an organizational or infrastructure scope—such as giving an operator read-only access to an entire cluster—or define narrowly scoped permissions at a resource or entity level, like restricting a service account to produce messages to a single topic.",[40,7000,7002],{"id":7001},"key-highlights","Key Highlights:",[321,7004,7005,7008,7033,7048],{},[324,7006,7007],{},"Generally available for all StreamNative Cloud users: RBAC is automatically enabled for all organizations, providing robust security from day one without any complex setup.",[324,7009,7010,7011,4003,7016,7021,7022,4003,7027,7032],{},"Comprehensive Predefined Roles: We’ve introduced a comprehensive set of predefined roles that span every scope of your cloud resources. From broad administrative roles such as ",[55,7012,7015],{"href":7013,"rel":7014},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-roles#org-admin",[264],"org-admin",[55,7017,7020],{"href":7018,"rel":7019},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-roles#billing-admin",[264],"billing-admin"," to fine-grained data-plane roles like ",[55,7023,7026],{"href":7024,"rel":7025},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-roles#topic-producer",[264],"topic-producer",[55,7028,7031],{"href":7029,"rel":7030},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-roles#topic-consumer",[264],"topic-consumer",", you now have the flexibility to enforce the principle of least privilege with precision.",[324,7034,7035,7036,7041,7042,7047],{},"Simplified Management: StreamNative Cloud lets you configure and oversee role assignments through the Cloud Console or automate them with the ",[55,7037,7040],{"href":7038,"rel":7039},"https:\u002F\u002Fdocs.streamnative.io\u002Ftools\u002Fcli\u002Fsnctl\u002Fsnctl-overview",[264],"snctl"," CLI and ",[55,7043,7046],{"href":7044,"rel":7045},"https:\u002F\u002Fdocs.streamnative.io\u002Ftools\u002Fterraform\u002Fterraform-provider-overview",[264],"Terraform provider",". This approach streamlines access control while providing clear visibility and auditability over who can access which resources.",[324,7049,7050,7051,7056,7057,7062],{},"Secure Access Across Users and Applications: Assign broad operational roles (such as ",[55,7052,7055],{"href":7053,"rel":7054},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-roles#cluster-operator",[264],"cluster-operator",") to human users managing infrastructure, and grant highly specific, granular roles (like ",[55,7058,7061],{"href":7059,"rel":7060},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-roles#namespace-topic-producer",[264],"namespace-topic-consumer",") to service accounts used by applications. This clear separation of duties strengthens security, enforces least-privilege access, and improves governance across automated workflows.",[40,7064,7066],{"id":7065},"quick-start-assign-a-role-in-1-minute","Quick Start: Assign a Role in 1 Minute",[48,7068,7069,7070,7075],{},"Getting started with RBAC is straightforward. For example, you can grant a new user ",[55,7071,7074],{"href":7072,"rel":7073},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-roles#org-readonly",[264],"org-readonly"," access to your entire organization to support auditing or compliance reviews.",[3933,7077,7079],{"id":7078},"manage-role-by-snctl","Manage Role by snctl",[48,7081,7082],{},"You can use snctl to grant a role to a user account or service account with just one command.",[48,7084,7085],{},[384,7086],{"alt":5878,"src":7087},"\u002Fimgs\u002Fblogs\u002F68d4ee915e452651bd3bb046_iShot_2025-09-25_15.25.45.png",[48,7089,3931],{},[3933,7091,7093],{"id":7092},"manage-role-by-console","Manage Role by Console",[48,7095,7096],{},"Alternatively, you can manage it on the console. From the User Menu, click 'Account & Access'.",[48,7098,7099],{},[384,7100],{"alt":18,"src":7101},"\u002Fimgs\u002Fblogs\u002F68d4edb39b182e0ed32d148a_d5cf40ac.png",[48,7103,7104],{},"On the access page, you can select the resource type, such as organization, and then view the permissions currently assigned under that resource.",[48,7106,7107],{},"You can click \"Add rolebinding\" to add a new role and select the corresponding service account or user account.",[48,7109,7110],{},[384,7111],{"alt":18,"src":7112},"\u002Fimgs\u002Fblogs\u002F68d4edb39b182e0ed32d148d_e3231da2.png",[48,7114,7115,7116,190],{},"Once applied, the account will be able to view all resources in the organization without being able to make any changes. For more usage examples, please refer to the ",[55,7117,7120],{"href":7118,"rel":7119},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-role-bindings",[264],"documentation",[48,7122,7123],{},"We invite you to explore the new Role-Based Access Control (RBAC) in StreamNative Cloud today. Log in to your console to review predefined roles, assign permissions, and experience how streamlined access management can enhance both security and productivity for your teams and applications.",[40,7125,7126],{"id":1727},"What’s Next",[48,7128,7129],{},"This release of predefined roles represents a significant milestone in our ongoing mission to deliver best-in-class security for your data streaming platform. By establishing a consistent and standardized framework for permission management, we’re laying the groundwork for more advanced capabilities. Over the coming months, we plan to introduce additional predefined roles tailored to a variety of operational and compliance scenarios — from fine-grained data-plane permissions to specialized administrative roles — making it easier to align access control with organizational policies.",[40,7131,2149],{"id":2146},[48,7133,7134,7138,7139,7143,7144,190],{},[55,7135,7137],{"href":3907,"rel":7136},[264],"Sign up for a trial"," and get started for free. ",[55,7140,7142],{"href":7141},"\u002Fdevelopers","Leverage the following resources"," to learn more about StreamNative Cloud. Visit your StreamNative Cloud Console today to explore the available roles and start securing your resources. To learn more about all the predefined roles and their specific permissions, check out our detailed",[55,7145,7148],{"href":7146,"rel":7147},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fsecurity\u002Faccess\u002Frbac\u002Fmanage-rbac-roles",[264]," RBAC documentation",[48,7150,7151],{},"Happy (and secure) streaming!",[48,7153,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":7155},[7156,7157,7158,7159,7160],{"id":6977,"depth":19,"text":6978},{"id":7001,"depth":19,"text":7002},{"id":7065,"depth":19,"text":7066},{"id":1727,"depth":19,"text":7126},{"id":2146,"depth":19,"text":2149},"2025-09-29","Secure your data streaming platform with RBAC in StreamNative Cloud! Now generally available with granular permissions from organization to topic level. Simplify access management, enforce least privilege, and strengthen security across Serverless, Dedicated, and BYOC clusters.","\u002Fimgs\u002Fblogs\u002F68d9f529c030647c5072932b_RBAC-GA-no-logo.png",{},{"title":6967,"description":7162},"blog\u002Fannouncing-the-general-availability-of-role-based-access-control-in-streamnative-cloud",[3550,302,4301],"7L1WiRImhlu59Lre38RnxzRLd6lP6SP_a4tGNzi7wJA",{"id":7170,"title":7171,"authors":7172,"body":7173,"category":7338,"createdAt":290,"date":7339,"description":7340,"extension":8,"featured":294,"image":7341,"isDraft":294,"link":290,"meta":7342,"navigation":7,"order":296,"path":7343,"readingTime":3556,"relatedResources":290,"seo":7344,"stem":7345,"tags":7346,"__hash__":7348},"blogs\u002Fblog\u002Fapache-pulsar-seven-years-on-what-we-built-what-we-learned-whats-next.md","Apache Pulsar, Seven Years On: What We Built, What We Learned, What’s Next",[6785,806],{"type":15,"value":7174,"toc":7331},[7175,7179,7187,7196,7200,7212,7224,7227,7242,7245,7249,7252,7260,7274,7280,7283,7294,7298,7301,7304,7307,7315,7319,7322,7325,7328],[40,7176,7178],{"id":7177},"a-vibrant-community-driving-innovation","‍A Vibrant Community Driving Innovation",[48,7180,7181,7182,7186],{},"Seven years ago, Apache Pulsar graduated to become a Top-Level Project at the Apache Software Foundation. In that time, its community has blossomed into one of the most vibrant and innovative in open source. What began as a project incubated at Yahoo has evolved into a global collaboration with hundreds of contributors. By 2025, Pulsar had crossed 700+ contributors on the main repository and amassed over 13,000 commits, alongside 14,000+ GitHub stars and thousands of users on Slack. The momentum only continues to build – the recent ",[55,7183,7185],{"href":7184},"\u002Fblog\u002Fapache-pulsar-4-1-release-announcement","Apache Pulsar 4.1 release"," alone incorporated 560+ community-driven improvements, a testament to the project’s accelerating innovation velocity. As we reflect on this journey, we are humbled by the passionate individuals worldwide who have shared our vision. Each Pull Request, each question answered on Slack, and each community meetup adds to a welcoming, can-do vibe that defines Pulsar. It’s no exaggeration to say the project’s stability, scalability, and security today are direct results of this community-powered effort. We are grateful to every one of you who has been part of Pulsar’s story so far.",[48,7188,7189,7190,7195],{},"Those community efforts have made Pulsar a truly battle-tested technology. Our commitment to open source means that every new feature and fix is driven by real-world needs. Our developer community has hosted Pulsar meetups and summits across continents, sharing knowledge and celebrating successes. Whether it’s late-night discussions on the mailing list or collaborative design of a ",[55,7191,7194],{"href":7192,"rel":7193},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Ftree\u002Fmaster\u002Fpip#list-of-pips",[264],"Pulsar Improvement Proposal (PIP)",", the energy and openness of this community continue to amaze us. The Apache way – “community over code” – is alive and well in Pulsar. Seven years in, we feel like we’re just getting started.",[40,7197,7199],{"id":7198},"_1-production-proven-distributed-message-queue","#1 Production-Proven Distributed Message Queue",[48,7201,7202,7203,7207,7208,7211],{},"Nothing speaks louder than real-world adoption. Pulsar today is the #1 production-proven distributed message queue for many of the world’s most demanding use cases. In the financial services sector, Pulsar has become a critical backbone for high-volume payment systems. For example, Tencent – one of Asia’s largest tech companies – ",[55,7204,7206],{"href":7205},"\u002Fwhitepapers\u002Fapache-pulsar-helps-tencent-process-financial-transactions#:~:text=Because%20Tencent%20had%20been%20unable,with%20virtually%20no%20data%20loss","chose Pulsar to redesign its billing platform that processes tens of billions of financial transactions with virtually zero data loss",". Handling hundreds of millions of dollars in transactions per day, Tencent’s billing service could not afford downtime or inconsistency. After evaluating many messaging systems, they found Pulsar’s enterprise-grade reliability and scalability to be unmatched – and indeed, after migrating, ",[55,7209,7210],{"href":7205},"they run at massive scale “with virtually no data loss”",". This kind of confidence is why Pulsar is trusted in payment processing and banking environments where every message (every transaction, trade, or tick) counts.",[48,7213,7214,7215,7219,7220,7223],{},"Even organizations outside traditional finance have benefited from Pulsar’s rock-solid design. Cisco’s IoT Control Center, for instance, replaced a legacy messaging broker with Pulsar to manage ",[55,7216,7218],{"href":7217},"\u002Fsuccess-stories\u002Fcisco#:~:text=%E2%80%8DCisco%27s%20IoT%20control%20center%20is,seamlessly%20integrate%20Pulsar%20into%20their","245 million connected devices and 4.5 billion API calls per month across 35,000 enterprise customers",". In such a massive IoT deployment – from connected cars to smart city sensors – Cisco needed a system that was, in their own words, “",[55,7221,7222],{"href":7217},"reliable, scalable, and had extremely light overhead. Everything needs to be geo-replicated and secure…","”. Pulsar met those requirements, providing the low-latency, geo-replication, and multi-tenancy needed to ensure that “devices should not lose connectivity no matter what.” This is a powerful endorsement: when a Fortune 100 company entrusts Pulsar with critical real-time infrastructure, it validates Pulsar’s production readiness on a grand scale.",[48,7225,7226],{},"The sports betting and online gaming industry is another domain where Pulsar’s strengths shine. In this high-stakes arena, real-time data is the lifeblood – odds and game events must propagate globally in milliseconds. We’ve seen leading betting platforms gravitate to Pulsar for its ultra-low latency and high throughput. Pulsar’s ability to handle millions of events per second with strict ordering, combined with features like geo-replication and partitioned topics, make it ideal for powering live odds feeds and in-game analytics. In sports betting, every millisecond of delay can mean lost revenue or arbitrage – Pulsar’s architecture was built to minimize such delays. While some of these companies prefer to keep a low profile, we can say confidently that Pulsar now underpins real-time betting systems that deliver seamless experiences to users even during the busiest sports events. It’s incredibly exciting to see Pulsar enabling new levels of performance in an industry where real-time is truly real-time.",[48,7228,7229,7230,7235,7236,7241],{},"Modern SaaS platforms have also embraced Pulsar to drive their core business workflows. Two notable examples are Iterable and Attentive – both high-growth marketing tech companies operating at massive scale. Iterable, a customer engagement platform, famously replaced RabbitMQ and even Kafka with Pulsar to unify its messaging backbone. Why? As Iterable’s engineers put it, ",[55,7231,7234],{"href":7232,"rel":7233},"https:\u002F\u002Fpulsar.apache.org\u002Fcase-studies\u002F#:~:text=Iterable",[264],"Pulsar provided the right balance of scalability, reliability, and rich features to consolidate multiple systems into one",". Pulsar’s unique combination of streaming and queueing in a single system allowed Iterable to handle billions of events per day for facilitating hyper-personalized real-time marketing and customer engagement. Attentive, an AI-powered marketing platform for leading brands, similarly chose Pulsar as the backbone of its messaging system, ensuring the delivery of ",[55,7237,7240],{"href":7238,"rel":7239},"https:\u002F\u002Fpulsar.apache.org\u002Fcase-studies\u002F#:~:text=Attentive",[264],"billions of messages with exceptional reliability and scale",". They leveraged Pulsar’s built-in subscription modes to achieve fine-grained message exclusivity and high fan-out at scale – crucial for their use case of sending personalized messages to millions of consumers. Other SaaS innovators like InnerSpace are using Pulsar to ingest and analyze sensor data in real time (improving workplace safety and operational efficiency). Across these examples, a common theme emerges: Pulsar’s multi-tenancy, horizontal scalability, and durability give companies the confidence to centralize on one messaging platform. They no longer need one system for queues and another for streaming – Pulsar handles both paradigms seamlessly, reducing complexity and operational burden.",[48,7243,7244],{},"Looking across industries, we see Pulsar enabling everything from online banking and payment processing, to ticketing and logistics, to social media and gaming. The breadth of adoption speaks to Pulsar’s flexibility. It can be a high-throughput event stream feeding big data pipelines, and it can act as a persistent queue guaranteeing message delivery for mission-critical workflows – all in the same architecture. Features like tiered storage mean Pulsar can retain data as long as needed (months or years of events) without compromising performance, allowing use cases like auditing and reprocessing. Features like geo-replication and multi-region clustering mean enterprises can deploy Pulsar across data centers and clouds for disaster recovery and data locality, with out-of-the-box support. Simply put, Pulsar today offers the most complete feature set in the messaging space, which is why so many organizations have standardized on it.",[40,7246,7248],{"id":7247},"why-pulsar-matters-in-the-ai-era","Why Pulsar Matters in the AI Era",[48,7250,7251],{},"We’re now living in the era of AI – where real-time data streams fuel intelligent applications and autonomous agents. In this landscape, a robust messaging foundation is more important than ever. Apache Pulsar was born cloud-native and event-driven, so it’s no surprise that many cutting-edge AI platforms have chosen Pulsar as their data backbone. The reason is simple: modern AI workflows often involve orchestrating many microservices, data pipelines, and model outputs in real time. To do this reliably at scale, you need a messaging layer that can handle high throughput, guarantee delivery, enforce schemas, and scale horizontally – exactly Pulsar’s strengths.",[48,7253,7254,7255,7259],{},"Take Tencent’s Angel PowerFL (Federated Learning) platform as an example. This distributed machine learning system at Tencent had stringent requirements for stability, low latency, and data privacy across trillions of training tasks. After benchmarking different solutions, ",[55,7256,7258],{"href":7257},"\u002Fblog\u002Fpowering-federated-learning-tencent-with-apache-pulsar#:~:text=how%20they%20solved%20those%20problems,the%20machine%20learning%20platform%20requires","the team adopted Pulsar for the federated data synchronization, concluding that Pulsar provided the stability, reliability, and scalability their ML platform required",". In production, Pulsar has lived up to the task, ensuring that model updates and gradients are streamed efficiently and securely between participants in the federated learning network. When an AI system is coordinating learning across banks or hospitals (where data can’t be centralized), Pulsar’s multi-tenant and geo-replicated design becomes a critical enabler – it allows data scientists to focus on models, knowing the data movement “just works.”",[48,7261,7262,7263,7268,7269,7273],{},"Another great example is TrustGraph, an open-source AI platform for building knowledge graphs and LLM-powered agents. ",[55,7264,7267],{"href":7265,"rel":7266},"https:\u002F\u002Fmemgraph.com\u002Fblog\u002Ftrustgraph-memgraph-knowledge-retrieval-complex-industries",[264],"TrustGraph’s architecture is built from the ground up on Pulsar’s publish-subscribe model",". Why? Because they needed a backbone that ensures real-time processing, fault tolerance, and parallel workflows as data flows through their pipeline of extractors, transformers, and AI agents. The TrustGraph founders, coming from enterprise AI backgrounds, ",[55,7270,7272],{"href":7271},"\u002Fblog\u002Fcase-study-apache-pulsar-as-the-event-driven-backbone-of-trustgraph","deliberately chose Pulsar to overcome the reliability and scaling limitations they saw in other frameworks",". Pulsar’s ability to handle streaming data and event-driven triggers means TrustGraph can chunk and analyze huge unstructured datasets (like entire law libraries or aerospace manuals) with a network of cooperating AI agents – all without breaking the flow of data. In short, Pulsar is the “glue” that holds together the complex moving parts of an AI system, from ingestion to inference.",[48,7275,7276,7277,7279],{},"We’ve also seen AI startups leveraging Pulsar to do things that simply weren’t possible with legacy queues or log systems. A company like ",[55,7278,5254],{"href":5251}," – which built an AI-driven go-to-market platform – is a great case in point. Backed by the OpenAI Startup Fund, Unify set out to deliver instant AI insights on streaming customer events. Early on, they realized that a patchwork of cron jobs and Amazon SQS queues wouldn’t scale or meet their latency goals. They turned to Pulsar (via StreamNative Cloud) to handle tens of millions of events per day in real time, powering an AI that scores leads and triggers workflows in seconds. Pulsar allowed them to consolidate what would have been multiple subsystems – message queuing, pub\u002Fsub, event storage, scheduling – into one simple platform. With features like message replay, delayed delivery, and topic compaction, Unify’s small engineering team achieved capabilities that rival those of much larger organizations. They can reprocess historical events to improve their ML models, schedule automated follow-ups without external schedulers, and guarantee that no data is lost even if an AI consumer goes down. As Unify’s founding ML engineer put it, Pulsar gave them “peace of mind” to deploy new AI features without worrying about missing events. This agility is priceless in the fast-moving AI domain.",[48,7281,7282],{},"Crucially, Pulsar’s design principles align with the needs of AI systems. Strict message ordering and backpressure management ensure that event streams remain consistent – so an AI’s decisions based on those events remain correct. Builtin Schema Registry support means producers and consumers can evolve data formats in a controlled way, and Pulsar will reject incompatible producers – preventing bad data from silently corrupting an ML pipeline. In fact, imagine an AI application trying to consume messages from a topic and suddenly encountering an unexpected schema change that breaks its parser. In Pulsar, that scenario is avoidable by design: you can enforce schemas at the topic level, something not possible in Kafka without external add-ons. Similarly, Pulsar’s Dead Letter Queue (DLQ) and Negative Acknowledgment features are a godsend for AI workflows. If an AI microservice fails to process certain events (perhaps an image is too large, or a model isn’t available), Pulsar can automatically route those events to a DLQ for later inspection or reprocessing. This kind of resiliency ensures that one hiccup in an AI pipeline doesn’t require shutting everything down – the show goes on, and engineers can address the outliers after. As AI applications mature, these operational safeguards separate the toy projects from the production-grade platforms.",[48,7284,7285,7286,7288,7289,7293],{},"Pulsar is even proving its value in cutting-edge areas like real-time computer vision and edge AI. ",[55,7287,5271],{"href":5268}," is a startup that helps enterprises monitor physical operations (think real-time occupancy counting, queue detection, etc.) using their existing security cameras and ML models. As they scaled to managing 10,000+ video streams from 50,000+ cameras, Safari AI found that Kafka and Kinesis were not cost-effective or agile enough. They migrated to Pulsar via StreamNative and ",[55,7290,7292],{"href":7291},"\u002Fsuccess-stories\u002Fsafari-ai-cuts-cloud-costs-by-50-while-scaling-real-time-computer-vision-analytics-with-streamnative#:~:text=%E2%80%8DThe%20implementation%20of%20StreamNative%27s%20platform,processing%20storage%2C%20providing","saw a 50% reduction in cloud costs while easily supporting their complex ML data pipelines",". In the words of Safari’s co-founder, “StreamNative’s resilience is critical to our SaaS operations… the best choice – cutting our costs by more than 50% while seamlessly supporting our ML data structure requirements.” With Pulsar’s tiered storage and schema management, Safari AI was able to retain a year’s worth of video event data for analysis, maintain sub-10s end-to-end latency in delivering metrics, and do it all without a large DevOps team. This story encapsulates why message queues like Pulsar are vital in the AI era: they let companies focus on building intelligent features rather than reinventing streaming infrastructure. As AI continues to proliferate – from real-time fraud detection, to autonomous vehicles, to personalized content feeds – we believe Apache Pulsar will be the go-to nerve system that connects data to intelligence reliably at scale.",[40,7295,7297],{"id":7296},"built-for-the-future-bring-pulsars-philosophy-to-kafka-via-ursa-engine","Built for the Future: Bring Pulsar’s philosophy to Kafka via Ursa Engine",[48,7299,7300],{},"From the beginning, Apache Pulsar was designed to address the shortcomings we saw in earlier messaging systems like Kafka. Many of the features that Pulsar pioneered over the last 7 years have only become more relevant with time. We sometimes like to play a thought experiment: What if Apache Kafka had originally been designed with some of Pulsar’s core features? For instance, imagine if Kafka had clear multi-tenancy and isolation through namespaces by default – how much easier would self-service streaming be in large organizations! Imagine if Kafka could validate schemas at the broker and reject producers sending invalid data, preventing nasty surprises downstream. What if Kafka had built-in Dead Letter Queues and negative acknowledgments, allowing applications to handle failures gracefully without external tools? Or if it could run lightweight serverless functions directly on the cluster, enabling simple event transformations and routing on the fly? Some of you know that these aspects – multi-tenancy, strong schema enforcement, developer-friendly features – are very dear to us. These aren’t just “nice-to-haves” – they solve real pains for large organizations that use streaming at scale. The good news is that all of these “what if” features already exist today. They exist in Apache Pulsar. In many ways, Pulsar has been ahead of the curve, integrating capabilities that developers ended up needing as their deployments grew.",[48,7302,7303],{},"It’s gratifying to see the broader ecosystem acknowledge these innovations. We’ve watched over the years as Kafka users and cloud vendors bolted on solutions for some of these problems (Kafka “schema registry” servers, Kafka Streams and Connect, kludgy multi-tenant clusters, etc.), confirming that the problems Pulsar set out to solve were very much real. Pulsar’s holistic approach – a multi-layer architecture separating compute and storage, built-in geo-replication, first-class multi-tenancy – was the result of lessons learned operating global-scale messaging at Yahoo. That DNA of innovation continues to guide Pulsar’s evolution. The recent Pulsar 4.1 release is evidence: enhancements in 4.x have improved reliability, performance and operability. It’s no wonder the Pulsar community can implement over 560 improvements in one release cycle – we are moving quickly to keep Pulsar the most advanced platform of its kind.",[48,7305,7306],{},"Yet, we also recognize that not everyone is on Pulsar (yet!). There are many existing applications and data platforms built on Kafka that, for various reasons, cannot migrate easily. As enthusiasts of streaming tech, we want to see the benefits of Pulsar’s innovations shared as widely as possible, even by those who haven’t made the switch. That philosophy led us to our next big project: the Ursa Engine. Ursa is our effort to bring the core ideas of Pulsar – its architecture and lessons – to the Kafka ecosystem. We sometimes describe Ursa as “Pulsar’s technology applied to Kafka’s API”, though in truth it’s more than that. Under the hood, Ursa is a brand new streaming engine that combines a leaderless architecture with a lakehouse-centric storage model. In practical terms, it means Ursa can serve Kafka topics with Pulsar-like efficiency and scalability. It decouples the compute and storage like Pulsar does, using object storage and a lakehouse format (Apache Iceberg\u002FDelta) for message persistence. This eliminates a lot of the operational pain that Kafka clusters traditionally face around data retention and cluster rebalance. With Ursa, we can achieve cost-effective, high-throughput streaming without the overhead of maintaining multiple replicas of data on local disks – instead, data is persisted once to durable storage, and brokers are stateless processing nodes. This leaderless design also avoids the fragility of a single leader per partition; no more controller elections or hot partitions as in Kafka’s world. In short, Ursa takes the scalability of Pulsar’s architecture and makes it available to Kafka users, so they can grow beyond the limits of the old Kafka design.",[48,7308,7309,7310,7314],{},"We’re incredibly excited about Ursa, not only for what it does for Kafka compatibility, but also for what it means for the data streaming ecosystem. Ursa is fully compatible with Pulsar as well – it’s an engine that can speak multiple protocols (Kafka, Pulsar, etc.) on top of a next-gen storage layer. This is why we say Pulsar’s philosophy continues at the heart of Ursa: we’re effectively bringing Pulsar’s ideas into a form that can be adopted by the Kafka community, bridging two ecosystems for the benefit of all. The early results have been very promising. In fact, our Ursa Engine research was recognized with the ",[55,7311,7313],{"href":7312},"\u002Fblog#:~:text=6%20min%20read","VLDB 2025 Best Industry Paper award",", highlighting Ursa as the first “lakehouse-native streaming engine for Kafka.” Our vision is that in the coming years, whether you come from the Pulsar world or the Kafka world, you’ll have access to a unified data storage foundation that combines the best of both. Pulsar will continue to thrive and evolve (with a 5.0 LTS on the horizon and more novel features in development), and Kafka-based users will also be able to enjoy those advancements through the Ursa-powered storage foundation. It truly feels like we’re entering a new chapter where the lines between “Kafka or Pulsar” fade away, and the focus shifts to capabilities and outcomes. We want to make streaming data easier, more affordable, and more powerful for everyone.",[40,7316,7318],{"id":7317},"conclusion-a-personal-thank-you-and-onward-to-the-future","Conclusion: A Personal Thank-You and Onward to the Future",[48,7320,7321],{},"As we celebrate Apache Pulsar’s seven-year anniversary, we – Sijie and Matteo – want to take a moment to reflect on the journey with gratitude. When we started building Pulsar, we imagined a system that could serve as the unified messaging fabric for cloud applications; we believed in a design that challenged the status quo and put developers first. Seeing that vision validated – by a vibrant community and by adoption at some of the world’s top companies – is deeply rewarding on a personal level. More than anything, we are thankful to the Pulsar and broader data streaming community: every user, every contributor, every champion who advocated for Pulsar in their organization. You have made Pulsar not just a technology, but a movement. The energy and optimism we feel from this community keeps us motivated every single day.",[48,7323,7324],{},"The vibe around Pulsar has always been one of innovation and inclusivity. It’s not just about writing code; it’s about helping each other succeed with event-driven architectures, it’s about welcoming newcomers on Slack, it’s about continuing to push the boundaries of what a messaging system can do. To the many organizations that put their trust in Pulsar, we thank you for your confidence – your success stories are our proudest achievements. Knowing that Pulsar helped cut costs in half for a startup, or ensured zero data loss in a bank, or delivered instant experiences in a mobile app – that’s what this is all about.",[48,7326,7327],{},"Looking ahead, we are more excited than ever. The next wave of challenges – agentic workloads, global-scale data sharing, fully autonomous systems – are exactly the kinds of challenges Pulsar is built to handle. With the community’s help, Pulsar will continue to evolve rapidly. Features like unified stream\u002Ftable storage (via Ursa), deeper serverless function integration, and even more ecosystem connectors are on the horizon. We also remain committed to making Pulsar easy to adopt: from improving documentation and onboarding, to offering managed services and training, we want to ensure anyone who can benefit from Pulsar has a smooth path to do so.",[48,7329,7330],{},"In closing, we want to encourage everyone reading this: if you’re already part of the Pulsar community, thank you for an amazing seven years – let’s raise a toast to how far we’ve come. If you’re new to Pulsar or considering it, come join us! There’s never been a better time to get involved, whether by trying out Pulsar 4.1, contributing to a GitHub issue, or attending an upcoming Data Streaming Summit. We co-founders remain as approachable as ever – find us on Slack, at conferences, or via the Pulsar or StreamNative community channels – we love hearing your feedback and ideas. Apache Pulsar’s journey from an incubating project to a world-class messaging and streaming platform has been a thrilling ride, and it’s still early days. With this community and our relentless drive to innovate, we’re confident the best is yet to come. Here’s to the next seven years and beyond – onwards and upwards with Pulsar!",{"title":18,"searchDepth":19,"depth":19,"links":7332},[7333,7334,7335,7336,7337],{"id":7177,"depth":19,"text":7178},{"id":7198,"depth":19,"text":7199},{"id":7247,"depth":19,"text":7248},{"id":7296,"depth":19,"text":7297},{"id":7317,"depth":19,"text":7318},"Community","2025-09-25","Celebrate Apache Pulsar's 7 years of innovation! Discover how this #1 distributed message queue powers demanding use cases in finance, IoT, gaming, and AI, with features like geo-replication, tiered storage, and the new Ursa Engine.","\u002Fimgs\u002Fblogs\u002F68d4ddd72eeca005c8fc8334_Pulsar-7-years.png",{},"\u002Fblog\u002Fapache-pulsar-seven-years-on-what-we-built-what-we-learned-whats-next",{"title":7171,"description":7340},"blog\u002Fapache-pulsar-seven-years-on-what-we-built-what-we-learned-whats-next",[821,7347],"Intro","HNxZAtECzQDcoy_fKCmP2c5aJnk6PIqwP8S_MaGQ5lY",{"id":7350,"title":7351,"authors":7352,"body":7353,"category":7338,"createdAt":290,"date":7691,"description":7692,"extension":8,"featured":294,"image":7693,"isDraft":294,"link":290,"meta":7694,"navigation":7,"order":296,"path":7695,"readingTime":3556,"relatedResources":290,"seo":7696,"stem":7697,"tags":7698,"__hash__":7699},"blogs\u002Fblog\u002Flatency-numbers-every-data-streaming-engineer-should-know.md","Latency Numbers Every Data Streaming Engineer Should Know",[808,28],{"type":15,"value":7354,"toc":7683},[7355,7359,7362,7373,7376,7387,7390,7407,7410,7427,7430,7441,7444,7452,7455,7469,7472,7486,7490,7493,7504,7513,7517,7520,7523,7549,7568,7571,7575,7578,7589,7592,7596,7599,7608,7623,7632,7636,7639,7648,7651,7654,7662,7664,7672,7675],[3933,7356,7358],{"id":7357},"tldr","TL;DR",[48,7360,7361],{},"What “real-time” usually means",[321,7363,7364,7367,7370],{},[324,7365,7366],{},"Ultra-low latency (E2E): \u003C 5 ms — tight budgets; no cross-region hops; avoid disk fsync on the hot path.",[324,7368,7369],{},"Low latency (E2E): 5–100 ms — good for interactive dashboards, alerts, online features.",[324,7371,7372],{},"Latency-relaxed (E2E): > 100 ms to minutes — fine for near-real-time analytics\u002FETL; enables aggressive batching & cost savings.",[48,7374,7375],{},"Storage \u002F durability costs",[321,7377,7378,7381,7384],{},[324,7379,7380],{},"HDD seek\u002Ffsync: 5–20 ms (one flush can consume your entire ultra-low budget).",[324,7382,7383],{},"SATA\u002FNVMe SSD fsync: ~0.05–1 ms (device & kernel dependent).",[324,7385,7386],{},"Object storage PUT (e.g., S3): ~10–100 ms until write completes; listing\u002Fmetadata may add more.",[48,7388,7389],{},"Network reality (one way; round-trip is ~2×)",[321,7391,7392,7395,7398,7401,7404],{},[324,7393,7394],{},"Same host \u002F loopback: \u003C 0.1 ms (µs range).",[324,7396,7397],{},"Same rack \u002F same AZ: ~0.1–0.5 ms one way (~0.2–1 ms RTT).",[324,7399,7400],{},"Cross-AZ, same region: ~0.5–2 ms one way.",[324,7402,7403],{},"Cross-region, same continent: ~15–40 ms one way (~30–80 ms RTT).",[324,7405,7406],{},"Intercontinental: ~50–150 ms one way (~100–300+ ms RTT).",[48,7408,7409],{},"Broker publish (producer → log)",[321,7411,7412,7415,7418,7421,7424],{},[324,7413,7414],{},"acks=0\u002F1, same-AZ, SSD: ~0.2–2 ms per write (no replica wait).",[324,7416,7417],{},"acks=all (sync to quorum), same-AZ: ~0.3–5 ms (adds network + replica fsync).",[324,7419,7420],{},"Sync replication across AZs: +1–5 ms.",[324,7422,7423],{},"Sync replication across regions: +50–200+ ms (generally incompatible with sub-100 ms goals).",[324,7425,7426],{},"Producer batching (linger): adds +N ms intentionally (typical 5–50 ms) to trade latency for throughput.",[48,7428,7429],{},"Consumer side",[321,7431,7432,7435,7438],{},[324,7433,7434],{},"Long-poll \u002F push-like fetch: ~sub-millisecond to a few ms once available.",[324,7436,7437],{},"Polling interval (misconfigured): adds 0–500+ ms directly to E2E.",[324,7439,7440],{},"Light in-memory transform: typically \u003C 1 ms per record; heavy I\u002FO dominates instead.",[48,7442,7443],{},"End-to-end (wire → result)",[321,7445,7446,7449],{},[324,7447,7448],{},"Well-tuned, single-region, durable: commonly ~10–50 ms p50; watch p99.9 tails (GC, bursts) ~50–200+ ms.",[324,7450,7451],{},"With cross-region sync: expect ~100–300+ ms minimum, dominated by RTT.",[48,7453,7454],{},"Table visibility (e.g., Iceberg)",[321,7456,7457,7460,7463,7466],{},[324,7458,7459],{},"Commit interval governs freshness: “Fast” configs: ~5–30 s visibility.",[324,7461,7462],{},"Common enterprise configs: ~1–10 min visibility.\nRule: visibility ≈ commit cadence (plus seconds for object store\u002Fmetadata). Use shorter commits for freshness, longer for efficiency.\nSync vs async (what it costs)",[324,7464,7465],{},"Synchronous replication\u002Fcommits: add ≥ 1 RTT per replica\u002Fround, but give stronger durability\u002Fordering.",[324,7467,7468],{},"Asynchronous replication\u002Fcommits: near-local latency, but risk temporary lag or data loss on failure.",[48,7470,7471],{},"Handy heuristics",[321,7473,7474,7477,7480,7483],{},[324,7475,7476],{},"If your path includes disk fsync OR cross-AZ, budgeting \u003C 1 ms is unrealistic.",[324,7478,7479],{},"If your path includes cross-region sync, budgeting \u003C 100 ms is unrealistic.",[324,7481,7482],{},"To stay \u003C 10 ms E2E, keep everything in one AZ, avoid per-record fsync, and minimize batching\u002Fpoll delays.",[324,7484,7485],{},"For cost-optimized pipelines, aim seconds–minutes latency via batching and table commits; keep only hours of “hot” data in the stream.",[40,7487,7489],{"id":7488},"what-real-time-really-means-latency-classes","What Real-Time Really Means (Latency Classes)",[48,7491,7492],{},"In data streaming, “real-time” can mean different things depending on context and requirements. Generally, it implies that data flows and is processed with minimal delay. However, not all real-time systems demand the same speed. We can break down latency targets into a few categories for clarity:",[321,7494,7495,7498,7501],{},[324,7496,7497],{},"Ultra-Low Latency (\u003C 5 ms): This is the realm of hard real-time responsiveness. Systems requiring ultra-low latency (on the order of a few milliseconds or less) are typically found in high-frequency trading, real-time control systems, or in-memory data processing. Achieving sub-10ms end-to-end latency often means using highly optimized, specialized infrastructure – for example, colocating services in the same memory or machine, using kernel-bypass networking, or other techniques. It’s latency-optimized at the expense of higher cost or complexity, since even a single HDD disk seek (≈10 ms) would break this budget . (For perspective, 100 ms is often cited as the threshold where a response feels instantaneous to a human , so 10 ms is an order of magnitude faster than a typical UI interaction.)",[324,7499,7500],{},"Low Latency (5–100 ms): This range covers the interactive real-time experiences most users and applications consider “real-time.” Anything under a few hundred milliseconds generally feels immediate for interactive applications (the classic “\u003C100 ms” rule of thumb for instant UI feedback ). In data streaming, latencies in the tens of milliseconds up to a couple hundred milliseconds are often sufficient for use cases like live dashboards, online analytics, or alerting systems. Achieving 5–100 ms latency typically still requires streaming (event-at-a-time) processing rather than long micro-batches, but it’s more forgiving than the ultra-low range. Many real-time stream processing platforms (like Apache Flink with event-at-a-time processing) target latencies well under a second – often in the 10s of milliseconds or below if tuned correctly . This level usually involves some optimizations (e.g. in-memory buffering, minimal disk flushes), but may trade off a bit of throughput or cost to stay fast .",[324,7502,7503],{},"Latency-Relaxed (> 100 ms): When we go above a few hundred milliseconds, we’re in near-real-time or latency-relaxed territory. Latencies from several hundred milliseconds to a few seconds might be acceptable for certain analytics, reporting, or ETL scenarios where “real-time” means “within a second or two” rather than instant. In practice, many so-called real-time pipelines in industry tolerate second-level or even minute-level delays if the use case isn’t user-facing or time-critical . For example, updating an analytics dashboard every 5 seconds or even every minute can be fine for business intelligence needs. This latency-relaxed approach often allows more cost-optimized designs – e.g. using micro-batches, compressing data, writing to cheaper storage, etc., because we’re not racing the clock on each event. Essentially, if your application can accept >100 ms delays, you have freedom to batch and buffer data for efficiency, gaining throughput or reducing cost at the expense of immediacy. (In fact, many “real-time” data streaming systems integrating with data lakes or warehouses are happy with 1–5 minute latencies, which is near-real-time by broader definition.)",[48,7505,7506,7507,7512],{},"Latency vs. Cost Trade-offs: It’s important to recognize that pushing into ultra-low latency often comes with exponential cost or complexity. For instance, keeping data in an in-memory store or a hot streaming cluster for instant access is far more expensive than writing it to a data lake and querying it with a slight delay. ",[55,7508,7511],{"href":7509,"rel":7510},"https:\u002F\u002Fwww.infoq.com\u002Fpresentations\u002Fapache-iceberg-streaming\u002F#:~:text=In%20a%20Flink%20Meetup%2C%20Sundaram,to%20serve%20the%20batch%20workload",[264],"A study at Netflix found that storing long-term data in Kafka was 38× more expensive than storing it in an Apache Iceberg data lake",", so they keep only a few hours of hot data in Kafka and tier the rest to the data lake. In general, achieving lower latency might require more computing resources, more careful tuning, or specialized hardware, whereas relaxing latency requirements can dramatically lower costs by allowing more batching and using cheaper storage or network options. Always ask, “Do I need this result in milliseconds, or just soon enough?” – the answer will guide whether to optimize for lowest latency or prioritize simplicity and cost-efficiency.",[40,7514,7516],{"id":7515},"understanding-the-physics-of-latency-hardware-and-network","Understanding the Physics of Latency (Hardware and Network)",[48,7518,7519],{},"Real-time data streaming performance is grounded in some unavoidable physical realities. To design and troubleshoot streaming systems, engineers should know the ballpark latency of fundamental operations in hardware and networks – these form the lower bounds of any system’s latency.",[48,7521,7522],{},"Figure: Relative scale of latency for various operations (logarithmic). A CPU L1 cache reference happens in under a nanosecond, whereas a disk seek may take ~10 milliseconds, about 7 orders of magnitude slower . Network latencies vary by distance: a round-trip within the same data center is ~0.5 ms , while a transcontinental round trip (California ↔ Netherlands) is ~150 ms . These physical delays set the floor for streaming latency.",[48,7524,7525,7526,7531,7532,7537,7538,7542,7543,7548],{},"Storage Latency – HDD vs. SSD\u002FNVMe vs. Memory: Not all storage is equal. Traditional hard disk drives (HDDs) have mechanical seek times on the order of milliseconds. For example, ",[55,7527,7530],{"href":7528,"rel":7529},"https:\u002F\u002Fgist.github.com\u002Fhellerbarde\u002F2843375",[264],"a single disk seek or fsync (force write to disk) might take ~10–20 ms on a spinning disk",". This is 10,000× slower than in-memory operations. In a streaming context, if your pipeline depends on flushing to an HDD for durability on each message, ",[55,7533,7536],{"href":7534,"rel":7535},"https:\u002F\u002Fwww.percona.com\u002Fblog\u002Ffsync-performance-storage-devices\u002F#:~:text=In%20the%20above%20example%20using,at%20least%20four%20database%20connections",[264],"that alone could add tens of milliseconds latency (and severely cap throughput to only ~50–100 writes per second per disk)",". Modern solid-state drives (SSD) and especially NVMe drives are much faster. ",[55,7539,7541],{"href":7528,"rel":7540},[264],"An SSD can complete a random read in ~150 μs (0.15 ms)",", and high-end NVMe drives can fsync writes in under 1 ms. In fact, ",[55,7544,7547],{"href":7545,"rel":7546},"https:\u002F\u002Fwww.percona.com\u002Fblog\u002Ffsync-performance-storage-devices\u002F#:~:text=I%20tested%20your%20fsync,23%20250%20fsyncs%20%2F%20seconds",[264],"one test showed an Intel Optane NVMe (an extremely fast storage device) could sync a write in about 0.043 ms (that’s 43 microseconds) on average"," – over 400× faster than an HDD’s 18 ms flush. This huge difference means streaming platforms that write to disk (for example, persisting logs or state) can achieve far lower latency with SSD\u002FNVMe storage. Many distributed log systems (like Apache Kafka or Apache BookKeeper) rely on sequential writes which are faster, but they still benefit from SSDs for low latency commits. The key takeaway: if you need low latency and must touch disk, use the fastest storage possible (or amortize the cost of slow storage with batching), because hardware can be a limiting factor.",[48,7550,7551,7552,7556,7557,7562,7563,7567],{},"Network Latency – Local vs. Cross-Domain: When data doesn’t stay on one machine, network hops become a major contributor to latency. The speed of light (and network infrastructure) imposes delays that no amount of software optimization can eliminate. For instance, ",[55,7553,7555],{"href":7528,"rel":7554},[264],"a packet round-trip within the same data center (or Availability Zone) is often around 0.5 ms or less",". Cloud providers design AZ networks to be very fast and local – latencies ~sub-millisecond are typical within one zone or region. If you communicate across availability zones in the same region, latency might bump up to the low single-digit milliseconds. ",[55,7558,7561],{"href":7559,"rel":7560},"https:\u002F\u002Fcloudjourney.medium.com\u002Faws-network-latency-comparison-a59fea637524",[264],"Measurements in AWS show cross-AZ pings around 1–2 ms on average",". This is still quite low, but not zero – crossing outside a single facility adds a slight delay. Now, consider cross-region or long-distance communication: latency grows with distance. Sending data across a continent or ocean will take on the order of tens to hundreds of milliseconds. For example, ",[55,7564,7566],{"href":7528,"rel":7565},[264],"a round-trip from the west coast of the US to the Netherlands (~5,000 miles) is about 150 ms",". Even between New York and San Francisco (around 2,900 miles), a ping might be ~60–80 ms. The rule of thumb is ~5 μs of latency per km of fiber (or ~8 ms per 1000 miles) one way, plus routing overhead – so geography directly impacts network delay. For global data streaming, this means if you’re replicating or forwarding events to another region, you instantly introduce perhaps 50–200+ ms of latency just due to the speed of light and network hops.",[48,7569,7570],{},"Implications for Streaming Systems: All these physical latencies add up in a streaming pipeline. If your producer must write to disk, that disk’s latency is a hard floor on how fast you can acknowledge an event. If your stream has to replicate data to a far-away data center, you incur at least the network round-trip latency in doing so. For instance, a streaming system that synchronously replicates messages to another region will always have at least, say, ~100 ms latency minimum just from networking, no matter how optimized the code is. Likewise, if using a distributed storage, a cross-AZ write might add ~1–2 ms each way – seemingly small, but significant when you’re aiming for ~10 ms total. This is why geo-distributed streaming designs often involve trade-offs: either accept higher latency for stronger consistency (data replicated everywhere before use), or relax consistency (async replication) to keep latency low (more on this later). It’s also why co-locating stream processors close to their data sources and sinks is important for low latency – every meter of distance and every hardware boundary (memory vs disk vs network) adds delay. Understanding these baseline numbers (disk = milliseconds, local network = sub-ms, cross-country = dozens of ms) helps an engineer set realistic expectations and choose architectures that meet their latency goals.",[40,7572,7574],{"id":7573},"publish-consume-and-end-to-end-latency-in-streaming","Publish, Consume, and End-to-End Latency in Streaming",[48,7576,7577],{},"When we talk about “latency” in a data streaming context, it’s useful to distinguish where that latency comes from. Generally, the user cares about end-to-end latency – the delay from an event being produced to that event being fully processed\u002Fvisible at its destination. We can break this into two pieces: publish latency on the producer side, and consume latency on the consumer side. Let’s define each:",[321,7579,7580,7583,7586],{},[324,7581,7582],{},"Publish Latency (Producer → Broker): This is the time it takes for an event to go from the producer to being durably stored and available on the streaming platform. It includes network transit from the producer to the broker (which could be sub-ms in the same data center, or more if remote), plus any processing the broker does (e.g. writing to a log, replicating to followers, etc.). For example, a producer sending a message to Kafka will typically wait for an acknowledgment. If the broker writes to disk and replicates to followers before acknowledging (for durability), the publish latency includes the disk write and the network hop to followers. A synchronous publish (waiting for replicas) will have higher latency than an async fire-and-forget publish. Tuning factors like Kafka’s acks setting illustrate this: requiring acknowledgement from all replicas (acks=\"all\") adds latency but guarantees durability, whereas acks=1 or 0 responds faster but risks data loss. Batching on the producer side also affects publish latency – e.g., a producer might wait 50 ms to batch multiple events into one request for efficiency, which adds a fixed delay (this is configurable via linger time in Kafka producers, for instance). In summary, publish latency is influenced by network hop from producer to stream, broker processing (disk I\u002FO, etc.), and replication strategy.",[324,7584,7585],{},"Consume Latency (Broker → Consumer): This is the time it takes for a stored event to be delivered to or fetched by the consumer after it’s available on the broker. In a push-based system this can be very fast (brokers push immediately), whereas in pull-based systems (like Kafka’s default consumer model), there might be a slight delay depending on the polling interval. For example, if a consumer polls for new messages every 100 ms, then an event might sit up to 0–100 ms before the consumer picks it up. Many streaming frameworks and message queues allow long-polling or event-driven consumption to minimize this, so consume latency can often be only a few milliseconds or less once the data is available. That said, consumer side processing can add to latency as well – e.g., how quickly the consumer code or downstream system can process the event once received. If the consumer is doing heavy computation or is bottlenecked, that contributes to end-to-end latency. In practice, a well-tuned streaming consumer will fetch data almost as soon as it arrives (often yielding end-to-end latencies only marginally above the publish latency).",[324,7587,7588],{},"End-to-End Latency: This is what users ultimately experience – the total time from when data is generated to when it’s processed\u002Fuseable at its destination. End-to-end latency = publish latency + consume latency (plus any processing time in between). For example, suppose a sensor emits an event at time T0. It’s published to a stream and acknowledged at T0+20 ms, and a consumer picks it up at T0+30 ms, then processes and stores the result by T0+50 ms. The end-to-end latency is 50 ms. In a well-designed streaming pipeline, publish and consume latencies can often be on the order of only a few milliseconds each, yielding end-to-end latencies perhaps tens of milliseconds above the raw network and processing time. But if either side is misconfigured, latency can creep up. For instance, if the producer batches for 100 ms or the consumer only polls every 200 ms, those will directly add to end-to-end delay. P99 latency (99th percentile) is also critical – even if average latency is 50 ms, the slowest 1% of events might take significantly longer due to occasional stalls, GC pauses, or bursts of load. Streaming engineers need to monitor and optimize for these tail latencies as well, since a “real-time” system is only as responsive as its slowest pertinent result.",[48,7590,7591],{},"To reduce end-to-end latency, one typically does things like: use small batches (or no batching) on the producer, configure low linger or flush intervals, ensure the broker has adequate I\u002FO throughput (SSD disks, etc.), and have consumers that process promptly. However, each of these can impact throughput or resource usage – again underscoring the latency vs. throughput trade-off. It’s often a balancing act: “How many events per second can I handle at 50 ms latency?” versus “If I allow 500 ms latency, I could batch more and handle much higher throughput.” There’s no free lunch, but understanding where the latency comes from (network, disk, acks, poll intervals) helps target the right optimizations .",[40,7593,7595],{"id":7594},"data-visibility-latency-with-analytical-storage-eg-apache-iceberg","Data Visibility Latency with Analytical Storage (e.g. Apache Iceberg)",[48,7597,7598],{},"Streaming data often doesn’t end with an in-memory consumer; many pipelines flow into analytical databases or data lakes for further use (such as Apache Iceberg, Delta Lake, etc. which store data on cloud storage). It’s crucial to understand that these systems have a different model of latency – typically micro-batch commits – which can introduce a substantial delay in data visibility. “Data visibility latency” refers to how quickly data that was ingested into the table format becomes queryable or visible to downstream consumers (like analytics jobs or queries on the table).",[48,7600,7601,7602,7607],{},"With Apache Iceberg (a popular table format for data lakes), data is committed in snapshots. A streaming job (e.g., Flink writing to Iceberg) will buffer a set of events into a data file and then commit that file as a new table snapshot. This commit might happen, say, every 5 minutes or every 1 minute – it’s configurable, but there’s a trade-off. Frequent commits = lower latency, but too many small files and metadata overhead; Infrequent commits = higher latency, but more efficient batching . In practice, “",[55,7603,7606],{"href":7604,"rel":7605},"https:\u002F\u002Fwww.infoq.com\u002Fpresentations\u002Fapache-iceberg-streaming\u002F#:~:text=Kafka%2C%20write%20them%20in%20data,commit%20intervals%20are%20pretty%20common",[264],"One to 10 minutes commit intervals are pretty common.","” Many organizations choose to commit every few minutes. If you commit a new snapshot every 5 minutes, that means data written to the table may not be visible to readers until that commit occurs. The latency to visibility is thus on the order of the commit interval (plus a tiny processing lag). For example, if an event arrived just after the last commit, it might wait nearly the full interval (almost 5 minutes) before it’s committed and visible; on average, you’d see a couple minutes of latency. As one data engineer put it, this pattern is essentially incremental batch processing – you’ve traded sub-second streaming latency for minute-level latency in exchange for efficiency and lower cost .",[48,7609,7610,7611,7616,7617,7622],{},"To illustrate, ",[55,7612,7615],{"href":7613,"rel":7614},"https:\u002F\u002Fwww.infoq.com\u002Fpresentations\u002Fapache-iceberg-streaming\u002F#:~:text=The%20latency%20is%20most%20determined,latency%20is%20less%20than%2040",[264],"in one experiment with an Iceberg streaming source, using a 10-second commit interval caused the downstream consumer to see events with a median latency of ~10 seconds, and a max latency under ~40 seconds",". The median matched the commit interval (10s) because data arrives in chunks each commit. When the commit interval was larger, say 1 minute, the latency would similarly track around that scale. This stop-and-go pattern is due to Iceberg’s design: it offers atomic, transactional commits for reliability (no partial data visible), but that means holding data until commit. The benefit is strong consistency – readers either see an entire batch or nothing – and excellent throughput (writing big files to S3 efficiently), at the cost of latency. In many analytics scenarios, this is acceptable. As noted by experts, ",[55,7618,7621],{"href":7619,"rel":7620},"https:\u002F\u002Fwww.infoq.com\u002Fpresentations\u002Fapache-iceberg-streaming\u002F#:~:text=This%20paradigm%20of%20streaming%20from,This%20is%20probably%20mostly",[264],"a lot of streaming use cases are “fine with minute-level latency” and explicitly not looking for sub-second results",". By using a table format like Iceberg, they get a more cost-effective, simpler pipeline (no continuously running hot storage for every event) and can still achieve end-to-end latencies on the order of minutes, rather than hours for traditional batch .",[48,7624,7625,7626,7631],{},"However, if you need faster visibility, you can shorten the commit interval – some users commit every few seconds for “real-time analytics” tables. Iceberg (and similar systems) can support that, ",[55,7627,7630],{"href":7628,"rel":7629},"https:\u002F\u002Fwww.infoq.com\u002Fpresentations\u002Fapache-iceberg-streaming\u002F#:~:text=If%20we%20commit%20too%20frequently%2C,have%20snapshot%20at%20the%20end",[264],"but beware of generating too many small files and excessive metadata load if commits are too frequent",". There’s a sweet spot: commonly 1–10 minutes as mentioned, but some do 5 or 10 seconds in specialized cases. Also, some emerging techniques like continuous streaming sinks or table change feeders are making it possible to get lower latencies from data lakes by tailing the commits. Still, as a data streaming engineer, you should know to account for this extra latency when integrating streaming with data lake tables. The “real-time” portion might be snappy, but once you hand off to a system that commits to cloud storage, the latency jumps to whatever the commit policy is. Designing your pipeline, you’d decide: does this use case truly need second-level updates, or can it tolerate a 1-2 minute delay (with a big drop in cost)? Understanding data visibility latency ensures you set the right expectations with consumers of the data. If truly low latency is required all the way, you might keep the data in a fast store (like Kafka or a real-time database) rather than immediately landing it to Iceberg – or use hybrid approaches (hot data in Kafka for instant use, cold data in Iceberg for cost-efficiency).",[40,7633,7635],{"id":7634},"synchronous-vs-asynchronous-operations-impact-on-latency-and-consistency","Synchronous vs. Asynchronous Operations (Impact on Latency and Consistency)",[48,7637,7638],{},"A fundamental design choice in distributed streaming systems is whether operations are done synchronously (blocking\u002Fwaiting for a result) or asynchronously (proceeding without an immediate confirmation). This choice has big implications for latency (and data safety). Two areas where this comes up are data replication and event processing:",[48,7640,7641,7642,7647],{},"In synchronous replication, every event (or transaction) is durably stored in multiple places before the system acknowledges it as “done”. For a streaming platform, that could mean when a producer publishes a message, the broker waits until, say, 2 other nodes have written the message too (replicated) before sending an ACK to the producer. The obvious advantage is strong consistency and durability – you won’t lose data even if one node crashes right after. The cost, however, is extra latency. The producer’s publish latency now includes one or more network round-trips and disk writes to the replicas. If those replicas are in the same rack, the delay might be small (maybe 1–2 ms ); if they’re across data centers, it could be tens of milliseconds or more. Each additional replica or distant node increases the latency, because the commit has to travel further or to more endpoints. As CockroachDB’s engineers note, ",[55,7643,7646],{"href":7644,"rel":7645},"https:\u002F\u002Fwww.cockroachlabs.com\u002Fblog\u002Fdata-loss-prevention-during-outages-you-might-be-losing-data-without-knowing-it\u002F",[264],"synchronous replication is robust but comes “at the cost of very high write latency” in widespread clusters – it can be “crippling for many applications” if they can’t tolerate the wait",". Thus, not every streaming pipeline uses fully synchronous replication for all data; there’s often a configuration (like Kafka’s acks setting) to choose how many replicas must ack. If you set acks=all (fully sync), you get strongest durability with higher latency; if you set acks=1 (just leader ack), you get lower latency but risk that if the leader dies before followers catch up, that message could be lost.",[48,7649,7650],{},"In asynchronous replication, the idea is “send and pray”. The producer or primary doesn’t wait for the followers\u002Freplicas to confirm. For example, a database might commit locally and return “success” to the user, then ship the data to a backup server a moment later. In streaming, you might publish with acks=0 (no wait at all) or acks=1 (wait only for the leader’s own write). This minimizes latency – essentially you’re only as slow as the primary write, which could be just a local disk write. But the trade-off is obvious: if something fails at the wrong time, data might not make it to the backup. Asynchronous systems introduce the possibility of temporary inconsistency (followers lag behind) and data loss if the primary node crashes before sending out the buffered events. In practice, many systems choose a middle ground (like Kafka’s default acks=1 with replication – a good balance of some durability with minimal latency impact). Also, some systems offer “semi-synchronous” modes or parallel async replication where at least one replica is waited on but others are async, to balance safety and speed.",[48,7652,7653],{},"Beyond replication, the sync vs async concept applies to processing workflows too. For instance, consider an event that triggers two downstream actions. If done synchronously, the first action must complete (and maybe respond) before the second starts – ensuring order or simplicity but incurring the cumulative latency of both steps. In an asynchronous (or concurrent) design, those actions could happen in parallel or in a fire-and-forget manner, reducing overall response time seen by the initial trigger. The downside is added complexity: you need to handle out-of-order completions, correlate results, and possibly deal with partial failures. Another example: a stream processing job might do an external database call for each event. If it does so synchronously, each event might incur, say, 50 ms to get a response, severely slowing the pipeline (and if hundreds are concurrent, they queue up). If instead the job uses an asynchronous, non-blocking approach (issuing requests without waiting, and handling responses as they come), it can pipeline those calls and achieve much higher throughput and lower per-event latency – at the cost of a more complex design (e.g. using async I\u002FO, callbacks or promises, etc.). The general rule is, synchronous = simpler but can add latency through waiting, asynchronous = faster throughput and lower waiting time, but more complex and potentially inconsistent intermediate state.",[48,7655,7656,7657,7661],{},"When designing a streaming system, decide where you need strong ordering or guarantees (which might force synchronous steps) and where you can afford asynchrony. For example, if losing one event is absolutely unacceptable, you’ll lean towards synchronous replication (and pay the latency cost). If throughput and speed are paramount and occasional loss is tolerable (or mitigated by upstream retry), you might go async. Modern cloud databases highlight this trade-off: ",[55,7658,7660],{"href":7644,"rel":7659},[264],"a fully synchronous commit across regions might be 200 ms latency (too slow for many apps), whereas an async commit is maybe 5–10 ms but risks a second of data if a failure occurs",". It’s all about business requirements. As a streaming engineer, knowing the latency hit of synchronous operations is key – it might be worth ~50 ms to guarantee order\u002Fdurability, or it might not, depending on your use case. Often, systems offer tunable consistency levels so you can choose per workload. But whichever path, the influence on latency must be understood and communicated.",[40,7663,2125],{"id":2122},[48,7665,7666,7667,7671],{},"Latency is a critical aspect of data streaming systems – it’s often the very reason we choose streaming over batch processing. But “real-time” isn’t one-size-fits-all: it spans from a few milliseconds to a few minutes, and knowing the difference is essential. Every data streaming engineer should internalize the key latency numbers and concepts: ",[55,7668,7670],{"href":7528,"rel":7669},[264],"how fast hardware can realistically move data (nanoseconds in CPU, microseconds in memory\u002FSSD, milliseconds on disk or across networks)",", and how those translate into end-to-end pipeline delays. Recognizing these orders of magnitude helps in design (e.g. you won’t expect a cross-country streaming pipeline to ever be 5 ms; physics won’t allow it). It also helps in debugging – if you see 100 ms delays, is it network? Disk? Queuing? The numbers give clues.",[48,7673,7674],{},"Furthermore, engineering for low latency is a balancing act with throughput, cost, and complexity. You’ve seen how batching and relaxing real-time requirements can cut costs dramatically (e.g. using Iceberg with minute-level latency vs. a live feed at sub-second latency) . Always tie your latency goals to business needs: if a dashboard updates in 5 seconds instead of 0.5 seconds, does it matter? If not, you might save a lot of money with a simpler, slightly slower design. On the other hand, if milliseconds matter (say, in fraud detection or user experience), then you know where to focus investment – like faster storage, avoiding cross-region hops, or using asynchronous processing to shave off waits.",[48,7676,7677,7678,7682],{},"In summary, the “latency numbers” every streaming engineer should know aren’t just abstract timings – they’re guideposts for making architectural decisions. By knowing what truly constitutes real-time in your context, understanding the physical limits, and measuring publish\u002Fconsume\u002Fvisibility delays in your pipeline, you can design streaming systems that meet their SLAs without guesswork. Jeff Dean’s ",[55,7679,7681],{"href":7528,"rel":7680},[264],"famous list of latency numbers"," taught programmers to respect the reality of time in computing; similarly, our tour of streaming latency shows that latency is a feature you must budget and engineer for. Equipped with this knowledge, you can more confidently build systems that strike the right balance between blinding speed and practical efficiency – delivering data when and where it’s needed, in real real-time.",{"title":18,"searchDepth":19,"depth":19,"links":7684},[7685,7686,7687,7688,7689,7690],{"id":7488,"depth":19,"text":7489},{"id":7515,"depth":19,"text":7516},{"id":7573,"depth":19,"text":7574},{"id":7594,"depth":19,"text":7595},{"id":7634,"depth":19,"text":7635},{"id":2122,"depth":19,"text":2125},"2025-09-24","Explore the essential latency numbers for data streaming engineers, covering ultra-low, low, and latency-relaxed systems, storage, network, and end-to-end considerations to optimize performance and cost","\u002Fimgs\u002Fblogs\u002F68d3a666c5599bb44a2e4a00_Latency-Numbers.png",{},"\u002Fblog\u002Flatency-numbers-every-data-streaming-engineer-should-know",{"title":7351,"description":7692},"blog\u002Flatency-numbers-every-data-streaming-engineer-should-know",[5647,303,1330],"trnXOj_hikw9-DiZuV3uATOAaVSRQ4yb7NJyS_TWqlI",{"id":7701,"title":7702,"authors":7703,"body":7706,"category":821,"createdAt":290,"date":7881,"description":7882,"extension":8,"featured":294,"image":7883,"isDraft":294,"link":290,"meta":7884,"navigation":7,"order":296,"path":7271,"readingTime":3556,"relatedResources":290,"seo":7885,"stem":7886,"tags":7887,"__hash__":7888},"blogs\u002Fblog\u002Fcase-study-apache-pulsar-as-the-event-driven-backbone-of-trustgraph.md","Case Study: Apache Pulsar as the Event-Driven Backbone of TrustGraph",[7704,7705],"Daniel Davis","Mark Adams",{"type":15,"value":7707,"toc":7873},[7708,7712,7715,7719,7722,7736,7739,7743,7746,7787,7790,7794,7797,7814,7817,7821,7824,7841,7844,7854,7860,7863,7867,7870],[40,7709,7711],{"id":7710},"introduction-the-challenge-of-building-an-ai-platform","‍Introduction: The Challenge of Building an AI Platform",[48,7713,7714],{},"Late one afternoon, a team of developers set out to build TrustGraph – an open-source AI product creation platform aimed at orchestrating sophisticated AI agents. They faced a familiar challenge: how to connect a constellation of microservices (knowledge extractors, vector indexers, agent runtimes, etc.) into one cohesive system that can scale and adapt dynamically. Traditional point-to-point integrations felt brittle and hard to scale. The team needed a nervous system for their platform – a messaging backbone that could seamlessly link all components in real-time. Enter Apache Pulsar, the technology that would become the high-performance core of TrustGraph’s event-driven architecture. Pulsar (with enterprise support from StreamNative) offered exactly what TrustGraph needed: a reliable publish\u002Fsubscribe foundation with the flexibility to handle everything from real-time agent queries to large-scale data ingestion. What follows is the story of how Pulsar powers TrustGraph, enabling developers to build modular AI systems that are scalable, resilient, and a joy to work with.",[40,7716,7718],{"id":7717},"why-trustgraph-chose-pulsar-as-its-backbone","Why TrustGraph Chose Pulsar as its Backbone",[48,7720,7721],{},"From the outset, the TrustGraph engineers recognized that building a scalable AI platform meant embracing event-driven design. They needed a messaging layer that could support diverse workloads – from synchronous API calls to asynchronous data pipelines – without becoming a bottleneck. Apache Pulsar stood out for several reasons:",[321,7723,7724,7727,7730,7733],{},[324,7725,7726],{},"It “just works” for ops: Pulsar provides an operations-friendly way to connect complex processing elements. Its simplicity in managing communication patterns and scaling freed the team from writing custom pipeline glue code. Site reliability engineers could focus on deploying and monitoring AI capabilities rather than debugging message passing.",[324,7728,7729],{},"Native Pub\u002FSub Model: Pulsar’s publish-subscribe architecture was a perfect fit for TrustGraph’s decoupled microservices. Components like the Knowledge Graph Builder, AI Agent Runtime, and data processors communicate by publishing events and subscribing to the topics they care about – no direct dependencies needed. This decoupling means each service can evolve or scale independently, a critical requirement for a modular AI platform.",[324,7731,7732],{},"Persistent and Non-Persistent Topics: Pulsar uniquely lets you choose between persistent and non-persistent messaging. TrustGraph leverages this to balance reliability vs. latency. For critical data (e.g. ingesting documents into a knowledge base), TrustGraph uses persistent topics to guarantee delivery – ensuring no data is lost even if a service goes down. Conversely, for high-speed, ephemeral interactions (like an AI agent responding to a user query), TrustGraph uses non-persistent topics to minimize overhead and latency. This flexible messaging guarantees that each use-case gets the right trade-off between speed and safety.",[324,7734,7735],{},"Multi-Tenancy and Isolation: Pulsar’s built-in multi-tenancy (via tenants and namespaces) proved invaluable for TrustGraph’s vision of dynamic “Flows.” A Flow in TrustGraph is essentially an isolated AI pipeline or workspace. Pulsar’s tenant\u002Fnamespace model allows TrustGraph to create isolated channels for each Flow, ensuring that projects or tenants don’t interfere with each other’s data streams. This strong isolation was critical for enabling TrustGraph to support multiple concurrent AI agent workflows in one cluster, whether they belong to different teams, customers, or use cases.",[48,7737,7738],{},"In summary, Pulsar provided the scalability, flexibility, and reliability that TrustGraph needed in a messaging backbone. As Mark Adams, Co-founder of TrustGraph, put it, building on Pulsar gave them confidence that the communication layer would not be the limiting factor in scaling intelligent agents. It laid a rock-solid foundation on which to construct an AI platform ready for both rapid iteration and production-grade stability.",[40,7740,7742],{"id":7741},"architecting-trustgraph-with-pulsar-key-patterns","Architecting TrustGraph with Pulsar: Key Patterns",[48,7744,7745],{},"With Apache Pulsar at its core, TrustGraph’s architecture evolved a set of powerful patterns. These patterns illustrate how Pulsar’s features are used in practice to create an event-driven, modular AI system:",[1666,7747,7748,7751,7761,7767,7770,7773],{},[324,7749,7750],{},"Dynamic and Scalable “Flows”: In TrustGraph, a Flow represents a configurable pipeline of AI tasks (for example, a data ingestion flow or an agent reasoning flow). Some services are global (shared across all Flows), while others are flow-specific. Pulsar enables this dynamic behavior through dynamic queue naming and creation.Global Services (like configuration, knowledge base, and librarian APIs) listen on well-known, fixed Pulsar topics since they are always available and shared.",[324,7752,7753,7754,7757,7758],{},"Flow-Hosted Services (like a GraphRAG processor, Agent runtime, or custom embeddings service) spin up when a new Flow is started. TrustGraph automatically generates unique Pulsar topics for that Flow’s services. For example, if a Flow is named ",[4926,7755,7756],{},"research-flow",", the GraphRAG service in that flow might publish\u002Fsubscribe on topics named:",[4926,7759,7760],{},"non-persistent:\u002F\u002Ftg\u002Frequest\u002Fgraph-rag:research-flow",[324,7762,7763,7766],{},[4926,7764,7765],{},"non-persistent:\u002F\u002Ftg\u002Fresponse\u002Fgraph-rag:research-flow"," Each new Flow gets its own set of topics, isolating its traffic. Multiple Flows can run concurrently without stepping on each other’s messages – a huge win for multi-project and multi-tenant deployments. When the Flow is stopped, its topics can be torn down just as easily. This dynamic provisioning of queues means the platform can scale out new pipelines on the fly with full isolation, all thanks to Pulsar’s flexible naming and multi-tenancy.",[324,7768,7769],{},"Diverse Communication Patterns (Pub\u002FSub Flexibility): TrustGraph doesn’t force a one-size-fits-all messaging style; instead, it uses Pulsar to support different interaction patterns within the platform:Request\u002FResponse Messaging: For interactive services—such as an AI Agent API or the GraphRAG query service—TrustGraph sets up dedicated request and response topics. For example, when a user’s query hits the Agent service, it is published to a request topic, the agent processes it, and the answer comes back on a response topic tied to that user’s session or flow. This pub\u002Fsub request-response pattern feels like a direct call from the client’s perspective, but under the hood it’s decoupled and asynchronous. The client can await a response without knowing which specific service instance will handle it. This pattern gives synchronous behavior on top of asynchronous internals, combining interactivity with scalability.",[324,7771,7772],{},"Fire-and-Forget Ingestion: For one-way data pipelines like ingesting documents, TrustGraph uses a simpler fire-and-forget approach. A client (say, a data loader component or a user uploading a file) will publish data to an ingestion topic and immediately move on. Downstream processor services (e.g. a Text Load service or a Triples Store loader) are subscribed and will process the data in due course. Crucially, these ingestion topics are persistent in Pulsar. This guarantees that if a processor is slow or temporarily down, the data remains in the queue until processed, ensuring no loss. Developers benefit by not having to babysit the pipeline – they trust Pulsar to eventually deliver data when the consumers are ready, improving the system’s resilience to spikes or faults.",[324,7774,7775,7776,7779,7780,1154,7783,7786],{},"Centralized, Push-Based Configuration: Running a complex AI platform means lots of configuration: prompts for the LLM, tool definitions for agents, pipeline parameters, etc. TrustGraph chose to manage configuration changes through Pulsar as well, turning config into an event stream. There is a dedicated Pulsar topic (e.g. ",[4926,7777,7778],{},"persistent:\u002F\u002Ftg\u002Fconfig\u002Fconfig",") that acts as a central config channel. Whenever an administrator or developer updates a configuration – for instance, adjusting a prompt template or adding a new tool plugin – that update is published as a message on the config topic. All services that care about config subscribe to this channel. TrustGraph’s services (built on common base classes ",[4926,7781,7782],{},"FlowProcessor",[4926,7784,7785],{},"AsyncProcessor",") are designed to receive these config events and reconfigure themselves on the fly. The moment a new Flow is launched or a parameter changes, every component gets the memo via Pulsar and updates its behavior without needing a restart. This push-based config distribution makes the platform highly dynamic – developers can deploy new capabilities or tune the system in real-time, and Pulsar ensures a consistent configuration state across the distributed system.",[48,7788,7789],{},"These patterns highlight a theme: Pulsar decouples parts of the system while keeping them coordinated. Dynamic topic creation lets TrustGraph scale out new processing flows easily. Multiple messaging patterns let each service communicate in the style that fits its role. A config event stream keeps everything in sync. All of it is implemented on Pulsar’s robust pub\u002Fsub substrate, meaning it inherits Pulsar’s strengths like horizontal scalability, durability, and back-pressure handling.",[40,7791,7793],{"id":7792},"benefits-to-developers-and-ai-teams","Benefits to Developers and AI Teams",[48,7795,7796],{},"By weaving Pulsar so deeply into its design, TrustGraph reaps numerous benefits that directly address pain points developers often face in building AI systems:",[321,7798,7799,7802,7805,7808,7811],{},[324,7800,7801],{},"Easier Scaling: Need to handle more load? Simply add more consumers to a Pulsar topic to scale out a microservice – no complex rebalancing needed. Because each TrustGraph component processes messages from a queue, scaling is as straightforward as running another instance that subscribes to the same topic. For example, if the AI Agent requests spike, the team can spin up additional agent service containers; Pulsar will automatically distribute requests among them. This elasticity means the system can handle varying workloads on different parts of the AI pipeline without a hitch.",[324,7803,7804],{},"Resilience and Fault Tolerance: Pulsar’s persistent messaging ensures critical data isn’t lost if something fails. Developers don’t have to write custom retry logic or worry about data gaps – if the Knowledge Graph builder goes down for a bit, all pending documents remain queued. When it comes back up, it picks up where it left off. Also, thanks to the decoupled design, a failure in one component (e.g., the vector embedding service) won’t crash the entire platform. Messages will queue up until that service recovers, while the rest of the system continues unaffected. This isolation containing failures makes the overall platform more robust in production.",[324,7806,7807],{},"Flexibility for New Features: The dynamic Flow architecture allows teams to deploy new pipelines or custom components without modifying the core system. Because Pulsar handles the routing, a new service can be introduced by simply defining the topics it will use and plugging it in. This pluggable architecture means TrustGraph can evolve quickly. For instance, a developer could add a new “Sentiment Analysis” microservice into a Flow by having it subscribe to an intermediate topic – no need for a full redeploy or breaking existing flows. Pulsar’s multi-tenant setup means this can happen in an isolated way, so experimentation in one Flow won’t disrupt others.",[324,7809,7810],{},"Better Observability: With Pulsar as the central hub for all messages, it provides a one-stop view into the system’s activity. TrustGraph takes advantage of Pulsar’s metrics – like message rates, consumer backlogs, throughput, and latency per topic – to give developers deep insight into how each part of the platform is performing. These metrics feed into Grafana dashboards where the team can see, for example, if the “ingestion queue” is backing up or if the “response times on the agent request topic” are rising. Such observability helps pinpoint bottlenecks quickly (maybe a vector DB is slow, causing a backlog) and aids in capacity planning. It essentially turns Pulsar into a stethoscope on the health of the AI platform.",[324,7812,7813],{},"Faster Iteration: Perhaps most importantly, this Pulsar-driven architecture empowers faster development cycles. Because adding new flows or services is low-friction, developers can prototype new AI capabilities without weeks of pipeline engineering. The combination of fewer bottlenecks, auto-scaling behavior, safe fault handling, and real-time config updates means the team spends less time on infrastructure and more on innovating AI features. In practice, that could mean quickly trying a new large language model in the Agent service or connecting an experimental knowledge source – TrustGraph will handle the messaging and integration details, so the developer can focus on the AI logic.",[48,7815,7816],{},"All these benefits fundamentally spring from Pulsar’s role as a unified messaging layer. It abstracts away the hard parts of distributed communication (scaling, reliability, ordering, isolation), letting developers concentrate on building intelligent agents and knowledge pipelines.",[40,7818,7820],{"id":7819},"pulsar-in-action-a-day-in-the-life-with-trustgraph","Pulsar in Action: A Day in the Life with TrustGraph",[48,7822,7823],{},"To cement how Pulsar powers real-world usage of TrustGraph, let’s walk through a hypothetical scenario:",[48,7825,7826,7827,7829,7830,4003,7833,7836,7837,7840],{},"Meet Alice, an AI engineer at an enterprise, who is using TrustGraph to build a new AI-powered research assistant. She begins her day by defining a new processing Flow for the project, aptly named ",[4926,7828,7756],{},". When Alice starts this Flow via TrustGraph’s CLI, under the hood the platform spins up microservices for that Flow – an Agent service, a GraphRAG service, an Embeddings service, etc. – each with their own Pulsar topics. Alice doesn’t have to manually configure any queues; Pulsar automatically provisions topics like ",[4926,7831,7832],{},"tg\u002Frequest\u002Fgraph-rag:research-flow",[4926,7834,7835],{},"tg\u002Fresponse\u002Fgraph-rag:research-flow"," for her new Flow. Immediately, her Flow’s services begin running in isolation. In fact, a colleague can launch a separate ",[4926,7838,7839],{},"analysis-flow"," in parallel, and thanks to Pulsar, the two sets of services won’t conflict. This allows different teams to use TrustGraph on the same infrastructure, each with their own dedicated message streams.",[48,7842,7843],{},"Later that morning, Alice feeds a batch of documents (PDF reports) into TrustGraph for ingestion. As she uploads them via the Workbench UI, each document’s content is published as a message to the Text Load service’s Pulsar topic. The ingestion is designed as fire-and-forget – the upload request immediately returns, and Alice can go grab a coffee while TrustGraph pipelines the data. Pulsar’s persistent queue means even if the Text Load processor or downstream Knowledge Graph builder is busy, all documents will be queued reliably. After a brief break, Alice checks the dashboard: the documents are being processed one by one, and there are no errors. One of the processing containers did restart (maybe due to a transient error), but because of Pulsar, no data was lost and the pipeline resumed automatically once the service recovered. Alice silently thanks the decision to use Pulsar; in past projects with DIY messaging, a crash often meant writing custom retry logic or manual data cleanup, but not anymore.",[48,7845,7846,7847,7850,7851,7853],{},"In the afternoon, Alice decides to improve the AI agent’s behavior by tweaking its prompt and adding a new tool for it. She opens TrustGraph’s configuration UI and updates the prompt template and registers an external API as a new tool. The moment she hits “Save”, TrustGraph’s Config service publishes an update event to ",[4926,7848,7849],{},"tg\u002Fconfig\u002Fconfig"," topic. All running services in ",[4926,7852,7756],{}," receive this update within milliseconds, thanks to their Pulsar subscriptions. The Agent runtime immediately pulls in the new prompt and tool definitions – there’s no need to restart anything. Alice initiates a test query to her agent; it responds using the updated prompt format and can even call the new API tool as needed, all in real-time. This kind of live reconfiguration makes it incredibly easy for Alice to iterate on her AI agent’s capabilities. In traditional setups, such changes might require editing config files on multiple servers or restarting processes, disrupting the workflow. With Pulsar’s event-driven config, TrustGraph achieves seamless, centralized control.",[48,7855,7856,7857,7859],{},"Before wrapping up, Alice reviews the system’s performance. Using TrustGraph’s observability stack, she notices the message backlog on the ",[4926,7858,7756],{}," ingestion topic grew slightly during peak load, but then drained as additional consumers auto-scaled. The Grafana metrics (sourced from Pulsar) show healthy throughput. One insight stands out: the response queue for the Agent service shows occasional latency spikes. Investigating further, Alice realizes that complex user questions trigger multiple knowledge searches, slowing responses. She decides to allocate another instance of the GraphRAG service to that Flow to handle these heavy queries. Thanks to Pulsar, scaling out is straightforward – the new instance will simply become another consumer on the relevant topics. Sure enough, once deployed, the next test query is handled faster, as the load is now balanced. The bottleneck was resolved by a one-line configuration change to scale the service, without any code changes or downtime. This agility in tuning performance is a direct consequence of the Pulsar-based design.",[48,7861,7862],{},"By the end of the day, Alice has not only built a functioning AI research assistant, but she’s also iterated on it multiple times – all without struggling with messaging middleware. TrustGraph, empowered by Pulsar, took care of the heavy lifting: routing messages, preserving data, triggering reconfigurations, and scaling services on demand. For Alice, the developer experience is night-and-day compared to earlier projects. She can focus on crafting AI logic, confident that the event-driven backbone (powered by Pulsar and StreamNative’s expertise) will handle the rest.",[40,7864,7866],{"id":7865},"conclusion-pulsar-as-the-foundation-for-event-driven-ai","Conclusion: Pulsar as the Foundation for Event-Driven AI",[48,7868,7869],{},"The story of TrustGraph underscores a broader lesson for AI platform developers: a robust messaging backbone is the key to unlocking scalable, modular, event-driven systems. Apache Pulsar proved to be that backbone for TrustGraph – acting as the central nervous system that links independent AI modules into one intelligent whole. Its pub\u002Fsub model, dynamic queue management, multi-tenancy, and mix of persistent vs. transient messaging enabled TrustGraph to achieve a level of flexibility and resilience that would be hard to realize otherwise. By using Pulsar, the TrustGraph team and its users gained scalability, fault tolerance, and speed of iteration as first-class features of the architecture. Developers can add new capabilities without fear of breaking the system, ops engineers can sleep easier knowing spikes or failures won’t collapse the pipeline, and organizations can deploy multiple AI agent flows concurrently with confidence in their isolation and security.",[48,7871,7872],{},"In essence, Pulsar (with StreamNative’s enterprise support in the wings) serves as the foundation for TrustGraph’s vision of an AI platform. It demonstrates how an advanced event streaming technology can solve the pain points of building AI products: eliminating brittle point-to-point links, preventing data loss, simplifying scaling, and improving observability. For any team looking to build the next generation of AI systems – be it autonomous agents, real-time analytics, or context-driven LLM applications – the combination of TrustGraph’s modular framework and Pulsar’s event-driven backbone offers a compelling blueprint. Pulsar enabled TrustGraph to transform from an ambitious idea into a production-grade reality, reinforcing its role as a foundational enabler for event-driven AI platforms. The result is a story of technology empowering developers: with Apache Pulsar under the hood, TrustGraph can truly deliver on its promise of creating intelligent, context-aware AI agents at scale.",{"title":18,"searchDepth":19,"depth":19,"links":7874},[7875,7876,7877,7878,7879,7880],{"id":7710,"depth":19,"text":7711},{"id":7717,"depth":19,"text":7718},{"id":7741,"depth":19,"text":7742},{"id":7792,"depth":19,"text":7793},{"id":7819,"depth":19,"text":7820},{"id":7865,"depth":19,"text":7866},"2025-09-19","Discover how TrustGraph built its open-source AI platform on Apache Pulsar, using event-driven architecture to connect microservices, scale dynamic AI workflows, and ensure resilience. Learn why Pulsar became the backbone for modular, real-time AI systems.","\u002Fimgs\u002Fblogs\u002F68cd4268d2956659bed8f5f9_case-study-TrustGraph.png.png",{},{"title":7702,"description":7882},"blog\u002Fcase-study-apache-pulsar-as-the-event-driven-backbone-of-trustgraph",[821,3988],"D_U86o4OX7DXtc3ZmaGvCxk-LXY0GBILP_MlY3ohJ-o",{"id":7890,"title":7891,"authors":7892,"body":7893,"category":5376,"createdAt":290,"date":7981,"description":7982,"extension":8,"featured":294,"image":7983,"isDraft":294,"link":290,"meta":7984,"navigation":7,"order":296,"path":7985,"readingTime":7986,"relatedResources":290,"seo":7987,"stem":7988,"tags":7989,"__hash__":7990},"blogs\u002Fblog\u002Fdata-streaming-summit-spotlight-streaming-lakehouse-track.md","Data Streaming Summit Spotlight: Streaming Lakehouse Track",[6127],{"type":15,"value":7894,"toc":7973},[7895,7899,7902,7906,7909,7913,7927,7931,7954,7958,7961,7965],[40,7896,7898],{"id":7897},"theme","Theme",[48,7900,7901],{},"One governed data layer for streaming and analytics — Iceberg-native real-time with engines that span streams and tables.",[40,7903,7905],{"id":7904},"why-this-track-matters","Why this track matters",[48,7907,7908],{},"The Streaming-Augmented Lakehouse (SAL) is becoming the default blueprint: append-only streams landing in open tables, compaction for curated layers, and query engines that serve both interactive analytics and ML\u002FAI. This track shows how OpenAI, Dremio, Onehouse\u002FHudi, Doris, and StreamNative\u002FUrsa make it real — including governance with Unity Catalog.",[40,7910,7912],{"id":7911},"what-youll-learn","What you’ll learn",[321,7914,7915,7918,7921,7924],{},[324,7916,7917],{},"Designing SAL architectures: bronze append-only vs. compacted silver\u002Fgold and when to choose each.",[324,7919,7920],{},"Ingestion & concurrency at scale: non-blocking writers, CDC, and streaming compaction.",[324,7922,7923],{},"Governance & catalogs: Iceberg with Unity Catalog and cross-engine access patterns.",[324,7925,7926],{},"Engine choices: where Ursa, Flink, Dremio, Doris, and Hudi fit in end-to-end pipelines.",[40,7928,7930],{"id":7929},"highlights","Highlights",[321,7932,7933,7936,7939,7942,7945,7948,7951],{},[324,7934,7935],{},"Dremio — Streaming with Apache Iceberg",[324,7937,7938],{},"Motorq - Real-Time Lakehouse Ingestion with StreamNative’s Classic Engine and Ursa",[324,7940,7941],{},"Apache Doris (VeloDB) — Unlocking Real-Time Insights: Apache Doris in Stream and Lakehouse Integration",[324,7943,7944],{},"OpenAI — StreamLink: Real-Time Data Ingestion at OpenAI Scale",[324,7946,7947],{},"Onehouse\u002FHudi — High-Throughput Streaming in the Lakehouse with Non-Blocking Concurrency Control in Apache Flink & Hudi",[324,7949,7950],{},"Uber + Onehouse —Flink Streaming Ingestion to Cloud-Lake at Scale",[324,7952,7953],{},"Kentra — Schema Management and Streaming Data Products",[40,7955,7957],{"id":7956},"who-should-attend","Who should attend",[48,7959,7960],{},"Data platform teams standardizing on Iceberg or Delta Lake, analytics leaders, and architects bridging batch + stream under one governance model.",[40,7962,7964],{"id":7963},"join-us","Join us",[48,7966,7967,7972],{},[55,7968,7971],{"href":7969,"rel":7970},"https:\u002F\u002Fwww.eventbrite.com\u002Fe\u002Fdata-streaming-summit-san-francisco-2025-tickets-1432401484399?aff=oddtdtcreator",[264],"Register for DSS SF 2025"," (Sept 30) and bring your lakehouse questions. We look forward to seeing you there!",{"title":18,"searchDepth":19,"depth":19,"links":7974},[7975,7976,7977,7978,7979,7980],{"id":7897,"depth":19,"text":7898},{"id":7904,"depth":19,"text":7905},{"id":7911,"depth":19,"text":7912},{"id":7929,"depth":19,"text":7930},{"id":7956,"depth":19,"text":7957},{"id":7963,"depth":19,"text":7964},"2025-09-18","Discover the Streaming Lakehouse Track at DSS SF 2025 — learn how OpenAI, Dremio, Onehouse, Doris, and StreamNative are building Iceberg-native architectures that unify streaming and analytics with governance, compaction, and cross-engine access.","\u002Fimgs\u002Fblogs\u002F68b99f5b4dc2f17841943e4e_DSS-Agenda_four-tracks-1.png",{},"\u002Fblog\u002Fdata-streaming-summit-spotlight-streaming-lakehouse-track","4 min read",{"title":7891,"description":7982},"blog\u002Fdata-streaming-summit-spotlight-streaming-lakehouse-track",[5376,1332,1330,800,303],"NgONd3o388sIfDwQQXLGNrlL6gBRY4uslI0nXE5pwuo",{"id":7992,"title":7993,"authors":7994,"body":7995,"category":5376,"createdAt":290,"date":8050,"description":8051,"extension":8,"featured":294,"image":7983,"isDraft":294,"link":290,"meta":8052,"navigation":7,"order":296,"path":8053,"readingTime":7986,"relatedResources":290,"seo":8054,"stem":8055,"tags":8056,"__hash__":8059},"blogs\u002Fblog\u002Fdata-streaming-summit-spotlight-ai-stream-processing-track.md","Data Streaming Summit Spotlight: AI + Stream Processing Track",[6127],{"type":15,"value":7996,"toc":8044},[7997,8001,8004,8008,8031,8033,8036,8038],[40,7998,8000],{"id":7999},"why-this-track-exists","‍Why this track exists",[48,8002,8003],{},"AI is real-time or it isn’t useful. This track focuses on turning streams into features, decisions, and safe automation — with Apache Flink and event-driven runtimes.",[40,8005,8007],{"id":8006},"session-highlights","Session highlights",[321,8009,8010,8013,8016,8019,8022,8025,8028],{},[324,8011,8012],{},"Google — Google-Scale Stream Processing with Just SQL — simplifying massive pipelines.",[324,8014,8015],{},"Uber — Safe Streams at Scale — deployment safety for mission-critical Flink jobs.",[324,8017,8018],{},"Confluent — Flink Changelog Modes — correctness and efficiency for stateful processing.",[324,8020,8021],{},"Ververica — The Need for Speed — squeezing latency out of Flink.",[324,8023,8024],{},"Salesforce — Insights from 300B Telemetry Trace Spans\u002FDay with Flink — real-world scale and patterns.",[324,8026,8027],{},"FiveOneFour & SteamNative - Building Real-Time Data Architectures for AI Chat - Practical patterns for natural-feeling AI conversations.",[324,8029,8030],{},"StreamNative — From Events to Autonomy: An Event-Driven Runtime for Fully Autonomous Agents — bridging events and agents.",[40,8032,7957],{"id":7956},[48,8034,8035],{},"Flink practitioners, platform teams supporting AI\u002FML, and anyone standing up agentic systems.",[40,8037,7964],{"id":7963},[48,8039,8040,8043],{},[55,8041,7971],{"href":7969,"rel":8042},[264]," (Sept 30) and bring your questions. We look forward to seeing you there!",{"title":18,"searchDepth":19,"depth":19,"links":8045},[8046,8047,8048,8049],{"id":7999,"depth":19,"text":8000},{"id":8006,"depth":19,"text":8007},{"id":7956,"depth":19,"text":7957},{"id":7963,"depth":19,"text":7964},"2025-09-15","Discover the AI + Stream Processing track at Data Streaming Summit SF 2025. Explore sessions from Google, Uber, Confluent, Salesforce, Ververica, StreamNative & more on Flink, real-time pipelines, safe automation, and agentic systems. Register now!",{},"\u002Fblog\u002Fdata-streaming-summit-spotlight-ai-stream-processing-track",{"title":7993,"description":8051},"blog\u002Fdata-streaming-summit-spotlight-ai-stream-processing-track",[5376,8057,8058,303],"Flink","Event-Driven","x1lLPwlwz511LiSRKYIN9EKnI8KCkvxZWHWCohLXSp0",{"id":8061,"title":8062,"authors":8063,"body":8065,"category":821,"createdAt":290,"date":8489,"description":8490,"extension":8,"featured":294,"image":8491,"isDraft":294,"link":290,"meta":8492,"navigation":7,"order":296,"path":8493,"readingTime":3556,"relatedResources":290,"seo":8494,"stem":8495,"tags":8496,"__hash__":8497},"blogs\u002Fblog\u002Finside-apache-pulsars-millisecond-write-path-a-deep-performance-analysis.md","Inside Apache Pulsar’s Millisecond Write Path: A Deep Performance Analysis",[8064],"Renyi Wang",{"type":15,"value":8066,"toc":8475},[8067,8074,8077,8080,8083,8087,8090,8098,8103,8106,8110,8113,8118,8121,8126,8129,8143,8146,8149,8152,8156,8159,8170,8175,8179,8182,8202,8207,8211,8214,8225,8228,8242,8245,8248,8251,8254,8257,8260,8264,8267,8278,8283,8286,8290,8293,8296,8299,8304,8312,8314,8318,8324,8333,8340,8344,8347,8350,8353,8356,8359,8362,8365,8368,8371,8374,8378,8381,8384,8386,8389,8392,8395,8397,8400,8403,8407,8410,8415,8418,8435,8439,8442,8445,8451,8453,8458,8461,8464,8467,8469,8472],[48,8068,8069,8070],{},"‍Original author: Renyi Wang (Software Engineer, 360 Cloud Platform Messaging Middleware Team). The blog post was originally published at ",[55,8071,8072],{"href":8072,"rel":8073},"https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FQa2uzvO0oiBD9caDi763xw",[264],[48,8075,8076],{},"Apache Pulsar is an excellent distributed messaging system, purpose-built for modern, real-time data needs. Its compute–storage separation architecture offers significant advantages over many other open-source messaging queues, enabling both scalability and operational flexibility. Pulsar also comes with powerful features out of the box, such as delayed message delivery and cross-cluster geo-replication, making it resilient in mission-critical deployments.",[48,8078,8079],{},"What truly sets Pulsar apart, however, is its ability to combine enterprise-grade durability with millisecond latency. In optimized environments, Pulsar can persist messages across replicas in as little as 0.3 milliseconds, while achieving throughput of over 1.5 million messages per second on a single producer thread—all while preserving strict message ordering. This blog post takes a deep dive into how Pulsar achieves these remarkable performance characteristics and what makes its design uniquely capable of delivering both speed and reliability.",[48,8081,8082],{},"Testing environment (context & limits). All latency numbers in this blog post were measured in a low-latency setup: a producer client, two Pulsar brokers, and three bookies deployed in the same data center and connected through a single switch. Storage was backed by NVMe SSDs, and replication was configured as ensemble=2, write quorum=2, ack quorum=2. Under these conditions, one-way broker↔bookie network transit is approximately 50 µs, which is essential to achieving ~0.3 ms end-to-end durable writes. Traversing additional switches, crossing racks\u002FAZs\u002Fregions, enabling heavier inline processing (e.g., TLS inspection), or using slower disks will increase observed latencies. A fuller breakdown of hardware and settings appears later in this post.",[40,8084,8086],{"id":8085},"pulsar-write-latency-breakdown","Pulsar Write Latency Breakdown",[48,8088,8089],{},"A Pulsar producer sending a message incurs latency in two main stages:",[1666,8091,8092,8095],{},[324,8093,8094],{},"Client to Broker: The time for the client to send the message to the Pulsar broker.",[324,8096,8097],{},"Broker to Bookies: The time for the broker to persist the message by writing it to multiple bookie storage nodes in parallel (replicas in Apache BookKeeper).",[48,8099,8100],{},[384,8101],{"alt":18,"src":8102},"\u002Fimgs\u002Fblogs\u002F68c2de286f9c7d21189703d6_98e8a10d.png",[48,8104,8105],{},"In other words, the end-to-end publish latency includes the network hop from client to broker, plus the broker’s internal processing and the storage write to BookKeeper (which itself replicates to multiple bookies). We will analyze each part in detail in the next sections.",[40,8107,8109],{"id":8108},"broker-side-latency-analysis","Broker-Side Latency Analysis",[48,8111,8112],{},"First, we will look into the broker latency. Pulsar provides a metric pulsar_broker_publish_latency (illustrated in Figure 2.) which provides insight into the total time a message spends on the broker side, encompassing the period from its initial receipt to its successful write to BookKeeper and the subsequent client callback completion.",[48,8114,8115],{},[384,8116],{"alt":18,"src":8117},"\u002Fimgs\u002Fblogs\u002F68c2de286f9c7d21189703e5_1852d47c.png",[48,8119,8120],{},"To understand where the broker spends time handling incoming writes, we used Alibaba Arthas to capture a thread-level flame graph. The flame graph revealed that most of the broker latency is spent in a few network I\u002FO operations (sending data to bookies and waiting for responses). Figure 3 illustrates the broker-side flame graph, showing how the workload is distributed across threads.",[48,8122,8123],{},[384,8124],{"alt":18,"src":8125},"\u002Fimgs\u002Fblogs\u002F68c2de286f9c7d21189703dc_d218e99d.png",[48,8127,8128],{},"From this flame graph, we identified four main threads on the broker that contribute to publish latency. The breakdown of their time is as follows:",[321,8130,8131,8134,8137,8140],{},[324,8132,8133],{},"BookKeeperClientWorker-1 – accounts for roughly 10% of the end-to-end latency. Within this thread, about half of the time is spent dequeuing write requests, and the other half executing the bookie callback and enqueuing the next task.",[324,8135,8136],{},"BookKeeperClientWorker-2 – about 25% of latency. Approximately 40% of its time is spent taking entries from the bookie queue, 40% sending the write requests to bookies over the network, and ~20% running the broker-side callback when a bookie write succeeds.",[324,8138,8139],{},"pulsar-io-1 – about 25% of latency. It spends ~30% of the time executing tasks to write to the bookie’s network queue, ~30% reading the bookie’s network response and putting data into a queue, and ~30% waiting (idle or blocking on I\u002FO).",[324,8141,8142],{},"pulsar-io-2 – about 30% of latency. Roughly 40% is spent reading the client’s incoming write request and adding it to the bookie request queue, 40% on sending the ACK response back to the client, and ~20% waiting (idle).",[48,8144,8145],{},"These four threads—BookKeeperClientWorker and pulsar-io—collaborate to manage data flow and client responses. Specifically, BookKeeperClientWorker threads are responsible for sending data to BookKeeper, ensuring durable storage. Concurrently, pulsar-io threads handle responding to the client, confirming data receipt and processing. This distributed handling means that on the broker, there isn't a single, dominant bottleneck; instead, latency is diffused across various networking and callback processing stages, leading to a more robust and efficient write path.",[48,8147,8148],{},"Bookie-Side Latency Analysis",[48,8150,8151],{},"After understanding publish latency at the broker side, we know that a significant amount of time is spent writing data to bookies. Let's now examine bookie write latency. Once the broker forwards a message to BookKeeper, the BookKeeper bookies (storage nodes) take over. Pulsar’s storage layer is backed by BookKeeper, so understanding BookKeeper’s write path is crucial for optimizing latency. Below, we examine Pulsar’s data storage model, the internals of a bookie’s write operation, and how we tuned it for maximum performance.",[32,8153,8155],{"id":8154},"data-storage-model-in-pulsarbookkeeper","Data Storage Model in Pulsar\u002FBookKeeper",[48,8157,8158],{},"Pulsar’s data model is designed to handle millions of topics with high throughput. Each topic in Pulsar is backed by a BookKeeper ledger (an append-only log stored on bookies). Multiple ledgers are aggregated and written to ledger files in bookies (similar to a commit log in a database or RocksDB), which is optimized for sequential writes. This is illustrated in Figure 4.",[321,8160,8161,8164,8167],{},[324,8162,8163],{},"Bookies batch and sort writes to optimize disk access. Within a single topic, data is written in order, which means when that topic is consumed, the data is mostly sequential on disk. This improves read efficiency by reducing random disk seeks (fewer RocksDB lookups and disk reads).",[324,8165,8166],{},"Bookies employ a multi-tier caching mechanism. When bookies write messages, the data is first written to a Write Cache (in-memory) as well as to the Write-Ahead Log (WAL), and later to the main ledger storage. Consumers read from the Read Cache if possible; if the data is not in cache, the consumer will query RocksDB (which indexes the ledger files) to find the data location on disk. That data is then fetched and also put into the Read Cache for future reads.",[324,8168,8169],{},"At any given time, on each bookie disk, only one ledger file is open for writes. Each bookie also uses a RocksDB instance (for indexing entry locations). This design maximizes sequential write throughput by appending to a single file per disk at a time.",[48,8171,8172],{},[384,8173],{"alt":18,"src":8174},"\u002Fimgs\u002Fblogs\u002F68c2de286f9c7d21189703df_e78030cd.png",[32,8176,8178],{"id":8177},"write-process-in-a-bookie","Write Process in a Bookie",[48,8180,8181],{},"When a Pulsar broker publishes a message to a topic, the message is internally forwarded to BookKeeper clients and then to the bookie storage nodes. The end-to-end write process on a bookie is illustrated in Figure 5 and described as follows (solid arrows indicate synchronous\u002Fblocking steps in the flow):",[1666,8183,8184,8187,8190,8193,8196,8199],{},[324,8185,8186],{},"Broker to bookies: The Pulsar broker’s BookKeeper client selects a set of bookies (the ensemble for that ledger, e.g. 2 or 3 replicas) and sends the write request to all bookies in parallel.",[324,8188,8189],{},"Write to cache: Each bookie receives the write request and immediately writes the entry to its in-memory write cache. (By default, the write cache is sized to 1\u002F4 of the bookie’s heap and is off-heap memory.) The write is acknowledged in memory and will be flushed to disk asynchronously (allowing batching).",[324,8191,8192],{},"Write to WAL (Journal): If journaling is enabled (it is by default), the bookie also appends the entry to a journal file (WAL) on disk. This is done to ensure durability. The journal write is buffered and triggers a flush based on the journal’s flush policy (detailed below).",[324,8194,8195],{},"Journal thread flush: The bookie’s Journal thread pulls pending entries from a queue and writes them into an in-memory buffer. When this buffer is full or a flush condition is met, the data is written to the OS page cache (accumulating data to eventually be written to the physical disk).",[324,8197,8198],{},"Force write to disk: A separate ForceWrite thread is responsible for ensuring durability. It takes data that has been written to the page cache and issues an fsync (flush) to force the data to persist to the physical disk (this is often the slowest step, as it involves actual disk I\u002FO).",[324,8200,8201],{},"Acknowledge back to broker: Once the data is safely written (WAL fsynced) on the bookie, it sends a write acknowledgment back to the broker. After the broker receives write acknowledgments from a quorum of bookies, the broker then knows this entry is durably stored and can trigger the client’s callback to signal a successful publish.",[48,8203,8204],{},[384,8205],{"alt":18,"src":8206},"\u002Fimgs\u002Fblogs\u002F68c2de286f9c7d21189703e2_df1c90ff.png",[32,8208,8210],{"id":8209},"journal-flush-policy-tuning","Journal Flush Policy Tuning",[48,8212,8213],{},"Our previous analysis showed that journal flushing contributes to bookie write latency. BookKeeper flushes the journal (WAL) to disk when any of the following conditions is met (whichever comes first):",[321,8215,8216,8219,8222],{},[324,8217,8218],{},"Max wait time: 1 ms by default. The journal thread will flush the accumulated writes if 1 millisecond has passed since the last flush, even if there is little data (ensuring low latency).",[324,8220,8221],{},"Max batch size: 512 KB of data by default. If the buffered writes reach 512KB, it will flush immediately (to optimize throughput by writing larger sequential chunks).",[324,8223,8224],{},"Flush when idle: Disabled by default. If this is enabled, the journal will also flush as soon as the write queue becomes empty (i.e., no more writes to batch). This avoids waiting when there's a lull in traffic. We enabled this in our test to reduce latency for sporadic writes.",[48,8226,8227],{},"For our extreme latency tuning, we adjusted the bookie configuration as follows:",[321,8229,8230,8233,8236,8239],{},[324,8231,8232],{},"Enabled force flush of journal data from page cache to disk on each flush (journalSyncData=true) to ensure data is actually on disk before acknowledging.",[324,8234,8235],{},"Ensured the journal is actually written (journalWriteData=true, which is usually true by default).",[324,8237,8238],{},"Enabled flushing even when the queue is not full (journalFlushWhenQueueEmpty=true), which is safe for high-IOPS SSDs. This makes even single entries get flushed without delay (useful under light load; under heavy load, the batch triggers will naturally dominate).",[324,8240,8241],{},"Aligned the journal writes to disk sector boundaries for efficiency: we set journalAlignmentSize=4096 and readBufferSizeBytes=4096 (4 KB) to match the SSD’s physical sector size. (Note: The alignment settings require journalFormatVersionToWrite=5 or higher to take effect.)",[48,8243,8244],{},"These settings optimize the bookie to flush to NVMe disks as quickly as possible, trading off some CPU\u002FIO overhead for the lowest possible latency.",[48,8246,8247],{},"journalSyncData=true",[48,8249,8250],{},"journalWriteData=true",[48,8252,8253],{},"journalFlushWhenQueueEmpty=true",[48,8255,8256],{},"journalAlignmentSize=4096",[48,8258,8259],{},"readBufferSizeBytes=4096",[32,8261,8263],{"id":8262},"bookie-side-flame-graph-analysis","Bookie-Side Flame Graph Analysis",[48,8265,8266],{},"After applying the above optimizations, we profiled the bookie’s performance. The thread-level flame graph on the bookie side (Figure 5) showed that there were no abnormal blocking delays in the write path—each thread is doing its part efficiently. The time breakdown across key bookie threads was:",[321,8268,8269,8272,8275],{},[324,8270,8271],{},"Journal thread – ~17% of the total request latency. It spends roughly 30% of its time reading entries from the journal queue, 30% writing data into the OS page cache, and ~25% enqueueing data into the force-write queue (handing off to the ForceWrite thread).",[324,8273,8274],{},"ForceWrite thread – ~48% of latency (this is where the heavy disk I\u002FO happens). About 10% of its time is spent dequeuing data from the force-write queue, ~80% on forcing the data from page cache to disk (fsync calls), and ~10% handling the completion (notifying and queuing the response back to the network thread).",[324,8276,8277],{},"Bookie I\u002FO thread – ~28% of latency. This thread handles network I\u002FO. Around 30% of the time goes to parsing incoming write requests and adding them to the journal queue, ~30% executing tasks in the network queue (sending the acknowledgment back to the broker), and ~30% waiting (idle or blocking on network waits).",[48,8279,8280],{},[384,8281],{"alt":18,"src":8282},"\u002Fimgs\u002Fblogs\u002F68c2de286f9c7d21189703d9_4528a36f.png",[48,8284,8285],{},"With these optimizations, the bookie effectively pipelines the work: writing to the journal and flushing to disk happens concurrently with network communication. No single thread is stalling the process significantly, and the overall bookie write path is highly efficient.",[40,8287,8289],{"id":8288},"performance-testing","Performance Testing",[48,8291,8292],{},"Having optimized brokers and bookies for low latency, we conducted end-to-end throughput and latency tests to determine Pulsar's performance limits. To focus on single-thread performance and preserve message ordering, we utilized a single topic (partition) and producer. We tested both synchronous and asynchronous publishing modes with varying message sizes. The test environment and results are detailed below.",[48,8294,8295],{},"Test Environment: Brokers were deployed on 2 nodes (each 4 CPU cores, 16 GB RAM, 25 Gb Ethernet). Bookies were on 3 nodes (each with four 4 TB NVMe SSDs, and 25 Gb network). All nodes are deployed under the same network switch. The topic was configured with 2 bookie replicas (ensemble size=2, write quorum=2, ack quorum=2), meaning each message is written to 2 bookies and acknowledged when both succeed (for strong durability).",[48,8297,8298],{},"Before testing, we created a partitioned topic with 1 partition and set the persistence to 2 replicas and ack quorum 2:",[8300,8301,8303],"h1",{"id":8302},"create-a-single-partition-topic","Create a single-partition topic",[48,8305,8306,8307,8311],{},"bin\u002Fpulsar-admin --admin-url ",[55,8308,8309],{"href":8309,"rel":8310},"http:\u002F\u002F192.0.0.1:8080",[264]," topics create-partitioned-topic persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftest_qps -p 1",[48,8313,3931],{},[8300,8315,8317],{"id":8316},"set-the-namespace-persistence-bookie-ensemble-size-2-write-quorum-2-ack-quorum-2","Set the namespace persistence: bookie ensemble size = 2, write quorum = 2, ack quorum = 2",[48,8319,8306,8320,8323],{},[55,8321,8309],{"href":8309,"rel":8322},[264]," namespaces set-persistence public\u002Fdefault \\",[8325,8326,8331],"pre",{"className":8327,"code":8329,"language":8330},[8328],"language-text","--bookkeeper-ensemble 2 \\\n\n--bookkeeper-write-quorum 2 \\\n\n--bookkeeper-ack-quorum 2\n","text",[4926,8332,8329],{"__ignoreMap":18},[48,8334,8335,8336,8339],{},"For synchronous publishing (each send waits for acknowledgment before sending the next), with pulsar-perf tool we used ",[4926,8337,8338],{},"--batch-max-messages 1"," and we used a single producer thread with no batching (to observe latency per message):",[8300,8341,8343],{"id":8342},"synchronous-publish-single-thread-measuring-latency","Synchronous publish, single thread, measuring latency",[48,8345,8346],{},"bin\u002Fpulsar-perf produce persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftest_qps \\",[48,8348,8349],{},"-u pulsar:\u002F\u002F192.0.0.1:6650 \\",[48,8351,8352],{},"--disable-batching \\",[48,8354,8355],{},"--batch-max-messages 1 \\",[48,8357,8358],{},"--max-outstanding 1 \\",[48,8360,8361],{},"--rate 500000 \\",[48,8363,8364],{},"--test-duration 120 \\",[48,8366,8367],{},"--busy-wait \\",[48,8369,8370],{},"--size 1024 > sync_1024.log &",[48,8372,8373],{},"For asynchronous publishing with batching, we allowed a large batch and higher outstanding messages to maximize throughput (while preserving message order on a single thread). We also enabled compression (LZ4) to improve throughput for larger messages:",[8300,8375,8377],{"id":8376},"asynchronous-publish-batching-and-compression-enabled-measuring-throughput","Asynchronous publish, batching and compression enabled, measuring throughput",[48,8379,8380],{},"export OPTS=\"-Xms10g -Xmx10g -XX:MaxDirectMemorySize=10g\"",[48,8382,8383],{},"bin\u002Fpulsar-perf produce persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftest_qps_async \\",[48,8385,8349],{},[48,8387,8388],{},"--batch-max-messages 10000 \\",[48,8390,8391],{},"--memory-limit 2G \\",[48,8393,8394],{},"--rate 2000000 \\",[48,8396,8367],{},[48,8398,8399],{},"--compression LZ4 \\",[48,8401,8402],{},"--size 1024 > async_1024.log &",[32,8404,8406],{"id":8405},"throughput-and-latency-results","Throughput and Latency Results",[48,8408,8409],{},"After running the tests for long enough duration in each scenario, we gathered the maximum sustainable throughput (QPS) and the average latency observed, for various message sizes. The results are summarized below:",[48,8411,8412],{},[384,8413],{"alt":5878,"src":8414},"\u002Fimgs\u002Fblogs\u002F68c2d9ea6f9c7d211893391f_iShot_2025-09-11_22.16.55.png",[48,8416,8417],{},"Key Takeaways:",[321,8419,8420,8423,8426,8432],{},[324,8421,8422],{},"In synchronous mode, a single producer could send ~3200–3400 messages per second for small messages (up to 16KB), limited by the one-at-a-time round-trip to the broker. The average end-to-end latency for each message (client send -> stored on 2 bookies -> ack received) was only about 0.3 milliseconds! This is incredibly low and mainly consists of network propagation and context switching time. Even a 512 KB message was acknowledged in ~1.4 ms on average (throughput ~2.75 Gb\u002Fs), showing Pulsar’s ability to handle large messages with low latency.",[324,8424,8425],{},"In asynchronous mode with batching, Pulsar achieved over 1 million writes per second on a single producer thread to a single topic. With 1 KB messages, we saw about 1.06 million msgs\u002Fs (~8.3 Gb\u002Fs). With compression enabled (LZ4), the throughput increased to about 1.5 million msgs\u002Fs for 128-byte messages (since compression reduces the data size, effectively pushing more messages through per second). The trade-off was a higher average latency of ~5–6 ms (because batches of messages are sent and flushed together).",[324,8427,8428,8429,8431],{},"At very high QPS with small messages (128 B and 1 KB), throughput is constrained primarily by per-message CPU overhead on the broker\u002Fbookie (Netty, callbacks), callback scheduling and GC, plus the journal\u002Fforce-write (fsync) pipeline on bookies — the link is ",[44,8430,4912],{}," NIC-limited in these cases. As message size grows (e.g., 16 KB), the bottleneck shifts toward NIC and disk throughput, while GC remains a secondary factor. In such tests, the 25 Gb\u002Fs network was nearly saturated (e.g., ~9.5 Gb\u002Fs per bookie for 16 KB messages, which is ~19 Gb\u002Fs total for 2 bookies).",[324,8433,8434],{},"Importantly, even in asynchronous mode, Pulsar maintains message order. The Pulsar client library and broker ensure that callbacks are executed in order for a given producer, so batching does not reorder messages. Also, using multiple threads did not improve throughput for a single topic\u002Fpartition because Pulsar uses a single IO thread per partition to preserve ordering (all messages for one partition go through the same channel and IO thread).",[32,8436,8438],{"id":8437},"disk-io-microbenchmark","Disk I\u002FO Microbenchmark",[48,8440,8441],{},"To better understand the lower bound of latency, we also measured the raw disk performance for fsync on the NVMe drives. Using fio, we simulated a single-thread writing 1KB to the page cache and immediately fsyncing (forced flush to disk):",[48,8443,8444],{},"fio --name=fsync_test --filename=\u002Fdata2\u002Ftestfile --bs=1k --size=1k --rw=write \\",[8325,8446,8449],{"className":8447,"code":8448,"language":8330},[8328],"--ioengine=sync --fsync=1 --numjobs=1 --iodepth=1 --direct=0 \\\n\n--group_reporting --runtime=60 --time_based\n",[4926,8450,8448],{"__ignoreMap":18},[48,8452,3931],{},[48,8454,8455],{},[384,8456],{"alt":18,"src":8457},"\u002Fimgs\u002Fblogs\u002F68c2de286f9c7d21189703e8_4c5e2de6.png",[48,8459,8460],{},"The result showed that an NVMe disk can handle a single-threaded sequential write+fsync in roughly 44 microseconds on average (about 18 µs to write to the page cache and 26 µs to flush to disk). In our Pulsar bookie tests, a single message fsync (journal write) took on the order of ~100 µs. The slight increase is due to additional overhead in the bookie (thread context switches, queue synchronization, etc., as seen in the flame graph breakdown).",[48,8462,8463],{},"Another factor in end-to-end latency is network propagation. Within the same availability zone (low network latency environment), we observed a one-way network transit time of roughly 0.05 ms (50 µs) between broker and bookie. Since our test used two bookies and required both to acknowledge, the client’s message experienced two network hops (to two bookies) plus the return hop from the broker.",[48,8465,8466],{},"Combining these factors: ~100 µs to durably write to an NVMe on each bookie, plus ~50 µs network each way, plus some processing overhead, it matches our observed ~0.3 ms end-to-end latency for a synchronous write with 2 replicas. This confirms that Pulsar’s architecture, when running on high-performance hardware (NVMe SSDs, 25GbE network), can indeed achieve sub-millisecond durable message writes.",[40,8468,2125],{"id":2122},[48,8470,8471],{},"This deep dive into Apache Pulsar’s performance demonstrates its ability to achieve ultra-low latency and high throughput with the right tuning and hardware. By leveraging a tiered architecture (separating compute and storage), optimizing write paths, and batching intelligently, Pulsar was able to reliably persist messages to multiple NVMe-backed replicas and acknowledge the client in about 0.3 milliseconds on average. In asynchronous mode, a single producer on one topic achieved on the order of 1 million messages per second, and up to 1.5 million msgs\u002Fs with compression, all while preserving message ordering.",[48,8473,8474],{},"Such performance is impressive for a distributed messaging system with strong durability guarantees. It showcases that Apache Pulsar’s design – with its write-ahead logs, caches, and efficient BookKeeper storage – can push the boundaries of messaging speed. For developers with demanding low-latency, high-throughput messaging needs, Pulsar’s architecture offers a compelling solution that can deliver lightning-fast data streaming without sacrificing reliability.",{"title":18,"searchDepth":19,"depth":19,"links":8476},[8477,8478,8484,8488],{"id":8085,"depth":19,"text":8086},{"id":8108,"depth":19,"text":8109,"children":8479},[8480,8481,8482,8483],{"id":8154,"depth":279,"text":8155},{"id":8177,"depth":279,"text":8178},{"id":8209,"depth":279,"text":8210},{"id":8262,"depth":279,"text":8263},{"id":8288,"depth":19,"text":8289,"children":8485},[8486,8487],{"id":8405,"depth":279,"text":8406},{"id":8437,"depth":279,"text":8438},{"id":2122,"depth":19,"text":2125},"2025-09-11","Discover how Apache Pulsar achieves sub-millisecond durable writes and 1M+ msgs\u002Fsec throughput. A deep dive into its high-performance write path design.","\u002Fimgs\u002Fblogs\u002F68c2e16ccf31a040e2f87e3e_Inside-Apache-Pulsar’s-Millisecond-Write-Path-no-logo.png",{},"\u002Fblog\u002Finside-apache-pulsars-millisecond-write-path-a-deep-performance-analysis",{"title":8062,"description":8490},"blog\u002Finside-apache-pulsars-millisecond-write-path-a-deep-performance-analysis",[821,7347],"-kXZWqQ36BCf0Ky-Gs33qYeEiJS9BPoc8x3EGB8FMZA",{"id":8499,"title":8500,"authors":8501,"body":8502,"category":7338,"createdAt":290,"date":8632,"description":8633,"extension":8,"featured":294,"image":8634,"isDraft":294,"link":290,"meta":8635,"navigation":7,"order":296,"path":7184,"readingTime":4475,"relatedResources":290,"seo":8636,"stem":8637,"tags":8638,"__hash__":8639},"blogs\u002Fblog\u002Fapache-pulsar-4-1-release-announcement.md","Apache Pulsar 4.1: 560+ Improvements That Prove Open Source Innovation Velocity",[28],{"type":15,"value":8503,"toc":8624},[8504,8507,8511,8514,8531,8535,8538,8541,8545,8549,8552,8555,8559,8562,8565,8569,8572,8576,8579,8583,8586,8590,8593,8596,8600,8603,8606,8609,8612,8615],[48,8505,8506],{},"Today marks a significant milestone for the Apache Pulsar ecosystem with the release of version 4.1, featuring over 560 fixes and improvements. This isn't just another release—it's a testament to what happens when an active, growing community collaborates at scale to push the boundaries of real-time messaging and streaming.",[40,8508,8510],{"id":8509},"the-numbers-tell-the-story","The Numbers Tell the Story",[48,8512,8513],{},"While proprietary messaging platforms release incremental updates on corporate timelines, Apache Pulsar 4.1 demonstrates the raw innovation velocity that only open source can deliver. These 560+ improvements span every critical aspect of the platform:",[321,8515,8516,8519,8522,8525,8528],{},[324,8517,8518],{},"Security hardening with proactive CVE fixes and enhanced TLS authentication",[324,8520,8521],{},"Performance optimizations across broker operations and load balancing",[324,8523,8524],{},"Developer experience upgrades including new CLI capabilities and better debugging tools",[324,8526,8527],{},"Operational excellence improvements for production deployments",[324,8529,8530],{},"Reliability enhancements for mission-critical workloads",[40,8532,8534],{"id":8533},"community-driven-innovation-at-scale","Community-Driven Innovation at Scale",[48,8536,8537],{},"What makes this release particularly compelling isn't just the volume of improvements—it's the collaborative process behind them. Our vibrant Apache Pulsar community has grown substantially, bringing together developers from organizations worldwide who are solving real production challenges and contributing those solutions back to the ecosystem.",[48,8539,8540],{},"This community-driven approach creates a virtuous cycle: more users means more diverse use cases, which generates more contributions, which results in a more robust platform for everyone. It's a competitive advantage that proprietary platforms simply cannot match.",[40,8542,8544],{"id":8543},"key-highlights-for-developers","Key Highlights for Developers",[3933,8546,8548],{"id":8547},"performance-at-scale-pip-430-broker-cache-improvements","Performance at Scale: PIP-430 Broker Cache Improvements",[48,8550,8551],{},"One of the standout improvements in 4.1 is PIP-430, which addresses fundamental inefficiencies in broker cache eviction mechanisms. This improvement draws inspiration from cutting-edge research, specifically the S3FIFO cache eviction algorithm that demonstrated superior performance across thousands of real-world cache traces. While PIP-430 doesn't implement S3FIFO directly, it applies similar principles of intelligent eviction based on access patterns rather than simple timestamp-based approaches.",[48,8553,8554],{},"The technical impact is substantial: PIP-430 introduces a centralized eviction mechanism using a global queue that tracks cached entries in insertion order, replacing expensive per-ledger iteration with a single periodic task. The new \"expected read count\" strategy intelligently retains entries with higher utility, particularly benefiting high fan-out catch-up reads and Key_Shared subscriptions. For operators running brokers with large numbers of active topics, this translates to reduced CPU overhead, improved cache hit rates, and decreased load on BookKeeper and tiered storage.",[3933,8556,8558],{"id":8557},"native-queuing-semantics-building-on-years-of-production-success","Native Queuing Semantics: Building on Years of Production Success",[48,8560,8561],{},"While Apache Kafka has spent over two years struggling to implement basic queuing semantics through KIP-932—with the feature still marked as \"early access\" and lacking essential capabilities like Dead Letter Queues—Apache Pulsar 4.1 continues to refine its battle-tested queuing foundation. Pulsar has offered true queuing semantics through shared subscriptions since its early days, enabling multiple consumers to process messages from the same partition with individual message acknowledgment.",[48,8563,8564],{},"The DLQ improvements in 4.1, including PIP-399's enhanced metric reporting for delayed queues, build upon this mature queuing infrastructure. Pulsar's shared subscription model naturally supports complex retry scenarios, poison message handling, and flexible consumer scaling—features that Kafka is still trying to implement through shared groups. When your queuing semantics are native to the platform rather than bolted on as an afterthought, you can focus on sophisticated enhancements rather than basic functionality.",[3933,8566,8568],{"id":8567},"enhanced-security-posture","Enhanced Security Posture",[48,8570,8571],{},"Pulsar 4.1 includes multiple security improvements, addressing CVEs promptly and enhancing authentication mechanisms. For enterprises evaluating messaging platforms, this proactive security stance demonstrates the platform's production readiness.",[3933,8573,8575],{"id":8574},"developer-experience-pip-435-cli-enhancements","Developer Experience: PIP-435 CLI Enhancements",[48,8577,8578],{},"PIP-435 adds timestamp-based message consumption capabilities to the client CLI, enabling developers to consume messages within specific time ranges. This seemingly simple addition addresses a common debugging and data recovery scenario that previously required custom tooling.",[3933,8580,8582],{"id":8581},"operational-excellence","Operational Excellence",[48,8584,8585],{},"Blue-green migration improvements, enhanced monitoring capabilities, and better resource management features demonstrate Pulsar's commitment to operational simplicity at enterprise scale. The release also includes fixes for metric reporting (like the delayed queue metrics in PIP-399) that improve observability in production environments.",[40,8587,8589],{"id":8588},"setting-the-pace-not-following-it","Setting the Pace, Not Following It",[48,8591,8592],{},"The velocity demonstrated in Apache Pulsar 4.1 represents something fundamental about open source innovation. While competitors are constrained by quarterly roadmaps and corporate decision-making processes, the Pulsar community responds directly to real-world needs with rapid iteration and deployment.",[48,8594,8595],{},"This isn't just about feature velocity—it's about the quality of innovation that emerges when the people building the software are the same people using it in production every day.",[40,8597,8599],{"id":8598},"a-massive-thank-you","A Massive Thank You",[48,8601,8602],{},"None of this would be possible without our incredible Apache Pulsar community. From the contributors who submitted patches and fixes, to the organizations that shared their production experiences, to the maintainers who reviewed and integrated hundreds of contributions—this release represents a truly collaborative effort.",[48,8604,8605],{},"The growing energy and engagement in the Pulsar community continues to prove that Apache Pulsar isn't just relevant—it's setting the standard for what modern messaging and streaming platforms should be.",[40,8607,8608],{"id":1727},"What's Next",[48,8610,8611],{},"Apache Pulsar 4.1 is available now. For organizations evaluating messaging platforms, this release demonstrates not just current capabilities, but the innovation trajectory that only a thriving open source community can deliver.",[48,8613,8614],{},"The question isn't whether Apache Pulsar is relevant—it's whether your current messaging platform can innovate at this pace.",[48,8616,8617,8618,8623],{},"Ready to experience the Apache Pulsar difference? Check out the",[55,8619,8622],{"href":8620,"rel":8621},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fmilestone\u002F40?closed=1",[264]," full release notes"," and join our growing community.",{"title":18,"searchDepth":19,"depth":19,"links":8625},[8626,8627,8628,8629,8630,8631],{"id":8509,"depth":19,"text":8510},{"id":8533,"depth":19,"text":8534},{"id":8543,"depth":19,"text":8544},{"id":8588,"depth":19,"text":8589},{"id":8598,"depth":19,"text":8599},{"id":1727,"depth":19,"text":8608},"2025-09-10","Discover Apache Pulsar 4.1: 560+ improvements driven by open-source innovation, enhancing security, performance, developer experience, and operational excellence for real-time messaging and streaming.","\u002Fimgs\u002Fblogs\u002F68c1823dc431fb8796642d5b_Pulsar-4.1.png",{},{"title":8500,"description":8633},"blog\u002Fapache-pulsar-4-1-release-announcement",[821,7347,302],"OTBLoPry0phC-ul1NYROncefJW8CT4Fri8bIHlKgD9w",{"id":8641,"title":8642,"authors":8643,"body":8644,"category":821,"createdAt":290,"date":8985,"description":8986,"extension":8,"featured":294,"image":8987,"isDraft":294,"link":290,"meta":8988,"navigation":7,"order":296,"path":8989,"readingTime":3556,"relatedResources":290,"seo":8990,"stem":8991,"tags":8992,"__hash__":8993},"blogs\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-7-pulsar-security-for-kafka-admins.md","Pulsar Newbie Guide for Kafka Engineers (Part 7): Pulsar Security for Kafka Admins",[808,809,810],{"type":15,"value":8645,"toc":8977},[8646,8649,8652,8656,8659,8662,8682,8685,8696,8699,8704,8707,8710,8724,8727,8730,8733,8737,8740,8748,8753,8761,8764,8767,8770,8773,8778,8781,8786,8789,8797,8800,8804,8821,8825,8828,8836,8839,8856,8861,8869,8872,8888,8894,8900,8904,8921,8925,8945,8948,8950,8952,8954,8961,8964,8971],[48,8647,8648],{},"‍TL;DR",[48,8650,8651],{},"This post covers how authentication and authorization work in Pulsar, drawing parallels to Kafka’s security. Pulsar supports pluggable auth mechanisms (TLS, OAuth2, JWT tokens, Kerberos, etc.) and fine-grained authorization (permissions at topic or namespace level). We’ll explain how to set up common auth methods (e.g., mutual TLS or token-based auth, akin to Kafka SASL\u002FSSL), and how Pulsar’s multi-tenant design makes ACL management both powerful and a bit different from Kafka’s. In short: Pulsar can do everything Kafka can (SSL encryption, SASL auth, ACLs) and more, like multi-tenant isolation and token-based auth built in.",[40,8653,8655],{"id":8654},"authentication-in-pulsar","Authentication in Pulsar",[48,8657,8658],{},"Pulsar, like Kafka, can be run in plaintext (no auth) or with authentication enabled. By default, Pulsar has no authentication required (open cluster). In production you’ll want to enable one or more auth methods.",[48,8660,8661],{},"Supported auth providers in Pulsar include:",[321,8663,8664,8667,8670,8673,8676,8679],{},[324,8665,8666],{},"TLS authentication (using client certificates – similar to Kafka’s SSL client auth).",[324,8668,8669],{},"Token-based authentication (using JWTs or arbitrary tokens).",[324,8671,8672],{},"OAuth2 (good for cloud identity integration).",[324,8674,8675],{},"Kerberos (yes, Pulsar can use Kerberos SASL via JAAS, which is analogous to Kafka’s GSSAPI).",[324,8677,8678],{},"Basic (username\u002Fpassword, typically not used in production unless over TLS).",[324,8680,8681],{},"Athenz (Yahoo’s system) for those in specific environments.",[48,8683,8684],{},"These correspond to Kafka’s SASL mechanisms:",[321,8686,8687,8690,8693],{},[324,8688,8689],{},"Kafka SASL_SSL with PLAIN or SCRAM -> Pulsar’s “Basic” or token can cover this (though token is more robust because it’s not plaintext creds each time, it’s signed tokens).",[324,8691,8692],{},"Kafka SASL_SSL with GSSAPI (Kerberos) -> Pulsar’s Kerberos option.",[324,8694,8695],{},"Kafka SSL client certs -> Pulsar TLS auth.",[48,8697,8698],{},"Configuring Authentication:\nIn Pulsar, you enable authentication on brokers (and proxies if using). For example, in broker.conf:",[48,8700,8701],{},[384,8702],{"alt":5878,"src":8703},"\u002Fimgs\u002Fblogs\u002F68c04db18a1ab7720303be65_iShot_2025-09-09_23.54.13.png",[48,8705,8706],{},"This would enable both token and TLS auth providers. Pulsar can allow multiple auth methods at once (it will try to authenticate clients using each provider in turn).",[48,8708,8709],{},"Clients then must authenticate using one of these:",[321,8711,8712,8715,8718,8721],{},[324,8713,8714],{},"For token: they provide a token (usually a JWT) on connect (e.g., using AuthenticationFactory.token(\"tokenString\") in Java client).",[324,8716,8717],{},"For TLS: they present a client certificate signed by a trusted CA to the broker.",[324,8719,8720],{},"For OAuth2: they perform the OAuth flow (Pulsar client library supports it by obtaining an access token and passing it).",[324,8722,8723],{},"For Kerberos: similar to Kafka, you’d configure JAAS and such, and the Pulsar client will do SASL handshake (if using the Kafka-on-Pulsar protocol, or perhaps in proxy).",[48,8725,8726],{},"A common modern approach is token auth, which is easier to manage than setting up a Kerberos infrastructure or distributing certs. You create a token (JWT) for a role (like “app1”) and the broker uses a secret or public key to verify it. Pulsar has utilities to create tokens (e.g., bin\u002Fpulsar tokens create --subject app1 --private-key my-sec.key). The broker config has the public key to validate.",[48,8728,8729],{},"Encryption in Transit: Pulsar can (and should) run on TLS. You’d configure brokerServicePortTls=6651, etc., to have brokers use TLS for their internal comms and for client connections (set up TLS listeners). This is analogous to Kafka’s SSL encryption. So you can have pulsar+ssl:\u002F\u002F URLs for clients. Typically, enable tlsAllowInsecure=false and provide certs. Pulsar supports both a plaintext and TLS port simultaneously if desired (similar to Kafka’s multiple listeners).",[48,8731,8732],{},"One difference: Pulsar’s multi-tenancy means you might issue different credentials per tenant or role, and those roles are part of the token or cert CN. In Kafka, an ACL entry might refer to a Kafka principal like User:alice. In Pulsar, a “role” is similar to a principal. For instance, a token might carry “subject”:”alice”, making alice the role.",[40,8734,8736],{"id":8735},"authorization-in-pulsar","Authorization in Pulsar",[48,8738,8739],{},"Authorization (which resources a client can access) in Pulsar is integrated with tenants and namespaces:",[321,8741,8742,8745],{},[324,8743,8744],{},"Pulsar supports ACLs at the namespace (and topic) level. You grant a role with produce\u002Fconsume permission on a namespace or a specific topic.",[324,8746,8747],{},"For example, to allow role “sensor-app” to produce to topics in iot\u002Fsensors namespace:",[48,8749,8750],{},[384,8751],{"alt":5878,"src":8752},"\u002Fimgs\u002Fblogs\u002F68c04eb27031f73a29738660_iShot_2025-09-09_23.58.33.png",[321,8754,8755,8758],{},[324,8756,8757],{},"And perhaps another for consumption.",[324,8759,8760],{},"These permissions are stored in ZK\u002Fmetadata and enforced by brokers. If a client with role sensor-app tries to produce to a topic in that namespace and it has produce permission, fine. If not, it gets an unauthorized error.",[48,8762,8763],{},"Important: Pulsar’s tenants add an admin layer. A role can be made a tenant admin (which allows creating namespaces, etc.). Also, Pulsar has the concept of superusers (configured in broker.conf) who have carte blanche (like Kafka’s super.users setting).",[48,8765,8766],{},"Kafka’s authorization works with topics, cluster, group resources, etc., via ACLs. Pulsar’s is a bit simpler in that it’s mostly produce\u002Fconsume on topics (or namespaces) and admin operations on namespaces\u002Ftenants. Kafka’s granular actions (Describe, Create, Delete) have equivalents like Pulsar admin API can be controlled if you integrate with something like function-worker but by default, Pulsar doesn’t have separate ACL for “can this role create a topic” aside from being tenant admin.",[48,8768,8769],{},"Default Authorization: If you enable authorization (authorizationEnabled=true on broker), and no permission is set for a role on a resource, the access is denied. You must populate grants as needed. You can automate that or use Pulsar’s REST\u002FCLI to do it.",[48,8771,8772],{},"Example for a Kafka ACL scenario:\nIn Kafka, you might do:",[48,8774,8775],{},[384,8776],{"alt":5878,"src":8777},"\u002Fimgs\u002Fblogs\u002F68c04f1a09cd07f68b37b5aa_iShot_2025-09-10_00.00.18.png",[48,8779,8780],{},"In Pulsar:",[48,8782,8783],{},[384,8784],{"alt":5878,"src":8785},"\u002Fimgs\u002Fblogs\u002F68c04f3c828ea16c8b3b038d_iShot_2025-09-10_00.00.50.png",[48,8787,8788],{},"This would allow alice to both produce and consume on all topics in that namespace. If you want to restrict to a single topic, Pulsar CLI has topics grant-permission too, but typically namespace-level is used for manageability.",[48,8790,8791,8792,8796],{},"Role token vs role name: If using token auth, the token maps to a role (subject) internally. If using TLS, the client certificate’s CN or SAN is the role. If using Kerberos, the short Kerberos principal becomes the role (like “",[55,8793,8795],{"href":8794},"mailto:alice@EXAMPLE.COM","alice@EXAMPLE.COM","” might map to “alice” depending on config). Ensure consistency so the role in the auth method matches what you use in grant-permission.",[48,8798,8799],{},"Multi-tenancy means you likely segregate roles by tenant. For example, tenant “finance” might have roles “financeApp1”, etc. Pulsar’s authorization is cluster-wide but often you manage per tenant.",[40,8801,8803],{"id":8802},"kafka-vs-pulsar-security-quick-comparison","Kafka vs Pulsar Security Quick Comparison",[321,8805,8806,8809,8812,8815,8818],{},[324,8807,8808],{},"TLS Encryption: Both Kafka and Pulsar can use TLS for encryption. Setup is similar (keystores, truststores or PEM files, configuring listeners). Pulsar documentation provides steps to enable TLS on brokers and clients.",[324,8810,8811],{},"Client Auth: Kafka’s options are SASL (with PLAIN, SCRAM, GSSAPI, OAUTHBEARER, etc.) or mTLS. Pulsar offers similar range: you can do TLS + token (somewhat like SASL + SSL), or just TLS with client certs, or OAuth2 (similar to SASL\u002FOAUTHBEARER concept), or Kerberos.",[324,8813,8814],{},"Token vs SASL\u002FPLAIN: Pulsar’s JWT token is a stateless way of auth, which many prefer to storing passwords on brokers (like Kafka’s SCRAM). Tokens can have expiry, and Pulsar brokers can auto-expire auth sessions or revalidate (token auth can be configured to not require re-auth each time or to periodically require).",[324,8816,8817],{},"Authorization Domains: Kafka’s ACLs are global to cluster (no concept of tenant). Pulsar’s ACLs logically separate by tenant\u002Fnamespace – which is nice because e.g. role “alice” can be allowed in tenantA without affecting tenantB. Kafka would need you to prefix resources or run separate clusters.",[324,8819,8820],{},"Superuser\u002FAdmin Rights: Kafka ACL has CLUSTER action for cluster-wide operations. Pulsar uses a superuser list in config (e.g., admin roles) to allow all actions. Also, tenant admins can be designated to manage their tenant.",[40,8822,8824],{"id":8823},"setting-up-a-simple-auth-scenario","Setting Up a Simple Auth Scenario",[48,8826,8827],{},"Imagine you have a Pulsar cluster and you want:",[321,8829,8830,8833],{},[324,8831,8832],{},"Clients to authenticate via token.",[324,8834,8835],{},"One tenant per team.",[48,8837,8838],{},"Steps:",[1666,8840,8841,8844,8847,8850,8853],{},[324,8842,8843],{},"Enable TLS on broker (so token isn’t sniffable): Set up broker certificates and enable TLS listener.",[324,8845,8846],{},"Enable authentication with AuthenticationProviderToken. Provide a secret key to broker to verify tokens (e.g., a public key if tokens are signed with private key).",[324,8848,8849],{},"Create tokens for roles: Use Pulsar token tool to generate JWTs for roles like “team1-producer”, “team1-consumer”. Distribute those securely to apps (like how you’d distribute Kafka credentials).",[324,8851,8852],{},"Enable authorization and create a tenant “team1”, assign allowed clusters.",[324,8854,8855],{},"Grant permissions: For namespace team1\u002Fns1, grant produce to team1-producer role, consume to team1-consumer role:",[48,8857,8858],{},[384,8859],{"alt":5878,"src":8860},"\u002Fimgs\u002Fblogs\u002F68c04fbf828ea16c8b3ba498_iShot_2025-09-10_00.03.00.png",[1666,8862,8863,8866],{},[324,8864,8865],{},"Now those roles can only do those actions in that namespace. If team1-consumer tries to produce, it fails authorization.",[324,8867,8868],{},"Client config: Producer passes its token (with role team1-producer) to Pulsar client config, uses service URL with TLS. Consumer similarly uses its token.",[48,8870,8871],{},"From that point, the broker will enforce:",[321,8873,8874,8877,8883],{},[324,8875,8876],{},"Only connections with valid tokens signed by our secret key are accepted (others get auth error).",[324,8878,8879,8880],{},"Those connections can only do what they’re permitted. If someone with team1-produ",[36,8881,8882],{},"cer token tries to subscribe (consume), broker returns authorization error.",[324,8884,8885],{},[36,8886,8887],{},"Multi-tenancy: If someone tries to use the wrong tenant (say team1-producer sending to tenant2’s topic), it’s not authorized because team1-producer role likely has no perms on tenant2.",[48,8889,8890,8893],{},[36,8891,8892],{},"Kerberos scenario:","* If you have Kafka using Kerberos, Pulsar can integrate too. Pulsar’s Kafka-on-Pulsar (KoP) can even accept SASL GSSAPI and map to a token internally. But natively, you can enable AuthenticationProviderKerberos. The setup is similar to a Hadoop or Kafka Kerberos setup (JAAS config, keytabs for broker and client). Many find tokens simpler nowadays.*",[48,8895,8896,8899],{},[36,8897,8898],{},"OAuth2 scenario:","* Pulsar can delegate auth to an OAuth2 server (like Auth0, Azure AD, etc.), which is analogous to Kafka’s OAUTHBEARER SASL where clients present a bearer token from some IdP. It’s a modern approach for cloud. StreamNative’s docs have details on enabling that. Essentially, broker trusts a certain JWT issuer, clients get a JWT from that issuer and present to broker.*",[40,8901,8903],{"id":8902},"auditing-and-best-practices","Auditing and Best Practices",[321,8905,8906,8909,8912,8915,8918],{},[324,8907,8908],{},"Use secure connections (TLS) so that authentication tokens or credentials aren’t intercepted.",[324,8910,8911],{},"Limit superuser access; ideally only infrastructure accounts are superusers.",[324,8913,8914],{},"Each tenant can have admins – you might grant a team lead a token that allows them to manage their tenant (create namespaces, etc.) without making them cluster superuser.",[324,8916,8917],{},"Rotate tokens or credentials periodically. Pulsar tokens can have expiration, and brokers can require refresh. Pulsar’s broker supports token revocation lists as well (so you can invalidate a token before expiry by configuring a callback or cache clear).",[324,8919,8920],{},"Monitor auth failures in broker logs or metrics – repeated failures could indicate an attack or misconfig.",[40,8922,8924],{"id":8923},"key-takeaways","Key Takeaways",[321,8926,8927,8930,8933,8936,8939,8942],{},[324,8928,8929],{},"Pulsar supports a wide array of authentication mechanisms – from TLS certs to OAuth2 and Kerberos. This gives parity with Kafka’s SASL\u002FSSL capabilities, and then some (built-in JWT support which Kafka doesn’t have out-of-the-box).",[324,8931,8932],{},"Multi-tenancy and roles: Pulsar’s security model is built around roles and tenants. Roles (identities) can be granted permissions on resources (like produce\u002Fconsume on a namespace). Tenants group those permissions logically. Kafka’s ACLs are global and need to be managed for each topic; Pulsar can cut down on admin by granting at namespace (group of topics) level.",[324,8934,8935],{},"Token-based auth (JWT) in Pulsar provides a simple way to issue credentials without a heavy infrastructure like Kerberos. It’s analogous to SASL\u002FPLAIN or OAUTHBEARER but with cryptographic verification and optional expiration.",[324,8937,8938],{},"Authorization (ACLs) in Pulsar is straightforward: specify role, actions, and resource (namespace or topic). Kafka ACLs have more operations but Pulsar covers the main ones (produce, consume, admin).",[324,8940,8941],{},"Because of multi-tenancy, isolation is stronger: one tenant’s admin can’t affect another tenant. In Kafka, an admin could theoretically create topics or ACLs anywhere if they had rights. In Pulsar, you’d typically give each tenant a scoped admin.",[324,8943,8944],{},"Setting up TLS and auth in Pulsar is a bit of a learning curve but very similar conceptually to Kafka – certificates, trust, config files. If you’ve secured Kafka, securing Pulsar is very achievable.",[48,8946,8947],{},"Coming up, Part 8 will move into operations: Load Balancing with ExtensibleLoadManager, where we discuss how Pulsar’s brokers distribute load – something we touched on with bundles, now more on the load manager mechanism and how it compares to Kafka’s static partition assignment.",[48,8949,3931],{},[208,8951],{},[48,8953,3931],{},[48,8955,8956,8957,8960],{},"Want to go deeper into real-time data and streaming architectures? Join us at the ",[55,8958,5405],{"href":6135,"rel":8959},[264]," on September 29–30 at the Grand Hyatt at SFO.",[48,8962,8963],{},"30+ sessions | 4 tracks | Real-world insights from OpenAI, Netflix, LinkedIn, Paypal, Uber, AWS, Google, Motorq, Databricks, Ververica, Confluent & more!",[48,8965,8966],{},[55,8967,8970],{"href":8968,"rel":8969},"https:\u002F\u002Fdatastreaming-summit.org\u002Fevent\u002Fdata-streaming-sf-2025\u002Fschedule",[264],"[Explore the Full Agenda]",[48,8972,8973],{},[55,8974,8976],{"href":7969,"rel":8975},[264],"[Register Now]",{"title":18,"searchDepth":19,"depth":19,"links":8978},[8979,8980,8981,8982,8983,8984],{"id":8654,"depth":19,"text":8655},{"id":8735,"depth":19,"text":8736},{"id":8802,"depth":19,"text":8803},{"id":8823,"depth":19,"text":8824},{"id":8902,"depth":19,"text":8903},{"id":8923,"depth":19,"text":8924},"2025-09-09","Learn Pulsar security for Kafka admins: explore authentication methods (TLS, JWT, OAuth2, Kerberos), fine-grained authorization, multi-tenancy, and best practices for securing your streaming infrastructure","\u002Fimgs\u002Fblogs\u002F68c04c79ca9f71615177cbe3_SN-sm-Pulsar-for-Kafka-Engineers-series-7.png",{},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-7-pulsar-security-for-kafka-admins",{"title":8642,"description":8986},"blog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-7-pulsar-security-for-kafka-admins",[821,7347,799],"BbdbItjCXdwFlXVxBv3rMtQZ9VKZEY-Ly-OTVDIqwRo",{"id":8995,"title":8996,"authors":8997,"body":8998,"category":5376,"createdAt":290,"date":9049,"description":9050,"extension":8,"featured":294,"image":7983,"isDraft":294,"link":290,"meta":9051,"navigation":7,"order":296,"path":9052,"readingTime":7986,"relatedResources":290,"seo":9053,"stem":9054,"tags":9055,"__hash__":9056},"blogs\u002Fblog\u002Fdata-streaming-summit-spotlight-use-cases-track.md","Data Streaming Summit Spotlight: Use Cases Track",[6127],{"type":15,"value":8999,"toc":9043},[9000,9003,9006,9008,9031,9033,9036,9038],[40,9001,9002],{"id":7999},"Why this track exists",[48,9004,9005],{},"Real impact beats theory. This track showcases end-to-end architectures, operational lessons, and business results from teams running streaming in production.",[40,9007,8007],{"id":8006},[321,9009,9010,9013,9016,9019,9022,9025,9028],{},[324,9011,9012],{},"Blueshift — Next-Gen Data Infra with Apache Pulsar — how a marketing platform scales resilient, low-latency engagement.",[324,9014,9015],{},"OpenAI — Streaming to Scale: Real-Time Infrastructure for AI — building and operating AI-driven real-time services.",[324,9017,9018],{},"PuppyGraph — A Survey of Cybersecurity Data Infra and How to Simplify It with Ursa and Graph.",[324,9020,9021],{},"Netflix — Kafka Under Pressure — resilience patterns for one of the world’s most demanding streaming footprints.",[324,9023,9024],{},"Credit Karma — Real-Time User Behavior Tracking for AI-Driven Recommendations — in-session inference and feedback loops.",[324,9026,9027],{},"Oracle — Operating Telemetry Pipelines at Exabyte Scale — when observability meets AI services.",[324,9029,9030],{},"Toyota - Complex Business Requirements + Lots of Data = High Cost? — patterns to bend the cost curve without cutting capability.",[40,9032,7957],{"id":7956},[48,9034,9035],{},"Engineering leaders, product owners, and architects looking for “copy-and-adapt” playbooks.",[40,9037,7964],{"id":7963},[48,9039,9040,8043],{},[55,9041,7971],{"href":7969,"rel":9042},[264],{"title":18,"searchDepth":19,"depth":19,"links":9044},[9045,9046,9047,9048],{"id":7999,"depth":19,"text":9002},{"id":8006,"depth":19,"text":8007},{"id":7956,"depth":19,"text":7957},{"id":7963,"depth":19,"text":7964},"2025-09-08","The Use Cases Track at Data Streaming Summit SF 2025 highlights end-to-end architectures, resilience patterns, and business results you can adapt to your own streaming systems. Discover how leading companies like OpenAI, Netflix, Credit Karma, Toyota, and more are scaling real-time data in production.",{},"\u002Fblog\u002Fdata-streaming-summit-spotlight-use-cases-track",{"title":8996,"description":9050},"blog\u002Fdata-streaming-summit-spotlight-use-cases-track",[5376,821,4301,303],"1PUIubgF4sEatVFAKgQNE4hENjPmILFukuPoZ4HE87g",{"id":9058,"title":9059,"authors":9060,"body":9061,"category":5376,"createdAt":290,"date":9137,"description":9138,"extension":8,"featured":294,"image":7983,"isDraft":294,"link":290,"meta":9139,"navigation":7,"order":296,"path":9140,"readingTime":7986,"relatedResources":290,"seo":9141,"stem":9142,"tags":9143,"__hash__":9145},"blogs\u002Fblog\u002Fdata-streaming-summit-2025-spotlight-deep-dive-track.md","Data Streaming Summit 2025 Spotlight: Deep Dive Track",[6127],{"type":15,"value":9062,"toc":9130},[9063,9067,9069,9072,9074,9091,9093,9116,9118,9121,9123,9128],[225,9064,9066],{"id":9065},"theme-how-modern-streaming-really-works-the-internals-proofs-and-patterns-that-cut-latency-raise-reliability-and-make-real-time-safe-at-scale","Theme: How modern streaming really works — the internals, proofs, and patterns that cut latency, raise reliability, and make real-time safe at scale.",[40,9068,7905],{"id":7904},[48,9070,9071],{},"If you own the platform or design the architecture, you need to see under the hood. The Deep Dive Track takes you past marketing claims and into the mechanics: deterministic ordering, broker-side compute, safety nets for “exactly-once,” schema governance that spans systems, and emerging log abstractions. It’s also where we stress-test long-held assumptions (e.g., Kafka guarantees) against today’s workloads and failure modes.",[40,9073,7912],{"id":7911},[321,9075,9076,9079,9082,9085,9088],{},[324,9077,9078],{},"Correctness at high throughput: Techniques for key-ordered processing, idempotency, and end-to-end guarantees.",[324,9080,9081],{},"Smarter brokers: Where to put filters, routes, and personalization logic to shrink fan-out cost and tail latencies.",[324,9083,9084],{},"Governance that scales: How schema registries interact with Pulsar\u002FKafka and multi-engine estates.",[324,9086,9087],{},"Failure as a feature: New approaches to chaos\u002Ffault injection that validate streaming promises in production.",[324,9089,9090],{},"New theory, new primitives: Real-time OLAP, and toward log abstractions built for AI- and agent-driven systems.",[40,9092,7930],{"id":7929},[321,9094,9095,9098,9101,9104,9107,9110,9113],{},[324,9096,9097],{},"Mastering Key-Ordered Message Processing in Apache Pulsar",[324,9099,9100],{},"Everything You Wanted from Broker-Side Filtering (and More): Building Personalized Feeds with Apache Pulsar",[324,9102,9103],{},"Solve a Crime in 15 mins with Kafka and AI",[324,9105,9106],{},"Unified Governance: Integrating Apache Pulsar with External Schema Registries",[324,9108,9109],{},"Have Your Real-time OLAP and Upsert It Too",[324,9111,9112],{},"Are Your Kafka Guarantees Actually Guaranteed?",[324,9114,9115],{},"LazyLog: A New Log Abstraction for Low-Latency Applications",[40,9117,7957],{"id":7956},[48,9119,9120],{},"Principal\u002Fstaff engineers, platform owners, SREs, and architects designing next-gen real-time platforms.",[40,9122,7964],{"id":7963},[48,9124,9125,8043],{},[55,9126,7971],{"href":7969,"rel":9127},[264],[48,9129,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":9131},[9132,9133,9134,9135,9136],{"id":7904,"depth":19,"text":7905},{"id":7911,"depth":19,"text":7912},{"id":7929,"depth":19,"text":7930},{"id":7956,"depth":19,"text":7957},{"id":7963,"depth":19,"text":7964},"2025-09-04","Explore the Deep Dive Track at Data Streaming Summit 2025—where engineers and architects uncover how modern streaming really works. Learn advanced patterns for key-ordered processing, broker-side compute, schema governance, and new log abstractions built for scale.",{},"\u002Fblog\u002Fdata-streaming-summit-2025-spotlight-deep-dive-track",{"title":9059,"description":9138},"blog\u002Fdata-streaming-summit-2025-spotlight-deep-dive-track",[5376,9144,303],"Transactions","7nC0likDc3vCY3dkPwb86LynUnO0jp_GJW-mivXNqWI",{"id":9147,"title":9148,"authors":9149,"body":9150,"category":821,"createdAt":290,"date":9137,"description":9509,"extension":8,"featured":294,"image":9510,"isDraft":294,"link":290,"meta":9511,"navigation":7,"order":296,"path":9512,"readingTime":3556,"relatedResources":290,"seo":9513,"stem":9514,"tags":9515,"__hash__":9516},"blogs\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-6-schema-management-in-pulsar.md","Pulsar Newbie Guide for Kafka Engineers (Part 6): Schema Management in Pulsar",[808,809,810],{"type":15,"value":9151,"toc":9496},[9152,9154,9157,9161,9164,9167,9178,9181,9184,9187,9192,9195,9199,9202,9219,9222,9227,9230,9233,9237,9240,9243,9247,9250,9255,9260,9268,9273,9275,9286,9289,9292,9303,9307,9310,9324,9327,9341,9345,9348,9351,9354,9358,9361,9364,9368,9371,9379,9382,9387,9391,9394,9399,9402,9405,9410,9413,9418,9421,9424,9429,9432,9435,9438,9443,9446,9448,9468,9471,9473,9475,9477,9482,9484,9489,9494],[48,9153,8648],{},[48,9155,9156],{},"Pulsar has a built-in schema registry that travels with the cluster – no extra servers needed. Producers and consumers can define a schema (Avro, JSON, Protobuf, etc.), and Pulsar brokers ensure compatibility and store schema versions. We’ll cover how to use Pulsar’s schema management, compare it to Kafka’s external Schema Registry (like Confluent’s), and show CLI tools (pulsar-admin schemas) to upload or fetch schemas. Key point: Pulsar enforces schema compatibility at the broker if configured, preventing bad or incompatible data from being written, which is a powerful feature for data quality.",[40,9158,9160],{"id":9159},"schemas-101-in-pulsar","Schemas 101 in Pulsar",[48,9162,9163],{},"In Apache Kafka, schema management is typically handled outside of Kafka brokers – for example, using Confluent Schema Registry. Producers write a schema ID with data, and consumers retrieve the schema from the registry. Kafka itself is schema-agnostic (it treats messages as byte arrays).",[48,9165,9166],{},"Apache Pulsar, on the other hand, treats schema as a first-class concept:",[321,9168,9169,9172,9175],{},[324,9170,9171],{},"Pulsar messages are still bytes on the wire\u002Fstorage, but you can associate a schema (structure) with a topic, and Pulsar will validate incoming messages against it.",[324,9173,9174],{},"The broker stores schema definitions in a schema registry (backed by its metadata store \u002F BookKeeper) and assigns each a version.",[324,9176,9177],{},"When a producer or consumer connects with a schema, the broker can check compatibility with existing schema for that topic and either allow or reject the new schema if it’s incompatible (depending on your policy).",[48,9179,9180],{},"This is huge: it means you cannot accidentally write a message with the wrong schema (if schema enforcement is on). In Kafka, nothing stops a producer from writing gibberish or an unexpected format, aside from conventions.",[48,9182,9183],{},"By default, Pulsar’s schema registry supports Avro, JSON, Protobuf, and a few others out-of-the-box. You define schemas using these or provide your own POJOs (in Java, for example, Pulsar can derive an Avro schema from a class).",[48,9185,9186],{},"Example:",[48,9188,9189],{},[384,9190],{"alt":5878,"src":9191},"\u002Fimgs\u002Fblogs\u002F68b9a716e2de9ebd3427f216_iShot_2025-09-04_22.49.44.png",[48,9193,9194],{},"This will register the Avro schema of the Purchase class for the topic purchases if not already present. If a schema is already there, the broker will check compatibility. If it’s compatible (say you added a new optional field), it will add a new version; if it’s incompatible (breaking change), it can reject the producer’s attempt to connect.",[32,9196,9198],{"id":9197},"schema-compatibility","Schema Compatibility",[48,9200,9201],{},"Similar to Confluent’s registry where you set backward\u002Fforward compatibility rules, Pulsar allows setting a compatibility strategy per namespace (or cluster). Options include:",[321,9203,9204,9207,9210,9213,9216],{},[324,9205,9206],{},"AlwaysCompatible (no checks, anything goes),",[324,9208,9209],{},"Backward (new schema can read data written with old schema),",[324,9211,9212],{},"Forward (old readers can read new schema data),",[324,9214,9215],{},"Full (both backward and forward),",[324,9217,9218],{},"etc.",[48,9220,9221],{},"By default, Pulsar’s policy might be BACKWARD (depending on version). You can adjust it:",[48,9223,9224],{},[384,9225],{"alt":5878,"src":9226},"\u002Fimgs\u002Fblogs\u002F68b9a7890d3ad75a8b20a2d0_iShot_2025-09-04_22.51.47.png",[48,9228,9229],{},"This means producers can evolve the schema (add fields, make them optional, etc.) as long as new consumers could still decode old messages (backward compatibility). If someone tries to remove a required field (breaking change), the broker will reject that new schema – the producer won’t be allowed to send data with that schema.",[48,9231,9232],{},"This broker-side enforcement is a strong safety net. Kafka’s approach is more “honor system” with Schema Registry; the brokers don’t know about schemas.",[32,9234,9236],{"id":9235},"schemas-and-topics","Schemas and Topics",[48,9238,9239],{},"Every topic can have at most one schema at a time (with multiple versions over time). If a producer without a schema connects to a topic that has a schema, what happens? By default, Pulsar will allow it if schema validation is not enforced, but it will treat their bytes as bytes schema. This can be dangerous (someone writing raw bytes to a structured topic), so Pulsar has a setting schemaValidationEnforced you can enable to require that producers use the topic’s schema. If enabled, a producer who doesn’t have a schema or has one that doesn’t match will be rejected.",[48,9241,9242],{},"For Kafka folks: this is like forcing all producers to go through Schema Registry and not allow schema-less writes – something you can’t do in Kafka natively.",[32,9244,9246],{"id":9245},"using-the-cli-for-schemas","Using the CLI for Schemas",[48,9248,9249],{},"Pulsar provides pulsar-admin schemas commands to manually manage schemas if needed:",[321,9251,9252],{},[324,9253,9254],{},"Upload a schema: If you want to pre-register a schema for a topic before producing, you can use schemas upload. For example, you have an Avro schema file MyType.avsc, you can do:",[48,9256,9257],{},[384,9258],{"alt":5878,"src":9259},"\u002Fimgs\u002Fblogs\u002F68b9a843d2f8a76dc043bd40_iShot_2025-09-04_22.54.51.png",[321,9261,9262,9265],{},[324,9263,9264],{},"This registers the schema (Schema type AVRO, schema definition in JSON inside the avsc). After this, any producer must conform to this schema.",[324,9266,9267],{},"Get schema: You can retrieve the latest schema or a specific version:",[48,9269,9270],{},[384,9271],{"alt":5878,"src":9272},"\u002Fimgs\u002Fblogs\u002F68b9a8f3da7915efcc203426_iShot_2025-09-04_22.57.49.png",[48,9274,3931],{},[321,9276,9277,9280,9283],{},[324,9278,9279],{},"This will output the schema JSON, including type, definition, and version.",[324,9281,9282],{},"Delete schema: schemas delete topic if you want to remove it (topic becomes schema-less). Pulsar requires the topic to be unused to delete a schema, typically.",[324,9284,9285],{},"Schema versions: If you want an older version, schemas get --version N topic.",[48,9287,9288],{},"These are akin to calls you’d make to a Schema Registry REST API in Kafka, but now it’s built-in.",[48,9290,9291],{},"Example scenario: Suppose you have a Kafka topic using Avro and Confluent Schema Registry. To migrate to Pulsar:",[1666,9293,9294,9297,9300],{},[324,9295,9296],{},"You could take the Avro schema from Schema Registry (as an Avro schema JSON or .avsc) and use pulsar-admin schemas upload to register it on the Pulsar topic.",[324,9298,9299],{},"Then produce data using Pulsar’s Avro schema. Pulsar will tag messages with a schema version. Consumers can ask the broker for the schema if they don’t have it (the Java client does this automatically).",[324,9301,9302],{},"Evolve the schema: if you change your Avro schema (say add a field), a Pulsar producer will send that as a new schema info. The broker will check compatibility; if okay, it will register that as a new version (v2) and allow it. Consumers that connect later will get the latest schema by default, and can also fetch older schema if needed.",[32,9304,9306],{"id":9305},"built-in-vs-external-schema-registry","Built-in vs External Schema Registry",[48,9308,9309],{},"Benefits of Pulsar’s approach:",[321,9311,9312,9315,9318,9321],{},[324,9313,9314],{},"No separate service to maintain – the schema registry is part of Pulsar’s metadata.",[324,9316,9317],{},"Automatic enforcement: less risk of bad data.",[324,9319,9320],{},"Schema travels with topic data (conceptually): when a consumer connects and gets data, if it has the wrong schema version, the broker can supply the needed schema. This is similar to how Confluent clients fetch from the registry by ID, but in Pulsar it’s handled seamlessly by the broker.",[324,9322,9323],{},"Supports schema on multi-tenancy: each tenant\u002Fnamespace can have its own compatibility setting. Admins can enforce that all topics in a namespace use a certain compatibility mode, etc.",[48,9325,9326],{},"Considerations:",[321,9328,9329,9332,9335,9338],{},[324,9330,9331],{},"If you don’t use schemas, Pulsar treats messages as just bytes (Schema.BYTES). That’s fine; you can still use Pulsar like Kafka without Schema Registry. But if you care about data formats, why not use the built-in feature?",[324,9333,9334],{},"Schemas add a tiny overhead: a schema version is attached to messages. But it’s negligible (a small int).",[324,9336,9337],{},"Pulsar’s Schema Registry is currently not as feature-rich as Confluent’s in terms of storing schema IDs for massive numbers of subjects or global compatibility across many topics. It’s more tightly coupled to topics. But for most uses, that’s exactly what you want.",[324,9339,9340],{},"You might not have GUI tooling like Confluent’s UI for schema browsing (unless using StreamNative Console or something that surfaces it). However, CLI and REST API (yes, Pulsar has admin REST endpoints for schemas as well) are available.",[32,9342,9344],{"id":9343},"schema-evolution-example","Schema Evolution Example",[48,9346,9347],{},"Imagine you have a JSON schema (basically, a structured JSON expected). Pulsar can infer it or you can define it. You publish a few messages with schema v1. Now you need to add a field. If you set compatibility to FULL, you must add it in a backward-compatible way (e.g., make it optional or provide a default). You then update your consumer code and producer code to use the new schema. The producer sends the new schema upon the first message. The broker checks (okay, new field has default, it’s backward compatible) – good. It registers schema version 2.",[48,9349,9350],{},"Existing consumers that haven’t updated schema can actually continue – Pulsar will deliver data and can provide the old schema version if needed. But typically, if the consumer is using a structured API, it will need the new class to parse the new messages. This is similar to Kafka – you need to update consumers to handle new fields, or they’ll ignore unknown JSON fields perhaps.",[48,9352,9353],{},"One nice thing: Pulsar’s client libraries often auto-handle schema versioning. For Avro, if a consumer still has old schema, it might drop unknown fields but not crash. If using generic records, you could even access fields dynamically.",[32,9355,9357],{"id":9356},"multi-language-and-schema","Multi-language and Schema",[48,9359,9360],{},"Pulsar supports schemas across Java, Python, Go, C++ etc. The client libraries can all fetch and decode using the stored schema info. For example, a Java producer could write Avro, and a Python consumer can consume by getting the Avro schema and using Avro library to decode.",[48,9362,9363],{},"Kafka in multi-lang with Avro often relies on everyone using Confluent’s wire format and each language having the Avro schema available. Pulsar simplifies that by making the broker the authority on schema.",[32,9365,9367],{"id":9366},"schema-enforcement-modes","Schema Enforcement Modes",[48,9369,9370],{},"As mentioned:",[321,9372,9373,9376],{},[324,9374,9375],{},"isAllowAutoUpdateSchema: When true (default), producers can auto-update schema if compatible. If false, the broker will not allow new schema versions via producers – you’d have to manually update via admin. In many cases, leaving it true is fine for agility.",[324,9377,9378],{},"schemaValidationEnforced: When true, any producer without a matching schema is denied. For instance, if some rogue app tries to write raw bytes or a different schema, it’s blocked. Kafka has no such enforcement – if an app doesn’t use the registry, it can still write and cause problems down the line. Pulsar can prevent that scenario.",[48,9380,9381],{},"You can enable schema validation at broker startup (config) or namespace level:",[48,9383,9384],{},[384,9385],{"alt":5878,"src":9386},"\u002Fimgs\u002Fblogs\u002F68b9ab070f3a2a40799bd6d2_iShot_2025-09-04_23.06.35.png",[40,9388,9390],{"id":9389},"cli-walkthrough-example","CLI Walkthrough Example",[48,9392,9393],{},"Let’s say we have a JSON schema for an Order:",[48,9395,9396],{},[384,9397],{"alt":5878,"src":9398},"\u002Fimgs\u002Fblogs\u002F68b9ab3b7fedca7b98bb390d_iShot_2025-09-04_23.07.22.png",[48,9400,9401],{},"(This is roughly how Pulsar stores JSON schema internally as JSON string.)",[48,9403,9404],{},"We can upload it:",[48,9406,9407],{},[384,9408],{"alt":5878,"src":9409},"\u002Fimgs\u002Fblogs\u002F68b9ab6630dc028fd1cebf4d_iShot_2025-09-04_23.08.16.png",[48,9411,9412],{},"This registers the schema (version 1). Now produce a message:",[48,9414,9415],{},[384,9416],{"alt":5878,"src":9417},"\u002Fimgs\u002Fblogs\u002F68b9abb8433830977baf5b77_iShot_2025-09-04_23.09.31.png",[48,9419,9420],{},"(Assuming -s could take a schema definition file to know how to serialize; in practice, you might write a small app or use pulsar-perf with a schema.)",[48,9422,9423],{},"Now consumer:",[48,9425,9426],{},[384,9427],{"alt":5878,"src":9428},"\u002Fimgs\u002Fblogs\u002F68b9abf992d8dd0a6b52f2a9_iShot_2025-09-04_23.10.41.png",[48,9430,9431],{},"This will receive messages and output them. The client automatically gets the schema from the broker (because we specified schema-type JSON, it knows to fetch the schema to decode into JSON).",[48,9433,9434],{},"If we change the schema (add a field customerName), we’d update the schema JSON and do an upload or let a producer auto-update it (if using an API, the producer would send the new schema on connect).",[48,9436,9437],{},"Using pulsar-admin schemas get persistent:\u002F\u002Fpublic\u002Fdefault\u002Forders we would see versions and definitions. It might output something like:",[48,9439,9440],{},[384,9441],{"alt":5878,"src":9442},"\u002Fimgs\u002Fblogs\u002F68b9ac5f3f260ae722705f32_iShot_2025-09-04_23.12.13.png",[48,9444,9445],{},"This confirms what’s stored.",[40,9447,8924],{"id":8923},[321,9449,9450,9453,9456,9459,9462,9465],{},[324,9451,9452],{},"Pulsar’s integrated schema registry provides schema storage and enforcement without an external service. It ensures that producers and consumers agree on data format, improving reliability.",[324,9454,9455],{},"Schema compatibility checks are done by brokers. You can configure strategies (BACKWARD, FORWARD, FULL, etc.) similar to Kafka’s Schema Registry rules. If a producer’s schema change is incompatible, the broker will refuse it – preventing bad data from ever being written.",[324,9457,9458],{},"No separate schema IDs to manage in your app – the Pulsar client handles schema versioning. When a consumer receives a message, it can ask the broker for the schema at that version if it doesn’t have it, ensuring it can decode the message.",[324,9460,9461],{},"CLI and Admin APIs allow manual schema operations: uploading a schema preemptively, deleting or fetching schemas. This is useful for governance or when migrating from an external schema store.",[324,9463,9464],{},"Enabling schema validation enforcement (schemaValidationEnforced) provides strong guarantees that all producers adhere to the declared schema. This is something Kafka cannot do natively – in Pulsar you can catch rogue producers at publish time.",[324,9466,9467],{},"For a Kafka engineer, using Pulsar schemas means a more streamlined architecture (one less component to run) and potentially safer schema evolution. It might require refactoring your producers\u002Fconsumers to use Pulsar’s Schema API rather than raw byte producers, but the benefits in data quality can be worth it.",[48,9469,9470],{},"In the next section, Part 7, we’ll look at Pulsar Security for Kafka Admins – covering authentication\u002Fauthorization and how Pulsar’s multi-tenant security compares to Kafka’s ACLs and SASL.",[48,9472,3931],{},[208,9474],{},[48,9476,3931],{},[48,9478,8956,9479,8960],{},[55,9480,5405],{"href":6135,"rel":9481},[264],[48,9483,8963],{},[48,9485,9486],{},[55,9487,8970],{"href":8968,"rel":9488},[264],[48,9490,9491],{},[55,9492,8976],{"href":7969,"rel":9493},[264],[48,9495,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":9497},[9498,9507,9508],{"id":9159,"depth":19,"text":9160,"children":9499},[9500,9501,9502,9503,9504,9505,9506],{"id":9197,"depth":279,"text":9198},{"id":9235,"depth":279,"text":9236},{"id":9245,"depth":279,"text":9246},{"id":9305,"depth":279,"text":9306},{"id":9343,"depth":279,"text":9344},{"id":9356,"depth":279,"text":9357},{"id":9366,"depth":279,"text":9367},{"id":9389,"depth":19,"text":9390},{"id":8923,"depth":19,"text":8924},"Learn how Apache Pulsar’s built-in schema registry simplifies schema management for Kafka engineers. Explore schema enforcement, compatibility strategies, CLI tools, and how Pulsar prevents bad data at the broker—without needing an external Schema Registry.","\u002Fimgs\u002Fblogs\u002F68b9a6bd89138fb38f8e8af0_SN-sm-Pulsar-for-Kafka-Engineers-series-6.png",{},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-6-schema-management-in-pulsar",{"title":9148,"description":9509},"blog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-6-schema-management-in-pulsar",[821,7347,799],"q_tQuIHYRSlAflfZ6W-u3FtJau0MtzARZq_7mGMsbAc",{"id":9518,"title":9519,"authors":9520,"body":9521,"category":6415,"createdAt":290,"date":9628,"description":9629,"extension":8,"featured":294,"image":9630,"isDraft":294,"link":290,"meta":9631,"navigation":7,"order":296,"path":9632,"readingTime":3556,"relatedResources":290,"seo":9633,"stem":9634,"tags":9635,"__hash__":9637},"blogs\u002Fblog\u002Ffrom-functions-to-agents-what-changes-in-the-runtime.md","From Functions to Agents: What Changes in the Runtime (Event-Driven Agents, Part 2)",[810,6500,6501],{"type":15,"value":9522,"toc":9621},[9523,9526,9529,9533,9536,9539,9542,9545,9549,9552,9569,9572,9576,9579,9600,9603,9606,9610,9613],[40,9524,9525],{"id":42},"‍Introduction",[48,9527,9528],{},"The shift from stateless serverless functions to persistent, event-driven agents represents a major evolution in how we build and run cloud applications. Traditional serverless functions (FaaS) are short-lived and stateless – they execute on demand and terminate, treating each invocation in isolation. In contrast, agents are long-running, context-aware event processors that stay alive to continuously react, reason, and learn from streaming data. In this post (part 2 of our series), we explore how runtime responsibilities change when moving from functions to agents. We’ll clarify what an “agent runtime” needs to provide and how modern streaming platforms support this shift.",[40,9530,9532],{"id":9531},"from-stateless-functions-to-persistent-agents","From Stateless Functions to Persistent Agents",[48,9534,9535],{},"Stateless serverless functions excel at executing discrete logic in response to events or requests. A function runs with no memory of past invocations – it processes the input and produces an output, then ends. This simplicity makes functions easy to scale horizontally and manage, since each event can be handled by a fresh instance without worrying about prior state. Frameworks like AWS Lambda or Apache Pulsar Functions brought a serverless feel to event processing: you write a small function for each message and let the platform handle scaling and fault tolerance. This lightweight approach drastically lowered the barrier to processing streams – no clusters to manage, just write your function and deploy. The trade-off, however, is that each function handles a narrow task in isolation, without context from previous events. If you need to maintain state or memory (say, to track a running average or user session), a purely stateless function must rely on external storage or context passed in every time.",[48,9537,9538],{},"Persistent event-driven agents take a different approach. An agent is more like a continuously running microservice with a brain – it doesn’t spin up every time for each event, but instead subscribes to streams of events and maintains context over time. Agents can perceive incoming events, remember what happened before, and make decisions or trigger actions based on both current and past data. This means an agent can implement more autonomous, adaptive behavior, not just a fixed input-output transformation. For example, imagine an IoT sensor application: a stateless function could process each temperature reading independently, but an agent could continuously update a running average and detect anomalies over time. Using an agent with state, you can update a counter and average with each reading (as shown in the Pulsar Functions example below), enabling continuous monitoring in-context. In essence, while a function might be a single if\u002Felse check, an agent is an ongoing control loop that learns and reacts. This opens the door to systems that are more autonomous and goal-driven, not just static event processors.",[48,9540,9541],{},"Why make this change? Certain problems simply cannot be solved elegantly with ephemeral, stateless logic. Consider an AI-driven support bot: a stateless version would treat every user query independently, leading to repetitive or generic answers. A stateful agent can carry on a conversation, remembering the user’s context and refining answers. Or consider fraud detection on a stream of transactions: a stateless function might flag one transaction at a time, whereas an agent can notice patterns across many events (maintaining a sliding window of behavior). By evolving from functions to agents, we enable contextual awareness – the runtime can persist data, patterns, or ML model state between events. The result is more intelligent responses and the ability to handle complex, long-lived tasks.",[48,9543,9544],{},"That said, agents introduce new challenges. They need to run continuously (not just for milliseconds), hold state safely, and coordinate with other agents. Running one agent in isolation is only the beginning – true agentic systems will involve a fleet of agents working together, which brings new infrastructure requirements. We next examine how the runtime’s responsibilities shift to meet these needs.",[40,9546,9548],{"id":9547},"shifting-runtime-responsibilities-ephemeral-vs-always-on","Shifting Runtime Responsibilities: Ephemeral vs. Always-On",[48,9550,9551],{},"Moving from functions to agents shifts a lot of work from the application logic to the runtime platform. A serverless function runtime (like a FaaS platform) is responsible for quickly scheduling function instances on-demand, passing in an event, then tearing down the instance. In an agent-based system, the runtime must provide a richer, always-on environment. Key shifts in runtime responsibilities include:",[321,9553,9554,9557,9560,9563,9566],{},[324,9555,9556],{},"Continuous Execution & Event Streaming: Instead of invoking code only per event, an agent runtime keeps agents alive and fed with events. The runtime must connect each agent to event sources (e.g. subscription to a message topic) so it can receive a continuous stream of messages. This is fundamentally different from a function that wakes up with a single event – an agent’s event loop never really stops. The platform needs to handle event subscriptions, backpressure, and delivery of events to agents in real-time. In practice, this means treating a streaming event bus as the default I\u002FO for agents. Each agent listens on certain topics or event types and reacts as events arrive, rather than being invoked via direct calls. This decoupled, publish\u002Fsubscribe model provides a “nervous system” for agents to sense the world and communicate with each other.",[324,9558,9559],{},"State Management and Context: In a stateless function model, any persistence (counters, caches, DB lookups) is external to the function. But an agent runtime is expected to give agents a way to remember information between events. This could be in-memory state that the agent process holds, or more robustly, state backed by a distributed store for durability. For example, Pulsar Functions allow storing key-value state that is persisted in a distributed storage tier, which an agent can use as “memory”. The runtime should expose easy APIs for an agent to put\u002Fget state, counters, or context data. Moreover, this state should be checkpointed or replicated so that if the agent is restarted on another node, it can resume with its prior context. Providing streaming memory in the runtime makes agents context-aware by design – they don’t start from scratch on each event. As a bonus, because agents’ state changes can be logged as events, we gain an audit trail of an agent’s thinking process, which improves observability and debugging. (By contrast, a stateless function’s internal variables vanish after each invocation, making it hard to trace how a decision was made.)",[324,9561,9562],{},"Long-Lived Compute & Resource Management: Running dozens of always-on agents is more akin to running a microservice cluster than executing isolated lambdas. The agent runtime must therefore take on concerns like scheduling agents across a cluster, managing their lifecycles, and handling failures. If an agent crashes or a node running it goes down, the runtime should automatically restart that agent elsewhere to keep the system running. It should also manage scaling: for example, if an agent is consuming a high-volume stream, the platform might spawn multiple instances (or partitions) of that agent to handle the load – akin to how stream processing jobs scale by partitioning data. This is tricky when state is involved, but techniques like sharding by key or using consumer group semantics can distribute events among agent instances while keeping each instance’s state separate. In essence, the runtime needs to provide the same reliability mechanisms that distributed stream processors or message consumers use – e.g. horizontal scaling, work partitioning, and fault recovery – but now applied to AI agents. One industry guide notes that traditional FaaS platforms fall short for stateful, long-running services, and combining stream processing with functions is needed to get correct, resilient behavior under failures. An agent runtime fulfills that by marrying the elastic scheduling of serverless with the durable state management of streaming systems.",[324,9564,9565],{},"Inter-Agent Communication & Composition: In a non-trivial agent system, agents will talk to other agents. We want to avoid tightly coupling agents (like one agent calling another directly), since that creates brittle dependencies. Instead, the runtime should encourage event-driven composition – agents emitting events that other agents consume, forming an indirect cooperation. This was illustrated in our example of Agent A raising an “anomaly.alert” event that Agent B listens for, rather than calling B’s API directly. The runtime’s role here is to provide a common event hub and possibly higher-level orchestration. By having all agents communicate via the event bus, the platform enables loose coupling and dynamic workflows (similar to how microservices communicate via an event broker). Complex sequences can be achieved by chaining events through multiple agents, rather than one monolithic function. In fact, breaking a complex task into multiple smaller event-driven agents is a recommended pattern – it allows independent scaling and updates of each piece. The runtime may also maintain an Agent Registry as a directory of all active agents and the event types they handle (so you can discover producers\u002Fconsumers of certain events). While the registry concept is beyond basic runtime, it highlights that in an agent system the platform, not the individual code, must handle discoverability and coordination at scale.",[324,9567,9568],{},"Observability, Security, and Governance: An agent that runs continuously and makes autonomous decisions needs oversight. The runtime should therefore provide built-in logging, tracing, and monitoring of agents’ actions. When every input, output, and intermediate step can be captured as an event or logged with context, we get a transparent view of what the agent is doing and why. This is critical in enterprise settings – you need to answer “who did what, when?” even for AI-driven actions. The platform might tag events with agent IDs, maintain audit logs of tool calls (e.g. if an agent triggers an external API), and allow operators to set guardrails (like rate limits or circuit breakers if an agent goes haywire). Security is another runtime concern: agents must be authenticated and authorized when accessing the event bus or external systems, just like any microservice. The runtime may manage credentials or tokens for agents and ensure each agent only sees the event streams it’s permitted to. Overall, the agent runtime is responsible for providing production-grade controls around these always-on autonomous programs, akin to how Kubernetes or FaaS platforms provide monitoring and security for microservices. Without such support, running hundreds of agents could become unmanageable or risky. (Imagine debugging a bug in an AI agent if you had no trace of its decisions – not acceptable in most orgs!)",[48,9570,9571],{},"In summary, the move to agents shifts us from a world of fleeting stateless functions to one of persistent services that think and act. The runtime environment must evolve from merely executing code to hosting living, stateful processes. An often-cited analogy is that agents are like microservices that reason – they need the same infrastructure as microservices (for availability, scaling, communication), plus additional support for memory and intelligent behavior. This raises the question: how do we practically provide such an agent runtime? The good news is we don’t have to start from scratch – streaming platforms are stepping up to fill this role.",[40,9573,9575],{"id":9574},"building-an-agent-runtime-on-streaming-platforms","Building an Agent Runtime on Streaming Platforms",[48,9577,9578],{},"The capabilities described above might sound ambitious, but modern streaming data platforms (like Apache Pulsar or Apache Kafka ecosystems) already offer many of these pieces. In fact, an event streaming platform is a natural foundation for an agentic runtime, because it was designed to feed continuous streams of data to long-running consumers with scalability and fault-tolerance. Let’s break down how streaming infrastructure supports the shift from functions to agents:",[321,9580,9581,9584,9587,9594,9597],{},[324,9582,9583],{},"Unified Event Bus: At the heart of any streaming platform is a publish\u002Fsubscribe log or message queue. This serves as the communication backbone for agents. All events that agents produce or consume flow through topics on this bus. Because topics support multiple subscribers and decouple senders from receivers, agents can easily form dynamic networks of interactions. For example, multiple anomaly-detection agents can all subscribe to the same “errors” topic, and multiple responder agents can act on an “alert” topic – without any of them explicitly calling each other. The platform ensures each agent gets the events it’s interested in (with filtering, partitioning, and backpressure handling under the hood). Importantly, stream brokers provide retention and replay of events. If an agent goes down for a minute, it can come back and replay missed events from the log, so no critical data is lost – something you’d have to build manually in a traditional RPC system. As one guide puts it, a data streaming platform acts as the “central nervous system” for agents, letting them collaborate in a loosely coupled but coordinated way. This real-time event backbone is a prerequisite for scalable, context-sharing agents.",[324,9585,9586],{},"Embedded Computation (Stream Functions): Platforms like Pulsar and Kafka have introduced ways to run user code directly in the messaging layer. Pulsar Functions, for instance, are lightweight functions-as-a-service that consume topics and produce results to other topics. This is essentially the same pattern an agent follows (read events, do some processing, emit new events). By leveraging such frameworks, we can deploy agents onto the stream platform itself. In fact, the StreamNative Agent Engine (early access) does exactly this – it builds on Pulsar’s function runtime to host AI agents. Each agent is packaged like a serverless function and deployed to the cluster, automatically wired into the event bus and registered in a directory. Under the covers, the function runtime has been tweaked to handle long-lived AI workloads, but the core is standard and battle-tested. This means the heavy lifting of scaling and restarting instances is largely solved by the existing function scheduler. Apache Pulsar’s Function Worker, for example, can run many functions (now agents) across the cluster, track their status, and restart them on failure. Similarly, Kafka Streams and Kafka-based frameworks allow stateful stream processing in applications and could be extended to agent logic. The bottom line: streaming platforms give us a serverless execution environment where code can run near the data stream, continuously and with managed parallelism. Adapting that to agents is often a matter of adding the right libraries (for AI reasoning, etc.), not inventing a whole new orchestration system.",[324,9588,9589,9590,9593],{},"Stateful Stream Processing: One of the breakthroughs in stream processing has been the ability to maintain state with strong consistency (think of Apache Flink or Kafka Streams state stores). These same capabilities can back an agent’s memory. Pulsar Functions, for instance, offer a state API that stores state in a distributed storage, accessible across function restarts. Kafka Streams uses embedded RocksDB state stores for stateful processing. By tapping into these, an agent runtime lets agents store their context locally but durably. For the agent developer, it might be as simple as using a ",[4926,9591,9592],{},"context.putState(\"key\", value) API ","(like in Pulsar Functions) or calling a state store in Kafka Streams. The streaming platform handles replication of that state behind the scenes. This fulfills the agent’s need for memory without introducing a separate database for developers to worry about. Additionally, because state is tied to event processing transactions in some frameworks, we can get exactly-once processing – meaning an event and the state update associated with it will be atomic. That guarantee is crucial when an agent, say, updates its knowledge base upon receiving an event; we wouldn’t want to lose or double-apply those updates if a failure happens. In short, streaming platforms have evolved to support stateful functions, and those are a perfect substrate for agents. We leverage the fact that stream processors already solved consistency, checkpointing, and scaling of stateful tasks.",[324,9595,9596],{},"Coordination via Consumer Groups: How to scale out multiple instances of an agent? Streaming platforms use consumer group protocols to divide partitions of a topic among consumers. This same mechanism can be used to run N instances of an agent in parallel (each handling a subset of events). For example, Kafka’s rebalance protocol or Pulsar’s subscription modes ensure that if you have, say, 3 instances of an agent and 10 partitions, each instance gets some partitions assigned. If one instance dies, its partitions are redistributed to the others. This provides automatic load balancing and failover for agents at the event ingestion level. The agent runtime can simply manage agent instances as consumers in a group. The result: dynamic scaling and recovery come “for free” from the streaming platform’s consumer infrastructure. Agents can increase or decrease in number, and the system will balance the work accordingly – much easier than having to manually orchestrate which agent handles what. This again shows how an agent runtime can repurpose proven components of stream processing.",[324,9598,9599],{},"Built-in Observability and Governance: Streaming systems are designed for high-throughput, observable data flows. They often integrate with monitoring tools, and every message can carry metadata (timestamps, IDs, etc.). By running agents on the streaming platform, we inherit a lot of this observability. We can trace an event from its origin through the topics into the agent’s processing and out to the events the agent produces. In fact, because agents emit events for their actions, we can log those to a separate audit topic. For example, an agent’s decision or outcome might be published as an event that a monitoring service subscribes to, creating an audit log in real-time. The platform also provides central control: you can update or pause an agent by updating its subscription or deployment in the cluster, much as you would manage a streaming job. And since all agents run on a common substrate, things like security policies (who can publish\u002Fsubscribe to which topic) uniformly apply to agent communication. This avoids the patchwork of ad-hoc integration you’d have if each agent were a standalone script with its own connections. In effect, the streaming platform serves as both the data layer and the control plane for your agents. This convergence is powerful – it means fewer moving parts and a single unified infrastructure for real-time data and AI agents.",[48,9601,9602],{},"It’s worth noting that both the open-source community and vendors are actively working on making streaming platforms more “agent-friendly.” For example, the concept of an Agent Registry can be built on top of the function metadata store (as noted with Pulsar’s function worker metadata being extended for agent descriptors). And the emerging Model Context Protocol (MCP) is being integrated so agents can call external tools\u002Fservices in a standardized way – with streaming runtimes acting as the glue (this will be discussed in a later post). The trend is clear: we are repurposing battle-tested stream processing tech to serve AI agents. By doing so, we avoid reinventing wheels around messaging, state, and reliability, which not only reduces engineering overhead but also significantly accelerates time to market.",[48,9604,9605],{},"As one eBook put it, “agents leverage event streaming to collaborate without rigid dependencies,” and a streaming platform connects data sources, processes events in motion, and enforces governance – exactly what’s needed for a robust agent ecosystem.",[40,9607,9609],{"id":9608},"conclusion-and-next-steps","Conclusion and Next Steps",[48,9611,9612],{},"The evolution from stateless functions to persistent agents is ultimately about bringing intelligence closer to the data. We began with simple functions triggered by events, which was great for modularizing logic but limited in context. Now, by running agents that live in the stream, we enable continuous reasoning on real-time data streams. This shift requires the runtime to take on new responsibilities – from managing state and long-lived processes to brokering rich inter-agent communication. Fortunately, streaming platforms like Pulsar and Kafka have grown into exactly the kind of always-on, scalable backbone that agents need. They provide the connective tissue (event bus), the muscle (compute runtime), and the memory (state stores) to support autonomous, event-driven agents at scale.",[48,9614,9615,9616,9620],{},"As we continue this series, we will delve into specific aspects like multi-agent coordination, open protocols for tool integration, and design patterns for agent systems. The journey from functions to agents is just one step toward a new paradigm of real-time, intelligent applications. Now is a great time to start experimenting with these concepts yourself. ",[55,9617,9619],{"href":7969,"rel":9618},[264],"Sign up for the Data Streaming Summit"," (Training & Workshop on September 29 and Conference on September 30, 2025). These events will showcase cutting-edge developments in streaming and AI agents, and offer a hands-on chance to apply what we’ve discussed. Whether you’re a developer or an architect, embracing an agent-driven runtime could be the key to building the next generation of reactive, smart services. Come join us and be part of this real-time revolution! (We look forward to seeing the innovative agents you create.)",{"title":18,"searchDepth":19,"depth":19,"links":9622},[9623,9624,9625,9626,9627],{"id":42,"depth":19,"text":9525},{"id":9531,"depth":19,"text":9532},{"id":9547,"depth":19,"text":9548},{"id":9574,"depth":19,"text":9575},{"id":9608,"depth":19,"text":9609},"2025-09-03","Discover how cloud runtimes evolve from stateless serverless functions to persistent, event-driven agents. Learn the key shifts in execution, state management, and streaming infrastructure that enable intelligent, always-on applications.","\u002Fimgs\u002Fblogs\u002F68b865c3eeff71354f5353cb_Event-Driven-Agents,-Part-2.png",{},"\u002Fblog\u002Ffrom-functions-to-agents-what-changes-in-the-runtime",{"title":9519,"description":9629},"blog\u002Ffrom-functions-to-agents-what-changes-in-the-runtime",[3988,821,9636],"Functions","Er2gJWTzLnmWTehNTwb6Nond1hhiIgDZxkHwdArAnjI",{"id":9639,"title":9640,"authors":9641,"body":9642,"category":821,"createdAt":290,"date":9628,"description":9929,"extension":8,"featured":294,"image":9930,"isDraft":294,"link":290,"meta":9931,"navigation":7,"order":296,"path":9932,"readingTime":5505,"relatedResources":290,"seo":9933,"stem":9934,"tags":9935,"__hash__":9936},"blogs\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-5-retention-ttl-compaction.md","Pulsar Newbie Guide for Kafka Engineers (Part 5): Retention, TTL & Compaction",[808,809,810],{"type":15,"value":9643,"toc":9920},[9644,9646,9649,9653,9656,9659,9667,9670,9673,9676,9679,9684,9687,9690,9693,9701,9705,9708,9711,9714,9719,9722,9725,9728,9731,9734,9738,9741,9749,9752,9760,9763,9766,9769,9773,9776,9779,9782,9787,9792,9794,9808,9811,9814,9817,9820,9823,9827,9830,9844,9847,9851,9869,9871,9894,9897,9899,9901,9903,9908,9910,9915],[48,9645,8648],{},[48,9647,9648],{},"Pulsar offers flexible message retention policies and features like Time-to-Live (TTL) and Topic Compaction, which differ from Kafka’s approach. By default, Pulsar retains messages until they are acknowledged (no time limit) and deletes them immediately once acknowledged. But you can configure retention to keep acknowledged messages for a duration or size (like Kafka’s log retention), as well as TTL to discard unacknowledged messages after a while (prevent infinite backlog). Pulsar also supports log compaction to keep the latest value per key, similar to Kafka’s compaction but implemented via a separate compacted ledger. We’ll explain these settings and how to use them to manage Pulsar topic storage, using Kafka’s behavior as a reference point.",[40,9650,9652],{"id":9651},"message-retention-in-pulsar-vs-kafka","Message Retention in Pulsar vs Kafka",[48,9654,9655],{},"Kafka’s model: In Kafka, retention is typically time-based or size-based per topic. For example, you might retain logs for 7 days or 10 GB. Kafka does not consider whether a message was consumed – it will delete messages older than the retention period regardless of consumer status. This means Kafka brokers can delete old data even if some slow consumer hasn’t processed it yet (that consumer would then miss those messages).",[48,9657,9658],{},"Pulsar’s default model: Pulsar, being a messaging system with acknowledgments, by default behaves differently:",[321,9660,9661,9664],{},[324,9662,9663],{},"Pulsar will keep all unacknowledged messages indefinitely (in storage) by default, to ensure consumers can get them whenever they come online.",[324,9665,9666],{},"Once a message is acknowledged by all subscriptions, Pulsar will immediately mark it for deletion (it can be deleted from storage).",[48,9668,9669],{},"In other words, Pulsar’s out-of-the-box behavior is: “retain data as long as someone still needs it; delete it as soon as nobody needs it.” This is more akin to a traditional messaging queue – messages don’t pile up once consumed.",[48,9671,9672],{},"This is basically opposite to Kafka’s strategy of time-based retention. If you hooked up a Pulsar topic with no special retention config and a consumer, and that consumer always stays caught up (acking messages), the topic would use almost no storage (only very recent unacked messages). In Kafka, the topic would accumulate data up to the retention period regardless of consumption.",[48,9674,9675],{},"Configurable Retention: Pulsar allows you to alter this behavior via retention policies. You can set a retention period (time and\u002For size) for messages even after acknowledgment. For instance, you might say: “Keep messages for 1 day or 1 GB, whichever comes first, even after consumers ack them.” That way, consumers could potentially reconnect within a day and replay data, or you could attach a new subscription within a day to reprocess history.",[48,9677,9678],{},"This is done at the namespace or topic level using pulsar-admin namespaces set-retention. For example:",[48,9680,9681],{},[384,9682],{"alt":5878,"src":9683},"\u002Fimgs\u002Fblogs\u002F68b859eb4fa7539ffa13f99c_iShot_2025-09-03_23.08.15.png",[48,9685,9686],{},"This would keep acknowledged messages for 24 hours or until 1 GB per topic is reached. After that, older messages are removed (even if not acked? Actually, acked messages only – unacked are still kept as backlog; more on that next).",[48,9688,9689],{},"To clarify: Pulsar retention policy applies to acknowledged messages (the ones that normally would be deleted immediately). Unacknowledged messages are governed by TTL (time-to-live) settings, not the retention policy.",[48,9691,9692],{},"So you have two separate concepts:",[321,9694,9695,9698],{},[324,9696,9697],{},"Retention (Acknowledged messages): Keep some history of consumed messages.",[324,9699,9700],{},"TTL (Time-to-Live for Unacknowledged messages): After a certain time, treat unacknowledged messages as acknowledged (essentially drop them).",[40,9702,9704],{"id":9703},"time-to-live-ttl-for-unacked-messages","Time-to-Live (TTL) for Unacked Messages",[48,9706,9707],{},"Why TTL? Consider a scenario where a consumer goes offline or is very slow – by default, Pulsar will keep feeding it its backlog forever. If that backlog grows massive, it could consume a lot of storage. In Kafka, if a consumer falls behind beyond retention, it just misses data (or if using a compacted topic, older state vanishes). Pulsar gives an option to say: “If messages haven’t been acknowledged for X time, we assume they won’t be and we discard them.”",[48,9709,9710],{},"This is message TTL (a per-namespace or per-topic setting). For example, set TTL to 7 days and any message not acknowledged more than 7 days of being published will be automatically marked as acknowledged (expired) and won’t be deliverable to consumers. It essentially protects the system from an infinite backlog due to a stuck consumer.",[48,9712,9713],{},"Using pulsar-admin:",[48,9715,9716],{},[384,9717],{"alt":5878,"src":9718},"\u002Fimgs\u002Fblogs\u002F68b85a2ad18fa42d7c6ae957_iShot_2025-09-03_23.09.19.png",[48,9720,9721],{},"(604800 seconds is 7 days). This would mean messages older than 7 days that are still unacked are expired.",[48,9723,9724],{},"From the docs: “If disk space is a concern, you can set a time to live (TTL) that determines how long unacknowledged messages will be retained. The TTL parameter is like a stopwatch attached to each message... when it expires, Pulsar automatically moves the message to the acknowledged state (and thus makes it ready for deletion)”.",[48,9726,9727],{},"That nicely summarizes TTL: after TTL, a message is considered acknowledged (even if the consumer never acked it), so it will be removed like any other acked message.",[48,9729,9730],{},"TTL is somewhat analogous to Kafka’s retention for the tail of the log, but specifically for unconsumed messages. Kafka doesn’t differentiate – it just kills old records. Pulsar, with TTL, gives you a safety net: normally you might not want to lose unconsumed messages, but at some point, you might prefer dropping them than letting them endlessly accumulate.",[48,9732,9733],{},"Backlog Quota: Another related concept is backlog quota. You can set a limit on how large a backlog (unacked messages) can grow (by size or time), and what to do when that limit is reached (e.g., reject producers, or start discarding oldest messages). This is configured separately (set-backlog-quota). For example, you might allow up to 50 GB of backlog; if more, either block producers (to exert backpressure) or throw oldest messages away. Backlog quota policies can complement TTL for robust control.",[40,9735,9737],{"id":9736},"kafka-like-retention-in-pulsar","Kafka-like Retention in Pulsar",[48,9739,9740],{},"If a Kafka engineer wants to emulate Kafka’s log retention (i.e., retain data for X days regardless of consumption), you can do that by:",[321,9742,9743,9746],{},[324,9744,9745],{},"Setting a retention period for acknowledged messages (so data sticks around even if consumed).",[324,9747,9748],{},"Also potentially setting a TTL for unacknowledged to that same period (so that if a consumer is not there, we don’t keep forever beyond that period).",[48,9750,9751],{},"For example, to mimic “retain messages for 7 days no matter what”:",[321,9753,9754,9757],{},[324,9755,9756],{},"Set namespace retention to 7 days (acknowledged messages retained 7 days).",[324,9758,9759],{},"Set TTL to 7 days (unacknowledged messages expire after 7 days).",[48,9761,9762],{},"Now Pulsar will behave more like Kafka: any message will exist for at most 7 days, whether or not it’s consumed.",[48,9764,9765],{},"However, be careful: If you have TTL=7d and your consumer is down for 8 days, it will lose messages from that gap (similar to Kafka consumer falling behind retention). If you truly never want to lose unconsumed data, you might leave TTL off (infinite) but then you rely on disk capacity or backlog quotas to handle runaway consumers.",[48,9767,9768],{},"By default, Pulsar doesn’t expire unacked messages (TTL off) and doesn’t retain acked messages (retention 0). So default is “only store what’s needed”. Kafka default is typically something like “store for a week”.",[40,9770,9772],{"id":9771},"compaction-maintaining-latest-state-per-key","Compaction: Maintaining Latest State Per Key",[48,9774,9775],{},"Kafka’s log compaction feature allows topics to retain only the latest value for each key (removing older values, except the latest and maybe some history). This is useful for state change events or last-known-value semantics. Pulsar offers a similar feature: Topic Compaction.",[48,9777,9778],{},"However, the implementation has a twist. In Pulsar, compaction doesn’t rewrite the existing data in place (since data is stored in BK ledgers). Instead, running compaction produces a new compacted ledger that contains the latest values per key. Consumers can then choose to read from the compacted ledger if they want a compressed view of the topic.",[48,9780,9781],{},"In practice:",[321,9783,9784],{},[324,9785,9786],{},"You trigger compaction manually via CLI or set it to run periodically. For example:",[48,9788,9789],{},[384,9790],{"alt":5878,"src":9791},"\u002Fimgs\u002Fblogs\u002F68b85b1a91e7c2e2f07fcc0c_iShot_2025-09-03_23.13.23.png",[48,9793,3931],{},[321,9795,9796,9799,9802,9805],{},[324,9797,9798],{},"This will initiate compaction. The broker goes through the topic’s backlog and builds a new ledger with only the latest message for each key.",[324,9800,9801],{},"After compaction, the topic has two sets of data: the full log (uncompacted backlog) and a compacted snapshot. Pulsar retains both. Why? Because some consumers might want to read the full log (e.g., if they’re processing every change), while others might want just the latest state.",[324,9803,9804],{},"A consumer can choose to read from the compacted view by setting readCompacted(true) on the consumer (only allowed for subscriptions with certain types, typically exclusive or failover subs, since shared subs could break the model). When readCompacted is true, the broker will serve from the compacted ledger (for earlier data) and then live data for new writes, essentially giving an experience similar to Kafka’s compacted topic.",[324,9806,9807],{},"Compaction respects retention: if retention has removed some messages entirely, those won’t be in the compacted log either. Also, compaction doesn’t delete the original data immediately; it just provides a compacted copy. The older ledgers remain (and could still be consumed normally or for auditing). You can configure Pulsar to truncate older ledgers once compacted up to a point, but by default, you might manually manage that or rely on retention.",[48,9809,9810],{},"One key difference: Kafka’s compacted topics can still optionally have a retention time to delete old tombstones or limit log size. Pulsar’s compaction essentially ensures at least the latest per key is kept, and if you want old data removed beyond that, you’d use retention or TTL.",[48,9812,9813],{},"Tombstones: Pulsar honors the concept of a null message as a deletion marker (tombstone). If a message with key K and null value is published, compaction will remove K from the compacted log (so it won’t appear at all for consumers reading compacted). This is like Kafka’s tombstone mechanic.",[48,9815,9816],{},"One limitation mentioned: “Pulsar is slightly less flexible in this regard. Messages can only be removed from the compact ledger via explicit deletion by key, otherwise you can expect to store at least the latest message for all keys”. This means Pulsar compacted topics always keep the last value for each key until you explicitly delete by sending a null (Kafka allows you to also set a retention on compacted topics to eventually drop even the last values after a time if needed). Pulsar’s approach is “keep last forever (or until explicit tombstone)”.",[48,9818,9819],{},"Use cases: If you want a topic that holds, say, the latest status of each user, you would use compaction. Produce updates with a key (user ID) and value (status). Compaction will ensure only the most recent status per user is kept in the compacted view. A new consumer can read the compacted log from start and quickly get the latest state of all users without going through all historical changes.",[48,9821,9822],{},"Running compaction doesn’t block the topic – you can run it while publishing is happening. It’s an operation that reads the backlog and writes a new ledger. It might consume resources, so schedule it appropriately (e.g., off-peak).",[40,9824,9826],{"id":9825},"putting-it-all-together","Putting It All Together",[48,9828,9829],{},"Let’s consider how you might configure a Pulsar namespace for different scenarios, drawing parallels to Kafka:",[321,9831,9832,9835,9838,9841],{},[324,9833,9834],{},"Ephemeral stream (like Kafka’s default): If you want data to vanish after some time regardless of consumption (like a Kafka topic with 7-day retention and maybe consumers that are expected to keep up or else miss data), you’d set a retention period (time-based) and maybe TTL the same or slightly larger. For example, retention 7 days, TTL 7 days. This way, acked or not, after 7 days data is gone. Consumers that fall behind by more than 7 days lose data. This is a trade-off for bounded storage.",[324,9836,9837],{},"Work queue (at-least-once, but not infinite backlog): Perhaps you have a queue that should not grow unbounded if consumers are down. You might set TTL for unacked messages to, say, 2 days. If consumers are down for >2 days, those tasks expire. But if they come back before that, they get everything. You might not bother retaining acknowledged messages at all in this case.",[324,9839,9840],{},"Durable log (don’t lose anything; like Kafka with infinite retention or very long retention): Keep TTL off (or extremely high) so you never drop unacked messages. Consumers can always come back and get their backlog. Also, maybe set retention for acked messages to some large value if you want the ability to re-read even after ack (like an audit trail). Or use pulsar-admin topics terminate to mark an endpoint and handle archival externally. Keep an eye on storage though – infinite retention needs infinite storage or periodic offloading to cold storage (Pulsar has tiered storage to move old ledger data to, e.g., S3).",[324,9842,9843],{},"Compacted topic for state: Set topic to compacted. Also likely set a retention policy so that even after compaction, you keep data (compaction will keep last keys by design, but what about keys that got tombstoned? They’ll be removed in compacted log, but original ledger entries may still exist until retention kicks in). Usually, you combine compaction with an infinite retention (or very long) but it’s compacted so storage doesn’t blow up with old updates. You may still want to purge tombstoned keys after some time – which retention can do for the underlying data.",[48,9845,9846],{},"How to trigger compaction: Kafka’s compaction runs continuously in the background on brokers. Pulsar’s approach is manual or scheduled. In a production Pulsar cluster, you’d typically run an automatic compaction periodically for the topics that need it (via a scheduler or perhaps using Pulsar Functions or external scripts to call the compact command). There’s also a “threshold” based compaction strategy (for example, compact when backlog reaches a certain size). Check Pulsar docs for auto-compaction configs if needed.",[40,9848,9850],{"id":9849},"monitoring-and-admin-for-retentionttl","Monitoring and Admin for Retention\u002FTTL",[321,9852,9853,9860,9863,9866],{},[324,9854,9855,9856],{},"pulsar-admin topics stats ",[9857,9858,9859],"topic",{}," will show retention stats and backlog size. You can see how many messages are stored, backlog size, etc.",[324,9861,9862],{},"If a backlog is consuming too much space, as an admin you might decide to set a TTL or remove a subscription (if a subscription is not needed but still has backlog, dropping it will free those messages).",[324,9864,9865],{},"Pulsar has a concept of inactive subscriptions (subscriptions that have no consumers but still have backlog). If a subscription lingers with backlog and no consumers, those messages will sit forever unless TTL or an admin explicitly expires them. Kafka doesn’t have that scenario because if no consumer reads, data still gets deleted by time. Pulsar’s durability means you should watch for ghost subscriptions. If using Pulsar as a Kafka replacement where you only care about consumer groups that are active, make sure to clean up subscriptions when they are no longer needed (or set a TTL\u002Fbacklog quota so they don’t live forever).",[324,9867,9868],{},"Tiered storage: If you need long retention but don’t want to burden hot storage, Pulsar can offload older ledger data to cloud storage. That’s beyond our scope here, but know that infinite retention is possible by pushing old data out to cheaper storage, somewhat analogous to Kafka’s tiered storage solutions.",[40,9870,8924],{"id":8923},[321,9872,9873,9876,9879,9882,9885,9888,9891],{},[324,9874,9875],{},"By default, Pulsar retains unacknowledged messages forever and immediately deletes acknowledged messages. This ensures no data loss for slow consumers by default, unlike Kafka which will eventually delete old messages regardless of consumer progress.",[324,9877,9878],{},"Pulsar’s retention policy allows you to keep acknowledged messages for a configured time\u002Fsize. This can make Pulsar topics behave more like Kafka logs, where data is available for reprocessing or late joiners for a window of time after consumption.",[324,9880,9881],{},"TTL (Time-to-Live) deals with the flip side: unacknowledged messages. It sets a limit on how long a message can remain unconsumed before Pulsar drops it. This prevents unbounded growth of backlog if consumers disappear. Kafka’s equivalent (not direct) is just its retention policy which would also delete data not consumed; Pulsar distinguishes between consumed and not consumed.",[324,9883,9884],{},"Log Compaction in Pulsar allows keeping the latest value per key, similar to Kafka’s compacted topics. Pulsar’s compaction generates a separate compacted view that consumers can opt into. Use compaction for stateful topics where you only care about the latest update per key (with tombstones to delete keys).",[324,9886,9887],{},"By combining retention, TTL, and compaction settings, Pulsar gives fine-grained control over data lifespan:You can achieve at-least-once delivery with bounded storage (via TTL).",[324,9889,9890],{},"You can achieve replay of recent history (via retention of acked messages).",[324,9892,9893],{},"You can maintain a compact state topic for lookup of current values (via compaction).\nFor a Kafka engineer, remember that Pulsar does not, by default, throw away data after X days blindly – you must configure it to do so if that’s desired. Conversely, you must monitor and manage backlogs or use TTL to avoid a stuck consumer filling up storage, a scenario Kafka would handle by data expiration but Pulsar will handle by pausing producers or requiring admin action if no TTL\u002Fquota set. Pulsar provides the tools to do this safely and more flexibly.",[48,9895,9896],{},"Next up, in Part 6, we’ll explore Schema Management in Pulsar, where we’ll see how Pulsar’s built-in schema registry compares to Kafka’s schema registry concept and how to enforce schema evolution rules on topics.",[48,9898,3931],{},[208,9900],{},[48,9902,3931],{},[48,9904,8956,9905,8960],{},[55,9906,5405],{"href":6135,"rel":9907},[264],[48,9909,8963],{},[48,9911,9912],{},[55,9913,8970],{"href":8968,"rel":9914},[264],[48,9916,9917],{},[55,9918,8976],{"href":7969,"rel":9919},[264],{"title":18,"searchDepth":19,"depth":19,"links":9921},[9922,9923,9924,9925,9926,9927,9928],{"id":9651,"depth":19,"text":9652},{"id":9703,"depth":19,"text":9704},{"id":9736,"depth":19,"text":9737},{"id":9771,"depth":19,"text":9772},{"id":9825,"depth":19,"text":9826},{"id":9849,"depth":19,"text":9850},{"id":8923,"depth":19,"text":8924},"Explore how Apache Pulsar handles message retention, Time-to-Live (TTL), and topic compaction compared to Kafka. Learn how to configure retention policies, prevent infinite backlogs, and use compaction to maintain the latest state per key.","\u002Fimgs\u002Fblogs\u002F68b858ba57c99ef2f3be6848_SN-sm-Pulsar-for-Kafka-Engineers-series-5.png",{},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-5-retention-ttl-compaction",{"title":9640,"description":9929},"blog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-5-retention-ttl-compaction",[821,799,7347],"yHLPW6Kt0G88nGXYz8ASLHlqoYs5eCwv92nAAhHr_N8",{"id":9938,"title":9939,"authors":9940,"body":9941,"category":6415,"createdAt":290,"date":10046,"description":10047,"extension":8,"featured":294,"image":10048,"isDraft":294,"link":290,"meta":10049,"navigation":7,"order":296,"path":10050,"readingTime":3556,"relatedResources":290,"seo":10051,"stem":10052,"tags":10053,"__hash__":10055},"blogs\u002Fblog\u002Fthe-event-driven-agent-era-why-streams-matter-now.md","The Event‑Driven Agent Era: Why Streams Matter Now (Event-Driven Agents, Part 1)",[810,6500,6501],{"type":15,"value":9942,"toc":10038},[9943,9946,9949,9953,9956,9959,9962,9966,9969,9972,9977,9980,9983,9986,9990,9993,10013,10016,10019,10021,10024,10027,10031],[40,9944,9945],{"id":42},"‍Introduction‍",[48,9947,9948],{},"AI has evolved through distinct waves — from early predictive models to the recent generative AI boom — and now into the era of agentic AI, where autonomous agents can act, adapt, and collaborate in real time. In this new phase, simply having smart models isn’t enough; the way these AI agents communicate and act upon incoming data becomes critical. This post kicks off our series by exploring why event-driven data streams are the key to unlocking the full potential of AI agents in this real-time, event-driven era. We’ll discuss the limitations of traditional architectures and how streaming-centric designs address those challenges, enabling more scalable and responsive AI systems.",[40,9950,9952],{"id":9951},"the-event-driven-agent-era-at-a-glance","The Event-Driven Agent Era at a Glance",[48,9954,9955],{},"In the agentic AI era, software “agents” don’t just generate content or predictions — they have the agency to make decisions and take actions autonomously. Think of an AI agent as a specialized microservice with a brain: it perceives incoming data, reasons over state, and initiates actions (including communicating with other agents). However, to be truly useful, such agents require more than just clever models and tools. They need a robust infrastructure to support real-time data flow, security to protect and govern sensitive information, and scalability to operate in concert across an organization. In other words, the underlying architecture must empower agents to work together timely and seamlessly, not in isolation.",[48,9957,9958],{},"Traditional request-response architectures (e.g. REST APIs, synchronous workflows) are ill-suited for this new breed of AI. Rigid, query-driven designs can’t keep up with dynamic, continuous streams of events. In a request-driven system, each agent or service must explicitly poll or call others to get updates, introducing delays and tight coupling. Such architectures quickly become bottlenecks that limit scalability and make coordination between multiple agents a nightmare. Agents might be starved of fresh data or spend too much time waiting on each other. Clearly, the old approaches break down when faced with the speed and complexity of modern AI applications.",[48,9960,9961],{},"What today’s AI agents demand is the ability to react asynchronously and in real time to an ever-changing environment. This calls for embracing an Event-Driven Architecture (EDA) powered by a streaming platform. In an event-driven system, the mindset flips: instead of agents making direct calls for data, they subscribe to relevant data streams and publish events when they have new information or outcomes. Agents become producers and consumers of events, all flowing through a central event hub or bus. This way, each agent is always working with the latest data and can react the moment something important happens, without being explicitly invoked by others. As we’ll see, this shift to streaming events brings major benefits for building AI that is robust, scalable, flexible and responsive.",[40,9963,9965],{"id":9964},"why-traditional-architectures-fall-short","Why Traditional Architectures Fall Short",[48,9967,9968],{},"Let’s examine the shortcomings of the traditional approach through a familiar analogy: microservices. Early on, distributed systems often involved tightly-coupled services making synchronous calls to each other. This led to complex webs of dependencies — if one service slowed down or failed, it directly impacted others. Similarly, a naive implementation of multiple AI agents might have them calling each other’s APIs or writing to the same databases, resulting in tangled interactions and fragile systems. In fact, deploying hundreds of independent AI agents without a structured communication framework would result in chaos: agents could become fragmented, inefficient, and unreliable just like uncoordinated microservices.",[48,9970,9971],{},"Microservice architecture evolved to solve these issues by introducing asynchronous, event-driven communication. Instead of every service talking to every other service directly, events are exchanged via a message broker or streaming platform. This decouples the senders and receivers — services simply emit events and any interested parties consume them. The breakthrough came with event-driven architectures where services react to changes asynchronously, enabling real-time responsiveness and better scalability. AI agents today need the same kind of shift in how they interact.",[48,9973,9974],{},[384,9975],{"alt":18,"src":9976},"\u002Fimgs\u002Fblogs\u002F68b71f8f21bd8cfde92d4b85_4974b309.png",[48,9978,9979],{},"Figure 1: Shifting from tightly coupled point-to-point interactions (left) to an event broker model (right) simplifies communication and scaling for microservices.",[48,9981,9982],{},"The diagram above illustrates how a set of services (or agents) can be coordinated via an event broker. On the left, each microservice must know about and directly communicate with others, forming a brittle network. On the right, each one simply emits events and listens for events on the broker, decoupling their logic. For AI agents, this means they don’t need hard-coded knowledge of each other. An agent can announce, “I have new data” or “an anomaly occurred” as an event, and any other agent interested in that type of event can respond accordingly. This architecture drastically reduces interdependencies and avoids the spaghetti mess of direct integrations. It also provides a natural buffer and fault tolerance: if an agent goes down, the event stream can persist the data until it comes back or a replacement picks up, instead of causing a cascade of failures.",[48,9984,9985],{},"Another limitation of traditional designs is the reliance on static data or batch processing. Many AI workflows historically operated on stale data snapshots — e.g. daily batch updates or periodic database queries. For an autonomous agent making decisions, stale data can be disastrous (imagine a trading agent acting on last week’s prices!). Request\u002Fresponse systems can’t easily push updates to agents in real time; at best, agents would have to constantly poll for changes, which doesn’t scale and still introduces latency. By contrast, event streaming ensures that whenever new data is available, it gets delivered to subscribers immediately. There are no artificial delays waiting for the next batch or API call – agents are always working with live feeds. As a result, they can respond to opportunities and risks instantaneously and accurately.",[40,9987,9989],{"id":9988},"why-streams-matter-for-ai-agents","Why Streams Matter for AI Agents",[48,9991,9992],{},"Streaming data platforms and event-driven architecture directly address the challenges above, providing a foundation for real-time, scalable intelligence. Here are some of the key benefits of adopting streams in an AI agent system:",[321,9994,9995,9998,10001,10004,10007,10010],{},[324,9996,9997],{},"Real-Time Data Access: Agents receive data as continuous streams of events, eliminating batch delays. This ensures decisions are made on the freshest possible information, not outdated snapshots. Anomaly detectors, for example, can catch issues within seconds of occurrence, because they don’t have to wait for a scheduled job or request to pull data.",[324,9999,10000],{},"Loose Coupling of Agents: Agents communicate through the event bus rather than direct one-to-one calls. This decoupled architecture reduces complexity and interdependence between agents. Each agent simply declares the types of events it produces and consumes. The benefit is easier integration of new agents and the ability to change or scale parts of the system without breaking everything else.",[324,10002,10003],{},"Scalability and Flexibility: Because agents process events asynchronously, you can scale out the number of agents or instances of an agent type seamlessly. New agents can join the system without disrupting existing workflows. Workloads naturally distribute via the streaming platform (much like how consumer groups work in Kafka or Pulsar), enabling the system to handle spikes in events or additional tasks by just adding consumers. This architecture is future-proof in that you can plug in any model or tool as an agent, and as long as it adheres to the event interfaces, it participates in the ecosystem without special coordination code.",[324,10005,10006],{},"Fresh, AI-Ready Data Streams: An event-driven platform can preprocess and transform data on the fly, preparing it for AI consumption. For instance, streaming pipelines might convert raw text into vector embeddings in real time and publish those embeddings as events. Agents subscribing to these enriched event streams get the benefit of up-to-date features and context (e.g. updated user profiles, live sensor readings) without needing to batch-retrain or constantly query databases. The streaming layer effectively feeds intelligent agents with a constant flow of relevant, structured information, which is crucial for techniques like real-time recommendation or reinforcement learning.",[324,10008,10009],{},"Resilience and Fault Tolerance: The event log acts as a durable buffer. If an agent goes down momentarily, it can replay missed events from the stream once it recovers, ensuring no data or trigger is lost. This is far more robust than synchronous RPC calls which might fail or time out. The decoupling also localizes failures — one stalled agent doesn’t directly block others, as long as the event stream is flowing. Overall system reliability increases, which is important as you scale to dozens or hundreds of agents.",[324,10011,10012],{},"Multiplexing and Replay: A streaming platform enables an upstream agent to publish results to a topic once, while multiple downstream agents can independently subscribe to that topic to consume events in real time. This pattern eliminates the need for direct integrations, ensuring communication remains flexible, decoupled, and highly scalable. New downstream agents can be introduced seamlessly without requiring changes to upstream logic. In addition, the ability to replay or rewind past events is critical for scenarios such as debugging, retraining models, or restoring state after failures. Rather than losing historical context, agents can reprocess prior events deterministically to achieve consistent outcomes. By combining multiplexed delivery with reliable replay, streaming platforms provide the foundation for resilient, scalable, and intelligent agent ecosystems.",[48,10014,10015],{},"Beyond these benefits, an event-driven approach inherently supports better governance and observability. All events flowing through a central platform can be logged, monitored, and even audited in real time. This is a big advantage for organizations concerned with compliance, security, or just debugging complex agent behaviors. A streaming platform can enforce data quality checks and security policies on the events, ensuring that agents only operate on trustworthy data. Traditional point-to-point integrations struggle to offer such a unified view and control. By using a centralized streaming backbone, you gain a “single source of truth” for what each agent did and why, since their inputs and outputs are all events on the bus.",[48,10017,10018],{},"Finally, streams enable what is called a “shift-left” approach – moving computation and decision-making closer to the data source. Instead of waiting for data to be stored and then processed, the processing happens in motion. This reduces latency dramatically. In practice, that means an AI agent can trigger actions in near real time as events arrive, which is critical for use cases like fraud detection, IoT automation, or personalized user experiences. The faster an agent can react to an event, the more value it can provide.",[40,10020,2125],{"id":2122},[48,10022,10023],{},"The emergence of event-driven streaming architecture marks a turning point for AI systems. We no longer have to bolt intelligent agents onto brittle, request-driven frameworks and hope for the best. By making events the lingua franca of our AI agents, we enable them to truly operate in real time and at scale. Streams provide a live data fabric that keeps agents in sync and informed, while avoiding the pitfalls of tight coupling and stale data. In short, streams matter now because they transform isolated AI capabilities into a coordinated, adaptive system of agents. Organizations that embrace this streaming-first approach will be positioned to build smarter, more responsive and better scalable AI solutions that can evolve with the ever-increasing pace of AI revolution.",[48,10025,10026],{},"This is just the beginning of our exploration. In upcoming posts, we will dive deeper into how to design and implement event-driven agent systems — from the anatomy of an AI agent, to multi-agent design patterns, to real-world architecture examples. We’ll illustrate how an enterprise-grade streaming platform (using platforms like StreamNative Cloud) forms the backbone of the agentic AI stack, and how you can start building your own event-driven agents.",[40,10028,10030],{"id":10029},"next-step-get-involved","Next Step-Get Involved",[48,10032,10033,10034,10037],{},"To learn more and see these concepts in action, be sure to join us at the ",[55,10035,5376],{"href":6135,"rel":10036},[264]," on September 30, 2025. It’s a great opportunity to hear from experts and architects who are building real-time AI systems (and yes, agentic AI will be a hot topic!).",{"title":18,"searchDepth":19,"depth":19,"links":10039},[10040,10041,10042,10043,10044,10045],{"id":42,"depth":19,"text":9945},{"id":9951,"depth":19,"text":9952},{"id":9964,"depth":19,"text":9965},{"id":9988,"depth":19,"text":9989},{"id":2122,"depth":19,"text":2125},{"id":10029,"depth":19,"text":10030},"2025-09-02","Discover why event-driven data streams are essential for the new era of agentic AI. This post explores the limitations of traditional architectures and how streaming platforms enable scalable, real-time AI agents for enhanced communication and responsiveness.","\u002Fimgs\u002Fblogs\u002F68b71c9594d68e8b827439af_Event-Driven-Agents-01.png",{},"\u002Fblog\u002Fthe-event-driven-agent-era-why-streams-matter-now",{"title":9939,"description":10047},"blog\u002Fthe-event-driven-agent-era-why-streams-matter-now",[3988,10054,8058,303],"GenAI","3Ys0JhqiaD52phNpBJPzI3rwH_VtVn6Q9EIb85p9W7k",{"id":10057,"title":10058,"authors":10059,"body":10060,"category":3550,"createdAt":290,"date":10046,"description":10315,"extension":8,"featured":294,"image":10316,"isDraft":294,"link":290,"meta":10317,"navigation":7,"order":296,"path":10318,"readingTime":4475,"relatedResources":290,"seo":10319,"stem":10320,"tags":10321,"__hash__":10323},"blogs\u002Fblog\u002Funlocking-real-time-data-synergy-tidb-cloud-changefeed-integrates-with-streamnative-cloud.md","Unlocking Real-Time Data Synergy: TiDB Cloud Changefeed Integrates with StreamNative Cloud",[6969],{"type":15,"value":10061,"toc":10301},[10062,10065,10070,10074,10077,10080,10094,10098,10101,10105,10124,10128,10131,10147,10152,10156,10159,10176,10181,10185,10188,10196,10201,10205,10208,10211,10216,10219,10235,10239,10242,10245,10249,10252,10267,10271,10274,10283,10287],[48,10063,10064],{},"We are excited to announce a strategic collaboration between TiDB Cloud and StreamNative, enabling you to stream real-time data from TiDB Cloud directly into the StreamNative Cloud ecosystem. This integration utilizes the new changefeed feature in TiDB Cloud to capture data changes as they occur and seamlessly stream them to topics hosted on StreamNative Cloud.",[48,10066,10067],{},[384,10068],{"alt":18,"src":10069},"\u002Fimgs\u002Fblogs\u002F68b6c22e25979387f7076ce1_bedb9a92.png",[40,10071,10073],{"id":10072},"unlocking-real-time-data-synergy","Unlocking Real-Time Data Synergy",[48,10075,10076],{},"In today's fast-paced digital landscape, the ability to process and react to data in real-time is essential. TiDB Cloud offers a powerful, distributed SQL database that excels at handling large-scale, high-concurrency workloads. StreamNative's ONE platform, powered by Apache Pulsar, on the other hand, provides a highly scalable, low-latency messaging and streaming platform designed for real-time data ingestion and distribution. The new changefeed feature bridges these two powerful platforms, enabling seamless data flow from TiDB Cloud to StreamNative Cloud.",[48,10078,10079],{},"This integration allows users to capture data changes in TiDB Cloud and stream them directly into Apache Pulsar topics on StreamNative Cloud. This opens up a multitude of use cases, including:",[321,10081,10082,10085,10088,10091],{},[324,10083,10084],{},"Real-time Analytics: Feed operational data from TiDB Cloud into Pulsar for immediate processing and analytical insights.",[324,10086,10087],{},"Event-Driven Applications: Build reactive applications that respond instantly to database changes.",[324,10089,10090],{},"Data Synchronization: Keep various data systems consistent by propagating changes efficiently.",[324,10092,10093],{},"Data Lake Ingestion: Stream TiDB Cloud data into data lakes for historical analysis and machine learning.",[40,10095,10097],{"id":10096},"getting-started-with-tidb-cloud-changefeed-and-streamnative-cloud","Getting Started with TiDB Cloud Changefeed and StreamNative Cloud",[48,10099,10100],{},"Getting started with this powerful integration is straightforward. Here's a quick guide to help you set up your first changefeed:",[32,10102,10104],{"id":10103},"prerequisites","Prerequisites",[321,10106,10107,10114,10121],{},[324,10108,10109],{},[55,10110,10113],{"href":10111,"rel":10112},"https:\u002F\u002Fconsole.streamnative.cloud\u002Fsignup?from=site_landing-page",[264],"StreamNative Cloud account",[324,10115,10116],{},[55,10117,10120],{"href":10118,"rel":10119},"http:\u002F\u002Ftidbcloud.com",[264],"TiDB Cloud account",[324,10122,10123],{},"Basic understanding of Apache Pulsar topics and subscriptions",[32,10125,10127],{"id":10126},"step-1-prepare-your-streamnative-cloud-environment","Step 1: Prepare Your StreamNative Cloud Environment",[48,10129,10130],{},"Before configuring the changefeed, you need to set up the destination topic and get authentication details from your StreamNative Cloud console.",[1666,10132,10133,10141,10144],{},[324,10134,10135,10136,10140],{},"Log in to your StreamNative Cloud console. You can create a ",[55,10137,10139],{"href":10138},"\u002Fdeployment#serverless","serverless"," cluster to get started quickly.",[324,10142,10143],{},"Create a Topic: Navigate to the Topics section, select your desired tenant and namespace, and create a new topic (e.g., tidb-changes). Note the full topic name, as you will need it later.",[324,10145,10146],{},"Get Authentication Key: Navigate to 'Pulsar Clients‘, select your service account, then click Create API Key to generate a token. Copy this token securely.",[48,10148,10149],{},[384,10150],{"alt":18,"src":10151},"\u002Fimgs\u002Fblogs\u002F68b6c22e25979387f7076ce4_f225cf2b.png",[32,10153,10155],{"id":10154},"step-2-configure-tidb-cloud-changefeed","Step 2: Configure TiDB Cloud Changefeed",[48,10157,10158],{},"With your StreamNative Cloud access APIKey ready, you can now create the changefeed in TiDB Cloud.",[1666,10160,10161,10164,10167,10170,10173],{},[324,10162,10163],{},"Log in to your TiDB Cloud console.",[324,10165,10166],{},"Navigate to the Changefeed section and create a new changefeed.",[324,10168,10169],{},"Select Pulsar as the destination.",[324,10171,10172],{},"Under Connection, provide your Pulsar Broker URL and select the Token Auth Type.",[324,10174,10175],{},"Enter the broker Pulsar Broker URL and API key you obtained in the previous step.",[48,10177,10178],{},[384,10179],{"alt":18,"src":10180},"\u002Fimgs\u002Fblogs\u002F68b6c22e25979387f7076ce7_b5ae3bee.png",[32,10182,10184],{"id":10183},"step-3-create-a-table-and-insert-data-in-tidb-cloud","Step 3: Create a Table and Insert Data in TiDB Cloud",[48,10186,10187],{},"Now that the changefeed is set up, you can generate change events by executing SQL commands in the TiDB Cloud Shell.",[1666,10189,10190,10193],{},[324,10191,10192],{},"Open the TiDB Cloud Shell for your cluster.",[324,10194,10195],{},"Run the following SQL statements to create a users table and insert two records:",[48,10197,10198],{},[384,10199],{"alt":5878,"src":10200},"\u002Fimgs\u002Fblogs\u002F68b6c1406e2ced80728482b6_iShot_2025-09-02_18.04.33.png",[32,10202,10204],{"id":10203},"step-4-verify-data-in-streamnative-cloud","Step 4: Verify Data in StreamNative Cloud",[48,10206,10207],{},"The change events from the SQL commands will be streamed to your StringNative Apache Pulsar topic. You can use a Pulsar consumer to verify that the data has been received correctly.",[48,10209,10210],{},"The consumer will receive the change events in a structured JSON format. The first event will be the DDL for the CREATE TABLE command, followed by separate DML events for each INSERT operation.",[48,10212,10213],{},[384,10214],{"alt":5878,"src":10215},"\u002Fimgs\u002Fblogs\u002F68b6c1600660dd9d6a3c382d_iShot_2025-09-02_18.05.05.png",[48,10217,10218],{},"For detailed instructions and advanced configurations, please refer to the official documentation:",[321,10220,10221,10228],{},[324,10222,10223],{},[55,10224,10227],{"href":10225,"rel":10226},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fget-started\u002Fquickstart-console",[264],"StreamNative Documentation",[324,10229,10230],{},[55,10231,10234],{"href":10232,"rel":10233},"https:\u002F\u002Fdocs.pingcap.com\u002Ftidbcloud\u002Ftidb-cloud-quickstart\u002F",[264],"TiDB Cloud Documentation",[40,10236,10238],{"id":10237},"empowering-developers-and-enterprises","Empowering Developers and Enterprises",[48,10240,10241],{},"This collaboration between TiDB Cloud and StreamNative is a testament to our shared commitment to fostering open-source innovation and empowering our users with best-in-class data infrastructure. By seamlessly integrating TiDB Cloud's robust data capabilities with Apache Pulsar's real-time streaming prowess, we are providing a comprehensive solution for building next-generation, data-intensive applications.",[48,10243,10244],{},"We invite you to explore this new feature and experience the synergy of TiDB Cloud and StreamNative Cloud firsthand.",[40,10246,10248],{"id":10247},"about-streamnative","About StreamNative",[48,10250,10251],{},"StreamNative is the leading contributor to Apache Pulsar and the provider of StreamNative Cloud, a fully-managed Apache Pulsar service. StreamNative empowers organizations to build and deploy real-time applications at scale, leveraging the power of Apache Pulsar for event streaming, messaging, and queuing.",[321,10253,10254,10260],{},[324,10255,10256,10257],{},"Learn more about ",[55,10258,3550],{"href":10259},"\u002F",[324,10261,10262,10266],{},[55,10263,10265],{"href":4688,"rel":10264},[264],"Sign up"," for a trial to get $200 credit",[40,10268,10270],{"id":10269},"about-tidb-cloud","About TiDB Cloud",[48,10272,10273],{},"TiDB Cloud is the company behind TiDB, the open-source, distributed SQL database that enables developers and DBAs to build scalable, resilient, and real-time applications. TiDB Cloud is TiDB Cloud's fully-managed database-as-a-service, simplifying database operations and allowing users to focus on innovation.",[321,10275,10276],{},[324,10277,10256,10278],{},[55,10279,10282],{"href":10280,"rel":10281},"https:\u002F\u002Fdocs.pingcap.com\u002F",[264],"TiDB Cloud",[40,10284,10286],{"id":10285},"follow-us-for-more-updates","Follow us for more updates",[321,10288,10289,10294],{},[324,10290,10291],{},[55,10292,4496],{"href":10293},"\u002Fblog",[324,10295,10296],{},[55,10297,10300],{"href":10298,"rel":10299},"https:\u002F\u002Fwww.pingcap.com\u002Fblog\u002F",[264],"PingCAP",{"title":18,"searchDepth":19,"depth":19,"links":10302},[10303,10304,10311,10312,10313,10314],{"id":10072,"depth":19,"text":10073},{"id":10096,"depth":19,"text":10097,"children":10305},[10306,10307,10308,10309,10310],{"id":10103,"depth":279,"text":10104},{"id":10126,"depth":279,"text":10127},{"id":10154,"depth":279,"text":10155},{"id":10183,"depth":279,"text":10184},{"id":10203,"depth":279,"text":10204},{"id":10237,"depth":19,"text":10238},{"id":10247,"depth":19,"text":10248},{"id":10269,"depth":19,"text":10270},{"id":10285,"depth":19,"text":10286},"Discover how TiDB Cloud Changefeed integrates with StreamNative Cloud, enabling real-time data streaming for analytics, event-driven apps, and more.","\u002Fimgs\u002Fblogs\u002F68b6bfc241900232a1e4ebfc_SN+PingCAP.png",{},"\u002Fblog\u002Funlocking-real-time-data-synergy-tidb-cloud-changefeed-integrates-with-streamnative-cloud",{"title":10058,"description":10315},"blog\u002Funlocking-real-time-data-synergy-tidb-cloud-changefeed-integrates-with-streamnative-cloud",[3550,10322],"BYOC","B-DZDknoWp6M1ZakQEPP9oNPT_UU6fA8hNABfd_A7yI",{"id":10325,"title":10326,"authors":10327,"body":10328,"category":1332,"createdAt":290,"date":10046,"description":10497,"extension":8,"featured":7,"image":10498,"isDraft":294,"link":290,"meta":10499,"navigation":7,"order":296,"path":4752,"readingTime":4475,"relatedResources":290,"seo":10500,"stem":10501,"tags":10502,"__hash__":10504},"blogs\u002Fblog\u002Fursa-wins-vldb-2025-best-industry-paper-the-first-lakehouse-native-streaming-engine-for-kafka.md","Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka",[806,6785],{"type":15,"value":10329,"toc":10491},[10330,10349,10360,10366,10370,10381,10384,10392,10400,10429,10436,10441,10444,10448,10456,10459,10463,10474],[48,10331,3600,10332,10337,10338,10342,10343,10348],{},[55,10333,10336],{"href":10334,"rel":10335},"https:\u002F\u002Fvldb.org\u002F2025\u002F",[264],"Very Large Data Bases (VLDB) conference"," — the flagship annual conference for databases and data management—highlights breakthroughs that advance our field. This year, our paper, “",[55,10339,10341],{"href":6823,"rel":10340},[264],"Ursa: A Lakehouse-Native Data Streaming Engine for Kafka","”, received the ",[55,10344,10347],{"href":10345,"rel":10346},"https:\u002F\u002Fvldb.org\u002F2025\u002F?conference-awards",[264],"Best Industry Paper ","award at VLDB 2025. Chosen from a highly competitive set of industry submissions from leading tech companies like Databricks, Meta, Alibaba, the recognition is both humbling and energizing. We’re grateful to the community and excited to share what we’ve built.",[48,10350,10351,10352,10354,10355,10359],{},"This paper introduces ",[55,10353,1332],{"href":6647},", our next-generation data streaming engine that is fully Kafka-compatible yet fundamentally different under the hood. Ursa is the first and only “lakehouse-native” streaming engine – built to write data directly to lakehouse table formats like Apache Iceberg and Delta Lake instead of using traditional broker disks. By eliminating Kafka’s usual leader-based replication and external connector jobs, Ursa ",[55,10356,10358],{"href":10357},"\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour","slashes streaming infrastructure costs by up to 10× while maintaining seamless compatibility with the Kafka API",". Its design meets modern cloud requirements for elasticity, high availability, and efficiency: users get the same Kafka experience, but backed by an architecture that leverages cloud object storage and shared metadata services to decouple compute from storage. The result is a system that lets you think in terms of data and workloads – not low-level infrastructure – and delivers consistent performance across cloud environments with dramatically lower operational overhead.",[48,10361,10362],{},[55,10363,10365],{"href":6823,"rel":10364},[264],"Read The Award-Winning Paper",[40,10367,10369],{"id":10368},"what-the-vldb-best-industry-paper-means","What the VLDB Best Industry Paper Means",[48,10371,3600,10372,10375,10376,10380],{},[55,10373,10336],{"href":10334,"rel":10374},[264]," is the database community’s flagship annual gathering. Each year, its program committee runs a rigorous, multi-stage review to surface standout research and industrial system papers that pair technical originality with real-world impact. From the industry track, only one paper is selected for the ",[55,10377,10379],{"href":10345,"rel":10378},[264],"Best Industry Paper"," award—recognizing work that advances production systems with novel, practical techniques. In 2025, our Ursa paper received this distinction, underscoring the significance of our leaderless, lakehouse-native approach to modern cloud data streaming.",[40,10382,10341],{"id":10383},"ursa-a-lakehouse-native-data-streaming-engine-for-kafka",[48,10385,10386,10387,10391],{},"We ",[55,10388,10390],{"href":10389},"\u002Fblog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming","launched Ursa a year ago"," to answer a clear gap: teams want Kafka’s developer experience without inheriting the cost and operational weight of legacy, leader-based & disk-centric designs. Rather than lift-and-shift Kafka onto cloud VMs or ask customers to run everything in their own account, we took the harder route—rethinking the engine from first principles. Ursa keeps full Kafka compatibility while rebuilding the core around three ideas: leaderless architecture, lakehouse-native storage, and cloud-native elasticity. The result is a platform designed for today’s data stacks: no leader\u002Ffollower bottlenecks, no hard dependency on expensive disks, and a direct path from streams to tables.",[48,10393,10394,10395,10399],{},"Ursa architecture at a glance: Over the course of developing the Ursa engine, our team faced numerous technical hurdles and achieved a series of breakthroughs to realize this new design. You can read all the details in ",[55,10396,10398],{"href":6823,"rel":10397},[264],"the paper itself",". Below are a few of the key innovations that define Ursa’s architecture:",[321,10401,10402,10405,10413,10426],{},[324,10403,10404],{},"Leaderless architecture with zone-local brokers – Ursa gets rid of the single-leader-per-partition model entirely. Every broker is an equal peer that can accept writes, and data is replicated implicitly via a shared durable storage layer rather than broker-to-broker copying. This means there are no leader elections and no inter-zone replication traffic bogging down the system. Thanks to built-in zone affinity, producers and consumers connect to local brokers in their nearest availability zone, eliminating cross-AZ network hops while still ensuring data durability and high availability across zones. In short, Ursa’s brokers are leaderless and stateless – which greatly simplifies operations and improves resiliency by removing a whole class of failure scenarios and network overhead.",[324,10406,10407,10408,10412],{},"Lakehouse-native storage – Unlike conventional streaming engines that store logs on local disks and later copy data via connectors into analytic data lakes, Ursa writes data directly to cloud object storage in open table formats (e.g. Iceberg, Delta Lake) in real time. This lakehouse-native design means your streaming data is immediately available as table-formatted files in the data lake, ",[55,10409,10411],{"href":10410},"\u002Fproducts\u002Fursa#:~:text=Stream","eliminating the need for separate ETL pipelines or sink connectors"," to get data into analytics systems. Long-term data is stored in highly durable, cost-effective object storage instead of expensive replicated  local disks, which cuts storage costs massively and enables easy integration with tools like Spark, Trino, or Snowflake for batch queries. By leveraging the cloud’s built-in durability and scalability, Ursa avoids the 3× data duplication of traditional triple replication and instead stores the data durably in the lakehouse.",[324,10414,10415,10416,10420,10421,10425],{},"“Stream–table” duality through built-in compaction – Ursa introduces the powerful concept of stream-table duality, ",[55,10417,10419],{"href":10418},"\u002Fblog\u002Fleaderless-architecture-and-lakehouse-native-storage-for-reducing-kafka-cost#:~:text=This%20approach%20represents%20Ursa%E2%80%99s%20key,outlining%20the%20specifications%20of%20this","unifying real-time streams and batch tables on the same underlying data",". To achieve this, Ursa’s storage engine continuously compacts the append-only log data into columnar files in the background. Incoming records are first written to a row-based write-ahead log (WAL) on object storage for low-latency streaming. Then a background compaction service aggregates and converts these raw log segments into compressed Parquet files, ",[55,10422,10424],{"href":10423},"\u002Fblog\u002Fleaderless-architecture-and-lakehouse-native-storage-for-reducing-kafka-cost#:~:text=If%20data%20retention%20is%20short%2C,integration%20with%20the%20lakehouse%20ecosystem","organizing them into Apache Iceberg\u002FDelta Lake table formats",". The result is that each Kafka topic is simultaneously a live event stream and an up-to-date analytics table. Applications can consume recent data via the Kafka API with minimal latency, or query historical data via SQL on the lakehouse – all from a single storage source without needing to maintain dual pipelines. This duality not only simplifies data architecture but also improves efficiency, since one storage tier serves both streaming and analytic use cases.",[324,10427,10428],{},"Pluggable write-ahead log for cost\u002Flatency tradeoffs – A one-size-fits-all approach doesn’t work for every workload, so Ursa’s design allows choosing different WAL storage modes per use case. For latency-relaxed workloads that can tolerate slightly higher tail latency, Ursa can use cloud object storage as the WAL (i.e. writing events directly to S3, GCS, etc.), which eliminates virtually all cross-AZ network costs and offers unbeatable durability at the expense of a bit more write latency. Meanwhile, for latency-sensitive topics that demand single-digit millisecond persistence, Ursa supports a latency-optimized WAL (for example, using Apache BookKeeper - a fast distributed log service with replication) to ensure quicker acknowledgments. This pluggable WAL architecture lets you optimize each stream for what matters most – cost or latency – or even run hybrid modes. Thanks to multi-tenant storage profiles, each tenant or topic can select the ideal storage policy (object-storage WAL vs. low-latency WAL), balancing throughput, latency, and cost requirements as needed. In practice, this flexibility means Ursa can handle a wide spectrum of use cases, from cost-efficient data ingestion, to mission-critical low-latency streams, all within one platform.",[48,10430,10431,10432,190],{},"In addition to the above, Ursa’s cloud-native architecture cleanly separates compute from storage. Brokers are effectively stateless processors that can be scaled up or down independently of storage capacity – no more overprovisioning large disks on every broker “just in case.” This makes elasticity trivial and allows deploying Ursa in multi-cloud and Bring-Your-Own-Cloud (BYOC) environments with ease. In fact, Ursa is designed to run natively in your own cloud account (on AWS, GCP, Azure, etc.), with the streaming service leveraging your cloud’s object store for data persistence. You only pay for the throughput and storage you actually use, not idle capacity. All of these innovations contribute to Ursa’s dramatic cost savings and operational simplicity: by eliminating inter-zone data transfer, local disks, external connectors, and duplicate data copies, Ursa can deliver the same or better streaming performance at ",[55,10433,10435],{"href":10434},"\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour#:~:text=In%20contrast%2C%20traditional%20streaming%20systems%E2%80%94like,Ursa%20mitigates%20these%20costs%20by","a fraction of the cost of traditional Kafka deployments",[48,10437,10438],{},[384,10439],{"alt":18,"src":10440},"\u002Fimgs\u002Fblogs\u002F68b698c5d7f1555ba95235e2_90d4eed3.png",[48,10442,10443],{},"Figure 1. Data Streaming Cost Comparison: Ursa vs. Kafka (lower is better). For more details, read the Ursa paper.",[40,10445,10447],{"id":10446},"team-community","Team & Community",[48,10449,10450,10451,10455],{},"Ursa is the result of an incredible team effort ",[55,10452,10454],{"href":10453},"\u002Fblog\u002Fstream-table-duality-and-the-vision-of-enabling-data-sharing","over multiple years",". We want to thank our engineering team at StreamNative – both current and former members – who worked tirelessly to turn the vision of a truly lakehouse-native streaming engine into reality. This achievement is a testament to their technical skill, creativity, and perseverance in tackling hard distributed systems problems.",[48,10457,10458],{},"We are also deeply grateful to the open-source communities that laid the groundwork for Ursa. Apache Kafka® provided the ubiquitous protocol and community that we built upon, and Apache Pulsar® (along with its BookKeeper store) inspired many of the cloud-native ideas in Ursa’s design. Without the innovations in these projects and the feedback from their user communities, Ursa would not have been possible. We thank everyone in the Kafka, Pulsar, Iceberg & DeltaLake ecosystems for their contributions, as well as our customers and partners for continually pushing us with real-world requirements and feedback. This award is as much a recognition of the community’s progress as it is of our own work.",[40,10460,10462],{"id":10461},"see-ursa-in-action","See Ursa in Action",[48,10464,10465,10466,10469,10470,190],{},"As mentioned above, Ursa is not just an idea on paper or a prototype in a lab – it’s already running in production and proving its value. In fact, Ursa has been powering StreamNative’s managed cloud service, where it recently sustained ",[55,10467,10468],{"href":10357},"5 GB\u002Fs of Kafka throughput at only ~5% of the cost"," of other platforms. In real terms, that’s on the order of a 10× cost reduction for the same workload, validating Ursa’s cost-efficiency claims in practice. We invite you to ",[55,10471,10473],{"href":4688,"rel":10472},[264],"experience Ursa for yourself",[48,10475,10476,10477,10481,10482,10486,10487,10490],{},"For those interested in learning more, we encourage you to read ",[55,10478,10480],{"href":6823,"rel":10479},[264],"the VLDB paper"," for a deep dive into Ursa’s design and internals. If you have any questions, feel free to reach out to ",[55,10483,10485],{"href":10484},"mailto:ursa-paper@streamnative.io","ursa-paper@streamnative.io",", or meet us in person. We are presenting our research at the VLDB conference in London this week and will also be sharing insights about Ursa and the future of streaming at the upcoming ",[55,10488,6796],{"href":6135,"rel":10489},[264]," in San Francisco on September 29–30, 2025. We look forward to seeing you there, and to continuing the conversation about the industry’s shift toward lakehouse-native, leaderless streaming architectures. The journey has only begun, and we’re excited about what’s next!",{"title":18,"searchDepth":19,"depth":19,"links":10492},[10493,10494,10495,10496],{"id":10368,"depth":19,"text":10369},{"id":10383,"depth":19,"text":10341},{"id":10446,"depth":19,"text":10447},{"id":10461,"depth":19,"text":10462},"Ursa wins the VLDB 2025 Best Industry Paper for introducing the first lakehouse-native streaming engine for Kafka! Discover how Ursa delivers 10× cost savings, leaderless architecture, and seamless Kafka compatibility by writing directly to Iceberg and Delta Lake.","\u002Fimgs\u002Fblogs\u002F68b694de0660dd9d6a274f4c_VLDB-best-industry-paper.png",{},{"title":10326,"description":10497},"blog\u002Fursa-wins-vldb-2025-best-industry-paper-the-first-lakehouse-native-streaming-engine-for-kafka",[1332,799,10503,800],"Benchmarks","kebvqPg80ztB_et2IKcEaEP4hJb-Nkv9X7o3nzJYsog",{"id":10506,"title":10507,"authors":10508,"body":10509,"category":821,"createdAt":290,"date":10720,"description":10721,"extension":8,"featured":294,"image":10722,"isDraft":294,"link":290,"meta":10723,"navigation":7,"order":296,"path":10724,"readingTime":3556,"relatedResources":290,"seo":10725,"stem":10726,"tags":10727,"__hash__":10728},"blogs\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers.md","Pulsar Newbie Guide for Kafka Engineers (Part 4): Subscriptions & Consumers",[808,809,810],{"type":15,"value":10510,"toc":10709},[10511,10513,10516,10520,10529,10532,10546,10554,10557,10561,10564,10568,10571,10574,10578,10581,10584,10587,10590,10593,10601,10604,10607,10611,10614,10617,10620,10623,10627,10630,10633,10636,10639,10642,10645,10648,10659,10661,10664,10678,10681,10684,10686,10688,10690,10695,10697,10702,10707],[48,10512,8648],{},[48,10514,10515],{},"This post dives into how Apache Pulsar handles subscriptions and consumers, which is Pulsar’s equivalent to Kafka’s consumer groups. Pulsar requires consumers to specify a subscription name, which acts like a consumer group ID in Kafka. You can have multiple subscriptions on the same topic (for multi-group fan-out) and each subscription can have one or more consumer instances attached. Pulsar offers four subscription types – Exclusive, Failover, Shared, and Key_Shared – that determine how messages are delivered to consumers. Exclusive (the default) and Failover ensure only one consumer (or one active consumer at a time) receives all messages (preserving order like Kafka) of one or multiple partitions. Shared and Key_Shared allow multiple consumers to split the messages of a partition: Shared distributes messages round-robin (like a queue, higher throughput but no global order guarantee), while Key_Shared also distributes messages but guarantees ordering per message key. Pulsar’s broker tracks a subscription cursor (like an offset) for each subscription to maintain where consumers left off, and unacknowledged messages form a backlog (analogous to Kafka’s consumer lag) that you can monitor. In short, Pulsar’s flexible subscription model lets you achieve Kafka-like streaming and RabbitMQ-like queuing patterns on the same platform.",[40,10517,10519],{"id":10518},"understanding-pulsar-subscriptions-vs-kafka-consumer-groups","Understanding Pulsar Subscriptions vs Kafka Consumer Groups",[48,10521,10522,10523,10528],{},"If you come from Kafka, you’re used to consumer groups: a named group of consumers where each Kafka partition is consumed by one member of the group. Pulsar approaches this concept with subscriptions. A subscription in Pulsar is essentially a named rule for consuming a topic – think of it as a durable consumer group on a single topic. Consumers subscribe by specifying a subscription name, and Pulsar will ensure messages are delivered according to the subscription’s type (more on types soon). Under the hood, when a subscription is created, ",[55,10524,10527],{"href":10525,"rel":10526},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F#:~:text=,record%20the%20last%20consumed%20position",[264],"Pulsar sets up a cursor to track the subscription’s position in the topic",". This cursor is stored durably (in BookKeeper) so that if the consumer disconnects or the broker restarts, the subscription’s last read position is remembered. In Kafka, the consumer group’s offsets serve a similar purpose (often stored in an internal topic). In Pulsar, the broker itself manages the offsets (cursors), which simplifies offset management – you don’t need an external store or to manually commit offsets, it’s handled by the act of acknowledging messages.",[48,10530,10531],{},"Because Pulsar decouples the subscription from the physical consumer, you can have multiple subscriptions on one topic just by using different subscription names. Each subscription name represents an independent feed of the topic. For example, if you have two separate services that need the same data, you can have one Pulsar topic with two subscriptions (say “serviceA” and “serviceB”), and each subscription will get every message published – effectively duplicating the stream, like two separate Kafka consumer groups reading the same topic. Pulsar keeps track of a cursor for each subscription, and each subscription has its own backlog (messages published to the topic that have not yet been acknowledged on that subscription). This is powerful: it means Pulsar inherently supports fan-out (pub-sub) as well as work-queue sharing patterns on the same data. You could have one subscription where only one consumer reads all messages (stream processing), and another subscription on the same topic where a pool of consumers share the messages (distributed queue processing).",[48,10533,10534,10535,10539,10540,10545],{},"Let’s clarify some terminology: acknowledgment in Pulsar is the act of a consumer confirming it has processed a message. When a message is acknowledged, the subscription’s cursor moves forward, and the message is considered consumed for that subscription. Acking in Pulsar is analogous to committing an offset in Kafka, but it’s automatic when you use Pulsar’s APIs (or CLI) unless you disable auto-ack. Importantly, Pulsar supports two acknowledgment modes: individual acks (ack each message) and cumulative acks (acknowledge all messages up to a given position in one go). Cumulative acks are useful in Exclusive\u002FFailover subscriptions to advance the cursor in bulk, but they aren’t supported in Shared mode (since out-of-order consumption makes it tricky). In practice, individual ack is common and is what the Pulsar client does by default. Unacknowledged messages remain in the subscription backlog and will be redelivered later or to other consumers if possible. The backlog is essentially the number of messages pending acknowledgment – similar to the idea of “consumer lag” in Kafka (how many messages behind the tip of the log you are). You can view the backlog and other stats with pulsar-admin topics stats as we saw in ",[55,10536,10538],{"href":10537},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-1-kafka---pulsar-cli-cheatsheet","the first blog",", which shows the subscription cursor position and backlog for each subscription. Pulsar will retain messages as long as there is at least one subscription that hasn’t acknowledged them (or until retention limits kick in). If a topic has no subscriptions or if all subscriptions have acknowledged a message, that message can be deleted (",[55,10541,10544],{"href":10542,"rel":10543},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F#:~:text=By%20default%2C%20messages%20of%20a,see%20message%20retention%20and%20expiry",[264],"Pulsar doesn’t require a fixed retention period if messages are consumed, unless you’ve enabled time or size-based retention","). This is a key difference: Kafka stores messages for a time window regardless of consumption, whereas Pulsar by default can delete acknowledged data (making it more storage-efficient for queue use cases), while still allowing you to configure retention to keep data for replay if needed.",[48,10547,10548,10549,10553],{},"Another difference is how new consumers start consuming. In Kafka, when a new consumer group is created, it has a auto.offset.reset policy (earliest or latest). In Pulsar, when you subscribe to a topic with a new subscription name, you also choose where to start: by default it’s at the latest position (meaning you only get messages published from that point onwards), but you can specify -p Earliest (or subscriptionInitialPosition in the client API) to consume from the beginning of the topic’s backlog. We demonstrated this in our ",[55,10550,10552],{"href":10551},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-1-kafka---pulsar-cli-cheatsheet#:~:text=consume%20messages%3A","first blog post"," by using -p Earliest for the console consumer to read existing messages. So remember, if you create a new subscription and want to replay from the start, specify the initial position accordingly; otherwise you might think “nothing is coming through” simply because by default it’s tailing new messages only.",[48,10555,10556],{},"In summary, think of a Pulsar subscription as the combination of Kafka’s consumer group concept and offset tracking mechanism, managed for you by Pulsar’s broker. Next, let’s explore the different subscription types that Pulsar offers – this is where Pulsar really shines in flexibility compared to Kafka.",[40,10558,10560],{"id":10559},"subscription-types-in-pulsar","Subscription Types in Pulsar",[48,10562,10563],{},"Pulsar has four subscription types: Exclusive, Failover, Shared, and Key_Shared. These define how messages are delivered when multiple consumers attach to the same subscription name on a topic. (If consumers use different subscription names, they’re completely isolated – each subscription sees all messages independently, as discussed.) By choosing a subscription type, you control whether a subscription behaves more like a traditional stream (one consumer getting all messages in order) or a queue (multiple consumers dividing up messages) or a blend of both. Let’s break down each type:",[32,10565,10567],{"id":10566},"exclusive","Exclusive",[48,10569,10570],{},"Exclusive is the default subscription type in Pulsar. As the name suggests, an Exclusive subscription only allows one consumer at a time to attach to all the partitions of a topic. If a second consumer tries to subscribe with the same subscription name while an active consumer is already attached, the broker will refuse the second consumer (the consumer will get an error indicating the subscription is already taken). This is akin to a Kafka consumer group with a single member (and Kafka would similarly not use a second member if there’s only one partition – it would just sit idle). Exclusive subscriptions guarantee that the entire topic’s messages go to one consumer, preserving the message order end-to-end, since no other consumer is concurrently receiving messages.",[48,10572,10573],{},"Because only one consumer can consume, Exclusive subscriptions aren’t about scaling out consumption; they are useful for strict ordering or when you truly only want one consumer processing a given stream of data. One common pattern is to use multiple exclusive subscriptions on the same topic to implement a pub-sub fan-out: for example, you might have two different services that need the data from topic X. You can have Service A use subscription “subA” (exclusive) and Service B use subscription “subB” (exclusive). Both services will get all messages from topic X independently (since they are on different subscriptions), each in order for themselves. This is exactly how Pulsar enables pub-sub – multiple exclusive subscriptions on the same topic – analogous to having multiple Kafka consumer groups reading the same topic. The difference is that in Pulsar, the broker tracks the cursor for each subscription and retains messages until each subscription acknowledges them, so you get durable pub-sub with one topic rather than duplicating data. In short, Exclusive = one consumer at a time, simplest model. It’s also the fallback: if you don’t specify a subscription type, you’ll get Exclusive by default.",[32,10575,10577],{"id":10576},"failover","Failover",[48,10579,10580],{},"Failover subscriptions allow multiple consumers to attach to the same subscription, but still only one consumer actively receives messages from a partition at any given time. The idea is to have a primary consumer and one or more backup consumers. If the primary (master) consumer disconnects or becomes unreachable, one of the backups is promoted to be the new primary and continues consuming from where the previous one left off. This provides high availability for consumption: if you have a critical processing pipeline, you can run a standby consumer that will take over automatically if the main consumer fails, minimizing downtime or data buildup.",[48,10582,10583],{},"How does Pulsar choose the primary and the failover order? By default, it’s based on the order in which consumers subscribe (or you can assign each consumer a priority level). The first consumer to attach becomes the master for the topic (or for each partition of the topic, if partitioned – more on that in a second). Second becomes the next in line, and so on. All consumers beyond the first are in a standby mode – they are connected and ready, but they do not receive messages while the master is active. They typically sit idle (Pulsar might send occasional heartbeats to them to know they’re alive, but no actual message traffic).",[48,10585,10586],{},"When the master consumer disconnects (or you deliberately close it, or it crashes), Pulsar will automatically start delivering messages to the next consumer in line. Any messages that were sent to the original consumer but not acknowledged will be redelivered to the new consumer as well, so no messages are lost. The newly promoted consumer continues from the last acknowledged position of the previous one, maintaining continuity. Message order is preserved under Failover because at any given time, each partition’s messages are processed by a single consumer. It’s similar to Exclusive in that sense (one-at-a-time consumption), except it permits a standby to take over instantly on failure. In fact, from an ordering standpoint, Exclusive and Failover are the same (strict ordering); the difference is Failover gives you redundancy.",[48,10588,10589],{},"A key detail for those coming from Kafka: with partitioned topics, Failover will assign the master role per partition. This means if you have a topic with 10 partitions and two consumers in a failover subscription, Pulsar will try to balance such that each consumer is master for some of the partitions. For example, consumer A might be primary for partitions 0-4 and consumer B for partitions 5-9 (the assignment is done by the broker, trying to even it out). In that case, both consumers are actually active simultaneously, but on different partitions. If one consumer dies, the other will take over all partitions. This behavior is analogous to Kafka’s consumer group rebalancing (each consumer gets some partitions). However, if the topic is non-partitioned (a single partition essentially), then only one consumer (the first or highest priority) gets all the messages, and the others truly get nothing until failover occurs. So, Failover mode can act both as a pure hot-standby (in single-partition topics) or as a load-sharing mechanism across partitions (in multi-partition topics). The main point remains: for each partition, one consumer is doing the work at a time. This guarantees order per partition and no duplicate processing. (If a new consumer with higher priority joins, it can even preempt and become the master for partitions, but that’s an edge scenario.)",[48,10591,10592],{},"In practice, you’d use Failover when you need reliability – e.g., you have a critical consumer and you want a backup to seamlessly continue if the primary fails. It’s common in scenarios where processing order matters but you also want quick failover for HA. If you tested this with the Pulsar CLI, you could do something like:",[321,10594,10595,10598],{},[324,10596,10597],{},"Terminal 1: pulsar-client consume -s mySub -t Failover -p Earliest -n 0 persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic",[324,10599,10600],{},"Terminal 2: pulsar-client consume -s mySub -t Failover -p Earliest -n 0 persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic",[48,10602,10603],{},"Both will connect. You then publish some messages (using pulsar-client produce). You’ll notice only one of the two terminals is printing the messages – that’s the master. If you stop Terminal 1 (the master), Terminal 2 will immediately start receiving any new messages. Any messages that Terminal 1 did not ack before it went down will be redelivered to Terminal 2 as well. This behavior confirms the failover: one active consumer at a time, automatic hand-off on failure. This is different from Kafka where if a consumer in a group dies, there is a rebalance delay and then other consumers resume partitions; Pulsar’s failover is near-instant for new messages because the standby is already connected and ready.",[48,10605,10606],{},"One caveat: if a failover happens at an awkward time, there is a possibility of a couple of messages being processed out of order or twice (for example, the old consumer got a batch of messages but crashed before acking some, and the new consumer might receive some of those messages again while the old one may have actually processed some before crashing). Pulsar’s documentation notes that in some cases you may see a duplicate or an out-of-order message around the switchover. But in general, failover mode is designed to hand off smoothly with minimal duplication.",[32,10608,10610],{"id":10609},"shared-round-robin","Shared (Round-Robin)",[48,10612,10613],{},"With a Shared subscription, multiple consumers can connect to the same subscription on a topic partition and receive messages concurrently. Unlike Exclusive\u002FFailover, where only one consumer gets all messages, in Shared mode the broker will round-robin dispatch messages to consumers. Each message from the topic goes to one of the consumers in the group (never to more than one), distributing the load. Effectively, this turns your topic + subscription into a work queue – multiple consumers are pulling from the same queue of messages, each handling different messages in parallel. This is great for scaling out processing: if one consumer instance isn’t fast enough to keep up with the topic’s throughput, you can add a second, third, etc., on the same subscription and Pulsar will spread the messages among them.",[48,10615,10616],{},"Because messages are distributed, ordering is not guaranteed across the subscription as a whole. If message A and then B are published, it’s possible A goes to Consumer 1 and B goes to Consumer 2, and Consumer 2 might process B before Consumer 1 processes A. There’s no coordination to preserve publish order in a Shared subscription – the goal is throughput and load balancing. If ordering is important, Shared might not be the right choice (or you’d need to ensure all related messages go to the same consumer, which is what Key_Shared is for). Shared subs also do not support cumulative ack (since each consumer may be at a different position in the stream, there’s no single “up to X” point that makes sense to ack collectively) – consumers should ack messages individually.",[48,10618,10619],{},"One of the big advantages of Shared mode is how it handles slow or stuck consumers. Since each message is delivered to one consumer at a time, if that consumer fails to acknowledge (maybe it died or is hanging), Pulsar can detect that (via ack timeouts or the TCP connection closing) and will redeliver those unacked messages to another consumer in the group. For example, if Consumer A received Message 5 but never acked it (maybe Consumer A crashed), after a timeout, Pulsar will requeue Message 5 and send it to Consumer B (assuming Consumer B is healthy). This ensures that a bad consumer doesn’t black-hole messages – the work will be picked up by someone else. Meanwhile, other messages that were sent to other consumers can continue being processed; one slow consumer doesn’t block the others. This contrasts with Kafka’s model where if a consumer in a group slows down on a partition, that partition’s consumption lags behind (since Kafka won’t hand those messages to a different consumer unless the first consumer is considered dead and a rebalance happens). Pulsar’s Shared mode provides a more dynamic load balancing: each message is assigned to a consumer, and if that consumer can’t handle it, it can be reassigned. This is why Pulsar can achieve true queue semantics on a stream. It’s very much like how a RabbitMQ work queue would behave – many consumers pulling tasks off a queue, each ACKing tasks as done, and the system requeuing unacked tasks if a worker goes away.",[48,10621,10622],{},"In terms of usage, Shared subscriptions are ideal when you have independent messages and you want to maximize parallel processing. If ordering doesn’t matter (or you only care about per-message handling, not sequence), use Shared to scale out. For example, imagine a thumbnail generation service where each message is “generate a thumbnail for image X”. The order doesn’t matter at all – you just want to process as many as possible in parallel. A Pulsar topic with a Shared subscription and many consumers allows you to spin up N workers and they’ll automatically load balance the tasks. Each consumer will acknowledge as it finishes a message; the subscription’s cursor advances per message as a result of those acks (the cursor essentially will mark the message as consumed when acked, but since messages might be out-of-order, the cursor might have holes – which is fine, those holes are the backlog of unacked messages). The Pulsar admin stats will show how many messages are in backlog (i.e., not yet acked). In a healthy steady state, backlog stays near zero as consumers keep up; if consumers fall behind, backlog grows (like a queue depth). You can always add more consumers to that subscription to catch up if needed – Pulsar will incorporate them and start sharing messages with the new consumers immediately.",[32,10624,10626],{"id":10625},"key_shared","Key_Shared",[48,10628,10629],{},"Key_Shared is the newest addition (relative to others) to Pulsar’s subscription types. It’s like an enhanced version of Shared that strikes a balance between ordering and parallelism. In a Key_Shared subscription, multiple consumers can attach and all will receive messages, but messages that share the same key will always go to the same consumer. In other words, Pulsar will hash or map message keys to specific consumers, and ensure that the order of messages for each key is preserved on that consumer. If that consumer disconnects, the messages for that key will be routed to another consumer, but always in a way that maintains the ordering from the last acked message onwards (Pulsar will not suddenly deliver older unacked messages of that key to a new consumer out of order).",[48,10631,10632],{},"This mode is extremely useful when your messages have some natural key (like user ID, or order ID, or device ID) and you want to ensure all messages for that entity are processed in order, but you don’t care about ordering across different entities. Kafka achieves something similar by requiring you to put all messages for an entity on the same partition – which then ties parallelism to partition count. Pulsar’s Key_Shared does this dynamically with consumers: you could have a single topic (single partition if you want) and still scale out consumption by key. The broker handles the assignment of keys to consumers. In fact, if you add more consumers, Pulsar can redistribute the hash ranges of keys among them automatically. If a consumer leaves, its key range is taken over by others. This all happens behind the scenes, giving you the effect of partitioning without manual partition management.",[48,10634,10635],{},"From the application perspective, Key_Shared means: “I have multiple consumers, but I want to ensure no two consumers ever process the same key’s messages concurrently or out of order.” It provides ordering per key and load-balancing across keys. A classic use case might be an event stream where events are tagged with a customer ID and you want per-customer ordering (maybe to avoid race conditions updating a customer’s state), but you also want to process different customers in parallel. With Kafka, you’d need as many partitions as you have parallelism (and all messages for a customer must go to the same partition). With Pulsar Key_Shared, you can spin up multiple consumers for a topic and Pulsar will ensure messages with the same key always go to the same consumer. For example, imagine tracking user activity where each message has a user_id as the key: With 10 consumers: All events for key:\"user_789\" will consistently go to the same consumer (let's say Consumer #3). Other users like key:\"user_456\" and key:\"user_123\" will each be consistently routed to their own assigned consumers. When you scale to 20 consumers: key:\"user_789\" might get reassigned to Consumer #7, but all their events will still go to just that one consumer. This gives you parallel processing across different users while maintaining strict ordering per individual user. The key-to-consumer assignment is handled automatically by Pulsar's hash-based distribution. This is handled by one of the available key distribution strategies (like auto-split ranges or consistent hashing), but you usually don’t need to worry about the exact algorithm as a user – just know that it balances keys.",[48,10637,10638],{},"In summary, Key_Shared = like Shared (multiple consumers in parallel), but with ordering guaranteed on a per-key basis. It’s the best of both worlds for many scenarios, giving you scaling with correctness. Key_Shared is often recommended when your use case can leverage message keys to delineate order boundaries – for instance, any stateful processing per entity should use Key_Shared if you want to scale out that processing. If ordering doesn’t matter at all, plain Shared is fine; if global order matters, you’d stick to Exclusive\u002FFailover. Key_Shared fills the gap of “order matters per entity, but not globally.”",[32,10640,10641],{"id":9825},"Putting it All Together",[48,10643,10644],{},"The beauty of Pulsar is that you can mix and match these subscription types to fit your needs, even on the same topic. For example, you could have one subscription on a topic using Key_Shared with 5 consumers processing events in parallel, and another subscription on the same topic using Exclusive to feed a separate system that needs the full ordered stream. The publisher only writes the message once, but Pulsar can deliver it in multiple ways to different subscribers. This is something not easily done in Kafka without duplicating data or using external systems – Pulsar’s design cleanly separates the publish side from the subscribe side through these named subscriptions.",[48,10646,10647],{},"To reinforce these concepts, it’s worth comparing with Kafka’s approach:",[321,10649,10650,10653,10656],{},[324,10651,10652],{},"In Kafka, if you want to do pub-sub (fan-out) you typically create multiple consumer groups. In Pulsar, you create multiple subscriptions (which is effectively the same idea). Each subscription has its own cursor and backlog.",[324,10654,10655],{},"In Kafka, if you want a work-queue pattern, you’d create a consumer group with multiple consumers. Kafka will then assign partitions to consumers (can’t have more consumers than partitions effectively) and you get parallelism at the partition level, but strict ordering within each partition. If one message in a partition is slow or causes an error, it blocks everything behind it in that partition until it’s handled or skipped. In Pulsar, for work-queue, you use a Shared subscription on a topic (which could even be a single partition topic). You get parallelism per message, not just per partition, and a slow message doesn’t block others – it can be retried elsewhere while other messages still flow to other consumers. This is a major difference in the consumption model and is one of Pulsar’s key advantages for certain workloads.",[324,10657,10658],{},"Key_Shared doesn’t really have a direct equivalent in Kafka. Kafka would require you partition by key to get key-ordering, but that then ties you to a static number of partitions and possibly uneven key distribution. Pulsar’s Key_Shared is more flexible and dynamic in that regard (you can increase consumers on the fly and it will redistribute keys, whereas Kafka partition count is fixed once topic is created, unless you manually add partitions which is a heavyweight operation and can cause ordering issues of its own for existing keys).",[40,10660,2125],{"id":2122},[48,10662,10663],{},"In this part, we corrected and clarified Pulsar’s subscription and consumer mechanics. We learned that Pulsar’s subscription name is analogous to Kafka’s consumer group – it’s how Pulsar tracks a consumer or group of consumers reading a topic. Pulsar’s broker maintains a subscription cursor for each subscription to know which messages have been acknowledged (processed), ensuring durability and allowing consumers to pick up where they left off after disconnects. We also reviewed the four subscription types in Pulsar and how they map to messaging patterns:",[321,10665,10666,10669,10672,10675],{},[324,10667,10668],{},"Exclusive: Single-consumer, like Kafka only allows one consumer (ensures total order).",[324,10670,10671],{},"Failover: Single active consumer with standby failovers, same as Kafka (ensures order, with quick failover on consumer loss).",[324,10673,10674],{},"Shared: Multiple consumers, competing for messages in a queue-like fashion (higher throughput via parallelism, no overall order guarantee, built-in replay of unacked messages).",[324,10676,10677],{},"Key_Shared: Multiple consumers with ordering per key (best for parallel processing when per-key order matters, effectively combining ordering and load balancing).",[48,10679,10680],{},"Pulsar gives you the freedom to use the right tool for the job – or even use both at the same time on the same data. You can have stream processing and queue processing co-exist on one topic through different subscriptions. This flexibility is one of the reasons Kafka engineers find Pulsar intriguing: it’s like having Kafka and RabbitMQ in one system. By leveraging subscription types, you can implement complex messaging workflows without deploying multiple platforms.",[48,10682,10683],{},"Now that you understand subscriptions and consumers in Pulsar, you’re well-equipped to build systems that take advantage of Pulsar’s dual nature of streaming and queuing. In the next part of the Pulsar Newbie Guide, we’ll continue our journey (stay tuned!). Meanwhile, feel free to experiment with subscription settings in a test environment to solidify your understanding – Pulsar’s CLI and admin tools make it easy to observe how messages flow under each mode. Happy Pulsar-ing!",[48,10685,3931],{},[208,10687],{},[48,10689,3931],{},[48,10691,8956,10692,8960],{},[55,10693,5405],{"href":6135,"rel":10694},[264],[48,10696,8963],{},[48,10698,10699],{},[55,10700,8970],{"href":8968,"rel":10701},[264],[48,10703,10704],{},[55,10705,8976],{"href":7969,"rel":10706},[264],[48,10708,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":10710},[10711,10712,10719],{"id":10518,"depth":19,"text":10519},{"id":10559,"depth":19,"text":10560,"children":10713},[10714,10715,10716,10717,10718],{"id":10566,"depth":279,"text":10567},{"id":10576,"depth":279,"text":10577},{"id":10609,"depth":279,"text":10610},{"id":10625,"depth":279,"text":10626},{"id":9825,"depth":279,"text":10641},{"id":2122,"depth":19,"text":2125},"2025-08-29","Part 4 of the Pulsar Newbie Guide for Kafka Engineers explores how Apache Pulsar handles subscriptions and consumers—its equivalent to Kafka consumer groups. Learn how Pulsar’s four subscription types (Exclusive, Failover, Shared, and Key_Shared) enable both streaming and queuing patterns, offering greater flexibility, scalability, and ordering guarantees than Kafka alone.","\u002Fimgs\u002Fblogs\u002F68b1b4c6fd73cfc228f21a9a_04.-Subscriptions-&-Consumers-1.png",{},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers",{"title":10507,"description":10721},"blog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers",[821,7347,799],"xFFXz9-9vQh-d1SeFydlulcqAzWcUuDMPKgq1zzwzgY",{"id":10730,"title":10731,"authors":10732,"body":10733,"category":821,"createdAt":290,"date":11035,"description":11036,"extension":8,"featured":294,"image":11037,"isDraft":294,"link":290,"meta":11038,"navigation":7,"order":296,"path":11039,"readingTime":5505,"relatedResources":290,"seo":11040,"stem":11041,"tags":11042,"__hash__":11044},"blogs\u002Fblog\u002Fqueues-are-just-subscriptions-demystifying-shared-and-failover-modes.md","Queues Are Just Subscriptions: Demystifying Shared and Failover Modes (Pulsar Guide for RabbitMQ\u002FJMS Engineers 4\u002F10)",[808,809,810],{"type":15,"value":10734,"toc":11026},[10735,10738,10741,10745,10748,10762,10765,10773,10777,10780,10783,10786,10800,10803,10811,10814,10819,10822,10825,10842,10845,10848,10852,10855,10858,10861,10864,10867,10870,10881,10884,10887,10890,10895,10898,10901,10904,10907,10911,10914,10917,10920,10924,10927,10938,10941,10952,10956,10959,10964,10967,10970,10972,10995,10998,11001,11003,11005,11007,11012,11014,11019,11024],[48,10736,10737],{},"‍TL;DR:",[48,10739,10740],{},"In Pulsar, you don’t create a “queue” – you create a subscription. By having multiple consumers share the same subscription, Pulsar will distribute messages among them (just like multiple consumers on a RabbitMQ queue). This is Pulsar’s Shared subscription mode, which provides load-balanced consumption. For high availability (active-passive consumers), Pulsar offers Failover subscriptions, where one consumer is active and others stand by to take over on failure. This post explains how these subscription modes work, how they correspond to the traditional queue semantics, and when to use each. After reading, you’ll understand that to Pulsar, a queue is essentially a named subscription with possibly many consumers attached, and how Pulsar manages who gets what message in different scenarios.",[40,10742,10744],{"id":10743},"revisiting-pulsar-subscriptions-vs-queues","Revisiting Pulsar Subscriptions vs Queues",[48,10746,10747],{},"We established earlier that Pulsar uses the concept of a subscription to handle what we think of as queues. A subscription represents a group of consumers with a given name on a topic. If one consumer is attached to that subscription, it will get all messages (and ordering is preserved). If multiple consumers attach to the same subscription, Pulsar must decide how to split messages between them. Pulsar offers a few policies (subscription types) to govern this distribution. The two most relevant for “queuing” are Shared and Failover (there is also Exclusive, which is the default single-consumer case, and Key_Shared, which we will cover in a later post about ordering).",[321,10749,10750,10753,10756,10759],{},[324,10751,10752],{},"Exclusive subscription: Only one consumer can attach at a time. If another tries, it gets an error. This is essentially a 1-to-1 mapping (like a JMS Queue that only one consumer can have at a time, or a JMS durable topic subscription being consumed by only one process). Exclusive is Pulsar’s default type; it ensures strict ordering and simplest semantics (no concurrency).",[324,10754,10755],{},"Shared subscription: Multiple consumers can attach; Pulsar will round-robin or load-balance messages across them. This is the true “competing consumers” setup akin to multiple consumers on a RabbitMQ queue or multiple listeners on the same JMS Queue.",[324,10757,10758],{},"Failover subscription: Multiple consumers can attach, but one is designated as the primary (active consumer) and receives all messages. Others sit idle (or if the topic is partitioned, each partition might have a primary on different consumers). If the primary dies or disconnects, one of the backups takes over and continues from where it left off. This provides high availability without duplicate processing.",[324,10760,10761],{},"Key_Shared subscription: A variant of shared where messages with the same key always go to the same consumer, to preserve ordering per key. This one combines aspects of parallelism and ordering and will be discussed in the context of ordering in one of our future blog posts.",[48,10763,10764],{},"For this discussion, focus on Shared vs Failover, as they essentially cover the two major ways you might use a queue in a system:",[321,10766,10767,10770],{},[324,10768,10769],{},"Shared: for scaling out throughput (many workers share the load of a queue).",[324,10771,10772],{},"Failover: for hot standby (only one worker at a time, but seamlessly switch if it fails).",[40,10774,10776],{"id":10775},"shared-subscription-pulsars-competing-consumers","Shared Subscription: Pulsar’s Competing Consumers",[48,10778,10779],{},"When you create a subscription and attach multiple consumers to it in Shared mode (also called “round-robin” mode in some Pulsar documentation), the broker will deliver each message from the topic to only one of the consumers on that subscription, distributing messages in a round-robin or weighted round-robin manner. Essentially, the subscription behaves like a classic queue – each message goes to one consumer – and multiple consumers means parallel processing of different messages.",[48,10781,10782],{},"Analogy: This is just like having a single RabbitMQ queue with multiple consumers. RabbitMQ will deliver each message in the queue to one consumer, fairly balancing by prefetch, etc. Pulsar does the same for a shared sub: each message in the subscription backlog is given to one of the available consumers.",[48,10784,10785],{},"Some points about Shared subscription behavior:",[321,10787,10788,10791,10794,10797],{},[324,10789,10790],{},"If a consumer hangs (doesn’t ack a message), that message will eventually be re-dispatched to another consumer (via ack timeout or if the consumer disconnects). So work isn’t lost; another consumer can pick it up, similar to how RabbitMQ requeues unacked messages if a consumer dies.",[324,10792,10793],{},"If a new consumer joins a shared subscription, the broker will start including it in the distribution of new messages. If a consumer leaves, the broker redistributes any unacked messages that were on that consumer to others.",[324,10795,10796],{},"Order is not guaranteed across different consumers in a shared subscription. If you care about ordering, you either stick to one consumer (Exclusive or Failover) or use Key_Shared (ensures ordering per key). Shared basically is aimed at throughput scaling at the cost of ordering.",[324,10798,10799],{},"Shared subscriptions support parallel processing: Each consumer can process messages independently. If one is slow, others still get new messages. This can maximize throughput.",[48,10801,10802],{},"Use cases for Shared:",[321,10804,10805,10808],{},[324,10806,10807],{},"Work queues: e.g., tasks that can be processed in parallel (transcoding jobs, sending emails, etc.). You create one subscription (like “task-queue”), spin up N consumer instances with that subscription name, and Pulsar will divide tasks among them – voila, a distributed work queue.",[324,10809,10810],{},"Scaling message consumption: If one consumer can’t keep up with the topic’s message rate, add more consumers on the same subscription to increase aggregate throughput.",[48,10812,10813],{},"How to create a Shared subscription:\nWhen using the Pulsar client API, you specify subscription type Shared. For instance:",[48,10815,10816],{},[384,10817],{"alt":5878,"src":10818},"\u002Fimgs\u002Fblogs\u002F68b09173123f71a994361f6b_iShot_2025-08-29_01.26.45.png",[48,10820,10821],{},"If you omit subscriptionType, it defaults to Exclusive (only one consumer at a time). So explicitly set it to Shared if you plan to attach multiple consumers. All consumers should use the same subscription name and same topic obviously.",[48,10823,10824],{},"Under the hood, what’s happening is:",[321,10826,10827,10830,10833,10836,10839],{},[324,10828,10829],{},"The first consumer to subscribe with that name will create the subscription on the broker.",[324,10831,10832],{},"Additional consumers join that existing subscription. The broker keeps track of all active consumers for the sub.",[324,10834,10835],{},"For each message, the broker chooses a consumer (basically cycling through them) and sends the message to that consumer.",[324,10837,10838],{},"The subscription’s backlog is decreased when a message is acknowledged by a consumer.",[324,10840,10841],{},"If a consumer disconnects with messages unacked, those messages will be re-dispatched to remaining consumers.",[48,10843,10844],{},"From a RabbitMQ perspective, you can consider Pulsar’s topic as analogous to an exchange+queue that all consumers draw from. The difference: you didn’t have to explicitly declare a queue and bind – you simply used a subscription name. Pulsar took care of tracking the offsets for that subscription.",[48,10846,10847],{},"JMS perspective: JMS 2.0 introduced the idea of a Shared Durable Subscription for topics, which allowed multiple consumers on the same durable subscription (to load balance topic messages). That’s quite analogous to Pulsar’s shared subscription on a topic. For JMS Queues, multiple consumers inherently share the queue. So Pulsar’s Shared subscription is fulfilling the role of both these concepts: a group of consumers sharing the work of one message stream.",[40,10849,10851],{"id":10850},"failover-subscription-high-availability-consumer","Failover Subscription: High Availability Consumer",[48,10853,10854],{},"In a Failover subscription, multiple consumers can attach, but Pulsar will only deliver messages to one “primary” consumer at a time. If that consumer disconnects or times out, Pulsar automatically fails over to the next consumer in line, which will then start receiving messages from where the previous one left off.",[48,10856,10857],{},"Think of Failover as an active-standby cluster. One consumer is doing all the work until it can’t, then a standby takes over. This is useful when you have a service where only one instance should be active (maybe because processing must be single-threaded or use a resource exclusively), but you want a hot backup to take over instantly on failure. It’s also useful to implement ordered processing with high availability – you want only one consumer at a time to preserve ordering, but still want fault tolerance.",[48,10859,10860],{},"How Pulsar picks the primary: When consumers connect in Failover mode, they have a priority (you can set a priority level, default often 0, and an internal lexicographical order on consumer names as a tiebreaker). The broker will choose the highest priority consumer as primary. If equal priority, the one that connected first (or lexicographically smallest name, depending on version) becomes primary. Others are essentially parked.",[48,10862,10863],{},"For a non-partitioned topic, it’s straightforward: Primary consumer gets 100% of messages. If it dies, next in line gets all new messages (and any that were unacked by the first consumer will be delivered to the new primary). For partitioned topics, each partition has its own primary assignment; the broker might distribute partitions among consumers if you have multiple partitions, but within each partition only one consumer is active. (This means in a failover subscription with a partitioned topic, you could actually have all consumers active – but each on different partitions. It’s more complex, but effectively it’s like each partition is a mini-topic with failover selection, possibly balancing partition primaries across consumers, as described in the docs.)",[48,10865,10866],{},"Behavior on failover: Let’s say Consumer A is primary, Consumer B is standby. A is chugging along. Suddenly A’s process crashes or network breaks. Pulsar notices A’s connection is gone; it then promotes B to primary. B will start receiving any messages that were next. If A had some messages in flight (unacked) when it died, those will become available to B (after what is effectively an immediate redelivery – Pulsar doesn’t wait for ack timeouts on failover; once A is gone, its unacked messages are free to deliver to B). This ensures minimal interruption – B can continue processing the queue almost where A left off.",[48,10868,10869],{},"Use cases for Failover:",[321,10871,10872,10875,10878],{},[324,10873,10874],{},"Situations where you want only one consumer to handle messages, perhaps because processing should not be parallel (maybe a legacy system can’t handle concurrent processing, or order must be absolutely preserved end-to-end).",[324,10876,10877],{},"High availability: e.g., a singleton service that you run two instances of for redundancy, but only one should actually do work at a time. If the active one fails, the backup seamlessly takes over.",[324,10879,10880],{},"Think of an example: processing bank transactions from a topic. You might decide to use one consumer to ensure they are strictly sequential (no parallel processing that could reorder things), but you want a standby instance in case the main one goes down, so you’re not stuck waiting for manual intervention. Failover subscription is perfect here.",[48,10882,10883],{},"Comparison to RabbitMQ: RabbitMQ doesn’t have an explicit “failover consumer” concept. If you wanted active-passive, you might just run one consumer at a time. If it dies, something else would have to start consuming. With HA queues (mirrored queues in RabbitMQ), multiple nodes have the data, but only one node’s consumers actually consume at a time. Achieving seamless failover of consumption is typically done at application level for Rabbit (like using heartbeats to detect a dead consumer and then starting another consumer). Pulsar builds that logic in – you can start two consumers and know one will be idle until needed.",[48,10885,10886],{},"Comparison to JMS: JMS does not have “failover subscriptions” per se, but many JMS brokers would effectively behave similarly if you have multiple consumers on a queue – one might get all messages if it has a higher priority or some brokers allow exclusive consumer concept. For example, ActiveMQ has an exclusive consumer feature for queues: one consumer gets all messages until it dies, then another takes over. Pulsar’s failover is akin to that but at the subscription level.",[48,10888,10889],{},"How to use Failover:",[48,10891,10892],{},[384,10893],{"alt":5878,"src":10894},"\u002Fimgs\u002Fblogs\u002F68b091aec08d2c1d210258d3_iShot_2025-08-29_01.28.04.png",[48,10896,10897],{},"And similarly on another instance with consumerName “Consumer-B”. (Consumer names aren’t mandatory but help in logging and also tie-breakers for ordering sometimes.)",[48,10899,10900],{},"If you want to designate priority, there’s subscriptionInitialPosition (to set where to start) and more relevantly ConsumerBuilder.priorityLevel(int) to give one consumer a higher priority. Higher priority consumers will always take precedence in failover. If you set one consumer with priority 1 and another with 2, the one with 2 will always be chosen if connected, the other is basically ignored until priority 2 is gone. By default, all have priority 0 (so then it falls back to whoever connects first).",[48,10902,10903],{},"To test failover, you can simulate killing the primary consumer – you should see the second consumer’s receive() calls now return messages.",[48,10905,10906],{},"One thing to mention: in failover mode, only one consumer receives messages at a time, so you’re not scaling throughput here, just providing redundancy. If you attach two consumers because you thought it would speed things up – it won’t, because only one is active. For throughput scaling, use Shared.",[40,10908,10910],{"id":10909},"under-the-hood-cursor-and-partition-details","Under the Hood: Cursor and Partition details",[48,10912,10913],{},"Each subscription (regardless of type) has a cursor – essentially a pointer in the partition log that tracks how far along consumption has gone. In a shared subscription, the cursor moves as messages are acknowledged (which can happen out of order if multiple consumers acknowledge at different times; the cursor might actually mark the lowest acked point plus bitmaps of acked\u002Funacked above that – but that’s an internal detail). In an exclusive or failover subscription, since only one consumer is doing sequential work, the cursor just moves sequentially with acks (or cumulatively).",[48,10915,10916],{},"For failover, when primary switches to secondary, the same subscription cursor is now being consumed by the new consumer. It picks up wherever the cursor last was. Any messages that were delivered to the first consumer but not acked are still marked unacked in the subscription, so the broker knows to redeliver them to the new consumer.",[48,10918,10919],{},"Partitioned topics and failover:\nIf the topic has multiple partitions (say 4 partitions), and you have two consumers in failover, the broker could assign half the partitions to one consumer as primary for those, and the other half to second consumer as primary for those, in order to utilize both consumers (this is optional and based on how priorities or names are sorted). This way, even in failover mode, both consumers might actually be active – but each on different partitions – so you get some throughput scaling too. If one consumer fails, the other takes over all partitions. This is a neat Pulsar nuance: failover subs can give you a mix of HA and load-spreading across partitions. However, if you want strict ordering across the whole topic, you wouldn’t partition the topic in the first place.",[40,10921,10923],{"id":10922},"summing-up-shared-vs-failover-vs-exclusive","Summing Up Shared vs Failover vs Exclusive",[48,10925,10926],{},"It’s helpful to summarize with an analogy:",[321,10928,10929,10932,10935],{},[324,10930,10931],{},"Exclusive: One cashier at a store, one line of customers. If that cashier is out, the store is closed until a new one arrives.",[324,10933,10934],{},"Failover: Two cashiers are present, but only one’s counter is open and taking customers; the other is in the back room on standby. If the first one has to step away, the second immediately opens their counter and continues serving the line. Customers always see exactly one open counter, so they go in order to that one.",[324,10936,10937],{},"Shared: Two (or more) cashiers actively open, each handling their own line (or a shared line that dispatches customers to them). Customers (messages) get assigned to whichever cashier is free next (round-robin). This way, customers are served faster in parallel, but they are not in one single ordered line – effectively each cashier has their portion of the load.",[48,10939,10940],{},"From RabbitMQ perspective:",[321,10942,10943,10946,10949],{},[324,10944,10945],{},"Exclusive = similar to having a single consumer on a queue.",[324,10947,10948],{},"Failover = no direct analog managed by Rabbit, but you could simulate by ensuring only one consumer connects (and use client-side logic to failover).",[324,10950,10951],{},"Shared = typical multiple consumers on a queue scenario.",[40,10953,10955],{"id":10954},"setting-it-in-pulsar-and-best-practices","Setting it in Pulsar and Best Practices",[48,10957,10958],{},"If you’re configuring via pulsar-client CLI for a quick test:",[321,10960,10961],{},[324,10962,10963],{},"There’s a flag for subscription type (-t Exclusive|Shared|Failover|Key_Shared). e.g., pulsar-client consume my-topic -s subName -t Shared -n 0 will allow multiple instances of that command to share messages.",[48,10965,10966],{},"A note on message ordering and Shared subscriptions: When using Shared, since ordering isn’t guaranteed, be mindful if message order matters to your application’s logic. If it does, you either need to include ordering info in the message (like a sequence number and have the consumer sort or detect out-of-order), or avoid parallel consumption for those streams, or use Key_Shared to at least order per key. Many use-cases (like processing independent tasks) don’t need global ordering, so Shared is fine.",[48,10968,10969],{},"A note on failover with multiple partitions: If you truly want a single consumer to handle all messages in order, don’t partition the topic (a single topic is single-partition by default). If you have a partitioned topic and want to use failover, be aware multiple consumers might each handle different partitions simultaneously. If that’s not desired, stick to 1 partition.",[40,10971,8924],{"id":8923},[321,10973,10974,10977,10980,10983,10986,10989,10992],{},[324,10975,10976],{},"“Queues” in Pulsar are achieved by shared subscriptions: To have a queue with multiple workers, simply create a subscription and start multiple consumers with that subscription name and SubscriptionType.Shared. Pulsar will load-balance the messages across them. You don’t create a separate queue object – the act of consumers sharing the subscription is what creates the queue-like behavior.",[324,10978,10979],{},"Failover subscriptions provide exclusive consumption with automatic failover: Only one consumer receives messages until it fails, then the next takes over. Use this for scenarios requiring a single consumer processing stream with high availability.",[324,10981,10982],{},"Exclusive vs Failover vs Shared: Exclusive (the default) ensures only one consumer – any additional consumer with the same subscription name is rejected. Failover allows standby consumers. Shared allows concurrent consumers. Choose based on your needs: concurrency vs ordering vs HA.",[324,10984,10985],{},"The subscription name is the queue name: If you connect 5 consumers to topic “alpha” with subscription “orders-sub” in Shared mode, those 5 effectively form the “orders-sub” queue group for topic alpha. If you had another subscription “billing-sub” on the same topic, that’s a separate queue group (receiving its own copy of messages, like a separate RabbitMQ queue bound to the same exchange).",[324,10987,10988],{},"JMS and Rabbit equivalence: Pulsar Shared subs = JMS queue with multiple consumers, or JMS shared durable subscription; Pulsar Failover subs = JMS exclusive consumer or concept of exclusive queue consumption with automatic handover (not natively in JMS, but ActiveMQ’s exclusive consumer feature is similar). From Rabbit’s view, a Pulsar shared subscription is just like how Rabbit distributes messages to multiple consumers on one queue.",[324,10990,10991],{},"No manual ack requeue hassle: In RabbitMQ, if a consumer didn’t ack, you either had to rely on the death of consumer for requeue or use basic.nack. In Pulsar’s shared subscription, if a consumer disconnects or negative-acks, Pulsar will readily redeliver unacknowledged messages to another consumer. So the queue processing will continue. This is managed by Pulsar’s subscription state.",[324,10993,10994],{},"Queues are durable via subscriptions: Because Pulsar subscriptions retain messages until acked, a shared subscription with zero consumers still holds messages (like a durable queue would) and any new consumer that comes in will get them. That’s similar to a RabbitMQ durable queue sitting around until a consumer attaches. Pulsar does that automatically – the subscription (queue) exists as long as it has messages or a consumer attached. (You can administratively delete a subscription if needed, analogous to deleting a queue.)",[48,10996,10997],{},"To wrap up, Pulsar’s subscription model might have seemed abstract at first, but now we see it maps cleanly to queue semantics. “Shared” is what gives Pulsar the power to act like a traditional queue system for distributing tasks. Meanwhile, “Failover” and “Exclusive” cover scenarios requiring strict ordering or single-consumer behavior.",[48,10999,11000],{},"In the next post, we’ll talk about message durability, retention, expiration, and dead-letter topics. In other words, what happens when messages aren’t consumed, how to not lose them, and how Pulsar handles things like TTL and DLQs compared to RabbitMQ’s similar features. If you’ve ever set a queue to expire messages or configured a Dead Letter Exchange in RabbitMQ, the Pulsar way of achieving that is coming up next!",[48,11002,3931],{},[208,11004],{},[48,11006,3931],{},[48,11008,8956,11009,8960],{},[55,11010,5405],{"href":6135,"rel":11011},[264],[48,11013,8963],{},[48,11015,11016],{},[55,11017,8970],{"href":8968,"rel":11018},[264],[48,11020,11021],{},[55,11022,8976],{"href":7969,"rel":11023},[264],[48,11025,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":11027},[11028,11029,11030,11031,11032,11033,11034],{"id":10743,"depth":19,"text":10744},{"id":10775,"depth":19,"text":10776},{"id":10850,"depth":19,"text":10851},{"id":10909,"depth":19,"text":10910},{"id":10922,"depth":19,"text":10923},{"id":10954,"depth":19,"text":10955},{"id":8923,"depth":19,"text":8924},"2025-08-28","Explore how Apache Pulsar implements queues through subscriptions. Learn the differences between Shared, Failover, and Exclusive subscription modes, their use cases for load balancing, high availability, and message ordering, and how Pulsar maps traditional RabbitMQ and JMS queue semantics to modern streaming patterns.","\u002Fimgs\u002Fblogs\u002F68b093978d9ff520a3aa4955_04.-Queues-Are-Just-Subscriptions.png",{},"\u002Fblog\u002Fqueues-are-just-subscriptions-demystifying-shared-and-failover-modes",{"title":10731,"description":11036},"blog\u002Fqueues-are-just-subscriptions-demystifying-shared-and-failover-modes",[821,7347,11043],"RabbitMQ","zKBgjvyJssQFpThMHzY9NanUFnKDiLowp4yKuzboDR8",{"id":11046,"title":11047,"authors":11048,"body":11049,"category":3550,"createdAt":290,"date":11175,"description":11176,"extension":8,"featured":294,"image":11177,"isDraft":294,"link":290,"meta":11178,"navigation":7,"order":296,"path":11179,"readingTime":11180,"relatedResources":290,"seo":11181,"stem":11182,"tags":11183,"__hash__":11184},"blogs\u002Fblog\u002Fstreamnative-cloud-now-available-for-public-preview-on-alibaba-cloud-marketplace.md","StreamNative Cloud Now Available for Public Preview on Alibaba Cloud Marketplace",[311],{"type":15,"value":11050,"toc":11168},[11051,11059,11064,11070,11074,11077,11080,11094,11098,11101,11115,11119,11122,11126,11129,11153,11157],[48,11052,11053,11054,190],{},"We’re excited to announce that StreamNative Cloud — the unified platform for real-time data streaming, stream processing, and lakehouse integration — is now available for ",[55,11055,11058],{"href":11056,"rel":11057},"https:\u002F\u002Fmarketplace.alibabacloud.com\u002Fproducts\u002F56730001\u002Fsgcmgj00036040.html?spm=a3c0i.26795044.0.0.11472faaaOixiP&innerSource=search",[264],"public preview on the Alibaba Cloud Marketplace",[48,11060,11061],{},[384,11062],{"alt":18,"src":11063},"\u002Fimgs\u002Fblogs\u002F68adb1409d2804d409b1064a_AD_4nXfriqlk7SY2AR-OVZiwAcysx7-uCE7th65x9rFOHtuBGozVyr5dllrXixx2inkKJ7uULhSLLrNXEcbLEFz4YnWnc8w4Poyl4zXyGfdgN0t_QSMNJiXAmM8hxKPGD76fN6aplo6zBA.png",[48,11065,11066,11067,11069],{},"Powered by our high-performance ",[55,11068,1332],{"href":6647}," streaming engine, StreamNative Cloud offers both Apache Pulsar and Apache Kafka APIs in a single platform, enabling organizations to consolidate their streaming workloads, reduce infrastructure costs, and accelerate time-to-value for real-time applications.",[40,11071,11073],{"id":11072},"the-streamnative-one-platform-on-alibaba-cloud","The StreamNative ONE Platform on Alibaba Cloud",[48,11075,11076],{},"With this launch, Alibaba Cloud customers can now access the StreamNative ONE Platform, designed to unify messaging, streaming, and lakehouse-ready storage in a fully managed cloud service.",[48,11078,11079],{},"Key capabilities include:",[321,11081,11082,11085,11088,11091],{},[324,11083,11084],{},"Multi-Protocol Streaming – Run Pulsar- and Kafka-based workloads on the same platform without replatforming.",[324,11086,11087],{},"Lakehouse-Ready Architecture – Stream data directly into open table formats like Apache Iceberg on Alibaba Cloud Object Storage Service (OSS) for analytics and AI.",[324,11089,11090],{},"Global-Scale Messaging & Event Streaming – Leverage Pulsar’s infinite topic scalability, multi-tenancy, and geo-replication.",[324,11092,11093],{},"Cost Optimization at Scale – Ursa’s leaderless architecture eliminates cross-AZ replication overhead, reducing networking and storage costs.",[40,11095,11097],{"id":11096},"why-this-matters-for-alibaba-cloud-customers","Why This Matters for Alibaba Cloud Customers",[48,11099,11100],{},"By bringing StreamNative Cloud to the Alibaba Cloud Marketplace, we make it easier for organizations to:",[321,11102,11103,11106,11109,11112],{},[324,11104,11105],{},"Procure and deploy StreamNative Cloud directly through their Alibaba Cloud account.",[324,11107,11108],{},"Integrate with Alibaba Cloud services like OSS, MaxCompute, Hologres, and AI\u002FML offerings.",[324,11110,11111],{},"Simplify architecture by consolidating streaming, messaging, and data lakehouse ingestion into one platform.",[324,11113,11114],{},"Accelerate innovation with built-in schema governance, security, and streaming analytics capabilities.",[40,11116,11118],{"id":11117},"powered-by-ursa-for-pulsar-and-kafka","Powered by Ursa for Pulsar and Kafka",[48,11120,11121],{},"At the heart of StreamNative Cloud is Ursa, our next-generation streaming engine built for cloud efficiency and scale. Ursa powers both Pulsar- and Kafka-compatible workloads within StreamNative Cloud, enabling customers to modernize streaming architectures without costly migrations.",[40,11123,11125],{"id":11124},"how-to-get-started","How to Get Started",[48,11127,11128],{},"Getting started is simple:",[1666,11130,11131,11138,11146],{},[324,11132,11133,11134],{},"Visit the ",[55,11135,11137],{"href":11056,"rel":11136},[264],"StreamNative Cloud listing on Alibaba Cloud Marketplace.",[324,11139,11140,11141],{},"Follow the steps to ",[55,11142,11145],{"href":11143,"rel":11144},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fbilling\u002Fbilling-alibaba",[264],"Get started with StreamNative Cloud on Alibaba Cloud Marketplace with Pay-As-You-Go.",[324,11147,11148],{},[55,11149,11152],{"href":11150,"rel":11151},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fbilling\u002Fbilling-alibaba#create-your-pulsar-instance-and-pulsar-cluster",[264],"Launch your first cluster in minutes and connect your streaming workloads.",[40,11154,11156],{"id":11155},"start-streaming-today","Start Streaming Today",[48,11158,11159,11160,11164,11165,11167],{},"With ",[55,11161,11163],{"href":11056,"rel":11162},[264],"StreamNative Cloud now on the Alibaba Cloud Marketplace",", you can unify Pulsar, Kafka, and lakehouse integration in a single managed platform — powered by ",[55,11166,1332],{"href":6647}," — and ready for your most demanding real-time use cases.",{"title":18,"searchDepth":19,"depth":19,"links":11169},[11170,11171,11172,11173,11174],{"id":11072,"depth":19,"text":11073},{"id":11096,"depth":19,"text":11097},{"id":11117,"depth":19,"text":11118},{"id":11124,"depth":19,"text":11125},{"id":11155,"depth":19,"text":11156},"2025-08-26","StreamNative Cloud, powered by Ursa, is now publicly available on Alibaba Cloud Marketplace, offering a unified platform for Apache Pulsar and Kafka, real-time data streaming, and lakehouse integration to accelerate innovation and reduce costs.","\u002Fimgs\u002Fblogs\u002F68adb0209d2804d409b01171_SN+Alicloud.png",{},"\u002Fblog\u002Fstreamnative-cloud-now-available-for-public-preview-on-alibaba-cloud-marketplace","3 min read",{"title":11047,"description":11176},"blog\u002Fstreamnative-cloud-now-available-for-public-preview-on-alibaba-cloud-marketplace",[10322,1332,3550],"RNCy1Jph8VQ8F07HjmCxCJCLmUVrChnO5blFeddiit4",{"id":11186,"title":11187,"authors":11188,"body":11189,"category":3550,"createdAt":290,"date":11345,"description":11346,"extension":8,"featured":294,"image":11347,"isDraft":294,"link":290,"meta":11348,"navigation":7,"order":296,"path":4843,"readingTime":4475,"relatedResources":290,"seo":11349,"stem":11350,"tags":11351,"__hash__":11352},"blogs\u002Fblog\u002Fstreamnative-serverless-is-now-generally-available-on-aws-google-cloud-and-azure.md","StreamNative Serverless is Now Generally Available on AWS, Google Cloud, and Azure",[311],{"type":15,"value":11190,"toc":11333},[11191,11199,11202,11206,11209,11212,11216,11221,11225,11228,11232,11236,11241,11245,11250,11258,11261,11265,11271,11274,11278,11281,11295,11298,11307,11310,11314,11317,11322,11326],[48,11192,11193,11194,11198],{},"On September 9, 2024, we introduced",[55,11195,11197],{"href":11196},"\u002Fblog\u002Fintroducing-streamnative-serverless-instant-start-seamless-scaling-and-effortless-data-streaming"," StreamNative Serverless in Public Preview",", bringing instant start, seamless scaling, and effortless data streaming to developers and enterprises. Since then, we’ve seen tremendous adoption from organizations that want a fully managed, elastic, and cost-effective way to build and run real-time applications without worrying about infrastructure.",[48,11200,11201],{},"Today, we are excited to announce that StreamNative Serverless is now Generally Available (GA) on AWS, Google Cloud, and Azure. This marks a major milestone in making streaming and messaging accessible to everyone, regardless of scale or expertise.",[40,11203,11205],{"id":11204},"why-streamnative-serverless","Why StreamNative Serverless?",[48,11207,11208],{},"With StreamNative Serverless, anyone can get started with powerful event streaming and messaging capabilities in just a few clicks. There’s no infrastructure to provision, no clusters to size, and no operational overhead. The result is a pay-as-you-go model that ensures cost-effectiveness while giving developers and enterprises the freedom to innovate faster.",[48,11210,11211],{},"The true power of StreamNative Serverless lies in its ability to unify data streaming and messaging under a single, fully managed platform with multi-protocol support. Unlike traditional services locked to a single interface, StreamNative Serverless natively supports Pulsar, Kafka, and MQTT protocols, giving teams the flexibility to connect diverse applications, systems, and devices without rewriting code or re-architecting pipelines. Whether you’re building event-driven microservices, streaming analytics, or IoT applications, this seamless protocol interoperability ensures that developers can work with the tools and APIs they already know while benefiting from the elasticity, scalability, and simplicity of a serverless environment.",[32,11213,11215],{"id":11214},"core-capabilities-of-streamnative-serverless","Core Capabilities of StreamNative Serverless",[48,11217,11218],{},[384,11219],{"alt":5878,"src":11220},"\u002Fimgs\u002Fblogs\u002F68a6c96308627d7bfd835023_iShot_2025-08-21_15.22.59.png",[40,11222,11224],{"id":11223},"elastic-throughput-units-etus","Elastic Throughput Units (ETUs)",[48,11226,11227],{},"At the heart of StreamNative Serverless is the Elastic Throughput Unit (ETU). ETUs provide a simple, elastic, and predictable way to measure and consume streaming capacity. Each ETU provides throughput and message capacity across multiple dimensions:",[32,11229,11231],{"id":11230},"serverless-etu-capacity-and-limits","Serverless ETU Capacity and Limits",[3933,11233,11235],{"id":11234},"per-etu-capacity","Per ETU capacity",[48,11237,11238],{},[384,11239],{"alt":5878,"src":11240},"\u002Fimgs\u002Fblogs\u002F68a6c98f35f4dacd3d98ac98_iShot_2025-08-21_15.23.52.png",[3933,11242,11244],{"id":11243},"cluster-maximum-capacity","Cluster maximum capacity",[48,11246,11247],{},[384,11248],{"alt":5878,"src":11249},"\u002Fimgs\u002Fblogs\u002F68a6c9fa264fb398c03165db_iShot_2025-08-21_15.25.17.png",[321,11251,11252,11255],{},[324,11253,11254],{},"Minimum ETU per cluster: 1",[324,11256,11257],{},"Elasticity: Scale up seamlessly as workloads grow and scale back down when demand subsides.",[48,11259,11260],{},"This means developers and operators no longer need to size clusters upfront or worry about over-provisioning—ETUs handle elasticity for you.",[40,11262,11264],{"id":11263},"the-power-of-uniconn-one-api-unlimited-connectivity","The Power of UniConn: One API, Unlimited Connectivity",[48,11266,11159,11267,11270],{},[55,11268,11269],{"href":5039},"Universal Connectivity for Kafka and Pulsar (UniConn)",", StreamNative Serverless makes it simple to connect to any ecosystem. Whether you’re integrating with Pulsar IO connectors or Kafka Connect-based connectors, you can easily bridge data across systems and clouds. This gives you access to a rich portfolio of pre-built connectors, reducing the time and effort needed to wire up modern data architectures.",[48,11272,11273],{},"With UniConn, debugging and troubleshooting connectors is simple and intuitive. Users can view and filter log files directly within the platform, making it easy to trace connector activity, identify issues, and resolve errors quickly. This streamlined debugging experience reduces downtime and accelerates development by giving teams clear visibility into connector performance and behavior.",[40,11275,11277],{"id":11276},"functions-for-stream-processing","Functions for Stream Processing",[48,11279,11280],{},"StreamNative Serverless also supports Functions, enabling lightweight stream processing and event-driven applications. Functions can be written in a variety of runtimes, including:",[321,11282,11283,11286,11289,11292],{},[324,11284,11285],{},"Java",[324,11287,11288],{},"Python",[324,11290,11291],{},"Node.js",[324,11293,11294],{},"WebAssembly (WASM)",[48,11296,11297],{},"This makes it easier for developers to write business logic directly within the data pipeline without deploying external processing frameworks.",[48,11299,4221,11300,11306],{},[55,11301,11303],{"href":11302},"\u002Fblog\u002Fintroducing-pulsar-functions-on-streamnative-cloud",[44,11304,11305],{},"common use case for StreamNative Functions is building lightweight, custom stream processing logic directly within the data pipeline",". Developers can use Functions to filter, transform, or enrich events in real time, implement routing logic to send data to the right topics or systems, or perform aggregations for analytics and monitoring. By embedding this logic close to the data, Functions simplify architectures and reduce the need for external processing frameworks.",[48,11308,11309],{},"StreamNative now features notifications for Functions—a new capability designed to help teams proactively monitor critical workloads. With built-in, preconfigured notification rules like function‑crash‑loop‑backoff and function‑oom‑killed, users gain visibility into common failure scenarios. Notifications can be enabled or disabled via the console, and when triggered, email alerts are sent immediately, along with follow-up emails once the incident is resolved. Recipients default to your organization’s technical contact (or billing contact if no technical contact is set), ensuring rapid awareness and response to runtime issues.",[40,11311,11313],{"id":11312},"serverless-pricing","Serverless Pricing",[48,11315,11316],{},"StreamNative Serverless pricing is simple and transparent, based on the Elastic Throughput Unit (ETU). Each ETU costs $0.10 per hour (about $73 per month), and customers can start with just 1 ETU minimum. As workloads grow, additional ETUs can be added seamlessly, ensuring that you only pay for the capacity you actually use. This flexible, pay-as-you-go model means there’s no need to commit to large upfront infrastructure costs—you scale up or down as your data streaming demands change, with full cost predictability.",[48,11318,11319],{},[384,11320],{"alt":18,"src":11321},"\u002Fimgs\u002Fblogs\u002F68a6cbb64535ae9f087d36f6_AD_4nXfA3GVXbNPMfy4efESDq99TaOmEYfzToZpD8JmadV7-W-Z04g06Lx1Y6Wf4q2XXtnYBnUL4kilT4PNW3jMK4eODSgI3_OrOVWO2FIWLTwGsLpHMrjsiqGTJvYn9p1urq2DI70WcSA.png",[40,11323,11325],{"id":11324},"get-started-with-streamnative-serverless","Get Started with StreamNative Serverless",[48,11327,11328,11329,190],{},"With GA availability on AWS, Google Cloud, and Azure, StreamNative Serverless is ready for production workloads of any size. Whether you’re building next-generation data applications, connecting disparate systems, or processing streams in real time, StreamNative Serverless delivers an elastic, cost-effective, and developer-friendly platform. ",[55,11330,11332],{"href":4688,"rel":11331},[264],"Sign up for a trial to get started",{"title":18,"searchDepth":19,"depth":19,"links":11334},[11335,11338,11341,11342,11343,11344],{"id":11204,"depth":19,"text":11205,"children":11336},[11337],{"id":11214,"depth":279,"text":11215},{"id":11223,"depth":19,"text":11224,"children":11339},[11340],{"id":11230,"depth":279,"text":11231},{"id":11263,"depth":19,"text":11264},{"id":11276,"depth":19,"text":11277},{"id":11312,"depth":19,"text":11313},{"id":11324,"depth":19,"text":11325},"2025-08-21","StreamNative Serverless is now GA on AWS, Google Cloud, and Azure! Deliver real-time apps faster with instant start, seamless scaling, multi-protocol support (Pulsar, Kafka, MQTT), and pay-as-you-go pricing—without managing infrastructure.","\u002Fimgs\u002Fblogs\u002F68a6c8a208627d7bfd830126_Serverless-GA.png",{},{"title":11187,"description":11346},"blog\u002Fstreamnative-serverless-is-now-generally-available-on-aws-google-cloud-and-azure",[3550,10322,4839],"fbMRiC9Q7dnTIc5PEaUhNrVyz-e7fok4QwMd1twlccI",{"id":11354,"title":11355,"authors":11356,"body":11357,"category":3550,"createdAt":290,"date":11503,"description":11504,"extension":8,"featured":294,"image":11505,"isDraft":294,"link":290,"meta":11506,"navigation":7,"order":296,"path":11507,"readingTime":11508,"relatedResources":290,"seo":11509,"stem":11510,"tags":11511,"__hash__":11513},"blogs\u002Fblog\u002Fstreamnative-announces-general-availability-of-universal-connectivity-uniconn.md","StreamNative Announces General Availability of Universal Connectivity (UniConn)",[311],{"type":15,"value":11358,"toc":11493},[11359,11366,11375,11379,11382,11386,11390,11393,11418,11422,11425,11430,11435,11439,11442,11447,11451,11454,11478,11483,11485],[48,11360,11193,11361,11365],{},[55,11362,11364],{"href":11363},"\u002Fblog\u002Frevolutionizing-data-connectivity-introducing-streamnatives-universal-connectivity-uniconn-for-seamless-real-time-data-access"," Universal Connectivity (UniConn)"," in public preview—a unified framework in StreamNative Cloud designed to simplify and accelerate real-time data access.",[48,11367,11368,11369,11374],{},"Today, we are excited to announce that UniConn is now generally available for all customers on StreamNative Cloud. With this milestone, enterprises can confidently move workloads into production using ",[55,11370,11373],{"href":11371,"rel":11372},"https:\u002F\u002Fdocs.streamnative.io\u002Fconnect\u002Foverview",[264],"UniConn’s broad connector ecosystem",", advanced observability, and seamless integration with their data platform stack.",[40,11376,11378],{"id":11377},"the-power-of-uniconn","The Power of UniConn",[48,11380,11381],{},"UniConn gives developers the flexibility to leverage the best of both worlds—connectors from Pulsar IO and Kafka Connect—to build streaming pipelines with ease. Whether you’re moving data between messaging systems, databases, data warehouses, or SaaS platforms, UniConn provides a single, consistent way to deploy, monitor, and manage connectors in StreamNative Cloud.",[40,11383,11385],{"id":11384},"whats-new-in-ga","What’s New in GA",[32,11387,11389],{"id":11388},"_1-expanded-connector-portfolio","1. Expanded Connector Portfolio",[48,11391,11392],{},"Our connector library now includes 50+ production-ready connectors, empowering you to integrate with a wide range of systems. Highlights include:",[321,11394,11395,11403,11406,11409,11412,11415],{},[324,11396,11397,11402],{},[55,11398,11401],{"href":11399,"rel":11400},"https:\u002F\u002Fdocs.streamnative.io\u002Fconnect\u002Fconnectors\u002Fsnowflake-streaming-sink\u002Fcurrent\u002Fsnowflake-streaming",[264],"Snowflake Snowpipe Streaming Connector"," – Stream data into Snowflake with sub-second latency.",[324,11404,11405],{},"New Kafka Sink Connectors – We added some new connectors since the Public Preview launch BigQuery Sink Connector",[324,11407,11408],{},"Elasticsearch Sink Connector",[324,11410,11411],{},"JDBC Sink Connector",[324,11413,11414],{},"MongoDB Sink Connector",[324,11416,11417],{},"Snowflake Sink Connector",[32,11419,11421],{"id":11420},"_2-enhanced-debugging-experience","2. Enhanced Debugging Experience",[48,11423,11424],{},"We’ve made it easier than ever to troubleshoot connectors. Quickly pinpoint issues without sifting through irrelevant log entries.",[321,11426,11427],{},[324,11428,11429],{},"Filter connector logs by INFO, DEBUG, or TRACE levels.",[48,11431,11432],{},[384,11433],{"alt":18,"src":11434},"\u002Fimgs\u002Fblogs\u002F68a29b7869de716b8bd32365_AD_4nXcfZrTOEFzLD_WoeqJffOLvC_Ptkf6fpdkDbIHIh7aBSVbOO_TcJIUR1TikbVktth1kJRzW-wClLk3RkVNPjCbIsY23iebP5jpi-5NBfQHLk5csTIrUkErz1Eyg9_dU3hZqmTb_Sw.png",[32,11436,11438],{"id":11437},"_3-enhanced-actions-menu","3. Enhanced actions menu",[48,11440,11441],{},"The enhanced actions menu for Pulsar and Kafka-based connectors provides a unified and streamlined experience for managing connector lifecycles.",[48,11443,11444],{},[384,11445],{"alt":18,"src":11446},"\u002Fimgs\u002Fblogs\u002F68a29b7869de716b8bd3233f_AD_4nXd4Rfx923OiiXH4a54yW5qEYYC0BX7A33a12QtpPOn0hUXk91Ayl0_2IoxhkQTzjswVdPwFmxzkIVvFSOdZZt-V4KXx8_5_DsGDJlTUIhUV9qwrzDgU60pRBotPDn2HKPPxdxdP.png",[32,11448,11450],{"id":11449},"_4-metrics-monitoring-at-your-fingertips","4. Metrics & Monitoring at Your Fingertips",[48,11452,11453],{},"All connector metrics are now exposed and can be integrated into your observability stack:",[321,11455,11456,11470],{},[324,11457,11458,11459,4003,11464,11469],{},"Consume ",[55,11460,11463],{"href":11461,"rel":11462},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Flog-and-monitor\u002Fcloud-metrics-api#pulsar-resource-metrics",[264],"Pulsar metrics",[55,11465,11468],{"href":11466,"rel":11467},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Flog-and-monitor\u002Fcloud-metrics-api#kafka-connect-metrics",[264],"Kafka connect metrics"," via Metrics API help fine-tune performance and troubleshoot issues.",[324,11471,11472,11477],{},[55,11473,11476],{"href":11474,"rel":11475},"https:\u002F\u002Fdocs.datadoghq.com\u002Fintegrations\u002Fstreamnative\u002F",[264],"Native Datadog integration"," pulls connector metrics directly from StreamNative Cloud, giving you real-time visibility into performance and health.",[48,11479,11480],{},[384,11481],{"alt":18,"src":11482},"\u002Fimgs\u002Fblogs\u002F68a29b7869de716b8bd32348_AD_4nXdWfNtT5C3ELXXOonZzXrmT07VCcnsYuflKt_N3diqkh0kAbfMnZh3DT1eTSZchuVzKcZo5aRb9-6PZEaWziepTq40G_duZUPHhJNjuZDGm7FU406P26-unxFqdCYD3l-xIX5x6.png",[40,11484,3880],{"id":3877},[48,11486,11487,11488,11492],{},"Experience the full power of Universal Connectivity in production.\n",[55,11489,11491],{"href":4688,"rel":11490},[264],"Sign up for a free trial of StreamNative Cloud"," and start building real-time data pipelines with our growing connector ecosystem.",{"title":18,"searchDepth":19,"depth":19,"links":11494},[11495,11496,11502],{"id":11377,"depth":19,"text":11378},{"id":11384,"depth":19,"text":11385,"children":11497},[11498,11499,11500,11501],{"id":11388,"depth":279,"text":11389},{"id":11420,"depth":279,"text":11421},{"id":11437,"depth":279,"text":11438},{"id":11449,"depth":279,"text":11450},{"id":3877,"depth":19,"text":3880},"2025-08-18","UniConn is now GA! Unlock 50+ production-ready connectors, seamless integrations, and real-time monitoring in StreamNative Cloud. Build streaming pipelines faster than ever.","\u002Fimgs\u002Fblogs\u002F68a29a8ddcac67ecc010c809_Unicon-GA.png",{},"\u002Fblog\u002Fstreamnative-announces-general-availability-of-universal-connectivity-uniconn","5 min read",{"title":11355,"description":11504},"blog\u002Fstreamnative-announces-general-availability-of-universal-connectivity-uniconn",[11512,3550],"UniConn","oXIhmBFCePa1Uo0ayk4_mHeTQL8woErfgzdS1U4zTKU",{"id":11515,"title":11516,"authors":11517,"body":11518,"category":5376,"createdAt":290,"date":11672,"description":11673,"extension":8,"featured":294,"image":7983,"isDraft":294,"link":290,"meta":11674,"navigation":7,"order":296,"path":11675,"readingTime":11508,"relatedResources":290,"seo":11676,"stem":11677,"tags":11678,"__hash__":11679},"blogs\u002Fblog\u002Fdata-streaming-summit-san-francisco-2025-schedule-announced.md","Explore the Future of Data Streaming and AI — Data Streaming Summit San Francisco 2025 Schedule Announced",[6127],{"type":15,"value":11519,"toc":11662},[11520,11523,11526,11529,11532,11536,11539,11550,11553,11557,11560,11564,11567,11571,11585,11588,11592,11615,11619,11633,11637,11640,11643,11650,11657,11660],[48,11521,11522],{},"‍Date: Sept 30, 2025",[48,11524,11525],{},"‍Location: Grand Hyatt at SFO‍",[48,11527,11528],{},"Tracks: Deep Dive · Use Cases · AI + Stream Processing · Streaming Lakehouse",[48,11530,11531],{},"This year’s Data Streaming Summit brings the most open, practitioner-driven agenda we’ve ever hosted: 30+ sessions across four focused tracks featuring builders from OpenAI, Netflix, Uber, Google, Blueshift, Confluent, Ververica, StreamNative, and many more.",[40,11533,11535],{"id":11534},"why-this-year-matters","Why This Year Matters",[48,11537,11538],{},"We’re at a pivotal moment for the industry. Data Streaming platforms are evolving rapidly to meet three core challenges:",[1666,11540,11541,11544,11547],{},[324,11542,11543],{},"Escalating costs of running traditional streaming infrastructure.",[324,11545,11546],{},"Data silos between operational streams and analytical systems.",[324,11548,11549],{},"The AI imperative — delivering insights and actions in milliseconds, not minutes.",[48,11551,11552],{},"At DSS SF 2025, we’ll explore how innovation in architecture, governance, and open standards is addressing these challenges — and shaping the next decade of data streaming systems.",[40,11554,11556],{"id":11555},"opening-keynotes","Opening Keynotes",[48,11558,11559],{},"The morning keynotes will feature industry leaders from OpenAI, Motorq, and StreamNative, sharing how innovation in streaming, analytics, and AI is redefining the future of data infrastructure. (More keynote speakers to be announced.)",[40,11561,11563],{"id":11562},"special-session","Special Session",[48,11565,11566],{},"We’re excited to host a fireside chat with Reynold Xin (Co-founder and Chief Architect, Databricks) on lakehouse, data streaming, and AI. This conversation will bring together insights from one of the most influential voices in the modern data ecosystem, exploring how these three pillars are converging to shape the future of real-time data infrastructure.",[40,11568,11570],{"id":11569},"four-concurrent-tracks-30-sessions","Four Concurrent Tracks — 30+ Sessions",[321,11572,11573,11576,11579,11582],{},[324,11574,11575],{},"Deep Dive — internals, performance tuning, and operational excellence.",[324,11577,11578],{},"Use Cases — real-world architectures delivering measurable results.",[324,11580,11581],{},"AI + Stream Processing — pipelines powering intelligent systems.",[324,11583,11584],{},"Streaming Lakehouse — unifying real-time and historical data under a single governance layer.",[48,11586,11587],{},"Hands-on engineering insights you can apply immediately — from low-latency patterns to Iceberg-powered streaming lakehouses, schema governance, and Pulsar\u002FKafka operations at scale.",[40,11589,11591],{"id":11590},"dont-miss-these-highlights","Don’t Miss These Highlights",[321,11593,11594,11597,11600,11603,11606,11609,11612],{},[324,11595,11596],{},"Blueshift — Next-Gen Data Infra: Building Resilient, Scalable Architecture with Apache Pulsar (Use Cases)",[324,11598,11599],{},"OpenAI — Streaming to Scale: Real-Time Infrastructure for AI (Use Cases)",[324,11601,11602],{},"Salesforce — Insights from Streaming 300B Telemetry Trace Spans per Day with Flink (AI + Stream Processing)",[324,11604,11605],{},"Netflix — Kafka Under Pressure: Netflix’s Blueprint for Unshakeable Kafka Resilience (Use Cases)",[324,11607,11608],{},"Uber — Safe Streams at Scale: Uber’s Deployment Safety Framework for Flink Jobs (AI + Stream Processing)",[324,11610,11611],{},"Google — Beyond Stream Ingestion: Building Google-Scale, AI-Powered Stream Processing Pipelines with Just SQL (AI + Stream Processing)",[324,11613,11614],{},"OpenAI — StreamLink: Real-Time Data Ingestion at OpenAI Scale (Streaming Lakehouse)",[40,11616,11618],{"id":11617},"and-deeper-dives-youll-talk-about-all-quarter","And Deeper Dives You’ll Talk About All Quarter",[321,11620,11621,11624,11627,11630],{},[324,11622,11623],{},"Beyond CAP: A New Theorem for Modern Data Streaming — Main Hall, StreamNative",[324,11625,11626],{},"Are Your Kafka Guarantees Actually Guaranteed? — Main Hall, Antithesis",[324,11628,11629],{},"High-Throughput Streaming in the Lakehouse with Non-Blocking Concurrency Control in Flink & Hudi — Streaming Lakehouse, Onehouse",[324,11631,11632],{},"Flink Changelog Modes — Why You Should Care — AI + Stream Processing, Confluent",[40,11634,11636],{"id":11635},"closing-panel-discussion","Closing Panel Discussion",[48,11638,11639],{},"A conversation with industry leaders on the future of data streaming. (Full details coming soon.)",[40,11641,11642],{"id":7963},"Join Us",[48,11644,11645,11646],{},"📅 Check the full schedule: ",[55,11647,11649],{"href":8968,"rel":11648},[264],"Schedule",[48,11651,11652,11653],{},"🎟 Get an earlybird ticket:",[55,11654,11656],{"href":7969,"rel":11655},[264]," Eventbrite Registration",[48,11658,11659],{},"Looking forward to seeing you at the Grand Hyatt at SFO!",[48,11661,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":11663},[11664,11665,11666,11667,11668,11669,11670,11671],{"id":11534,"depth":19,"text":11535},{"id":11555,"depth":19,"text":11556},{"id":11562,"depth":19,"text":11563},{"id":11569,"depth":19,"text":11570},{"id":11590,"depth":19,"text":11591},{"id":11617,"depth":19,"text":11618},{"id":11635,"depth":19,"text":11636},{"id":7963,"depth":19,"text":11642},"2025-08-14","Join data streaming leaders at DSS SF 2025, Sept 30 at the Grand Hyatt at SFO. 30+ sessions across 4 tracks with speakers from OpenAI, Netflix, Uber, Google & more — exploring AI, streaming lakehouses, and the future of real-time data.",{},"\u002Fblog\u002Fdata-streaming-summit-san-francisco-2025-schedule-announced",{"title":11516,"description":11673},"blog\u002Fdata-streaming-summit-san-francisco-2025-schedule-announced",[5376,800,821,1332,799],"TMBZ3nYVVxi5UtHvLwBoqUyXlxKhS5gv3P7x-rb_B1I",{"id":11681,"title":11682,"authors":11683,"body":11684,"category":821,"createdAt":290,"date":11672,"description":11892,"extension":8,"featured":294,"image":11893,"isDraft":294,"link":290,"meta":11894,"navigation":7,"order":296,"path":11895,"readingTime":4475,"relatedResources":290,"seo":11896,"stem":11897,"tags":11898,"__hash__":11900},"blogs\u002Fblog\u002Fkafkas-approach-to-multi-region-streaming.md","Kafka’s Approach to Multi-Region Streaming (Geo-Replicated Architectures with Kafka and Pulsar 2\u002F6)",[6785],{"type":15,"value":11685,"toc":11884},[11686,11688,11705,11709,11712,11715,11719,11722,11725,11736,11739,11750,11753,11756,11760,11763,11780,11783,11786,11790,11793,11807,11810,11814,11817,11828,11831,11834,11836,11859,11861,11863,11865,11870,11872,11877,11882],[48,11687,8648],{},[321,11689,11690,11693,11696,11699,11702],{},[324,11691,11692],{},"Apache Kafka achieves geo-replication through separate clusters and replication tools. A Kafka cluster is usually confined to one region, so multi-region streaming means running multiple clusters and copying data between them.",[324,11694,11695],{},"MirrorMaker 2 (MM2) is Kafka’s primary tool for cross-cluster replication. It runs as a Kafka Connect-based process that consumes from source topics and produces to target clusters, propagating messages continuously for disaster recovery, data distribution, or migrations.",[324,11697,11698],{},"Kafka’s geo-replication is asynchronous and typically active-passive: one cluster is primary (active) and others are secondary copies. Active-active (bidirectional) setups are possible with MirrorMaker 2 but introduce complexity (e.g. naming conflicts, offset translation).",[324,11700,11701],{},"Newer Kafka features like Confluent’s Cluster Linking (in Confluent Platform) aim to simplify multi-region replication by replicating data within brokers and preserving offsets. However, in open-source Kafka, MirrorMaker 2 remains the go-to solution for multi-region continuity.",[324,11703,11704],{},"Key challenges of Kafka’s approach include extra operational overhead (managing MirrorMaker connectors), offset inconsistencies between clusters, and ensuring failover processes so consumers can switch clusters with minimal downtime.",[40,11706,11708],{"id":11707},"kafka-clusters-and-geographic-boundaries","Kafka Clusters and Geographic Boundaries",[48,11710,11711],{},"By design, an Apache Kafka cluster is a tightly coupled set of brokers with a single distributed log. While Kafka’s broker replication (the intra-cluster replication) keeps data redundant across nodes and Availability Zones, it doesn’t by itself replicate data across distant regions. Stretching a single Kafka cluster over high-latency links (e.g. brokers spread across continents) is not recommended – it would suffer from slow replication and leader election issues. Instead, the typical approach is to deploy independent Kafka clusters per region or datacenter, then mirror data between clusters for geo-replication.",[48,11713,11714],{},"In a multi-region Kafka deployment, you might have a cluster “Kafka-US-East” and another “Kafka-US-West,” each serving local clients. To keep them in sync (say, to have US-West as a backup for US-East, or to aggregate events globally), Kafka provides MirrorMaker. The fundamental idea is simple: use a Kafka consumer-producer pair to continuously copy messages from topics in one cluster to topics in another. This can be one-way (unidirectional replication) or two-way (bi-directional between two clusters, if needed).",[40,11716,11718],{"id":11717},"mirrormaker-2-kafkas-geo-replication-workhorse","MirrorMaker 2: Kafka’s Geo-Replication Workhorse",[48,11720,11721],{},"MirrorMaker 2 (MM2) is the evolution of Kafka’s original MirrorMaker (MM1) and is the standard way to replicate data across Kafka clusters in open-source Kafka. MirrorMaker 2 is built on the Kafka Connect framework, which makes it essentially a set of connectors (source, heartbeat, and checkpoint connectors) dedicated to replication. Running MirrorMaker 2 means running a Connect cluster (which can be on separate machines or even co-located on the Kafka brokers) configured to replicate topics of interest from one cluster to another.",[48,11723,11724],{},"Some key aspects of MirrorMaker 2 architecture:",[321,11726,11727,11730,11733],{},[324,11728,11729],{},"It uses Kafka Connect Source connectors to consume from the source cluster’s topics and Connect Sink (producer) connectors to write to the target cluster. Internally, it’s effectively the same as a custom consumer reading from Cluster A and producing to Cluster B, but packaged and managed as a fault-tolerant Connect job.",[324,11731,11732],{},"MirrorMaker 2 includes a checkpoint connector that periodically records the source consumer group offsets in the target cluster (stored in special internal topics). This facilitates offset translation: if consumers fail over to the target cluster, they can know where to resume consuming in the new cluster based on their positions in the old cluster.",[324,11734,11735],{},"Heartbeat connectors emit periodic heartbeats used to monitor the health and lag of the replication process. This helps in monitoring whether the mirror is “caught up” or if it’s falling behind.",[48,11737,11738],{},"Using MirrorMaker 2, operators can implement various geo-replication patterns:",[321,11740,11741,11744,11747],{},[324,11742,11743],{},"Active-Passive (Disaster Recovery): Continuously replicate from a primary cluster to a secondary cluster that sits idle until needed. In a DR scenario, the secondary has all the data up to the failure point and can take over serving consumers. This is illustrated as the classic use case – if Cluster A goes down, Cluster B (which has been mirroring A) becomes the new source of truth.",[324,11745,11746],{},"Active-Active (Bidirectional): Set up MM2 to replicate in both directions between two clusters. This way, both clusters receive each other’s data. This scenario can support active-active applications (where each region produces some unique data and needs a global view). However, careful configuration is needed to avoid loops – typically, MirrorMaker applies cluster prefixes to mirrored topics to distinguish them. For example, data from Cluster A mirrored to Cluster B might be stored under topic name “A.topicName” on Cluster B. This prevents re-mirroring the same data back and forth endlessly.",[324,11748,11749],{},"Fan-out or Aggregation: One cluster can mirror to multiple target clusters (fan-out), or multiple source clusters can mirror into one central cluster (aggregation). For instance, you might aggregate logs from regional Kafka clusters into one global Kafka cluster for analytics. MirrorMaker 2 allows multiple independent source->target flows to run concurrently.",[48,11751,11752],{},"Operationally, MirrorMaker 2 is a separate component to manage. A best practice is to run the MirrorMaker (Connect) workers in the target region (the cluster receiving the data) for efficiency. This “consume remote, produce local” pattern leverages the fact that Kafka producers can be more latency-sensitive than consumers. By consuming over the WAN (which the source cluster can handle) and producing into the local cluster, you reduce latency and the chance of network issues affecting the producer side. It also avoids burdening the source cluster with additional producer load. In practice, that means if you’re mirroring from Region A to Region B, you’d deploy the MirrorMaker connectors in Region B close to the B brokers, so that writes are fast and any network slowness only impacts the reads from A.",[48,11754,11755],{},"Kafka’s documentation and community also emphasize monitoring replication lag – the delay between events in the source being available in the target. MirrorMaker 2’s checkpointing and heartbeats help track this. If lag grows unexpectedly, it might indicate network issues or that the MirrorMaker is underscaled (e.g., not enough consumer threads).",[40,11757,11759],{"id":11758},"challenges-and-considerations-in-kafkas-geo-replication","Challenges and Considerations in Kafka’s Geo-Replication",[48,11761,11762],{},"While MirrorMaker 2 gets the job done, it introduces some challenges that architects need to consider:",[321,11764,11765,11768,11771,11774,11777],{},[324,11766,11767],{},"Operational Complexity: You’re effectively running a Kafka Connect cluster alongside your Kafka clusters. This means additional moving parts – configuration, scaling of connect workers, monitoring another system’s health. It’s an extra layer where things can fail independently. As noted in Kafka Improvement Proposals, MirrorMaker (being external to the brokers) can experience outages even when the Kafka clusters are healthy. This requires robust monitoring and possibly automation to restart or reconfigure mirroring if it stops.",[324,11769,11770],{},"Delayed Consistency: Kafka’s geo-replication via MM2 is asynchronous. There will always be some lag between the source and replica – hopefully only a few seconds or less, but it could be more under load or network strain. During that lag, data written to the source may not yet be in the target. If a disaster strikes at that moment, those last few messages might be missing on the backup (this defines your Recovery Point Objective). Most setups aim for seconds of lag at most.",[324,11772,11773],{},"Consumer Offset Translation: One of the trickiest aspects is handling consumer failover. In Kafka, consumer group offsets are stored in a special topic (__consumer_offsets). When you replicate data to a second cluster, the consumer offsets in that cluster are not automatically in sync with the primary. MirrorMaker 2’s checkpoint connector addresses this by translating and syncing offsets periodically. Still, if a failover happens, consumers might have to seek to the translated offset positions in the new cluster. There’s a potential for confusion or even duplicate\u002Fmissed messages if the offset sync isn’t perfectly up-to-date. Confluent’s enterprise feature “Cluster Linking” improves this by preserving offsets exactly (since it effectively ships log segments over), but in open source Kafka, some manual intervention or careful planning is needed to smoothly switch consumers to a new cluster.",[324,11775,11776],{},"Topic Naming and Filtering: By default, MirrorMaker 2 will replicate all topics (or a whitelist\u002Fblacklist pattern). In multi-region architectures, it’s common to use prefixes to prevent collision or loops. For example, cluster “east” might prepend east. to all its topic names when mirroring to “west”, so on West cluster you have east.topic1 as the mirror of topic1 from East. This means consumers on the West cluster might subscribe to a different topic name. It’s a design decision: do you keep the same topic names on the secondary (and be careful to avoid feeding mirrored data back), or do you use distinct names? The Dattell comparison notes that Kafka “topics in the remote cluster must have a different name than the original” in two-way replication setups – because without that, the mirrored data would confuse the mirror maker on the return path. This complexity is something architects must plan for (often by segregating namespaces or using the built-in naming conventions of MM2, which by default uses prefixes for active\u002Factive links).",[324,11778,11779],{},"Active-Active Conflict Handling: If you do attempt an active-active deployment (both clusters accepting writes), you must ensure the same data isn’t produced in both places or you’ll have duplicates. Typically, this is done by geo-partitioning the writes (each region writes a subset of messages that the other doesn’t). For instance, Region A handles all events for customers in Americas, Region B for customers in EMEA, and they replicate to each other so both have the full global dataset read-only. There’s no inherent conflict resolution in Kafka’s mirror mechanism; it’s up to your application design to avoid two regions producing the same logical record.",[48,11781,11782],{},"Despite these challenges, Kafka’s approach has been battle-tested in industry. Companies like Uber have built complex mirroring topologies for multi-region Kafka, using tools like uReplicator (Uber’s enhancement of MirrorMaker) to handle massive scale. MirrorMaker 2 significantly improved on MirrorMaker 1 by adding the offset sync and better fault tolerance via Connect.",[48,11784,11785],{},"Additionally, Confluent Cluster Linking (a feature of Confluent Platform since version 7.x) provides an integrated alternative to MirrorMaker. Cluster Linking allows one Kafka cluster’s brokers to directly stream data from another cluster’s brokers, preserving message offsets and transaction semantics. It essentially treats the log as the unit of replication rather than going through producers\u002Fconsumers. For those using Confluent, this can simplify multi-region setup: you configure a link and the clusters share topics with the same names, making failover easier (no renaming). However, Cluster Linking is not part of Apache Kafka open source (as of 2025, proposals like KIP-786\u002F986 are still in discussion to bring similar ideas into OSS). So, in this series, we’ll focus on the open-source tools, namely MirrorMaker 2, while acknowledging these newer developments.",[40,11787,11789],{"id":11788},"example-disaster-recovery-failover-with-kafka","Example: Disaster Recovery Failover with Kafka",[48,11791,11792],{},"To ground this, imagine you have two Kafka clusters: Cluster A (primary) in us-east, and Cluster B (secondary) in us-west. MirrorMaker 2 is set up to replicate all topics from A to B. Producers and consumers normally all connect to Cluster A. Suddenly, Cluster A experiences a regional outage (network cut or major failure). Here’s what happens in a well-planned setup:",[1666,11794,11795,11798,11801,11804],{},[324,11796,11797],{},"Detection: Monitoring alerts that Cluster A is down. Also, MirrorMaker on B will start failing to fetch new data (or heartbeat fails).",[324,11799,11800],{},"Failover Switch: Operators (or an automated system) update client connection info to point producers\u002Fconsumers to Cluster B, the DR cluster.",[324,11802,11803],{},"Consumer Offset Sync: Thanks to MirrorMaker’s checkpoints, Cluster B knows the offsets of consumer groups from A. Consumers either automatically start at the correct position (if using Confluent cluster linking or a custom failover script using the stored checkpoints), or they may need to be pointed to the translated offsets MirrorMaker maintained.",[324,11805,11806],{},"Resume Operations: Producers now send to Cluster B’s topics, and consumers read from Cluster B. All data up until the outage was already in Cluster B (maybe lagging by a second or two), so the stream continues with minimal interruption.",[48,11808,11809],{},"When Cluster A is restored, one could optionally mirror data back or even make Cluster A the primary again (fail-back). However, doing so without duplicates requires careful coordination (perhaps wiping Cluster A and re-mirroring from B to A to get it caught up). This is why many Kafka deployments in practice run in an active-passive DR mode, where the passive is only used during failover, rather than true active-active.",[40,11811,11813],{"id":11812},"cross-region-considerations-for-kafka","Cross-Region Considerations for Kafka",[48,11815,11816],{},"A few additional points architects should note:",[321,11818,11819,11822,11825],{},[324,11820,11821],{},"Networking and Security: Cross-data-center links should be high-bandwidth and secure. MirrorMaker will happily saturate a network pipe with data. Ensure you have enough throughput between regions and use encryption (Kafka supports SSL for inter-cluster traffic).",[324,11823,11824],{},"Ordering Guarantees: MirrorMaker does not guarantee global ordering. Each cluster maintains its own order per partition. During failover, consumers might observe a slight disordering around the switch (especially if some messages were in transit). In critical systems, you might implement an “event ID” to deduplicate or reorder if needed across clusters.",[324,11826,11827],{},"Schema and Metadata: Kafka doesn’t automatically replicate schema registry data or other metadata. If you use Confluent Schema Registry, you’d need to replicate schema definitions to the DR site (Confluent offers Schema Linking for that). Likewise, ACLs and security settings need to be mirrored via external processes (or managed uniformly via an IaC approach).",[48,11829,11830],{},"In summary, Kafka’s multi-region story is one of multiple clusters bridged by replication processes. It gives you flexibility – you can tailor what to replicate and where – but demands careful setup. The benefit is proven technology and the vast Kafka ecosystem support. The downside is complexity in operating those bridges and in handling edge cases during failover.",[48,11832,11833],{},"In the next part, we’ll see how Apache Pulsar tackles geo-replication from a different angle – baking it into the system itself. Pulsar’s approach can simplify some of the above challenges (like built-in offset management across regions and no need for external processes). But it comes with its own trade-offs, as we’ll explore.",[40,11835,8924],{"id":8923},[321,11837,11838,11841,11844,11847,11850,11853,11856],{},[324,11839,11840],{},"Kafka itself confines replication to within a cluster (for fault tolerance within a region). Geo-replication in Kafka is achieved by running multiple clusters and copying data between them using tools like MirrorMaker 2.",[324,11842,11843],{},"MirrorMaker 2 (Kafka Connect-based) is the de facto solution for multi-region Kafka. It continuously consumes from source topics and produces to target clusters, supporting DR, data migration, and multi-region data distribution use cases.",[324,11845,11846],{},"Kafka geo-replication is asynchronous and eventually consistent. There will be a replication lag, so a backup cluster may be seconds behind the primary. Active-passive (one-way replication) is simpler; active-active (bi-directional) requires careful configuration (like distinct topic names or prefixes to avoid infinite loops).",[324,11848,11849],{},"Operating Kafka across regions introduces extra overhead: you must manage MirrorMaker\u002FConnect processes, handle offset translation so consumers can fail over without reprocessing data, and ensure monitoring of replication lag and health.",[324,11851,11852],{},"Newer features (e.g. Confluent Cluster Linking) improve the story by removing the external process and preserving offsets, but these are not part of open-source Kafka yet. Open-source users rely on MM2 or third-party tools for multi-region setups.",[324,11854,11855],{},"In practice, architecting Kafka for multi-region involves planning for failover procedures (how to switch clusters), client reconfiguration, and possibly tooling to synchronize metadata like schemas and ACLs across clusters.",[324,11857,11858],{},"Bottom line: Kafka can certainly be made to work in multi-region and hybrid-cloud scenarios – it powers many global systems – but it requires careful architecture. Next, we’ll look at Pulsar, which takes a more integrated approach to geo-replication, potentially simplifying multi-region streaming for architects.",[48,11860,3931],{},[208,11862],{},[48,11864,3931],{},[48,11866,8956,11867,8960],{},[55,11868,5405],{"href":6135,"rel":11869},[264],[48,11871,8963],{},[48,11873,11874],{},[55,11875,8970],{"href":8968,"rel":11876},[264],[48,11878,11879],{},[55,11880,8976],{"href":7969,"rel":11881},[264],[48,11883,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":11885},[11886,11887,11888,11889,11890,11891],{"id":11707,"depth":19,"text":11708},{"id":11717,"depth":19,"text":11718},{"id":11758,"depth":19,"text":11759},{"id":11788,"depth":19,"text":11789},{"id":11812,"depth":19,"text":11813},{"id":8923,"depth":19,"text":8924},"Learn how Apache Kafka achieves multi-region streaming through separate clusters and replication tools like MirrorMaker 2, and explore the challenges and considerations for geo-replication.","\u002Fimgs\u002Fblogs\u002F689e25766d743c5fe4a4afc9_02.-Kafka's-Multi-Region-Streaming.png",{},"\u002Fblog\u002Fkafkas-approach-to-multi-region-streaming",{"title":11682,"description":11892},"blog\u002Fkafkas-approach-to-multi-region-streaming",[799,11899,821],"Geo-Replication","5Qds30KN_Ch3IriG8ZSauAXeiSe0Q4gC0d_GdQJPF-g",{"id":11902,"title":11903,"authors":11904,"body":11905,"category":821,"createdAt":290,"date":12098,"description":12099,"extension":8,"featured":294,"image":12100,"isDraft":294,"link":290,"meta":12101,"navigation":7,"order":296,"path":12102,"readingTime":4475,"relatedResources":290,"seo":12103,"stem":12104,"tags":12105,"__hash__":12107},"blogs\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-3-ledgers-bookies.md","Pulsar Newbie Guide for Kafka Engineers (Part 3): Ledgers & Bookies",[808,809,810],{"type":15,"value":11906,"toc":12090},[11907,11909,11912,11916,11919,11922,11933,11936,11939,11950,11954,11957,11960,11971,11974,11977,11980,11983,11987,11990,12004,12007,12011,12025,12029,12040,12043,12046,12048,12065,12067,12069,12071,12076,12078,12083,12088],[3933,11908,7358],{"id":7357},[48,11910,11911],{},"This post dives into how Pulsar stores data using Apache BookKeeper – a departure from Kafka’s file-based storage per broker. Pulsar uses bookies (storage nodes) and ledgers (append-only logs spread across bookies) to durably persist messages. We’ll explain what ledgers are (think of them like distributed log segments) and how Pulsar brokers write to and read from bookies. The result is a decoupled architecture: brokers are stateless serving layers, and bookies handle data persistence and replication. Kafka engineers will learn how Pulsar achieves durability and high availability through this two-layer design.",[40,11913,11915],{"id":11914},"pulsars-storage-architecture-vs-kafkas","Pulsar’s Storage Architecture vs Kafka’s",[48,11917,11918],{},"In Apache Kafka, each broker stores the data (messages) for the partitions it owns on its local disk. Replication is broker-to-broker; each partition has leader and follower brokers. In Apache Pulsar, the approach is different: brokers do NOT keep long-term data on their local disk. Instead, Pulsar leverages Apache BookKeeper as a separate storage layer. Brokers in Pulsar act more like stateless proxy\u002Fdispatchers, while bookies (BookKeeper servers) provide the durable storage.",[48,11920,11921],{},"Let’s break it down:",[321,11923,11924,11927,11930],{},[324,11925,11926],{},"A bookie is a process (and node) in a Pulsar cluster whose job is to store data. It’s part of the BookKeeper ensemble. Think of bookies as analogous to Kafka brokers in terms of holding data, but they don’t talk to clients – they only store and serve data to Pulsar brokers.",[324,11928,11929],{},"A ledger is BookKeeper’s term for a log, similar to a Kafka log segment but distributed. More formally, “A ledger is an append-only data structure with a single writer that is assigned to multiple BookKeeper storage nodes, or bookies”. Each ledger is replicated to some number of bookies (by default, 2 or 3 copies depending on config).",[324,11931,11932],{},"Pulsar topics are composed of one or more ledgers. Over time, as one ledger fills or a time\u002Fsize threshold is reached, the broker will create a new ledger for the topic. Earlier ledgers can be deleted when they’re fully consumed (if no longer needed due to retention).",[48,11934,11935],{},"Analogy: If you’re a Kafka engineer, imagine if each partition’s log was broken into segments (Kafka already does that) but each segment could live on different nodes from the broker and is replicated independently. That’s roughly how Pulsar uses ledgers. The broker writes a batch of messages to a ledger (which goes to multiple bookies) and when done, it can move to a new ledger. The metadata of which ledgers form a topic is stored (in ZooKeeper or metadata store). Consumers don’t need to know about ledgers; they just ask the broker for data, and the broker reads from the bookies as needed.",[48,11937,11938],{},"Why go to this trouble? Decoupling storage from brokers has several benefits:",[321,11940,11941,11944,11947],{},[324,11942,11943],{},"Independent Scaling: Need more storage? Add more bookies; need more throughput\u002Fconnection handling? Add more brokers. In Kafka, adding a broker adds both storage and serving capacity, but you have to rebalance data to use it. Pulsar can instantly use a new bookie for new ledgers without moving any existing data, and new brokers can start serving immediately by coordinating with existing bookies.",[324,11945,11946],{},"No Single Broker Hotspot for a Topic: In Kafka, one partition’s writes are handled by one broker at a time (the leader). In Pulsar, a topic’s ledgers could be on many bookies, and writes go to multiple bookies in parallel (though orchestrated by the broker). Read throughput can also be spread if needed.",[324,11948,11949],{},"Seamless Broker Restarts: If a Pulsar broker goes down, another broker can take over serving a topic by reading the ledgers from bookies – since the data wasn’t on the broker’s disk exclusively. In Kafka, if a broker goes down, partitions it hosted need a leader election and the new leader may serve stale data until it catches up, etc.",[40,11951,11953],{"id":11952},"ledgers-how-they-work","Ledgers: How They Work",[48,11955,11956],{},"A ledger is written by a Pulsar broker and replicated by BookKeeper to a set of bookies. By default, Pulsar might use a quorum of 2 or 3 bookies for each ledger (configurable with ensemble size, write quorum, ack quorum). For example, a common configuration is an ensemble of 3, write quorum 2, ack quorum 2 – meaning each entry (message) is written to 2 of 3 bookies and considered committed when 2 have written it (this gives one bookie tolerance for failure).",[48,11958,11959],{},"Key properties of ledgers:",[321,11961,11962,11965,11968],{},[324,11963,11964],{},"Only one writer (the broker) appends to a ledger. There’s no contention on writes – similar to how a Kafka partition is written by one leader.",[324,11966,11967],{},"Once a ledger is closed (either the broker closes it or the broker crashes, triggering a recovery close), it becomes read-only. No more writes happen to that ledger. This is like Kafka log segments being immutable once closed.",[324,11969,11970],{},"Ledgers have a unique ID and are internally stored on bookies with that ID. The mapping of which ledgers belong to a topic is stored in metadata (ZooKeeper) as a list.",[48,11972,11973],{},"When a consumer asks for data, the Pulsar broker handling that topic will read from the ledgers. If the data is recent, chances are it might still be in broker cache (Pulsar brokers cache recent entries in memory for speed). If not, the broker retrieves entries from the bookies. This process is transparent to the client.",[48,11975,11976],{},"Importantly, cursor (subscription) positions are also stored in BookKeeper as special ledgers (called cursor or managed ledger metadata). This means the acked positions are durable and if a broker or client crashes, the subscription doesn’t lose track of where it was.",[48,11978,11979],{},"Caption: A Pulsar cluster with brokers (serving layer) and bookies (storage layer). Brokers publish entries to BookKeeper ledgers on multiple bookies, ensuring durability and replication. Consumers can connect to any broker to fetch data; the broker will read from bookies as needed. This architecture decouples message storage from the brokers’ lifecycle and load, enabling Pulsar’s horizontal scalability.",[48,11981,11982],{},"Some concrete numbers: BookKeeper is designed to handle many ledgers concurrently – thousands or more – and spread them across bookies. By using multiple disks (one for write-ahead journal, and others for storage) on each bookie, it can handle heavy concurrent writes and reads without the two interfering too much. This is why Pulsar can have very high throughput with many topics: the writes aren’t bottlenecked on a single disk or single node.",[40,11984,11986],{"id":11985},"lifecycle-of-a-topics-data","Lifecycle of a Topic’s Data",[48,11988,11989],{},"Consider a Pulsar topic (non-partitioned for simplicity). When messages start coming in:",[1666,11991,11992,11995,11998,12001],{},[324,11993,11994],{},"First Ledger: The broker creates a new ledger for that topic (say ledger ID 101). BookKeeper assigns 2 or 3 bookies to store it (e.g., bookies A, B, C with a quorum of 2). The broker writes each message to bookies A and B (for example) and once both have it, the message is considered persisted. This happens for all messages until perhaps a threshold.",[324,11996,11997],{},"Ledger Rollover: Pulsar may roll over to a new ledger after some conditions – e.g., a certain number of entries or time or broker restart. Let’s say after 1 million messages or a few minutes, ledger 101 is closed and ledger 102 is created on possibly a different set of bookies (could be A, C, D).",[324,11999,12000],{},"Subsequent Ledgers: This continues; a topic will have a sequence of ledgers. If a broker crashes in the middle of writing ledger 102, BookKeeper can recover that ledger (finish it) and a new broker will open a new ledger 103 to continue from the last position.",[324,12002,12003],{},"Deletion of Ledgers: If topic data expires due to retention or all messages in a ledger have been acknowledged by all subscriptions, Pulsar can delete that ledger to free storage. Essentially, older ledgers get trimmed off the “log” once they are not needed. This is similar in effect to Kafka’s log retention deletion, but happens at a ledger granularity.",[48,12005,12006],{},"Ledgers thus allow Pulsar to have an infinite log without one huge file: old ledgers can be removed and new ledgers added. Also, if using tiered storage (another feature), older ledgers can even be offloaded to cold storage (like S3) seamlessly, since they are self-contained units of data.",[40,12008,12010],{"id":12009},"bookkeeper-vs-kafka-storage-summing-up-differences","BookKeeper vs Kafka Storage: Summing up Differences",[321,12012,12013,12016,12019,12022],{},[324,12014,12015],{},"Disaggregated vs Integrated: Pulsar+BookKeeper is a two-layer system (compute\u002Fserve vs storage), whereas Kafka is one-layer (brokers do both). This means Pulsar can scale storage independently and recover faster from broker failure since data lives separately.",[324,12017,12018],{},"Fine-grained Replication: Every message in Pulsar is written to multiple bookies immediately (satisfying the ack quorum). In Kafka, the leader writes to its disk then sends to followers; there can be a replication lag. Pulsar’s BookKeeper replication is effectively parallel writes – which can reduce the window of data loss if a node dies (as long as ack quorum was met, data is safe).",[324,12020,12021],{},"Throughput and Ordering: In Kafka, a single partition is served by one broker, and order is per partition. In Pulsar, order is maintained per topic (or partition) as well, but the data serving could come from multiple bookies to one or more brokers. Pulsar ensures ordering by having the broker orchestrate reads\u002Fwrites of a topic’s ledgers in sequence. You won’t get out-of-order messages because the broker knows the ledger sequence and the entry IDs within ledgers.",[324,12023,12024],{},"Managed Ledger Abstraction: Pulsar introduces the concept of a “managed ledger” – this is an abstraction that represents the log for a topic, composed of one or more ledgers internally. The managed ledger handles creating new ledgers, closing them, and keeping track of cursors (subscription positions) in those ledgers. Kafka doesn’t expose anything like this because it doesn’t need to – the log is just the file on disk. But Pulsar’s managed ledger is a powerful component that handles a lot of complexity for you (e.g., it knows when a ledger can be deleted because all subscriptions have passed it).",[40,12026,12028],{"id":12027},"operational-notes-for-kafka-folks","Operational Notes for Kafka Folks",[321,12030,12031,12034,12037],{},[324,12032,12033],{},"Bookie failures: If a bookie dies, any ledgers that had entries on that bookie still have at least one other copy (if replication was set to >=2). BookKeeper will auto-replicate the missing fragments to another bookie to re-establish the desired redundancy. This is somewhat analogous to Kafka’s leader election and replication catch-up, but at a fragment level. As an operator, replacing a bookie doesn’t require moving whole topics manually; BookKeeper handles re-replication of just the missing data fragments.",[324,12035,12036],{},"Adding bookies: New bookies will start receiving ledger fragments for new ledgers immediately (Pulsar will typically not move existing ledger data onto them, which is fine as new data balances out). This is much less work than Kafka reassigning lots of partitions to a new broker.",[324,12038,12039],{},"Monitoring: Monitor bookie disk usage, and bookie journals. If a bookie becomes slow or falls behind, it could bottleneck writes. Pulsar brokers expose metrics and stats for managed ledgers and bookie client performance. Tools like pulsar-admin brokers healthcheck and BookKeeper’s own metrics (like ledger count, journal latency) are valuable.",[48,12041,12042],{},"By now, hopefully the mysterious terms “bookies and ledgers” have solid meaning. You can see that Pulsar’s durability and streaming prowess come from BookKeeper under the hood. It provides Pulsar with the ability to handle per-message replication and storage in a very scalable way.",[48,12044,12045],{},"In the next part, we will shift gears to the consumption side of Pulsar – Subscriptions & Consumers. This is where Pulsar’s flexibility (multiple subscription modes) shows how it can act like Kafka or a traditional messaging system depending on what you need.",[40,12047,8924],{"id":8923},[321,12049,12050,12053,12056,12059,12062],{},[324,12051,12052],{},"Pulsar offloads data storage to Apache BookKeeper. Brokers are stateless serving nodes, while bookies store message data. This decoupling is a fundamental difference from Kafka’s monolithic broker storage.",[324,12054,12055],{},"Ledgers are BookKeeper’s unit of storage – think of them as distributed log segments. A Pulsar topic’s data consists of a sequence of ledgers, each replicated to multiple bookies for durability. This provides built-in multi-copy storage and automatic failover at the storage layer.",[324,12057,12058],{},"Pulsar achieves durability by writing messages to multiple bookies (e.g., 2-3 copies) before acknowledging the producer. This is analogous to Kafka’s ACKS=all with min.insync.replicas, but Pulsar’s design makes every write go to a quorum of bookies in parallel.",[324,12060,12061],{},"Brokers do not hold persistent data, so recovery from failures is fast – any broker can serve a topic by reading from the ledgers on bookies. In Kafka, a broker failure means ts partitions need a new leader and possibly data catch-up; in Pulsar, the data was never lost (other brokers can pick up where it left off).",[324,12063,12064],{},"Operating Pulsar involves managing both brokers and bookies. The complexity of BookKeeper is mostly hidden from the end user, but understanding it helps in tuning (e.g., ledger sizes, number of bookie nodes). For a Kafka engineer, the concept of separating serving and storage may be new, but it’s the backbone of Pulsar’s scalability and reliability.",[48,12066,3931],{},[208,12068],{},[48,12070,3931],{},[48,12072,8956,12073,8960],{},[55,12074,5405],{"href":6135,"rel":12075},[264],[48,12077,8963],{},[48,12079,12080],{},[55,12081,8970],{"href":8968,"rel":12082},[264],[48,12084,12085],{},[55,12086,8976],{"href":7969,"rel":12087},[264],[48,12089,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":12091},[12092,12093,12094,12095,12096,12097],{"id":11914,"depth":19,"text":11915},{"id":11952,"depth":19,"text":11953},{"id":11985,"depth":19,"text":11986},{"id":12009,"depth":19,"text":12010},{"id":12027,"depth":19,"text":12028},{"id":8923,"depth":19,"text":8924},"2025-08-12","Learn how Apache Pulsar stores data using Apache BookKeeper's ledgers and bookies, providing a decoupled, scalable, and highly available storage architecture compared to Kafka.","\u002Fimgs\u002Fblogs\u002F689b52280db497fdd1646215_03.-Ledgers-&-Bookies.png",{},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-3-ledgers-bookies",{"title":11903,"description":12099},"blog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-3-ledgers-bookies",[821,7347,12106],"BookKeeper","qJg79HFRylQqr5BzcJdQ45JWqH8OkP0FTzH6XJMbzOU",{"id":12109,"title":12110,"authors":12111,"body":12112,"category":821,"createdAt":290,"date":12247,"description":12248,"extension":8,"featured":294,"image":12249,"isDraft":294,"link":290,"meta":12250,"navigation":7,"order":296,"path":12251,"readingTime":4475,"relatedResources":290,"seo":12252,"stem":12253,"tags":12254,"__hash__":12255},"blogs\u002Fblog\u002Fwhy-geo-replication-matters-for-multi-cloud-and-hybrid-streaming.md","Why Geo-Replication Matters for Multi-Cloud and Hybrid Streaming (Geo-Replicated Architectures with Kafka and Pulsar 1\u002F6)",[6785],{"type":15,"value":12113,"toc":12240},[12114,12116,12130,12134,12137,12140,12143,12147,12150,12153,12156,12160,12163,12180,12183,12187,12190,12193,12196,12198,12215,12217,12219,12221,12226,12228,12233,12238],[48,12115,8648],{},[321,12117,12118,12121,12124,12127],{},[324,12119,12120],{},"Geo-replication is the practice of copying streaming data across multiple regions or cloud environments in real time. It underpins disaster recovery (DR), high availability (HA), and low-latency local access in modern data architectures.",[324,12122,12123],{},"In multi-region, multi-cloud, and hybrid-cloud deployments, geo-replication ensures your streaming platform continues running despite regional outages. It keeps data redundant and consistent across data centers, enabling failover without data loss.",[324,12125,12126],{},"Geo-replication also improves performance by serving users from the closest region and reducing cross-region traffic. Even if one cloud or data center fails, others have up-to-date data to take over.",[324,12128,12129],{},"Both Apache Kafka and Apache Pulsar support geo-replication (Kafka via external tools, Pulsar built-in). This series will explore how each approaches it and how to bridge Kafka and Pulsar for a resilient, hybrid streaming ecosystem.",[40,12131,12133],{"id":12132},"the-need-for-geo-replication-in-modern-streaming","The Need for Geo-Replication in Modern Streaming",[48,12135,12136],{},"Today’s data streaming applications demand global reliability. Whether you’re connecting user activity logs from multiple continents or ensuring an IoT pipeline never goes down, having your data in one place is a risk. Geo-replication – replicating data across geographically distributed clusters – addresses this by keeping multiple copies of data in different locations. If one region experiences a disaster or outage, another region’s cluster can immediately take over with an up-to-date copy of the data. In other words, geo-replication is the linchpin of an effective disaster recovery plan for streaming systems.",[48,12138,12139],{},"High availability goes hand-in-hand with disaster recovery. In a multi-region deployment, even a complete data center outage won’t halt your event streams – consumers and producers can fail over to a healthy region with minimal disruption. For example, a mission-critical Kafka cluster can continue serving applications from a secondary region if the primary region goes down. A geo-replicated Pulsar topic remains available and consistent in surviving regions even if one cluster is offline. The result is near-zero downtime and business continuity for streaming services.",[48,12141,12142],{},"Geo-replication also enables low-latency data access for globally distributed users. By replicating data to multiple geographic regions, you can serve users from the cluster nearest to them, avoiding high latencies of cross-continent data fetches. As an added benefit, this often reduces cloud egress costs and network bottlenecks. Apache Pulsar’s documentation notes that geo-replication provides “low-latency access to data for consumers in different locations,” since data is available in-region rather than halfway around the world. In Apache Kafka ecosystems, it’s common to replicate topics to local clusters on each coast or each continent, so consumers always read from a nearby cluster. In summary, geo-replication brings data closer to your users, improving performance and user experience.",[40,12144,12146],{"id":12145},"multi-region-multi-cloud-and-hybrid-cloud-contexts","Multi-Region, Multi-Cloud, and Hybrid-Cloud Contexts",[48,12148,12149],{},"Modern architectures increasingly span multiple clouds and on-premises data centers. You might have a streaming pipeline that collects events in your private data center (for compliance reasons) but aggregates and analyzes them in a public cloud. Or, you might deploy clusters in AWS, GCP, and Azure to avoid vendor lock-in. Geo-replication is critical in these scenarios to keep data flowing across heterogeneous environments. It ensures that a message produced in one environment (say, an on-prem Kafka cluster) can be automatically copied to another environment (say, a cloud-based Pulsar cluster) for backup or combined processing.",[48,12151,12152],{},"In hybrid-cloud streaming, where an organization runs streaming platforms both on-premises and in the cloud, geo-replication enables a unified, resilient data fabric. For example, an on-prem Pulsar cluster can continuously replicate topics to a Pulsar cluster in the cloud, providing an off-site backup and feeding cloud-based analytics. Conversely, a cloud Kafka service could replicate to an on-prem Kafka cluster to satisfy data residency requirements or to integrate with local systems. The ability to bridge on-prem and cloud through geo-replication means you can migrate or burst workloads to the cloud without stopping the data flow.",[48,12154,12155],{},"Multi-cloud setups benefit similarly: if you have Kafka clusters in AWS and Azure, setting up geo-replication (through Kafka’s MirrorMaker or Confluent Cluster Linking) between them means each cloud has the full data stream. Users or services in each cloud get local access with minimal latency, and if one cloud has an outage, the other can pick up seamlessly. Geo-replication essentially decouples your streaming availability from any single region or provider’s uptime.",[40,12157,12159],{"id":12158},"key-benefits-recap","Key Benefits Recap",[48,12161,12162],{},"To summarize the benefits of geo-replication in hybrid streaming:",[321,12164,12165,12168,12171,12174,12177],{},[324,12166,12167],{},"Disaster Recovery: By maintaining live copies of data in multiple locations, geo-replication provides strong fault tolerance. If one region or cluster fails due to network outages, power loss, etc., consumers and producers can fail over to a replica in another region with no data loss. Your data streaming applications continue operating even if an entire region goes offline.",[324,12169,12170],{},"High Availability & Resilience: Even during normal operations, geo-replication keeps your system resilient to localized failures. Individual brokers or entire clusters can be taken down for maintenance or due to incidents, and clients can switch to a healthy cluster. The system remains continuously available, meeting the strict uptime requirements of modern applications.",[324,12172,12173],{},"Low Latency for Global Users: Geo-replication improves performance by placing data near users. A message produced in Europe can be consumed from a European cluster, one produced in Asia from an Asian cluster, etc., after being replicated. This avoids long WAN round-trips for data access. In Apache Pulsar, for example, producers and consumers can operate in different regions while still achieving low latency, thanks to geo-replication delivering messages to all regions in parallel.",[324,12175,12176],{},"Data Locality & Compliance: In multi-national operations, you may need data to reside in certain countries or clouds for compliance. Geo-replication lets you funnel specific data streams to specific regions (e.g. replicate only a subset of topics to a European cluster to comply with EU data residency, while keeping full copies in a U.S. cluster). Kafka’s MirrorMaker 2 supports filtering specific topics for replication, allowing data isolation strategies for security\u002Fprivacy.",[324,12178,12179],{},"Scalability and Load Balancing: By distributing streaming load across regions, geo-replication can also act as a load-balancing mechanism. Perhaps one region produces the majority of events and another region mostly consumes; replicating data both ways can balance read load. In Kafka, an “active-active” deployment with bidirectional mirroring can enable regional services to produce and consume on their local cluster while exchanging data with other regions as needed. (We’ll discuss the complexity of active-active setups later in the series.)",[48,12181,12182],{},"In short, geo-replication is not just a “nice-to-have” but often a requirement for enterprise streaming systems. It’s what turns a single-region message bus into a globally resilient streaming platform.",[40,12184,12186],{"id":12185},"kafka-and-pulsar-different-approaches-same-goals","Kafka and Pulsar: Different Approaches, Same Goals",[48,12188,12189],{},"Both Apache Kafka and Apache Pulsar recognize the importance of geo-replication, but they implement it in distinct ways. Kafka historically relies on external tools (like MirrorMaker) or add-on features to replicate across clusters, whereas Pulsar builds geo-replication into its core brokers. The next posts in this series will dive into each: we’ll explore Kafka’s approach to multi-region streaming (and the challenges of using MirrorMaker or other techniques), and then Pulsar’s built-in geo-replication and multi-cloud design that make it stand out.",[48,12191,12192],{},"Throughout, we’ll also touch on the reality that many organizations use both Kafka and Pulsar. Perhaps you use Kafka for some legacy systems and Pulsar for new workloads, and you want them to interoperate. We’ll look at cross-platform streaming as a secondary theme – for instance, how to bridge a Kafka pipeline with a Pulsar pipeline in a hybrid cloud. In Part 5, we’ll specifically introduce StreamNative’s UniLink, a tool designed to bridge Kafka and Pulsar streams in a seamless, low-friction way.",[48,12194,12195],{},"By the end of this series, you’ll have a clear understanding of not only why geo-replication is critical for hybrid and multi-cloud streaming, but also how to implement it using Kafka, Pulsar, or a combination of both. You’ll be equipped with an architect’s perspective on designing a resilient, geographically distributed streaming platform.",[40,12197,8924],{"id":8923},[321,12199,12200,12203,12206,12209,12212],{},[324,12201,12202],{},"Geo-replication is essential for disaster recovery and high availability in streaming systems. It keeps your data streaming even if a whole region or cloud goes down.",[324,12204,12205],{},"By replicating data closer to end users, geo-replication also reduces latency and improves performance for global applications.",[324,12207,12208],{},"Multi-region and multi-cloud architectures rely on geo-replication to synchronize data across environments – whether on-premises to cloud, or AWS to Azure – ensuring consistency and compliance.",[324,12210,12211],{},"Apache Kafka and Apache Pulsar both support geo-replication but via different means: Kafka typically uses external tools (MirrorMaker2, etc.), whereas Pulsar has out-of-the-box geo-replication built into the broker layer.",[324,12213,12214],{},"In hybrid setups that use both Kafka and Pulsar, bridging the two ecosystems is possible (e.g., via connectors or specialized tools like StreamNative UniLink). This allows organizations to leverage the strengths of each in a single resilient platform. Next, we’ll examine Kafka’s native approach to geo-replication in depth.",[48,12216,3931],{},[208,12218],{},[48,12220,3931],{},[48,12222,8956,12223,8960],{},[55,12224,5405],{"href":6135,"rel":12225},[264],[48,12227,8963],{},[48,12229,12230],{},[55,12231,8970],{"href":8968,"rel":12232},[264],[48,12234,12235],{},[55,12236,8976],{"href":7969,"rel":12237},[264],[48,12239,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":12241},[12242,12243,12244,12245,12246],{"id":12132,"depth":19,"text":12133},{"id":12145,"depth":19,"text":12146},{"id":12158,"depth":19,"text":12159},{"id":12185,"depth":19,"text":12186},{"id":8923,"depth":19,"text":8924},"2025-08-07","Explore why geo-replication is essential for modern, multi-cloud, and hybrid streaming architectures. Learn how Apache Kafka and Apache Pulsar handle it—and how to bridge both for resilience, performance, and compliance.","\u002Fimgs\u002Fblogs\u002F6894cb6bc37a442a5f8d8458_01.-Why-Geo-Replication-Matters-for-Multi-Cloud-and-Hybrid-Streaming.png",{},"\u002Fblog\u002Fwhy-geo-replication-matters-for-multi-cloud-and-hybrid-streaming",{"title":12110,"description":12248},"blog\u002Fwhy-geo-replication-matters-for-multi-cloud-and-hybrid-streaming",[11899,799,821],"GY0rGlBC5vKJeQmgJiN-V7FFyB5o53wnyLouvw3ilC8",{"id":12257,"title":12258,"authors":12259,"body":12260,"category":821,"createdAt":290,"date":12488,"description":12489,"extension":8,"featured":294,"image":12490,"isDraft":294,"link":290,"meta":12491,"navigation":7,"order":296,"path":12492,"readingTime":3556,"relatedResources":290,"seo":12493,"stem":12494,"tags":12495,"__hash__":12496},"blogs\u002Fblog\u002Fat-least-once-exactly-once-and-acks-in-pulsar.md","At-Least-Once, Exactly-Once, and Acks in Pulsar (Pulsar Guide for RabbitMQ\u002FJMS Engineers 3\u002F10)",[808,809,810],{"type":15,"value":12261,"toc":12477},[12262,12264,12267,12271,12274,12288,12291,12294,12297,12301,12304,12307,12310,12314,12317,12321,12324,12327,12335,12338,12341,12345,12348,12351,12354,12358,12361,12364,12367,12372,12375,12378,12382,12385,12411,12415,12418,12421,12423,12446,12449,12452,12454,12456,12458,12463,12465,12470,12475],[3933,12263,10737],{"id":7357},[48,12265,12266],{},"Pulsar ensures at-least-once delivery by persisting messages until consumers acknowledge them. In practice, this is similar to RabbitMQ’s and JMS’s default behavior – you won’t lose messages, but you could see duplicates if something fails and a message is re-delivered. Pulsar also offers features for effectively-once processing: it has automatic message deduplication on the broker side and introduced transactions for true end-to-end exactly-once semantics in complex workflows. In this post, we’ll explain how acknowledgments work in Pulsar (individual vs cumulative acks), how Pulsar handles redeliveries and duplicates, and how you can achieve “exactly-once” delivery guarantees using Pulsar’s features (which is something neither RabbitMQ nor JMS natively provide without external coordination).",[40,12268,12270],{"id":12269},"understanding-at-least-once-delivery-in-pulsar","Understanding At-Least-Once Delivery in Pulsar",[48,12272,12273],{},"By default, Pulsar follows an at-least-once delivery model. This means every message sent to Pulsar will be delivered to consumers at least once. It might be delivered more than once in some failure scenarios, but Pulsar will never intentionally drop a message that hasn’t been acknowledged. Let’s break down what that means:",[321,12275,12276,12279,12282,12285],{},[324,12277,12278],{},"When a message is published to a Pulsar topic, it’s stored durably (on disk via BookKeeper) and added to each subscription’s backlog.",[324,12280,12281],{},"A consumer receives the message and processes it. Until the consumer sends an acknowledgment back to the broker, that message remains in the backlog marked as “unacknowledged”.",[324,12283,12284],{},"If the consumer fails to ack (e.g., it crashes or times out), the broker will re-deliver that message (either to the same consumer once it reconnects, or to another consumer if using a shared subscription).",[324,12286,12287],{},"Because of this re-delivery on no-ack, the consumer might end up seeing the same message again – hence “at least once”.",[48,12289,12290],{},"This is analogous to how RabbitMQ works when you use manual acknowledgments (basic_ack). If a RabbitMQ consumer dies before acking, RabbitMQ will requeue and redeliver the message to another consumer (or the same one when it comes back), resulting in a potential duplicate delivery to the application. JMS similarly, in CLIENT_ACK or transactional sessions, will redeliver unacked messages on restart or rollback.",[48,12292,12293],{},"Pulsar’s ack mechanism: In Pulsar, an acknowledgment is an explicit signal. The default mode in the client API is manual ack – your consumer code calls consumer.acknowledge(messageId) (or acknowledges cumulatively, which we’ll discuss shortly). Pulsar then knows it can mark that message as processed. Only after acking does Pulsar consider the message permanently done for that subscription. Until then, it’s retained.",[48,12295,12296],{},"Now, how long will Pulsar wait to redeliver if a consumer doesn’t ack? Pulsar has a concept of acknowledgment timeout. If you set an ack timeout on the consumer (say 30 seconds), then if the consumer hasn’t acked a given message within 30s of receiving it, the broker considers that consumer “stuck” and will try delivering that message to another consumer (in a shared subscription scenario) or the same consumer again. The message is not removed until acknowledged. If ack timeout is not set, Pulsar will only redeliver on certain events like the client disconnecting. Additionally, a consumer can explicitly negative-ack a message (tell the broker “I’ve failed to process this, please redeliver sooner”) to speed up re-delivery without waiting for a timeout.",[40,12298,12300],{"id":12299},"cumulative-vs-individual-acknowledgments","Cumulative vs Individual Acknowledgments",[48,12302,12303],{},"Pulsar has a feature JMS does not: cumulative acknowledgment. This is a bit like saying “I acknowledge everything up to message X”. It’s useful for high throughput when you process messages in order and want to reduce ack traffic. For example, if a consumer receives messages 1,2,3,...10 sequentially, instead of sending ten separate acks, it could send one cumulative ack for message 10, which signals to the broker “messages 1 through 10 are all acknowledged”. This only works in subscription modes where ordering is guaranteed (exclusive or failover subs) and not in shared mode (because shared mode might deliver out-of-order across consumers). It’s an optimization detail, but good to know Pulsar supports it.",[48,12305,12306],{},"Individual ack is the normal mode: ack each message as you process it, which you’d do in a shared subscription or any scenario where you might skip around.",[48,12308,12309],{},"Negative ack (nack): Pulsar consumers can send a negative acknowledgment for a message they failed to process, prompting immediate re-delivery (instead of waiting for a timeout). RabbitMQ has a similar concept: basic.nack\u002Fbasic.reject to requeue or dead-letter a message. JMS typically would rely on not acknowledging or rolling back to signal failure.",[40,12311,12313],{"id":12312},"at-most-once-mode","At-Most-Once Mode?",[48,12315,12316],{},"At-most-once would mean a message is either delivered once or not at all (no duplicates, but possibly dropped on failure). Pulsar by design doesn’t drop messages without ack. However, if you wanted at-most-once behavior (for example, maybe you don’t care if the message is lost on failure, but you want to avoid duplicates at all cost), the way to approximate that is to enable auto-ack (so the client acks as soon as it receives a message, before processing). In that case, if the app crashes during processing, the message was already acked and will not be redelivered – so you might have lost it (didn’t finish processing) = at-most-once. But that’s usually not what you want for reliable systems. Pulsar defaults to at-least-once to favor reliability.",[40,12318,12320],{"id":12319},"dealing-with-duplicates-effective-exactly-once-processing","Dealing with Duplicates: Effective “Exactly-Once” Processing",[48,12322,12323],{},"Pure exactly-once delivery (where the messaging system guarantees a message is never delivered more than once to any consumer) is a hard problem in distributed systems, especially without heavy transactional coordination. Neither RabbitMQ nor JMS brokers guarantee exactly-once delivery to consumers out of the box – they guarantee at-least-once, and it’s up to the consumer to handle deduplicating if needed.",[48,12325,12326],{},"Pulsar’s improvements: Pulsar provides a couple of features to minimize duplicates:",[1666,12328,12329,12332],{},[324,12330,12331],{},"Message Deduplication on the broker: Pulsar brokers can detect and eliminate duplicates that occur due to producer retries. For example, if a producer sends the same message again (maybe it didn’t get an ack and retried), Pulsar can discard the duplicate if it has the same unique sequence ID from that producer. This is a server-side dedup so that the topic doesn’t even get the duplicate persisted. To use it, you enable broker deduplication (and the producer must either provide sequence IDs or let Pulsar auto-assign them). This is great for preventing the classic duplicate that happens when a producer retry goes through (e.g., network glitch causing the producer to think message wasn’t sent and send it again). Pulsar will store a small cache of recent message IDs to compare and drop dups. RabbitMQ doesn’t have an equivalent feature – if a producer re-sends, Rabbit will just queue it again, so consumers may see duplicates if the producer logic doesn’t handle it. JMS doesn’t standardize this either (some JMS brokers have “duplicate delivery check” features, but not universal).",[324,12333,12334],{},"Transactions and Exactly-Once Semantics: Pulsar introduced a transaction mechanism that allows a producer and consumer to participate in an atomic operation. Essentially, a consumer can consume messages and produce results to another topic within a transaction, and commit it such that either both the ack and the new message publish happen or neither do. With this, Pulsar can achieve end-to-end exactly-once in a pipeline (e.g., when using Flink or Pulsar Functions). If the transaction is aborted, Pulsar will roll back (meaning it will not ack the inputs, so they’ll be redelivered, and it will discard any outputs). If committed, it will make sure the ack is persisted and outputs are visible, exactly once. This feature is powerful for streaming jobs that read from a topic and write to another – it prevents duplicates in the output even if the job restarts. Implementing that in RabbitMQ or JMS typically involves external transactions (like using a database as a fence or two-phase commit between the queue and the processing outcome). Pulsar has it built-in for its ecosystem (since 2.8.0).",[48,12336,12337],{},"It’s worth noting that “exactly-once” in messaging is often achieved at the processing level rather than literally one and only one delivery. Pulsar’s documentation talks about “effectively-once” processing, meaning through deduplication + proper design you can ensure each effect (like a database update or a downstream event) happens once. The broker may deliver something twice, but your application or the system deduplicates such that the end result doesn’t double-count.",[48,12339,12340],{},"Where JMS stands: JMS doesn’t guarantee exactly-once delivery either. The closest is if you use JMS in a transacted session, you can get exactly-once processing within that transaction – either you consume and commit (so you won’t see it again) or rollback (so it’s as if you never got it). But that’s still at-least-once at the system level; exactly-once globally requires coordination outside JMS (like the two-phase commit with XA if integrating with a database).",[40,12342,12344],{"id":12343},"handling-acknowledgments-in-practice","Handling Acknowledgments in Practice",[48,12346,12347],{},"RabbitMQ users: Think of Pulsar’s ack like basic_ack. You should ack after you’ve processed the message. If you fail to process, you can either not ack (and allow redelivery) or negative ack (to expedite requeue). There’s no direct equivalent of RabbitMQ’s basic.reject requeue=false (which dead-letters or drops a message) except to implement a Dead Letter Topic policy or simply ack & drop. We’ll cover Dead Letter Topics in the next post, but basically, Pulsar can automatically route messages that keep failing to a special “DLQ” topic after a max redelivery count.",[48,12349,12350],{},"JMS users: Pulsar’s manual ack is like CLIENT_ACKNOWLEDGE mode (where you call message.acknowledge()). If you used AUTO_ACK in JMS, then to replicate that you’d just call ack as soon as you get the message or use a listener that auto-acks. Pulsar doesn’t have the concept of DUPS_OK_ACKNOWLEDGE (which JMS had for potentially lazy acks). And for JMS transacted sessions, the analogy would be using Pulsar transactions if you truly need atomic consume+produce. But for most cases, you commit processing by acking the message.",[48,12352,12353],{},"A nice thing about Pulsar: acknowledgments can be asynchronous (non-blocking). When you call consumer.acknowledgeAsync(msgId), the client will send the ack to broker in the background while your code can move on. This helps keep throughput high (you don’t wait for an ack round-trip each time).",[40,12355,12357],{"id":12356},"exactly-once-processing-with-pulsar-a-quick-example","Exactly-Once Processing with Pulsar: A Quick Example",[48,12359,12360],{},"To illustrate how Pulsar can do what others can’t, let’s outline a scenario:",[48,12362,12363],{},"Suppose we have a system that reads messages from an “input” topic, does some transformation, and writes to an “output” topic. We want to ensure that each input message results in exactly one output message, even if crashes happen.",[48,12365,12366],{},"Using plain at-least-once, if our consumer processes a message and publishes the result, but crashes before acking, Pulsar will redeliver that input message and the consumer will process it again, producing a duplicate output. How to avoid that?",[321,12368,12369],{},[324,12370,12371],{},"With Pulsar Transactions: We can start a Pulsar transaction, consume the message, produce the output message within the transaction, then commit the transaction. Pulsar will ensure the ack for input and the publish for output are atomic. If crash happens before commit, none of it is visible (so no ack, input will replay, but also no output published). If commit succeeds, input is acked and output is published once. This way, the output topic will not have duplicates, and input won’t be reprocessed erroneously.",[48,12373,12374],{},"Without transactions, one could still achieve idempotency by including a unique identifier from the input in the output and having consumers or downstream deduplicate, but that’s more work on the user side. Pulsar’s transactions aim to handle it in the messaging layer.",[48,12376,12377],{},"It’s advanced and currently used with frameworks like Flink for exactly-once streaming jobs. For many use cases, enabling broker deduplication is sufficient to avoid producer-side duplicates, and carefully handling consumer logic (so it can tolerate the rare duplicate by ignoring if it sees one) achieves effectively-once processing.",[40,12379,12381],{"id":12380},"acknowledgment-api-summary","Acknowledgment API Summary",[48,12383,12384],{},"Here’s a quick summary of Pulsar acknowledgment-related APIs and features:",[321,12386,12387,12390,12393,12396,12399,12402,12405,12408],{},[324,12388,12389],{},"consumer.acknowledge(msgId) – ack a single message.",[324,12391,12392],{},"consumer.acknowledgeCumulative(msgId) – ack this and all earlier messages in the subscription (only for ordered subs) in one go.",[324,12394,12395],{},"consumer.negativeAcknowledge(msgId) – signal a failure on this message; broker will redeliver it after a short delay (by default).",[324,12397,12398],{},"Ack timeout (set via ConsumerBuilder.ackTimeout(duration)) – if set, broker will automatically treat unacked messages as needing redelivery after this timeout.",[324,12400,12401],{},"By default, no ack timeout is set, so broker waits indefinitely until the consumer dies or negative acks.",[324,12403,12404],{},"Pulsar will mark acknowledged messages as deletable. If all subscriptions ack a message, it’s removed from storage (unless retention is keeping it for some time).",[324,12406,12407],{},"Unacknowledged messages live in the backlog. If a consumer reconnects, it’ll receive those messages.",[324,12409,12410],{},"Exactly-once via transactions: Use the transactional API (PulsarClient.newTransaction) to encompass consume and produce operations. This is a more complex API, not used unless you specifically need it.",[40,12412,12414],{"id":12413},"what-about-ordering-and-redelivery-ordering","What About Ordering and Redelivery Ordering?",[48,12416,12417],{},"One nuance: There is no ordering guarantee for Pulsar’s Shared subscription. If ordering is crucial, you typically would be using an Failover subscription (1 active consumer) or Key_Shared (to maintain per-key order). In those cases, if a message is not acked, you usually stop processing subsequent ones (or use cumulative ack) to maintain order.",[48,12419,12420],{},"Using negative ack on an Exclusive or Failover sub can break ordering if you continue with later messages. So the recommended pattern is, if you care about order, don’t ack out of order. Handle the failure out-of-band (like send to DLQ) or stop consumption until you can ack.",[40,12422,8924],{"id":8923},[321,12424,12425,12428,12431,12434,12437,12440,12443],{},[324,12426,12427],{},"At-least-once is the default: Pulsar, like RabbitMQ and JMS, will do everything to ensure a message is not lost – storing it until acknowledged. This means duplicates are possible on failures. You should design consumers to handle the occasional duplicate message.",[324,12429,12430],{},"Acks are explicit and crucial: Your Pulsar consumers must acknowledge messages after processing. Until you ack, the broker assumes you haven’t finished and will resend if needed. Pulsar gives you tools like cumulative ack and ack timeouts to manage this efficiently.",[324,12432,12433],{},"No auto-drop: Pulsar won’t drop messages that aren’t acked (unless you explicitly configure a TTL). There’s no equivalent of JMS’s Session.AUTO_ACKNOWLEDGE where messages are implicitly acked upon receipt – in Pulsar, ack happens when you call it (or if using a listener, when the framework acks after your callback returns).",[324,12435,12436],{},"Duplicates mitigation: Pulsar broker can deduplicate messages on the producer side when enabled, eliminating duplicates caused by producer retries. This is something RabbitMQ doesn’t do internally.",[324,12438,12439],{},"Exactly-once capabilities: Pulsar is one of the few messaging systems in its class that provides a transactional mechanism for true exactly-once delivery in complex workflows. This is advanced and typically used with stream processing frameworks, but it’s there. For simpler cases, you can often reach “effectively-once” by using deduplication and careful consumer design.",[324,12441,12442],{},"Comparison to RabbitMQ\u002FJMS transactions: RabbitMQ’s handling of acknowledgments is simpler (it has no multi-message transactions beyond acknowledging multiple deliveries in one go). JMS has the notion of sessions and transactions, but coordinating an exactly-once outcome often required XA transactions with an external resource. Pulsar’s built-in transaction support and end-to-end exactly-once for consume-process-produce scenarios is a step beyond what traditional brokers offer, giving Pulsar an edge for building reliable data pipelines.",[324,12444,12445],{},"Negative acks and redelivery: You can signal failures explicitly with negative acks, and Pulsar will requeue the message for redelivery quickly, helping you implement retry logic. This is similar to basic.nack in RabbitMQ.",[48,12447,12448],{},"In summary, Pulsar’s acknowledgment and delivery semantics are robust and similar to what queue veterans expect, with some extra goodies (like dedup and transactions) for those who need that extra level of guarantee. In the next post, we’ll look at how Pulsar’s concept of subscriptions can be used to mimic various queueing patterns, specifically focusing on how Shared and Failover subscription modes work – essentially, how Pulsar “queues” actually operate under the hood.",[48,12450,12451],{},"Stay tuned to understand how “Queues are just subscriptions” in Pulsar and how that simplifies scaling and failover.",[48,12453,3931],{},[208,12455],{},[48,12457,3931],{},[48,12459,8956,12460,8960],{},[55,12461,5405],{"href":6135,"rel":12462},[264],[48,12464,8963],{},[48,12466,12467],{},[55,12468,8970],{"href":8968,"rel":12469},[264],[48,12471,12472],{},[55,12473,8976],{"href":7969,"rel":12474},[264],[48,12476,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":12478},[12479,12480,12481,12482,12483,12484,12485,12486,12487],{"id":12269,"depth":19,"text":12270},{"id":12299,"depth":19,"text":12300},{"id":12312,"depth":19,"text":12313},{"id":12319,"depth":19,"text":12320},{"id":12343,"depth":19,"text":12344},{"id":12356,"depth":19,"text":12357},{"id":12380,"depth":19,"text":12381},{"id":12413,"depth":19,"text":12414},{"id":8923,"depth":19,"text":8924},"2025-08-06","Learn about Pulsar's message delivery guarantees, from at-least-once to effectively-once processing, and how it handles acknowledgments, redeliveries, and duplicates, with comparisons to RabbitMQ and JMS.","\u002Fimgs\u002Fblogs\u002F689cad93a3e3355c0da233ee_03.-At-Least-Once,-Exactly-Once,-and-Acks-in-Pulsar-1.png",{},"\u002Fblog\u002Fat-least-once-exactly-once-and-acks-in-pulsar",{"title":12258,"description":12489},"blog\u002Fat-least-once-exactly-once-and-acks-in-pulsar",[821,7347,11043],"JhOgW9g9if7QRbReh5KgVAnq8uv_uDVX14hEVAaiYho",{"id":12498,"title":12499,"authors":12500,"body":12501,"category":821,"createdAt":290,"date":12488,"description":12752,"extension":8,"featured":294,"image":12753,"isDraft":294,"link":290,"meta":12754,"navigation":7,"order":296,"path":12755,"readingTime":4475,"relatedResources":290,"seo":12756,"stem":12757,"tags":12758,"__hash__":12759},"blogs\u002Fblog\u002Fgoodbye-exchanges-how-pulsar-replaces-fanout-routing-and-headers.md","Goodbye Exchanges: How Pulsar Replaces Fanout, Routing, and Headers (Pulsar Guide for RabbitMQ\u002FJMS Engineers 2\u002F10)",[808,809,810],{"type":15,"value":12502,"toc":12742},[12503,12506,12509,12513,12516,12530,12533,12536,12539,12543,12546,12549,12552,12555,12558,12562,12565,12568,12571,12574,12578,12581,12584,12587,12590,12595,12598,12601,12604,12612,12616,12619,12622,12630,12633,12638,12641,12644,12647,12651,12654,12665,12668,12671,12674,12682,12685,12689,12692,12695,12697,12714,12717,12719,12721,12723,12728,12730,12735,12740],[48,12504,12505],{},"TL;DR:",[48,12507,12508],{},"Pulsar does away with RabbitMQ’s separate Exchange object – but it still lets you implement all the same messaging patterns (fanout broadcasts, selective routing, and even content-based routing) using topics, subscriptions, and a bit of application logic. In this post, we explain how to achieve RabbitMQ’s exchange types in Pulsar’s world. Fanout exchange? Just use one topic with multiple subscriptions (each subscription will get a copy of every message). Direct or topic exchanges (routing keys)? Use separate topics or metadata keys to route messages to where they need to go. Headers exchange (content-based routing)? Pulsar doesn’t route on message properties by itself, but we’ll show how you can use Pulsar Functions or client-side filtering to accomplish the same goal. By the end, you’ll see that although Pulsar’s model is simpler (just producers and topics), it’s flexible enough to replace the complex exchange bindings of RabbitMQ.",[40,12510,12512],{"id":12511},"recap-what-exchanges-do-rabbitmq-refresher","Recap: What Exchanges Do (RabbitMQ Refresher)",[48,12514,12515],{},"In RabbitMQ, an exchange is the routing intermediary that takes messages from producers and decides which queue(s) to send them to based on some rules. RabbitMQ has several built-in exchange types:",[321,12517,12518,12521,12524,12527],{},[324,12519,12520],{},"Direct exchange: routes messages to queues whose binding key exactly matches the message’s routing key. E.g., send with routing key \"us-west\", goes to the queue bound with \"us-west\".",[324,12522,12523],{},"Fanout exchange: routes messages to all bound queues, ignoring any routing key (broadcast).",[324,12525,12526],{},"Topic exchange: routes messages based on wildcard pattern matching of the routing key against the queue binding patterns (e.g., \"orders.*\" might catch \"orders.new\").",[324,12528,12529],{},"Headers exchange: routes based on message header values instead of a routing key (matching on a set of header key-value pairs).",[48,12531,12532],{},"These allow RabbitMQ to do complex in-broker routing logic. JMS, on the other hand, doesn’t have an explicit exchange concept; JMS Topics broadcast to all subscribers by default, and JMS Queue is point-to-point. Some JMS brokers offer filtering via message selectors, which let a consumer ask for only messages with certain properties, effectively offloading filtering logic to the broker.",[48,12534,12535],{},"Now, Apache Pulsar doesn’t use exchanges at all – producers send messages directly to a topic. So how can we replicate what exchanges do? The key is to remember that Pulsar topics are cheap and flexible, and consumers have the power to choose what they subscribe to (including using wildcard topic names). Also, Pulsar messages can carry a key and properties which applications can leverage for routing decisions.",[48,12537,12538],{},"Let’s go through each pattern:",[40,12540,12542],{"id":12541},"fanout-broadcast-one-message-to-all-subscribers","Fanout (Broadcast) – One Message to All Subscribers",[48,12544,12545],{},"RabbitMQ fanout exchange: Producer sends to an exchange of type “fanout”, which delivers the message to every queue bound to that exchange. Every consumer on those queues gets a copy of the message.",[48,12547,12548],{},"Pulsar approach: Use a single topic and give each subscribing group its own subscription name. As we saw in the first post, if two different subscriptions exist on the same topic, each subscription will receive every message. This naturally implements fanout. The producer just publishes to the topic normally – no special routing logic needed. Pulsar will ensure that Subscription A, Subscription B, etc., each get the message.",[48,12550,12551],{},"Example: Imagine you have an event that multiple services need to know about (like a “user.signup” event that both an email service and an analytics service should process). In RabbitMQ you might use a fanout exchange “user-events” bound to two queues (“emailQ” and “analyticsQ”). In Pulsar, you simply define a topic, say user-events, and have the email service subscribe with subscription name “email-service-sub” and the analytics service with “analytics-service-sub”. When a new user event is published to user-events topic, both subscriptions will get it (each service’s consumer gets its own copy). Under the hood, Pulsar retained the message until both subscriptions acknowledged it.",[48,12553,12554],{},"This pattern is straightforward: one topic, multiple subscriptions = broadcast. No exchange object or binding configuration required. When a new service needs the data, you just give it a new subscription name on that topic and it will start receiving all new messages from that point forward.",[48,12556,12557],{},"One thing to note: By default, if a new subscription is created, it begins at the latest message (i.e., it won’t see old messages sent before it existed). You can override this by specifying subscription options (like starting at earliest or a specific timestamp). But the typical behavior in pub-sub is that new subscribers only get new messages from the time they subscribe.",[40,12559,12561],{"id":12560},"direct-routing-pointing-messages-to-specific-consumers-or-queues","Direct Routing – Pointing Messages to Specific Consumers or Queues",[48,12563,12564],{},"RabbitMQ direct exchange: You have multiple routing keys and want messages to go only to the queue that is bound for that key. For example, in a stock trading system, you might tag price updates with a stock symbol and deliver each update only to the queue (service) handling that symbol.",[48,12566,12567],{},"Pulsar approach: In Pulsar, producers choose the topic to send to. So the simplest way to do what a direct exchange does is to use separate topics in the first place. For instance, instead of one exchange “prices” with routing keys for each symbol, you might have topics named prices.AAPL, prices.GOOG, etc. The producer for Apple stock updates simply sends to prices.AAPL topic, the Google producer to prices.GOOG, and so on. Consumers subscribe to the topic(s) they care about.",[48,12569,12570],{},"This might seem like moving complexity to the producer (since it must decide the topic), but remember in RabbitMQ the producer had to know the routing key and exchange anyway – not much different. In Pulsar, “topic” effectively replaces “exchange+routingKey” combination.",[48,12572,12573],{},"But what if you truly want to send to one topic and have the broker decide which subscriber should get it based on some key? Pulsar doesn’t have an exchange to do that routing decision for multiple subscriptions – typically you’d just use separate topics. However, Pulsar does allow consumers to use a topics pattern to subscribe to multiple topics in one go. This is analogous to RabbitMQ’s topic exchange wildcards but done on the consumer side. Let’s cover that next.",[40,12575,12577],{"id":12576},"topic-pattern-wildcard-similar-to-topic-exchanges","Topic Pattern (Wildcard) – Similar to Topic Exchanges",[48,12579,12580],{},"RabbitMQ topic exchange: Allows wildcard matching of routing keys. For example, route messages with routing key “error.crITICAL” to queues bound with pattern “error.*” or “#.CRITICAL”.",[48,12582,12583],{},"Pulsar approach: Instead of having one topic and multiple wildcard bindings, Pulsar encourages using the topic naming to categorize messages, and then consumers can subscribe using a regex pattern that matches multiple topics. Pulsar clients support subscribing to a regex pattern which will include all topics that match (and even auto-subscribe to new ones that match in the future).",[48,12585,12586],{},"For example, you could name topics by region: logs.us-west, logs.us-east, logs.eu, etc. A consumer can subscribe with a pattern logs.* to get all regions, or maybe logs.us-* to get just U.S. logs. This is powerful because it pushes the categorization to the topic namespace rather than a separate exchange layer.",[48,12588,12589],{},"How to use: In the Java client, you might do:",[48,12591,12592],{},[384,12593],{"alt":5878,"src":12594},"\u002Fimgs\u002Fblogs\u002F689363687196382ef5228f05_1.png",[48,12596,12597],{},"This one consumer will receive messages from all topics whose name starts with logs. (in that namespace). If new topics like logs.apac are created later and match the regex, the client will automatically pick them up.",[48,12599,12600],{},"This covers many “dynamic routing” cases. If you’re using JMS, think of it like having multiple Topics and using a wildcard to subscribe to many at once (JMS itself doesn’t have regex topic subscribe, but some brokers do). In RabbitMQ terms, we’ve sort of inlined the topic exchange’s logic into the topic naming scheme and consumer’s pattern.",[48,12602,12603],{},"Why use multiple topics instead of one with selective routing? Two main reasons:",[1666,12605,12606,12609],{},[324,12607,12608],{},"Isolation & scaling: In Pulsar (and Kafka as well), topics are the unit of parallelism and storage. Keeping disparate streams separate as different topics can be beneficial for performance and clarity. If you had one mega-topic with many different types of messages and you rely on filtering, you might be doing extra work reading messages that you then ignore. Multiple topics let you only consume what you need and allow the broker to manage them independently.",[324,12610,12611],{},"Simplicity of broker design: By not having complex server-side routing rules, Pulsar stays simpler and focuses on throughput and storage. The trade-off is that the application (or at least the naming convention) makes routing decisions. It might feel like a step backward if you enjoyed RabbitMQ’s built-in routing logic, but it actually aligns with how modern log-based systems (Kafka, Pulsar, etc.) operate – they favor partitioning streams by topic and key rather than inside-broker filtering.",[40,12613,12615],{"id":12614},"headers-or-content-based-routing-achieving-it-in-pulsar","Headers or Content-Based Routing – Achieving it in Pulsar",[48,12617,12618],{},"RabbitMQ headers exchange: You can route messages based on arbitrary header fields (like \"department: finance\" or \"priority: high\"), by matching those headers in the exchange routing logic. JMS’s equivalent is message selectors, where a consumer asks the broker, “Only give me messages where department='finance',” and the broker filters them.",[48,12620,12621],{},"Pulsar approach: Pulsar brokers do not examine message properties or content for routing. All consumers of a topic see all messages (unless it’s a shared subscription where broker is just load-balancing them out). So to do content-based filtering\u002Frouting, you have two main options:",[321,12623,12624,12627],{},[324,12625,12626],{},"Consumer-side filtering: A consumer can subscribe to the topic and then in your consumer code, check message properties or content and decide to process or skip. Unwanted messages can simply be acknowledged (to discard them) or not acknowledged (which would eventually dead-letter them if using DLQ, as we’ll cover in Post 5). This is akin to JMS selectors, except Pulsar doesn’t have a built-in selector syntax – you implement the “if” logic yourself in code.",[324,12628,12629],{},"Pulsar Functions (server-side): Pulsar provides Pulsar Functions, a lightweight server-side compute framework, which can be used to do content-based routing on the broker side. Essentially, you write a small function (in Java or Python, etc.) that triggers on each message of an input topic and then publishes it to one or more output topics based on content. This is exactly how you’d implement a headers exchange in Pulsar terms: one input topic, and the function will examine message properties or payload and then forward it to specific topic(s). We’ll cover Pulsar Functions in detail in Post 8, but to illustrate, consider this example:",[48,12631,12632],{},"Suppose you want to route incoming support tickets to different topics by urgency: normal vs. urgent. In RabbitMQ, you might attach a header \"priority\":\"urgent\" and have a headers exchange send urgent ones to an urgentTickets queue. In Pulsar, you could simply have an input topic tickets and two output topics tickets.normal and tickets.urgent. Then deploy a Pulsar Function that does:",[48,12634,12635],{},[384,12636],{"alt":5878,"src":12637},"\u002Fimgs\u002Fblogs\u002F689364ba8e7cd40388ea5092_2.png",[48,12639,12640],{},"This function subscribes to tickets (input) and republishes to the appropriate topic. Consumers can then just subscribe to tickets.urgent or tickets.normal. Yes, this means an extra step (the function) – but this is effectively how you’d implement content routing without burdening the core broker with property filtering logic. Pulsar Functions are integrated and can run on the Pulsar cluster, making this fairly seamless (we’ll see more later).",[48,12642,12643],{},"If writing a Pulsar Function is overkill for your scenario, you can also design producers to send messages to different topics directly if they know where things should go. Often, adding a bit of routing logic in producers or a dedicated router service keeps Pulsar’s usage clear: each topic has a defined purpose or category of message.",[48,12645,12646],{},"Real-world tip: Many Pulsar deployments use a combination of naming conventions and simple processing functions to replace what was done with RabbitMQ’s exchanges. For example, StreamNative’s documentation suggests using separate topics or Pulsar Functions for scenarios that Rabbit would solve with a headers exchange or complex bindings. While Pulsar doesn’t natively match on message content, it’s built to work with these extension points (functions or connectors) to cover that gap.",[40,12648,12650],{"id":12649},"no-exchanges-but-not-less-powerful","No Exchanges, But Not Less Powerful",[48,12652,12653],{},"At first, coming from RabbitMQ, it may feel like Pulsar lacks a feature because there’s no direct analog of an exchange. However, you’ll find that:",[321,12655,12656,12659,12662],{},[324,12657,12658],{},"Fanout is trivial in Pulsar: just multiple subscriptions. No need to declare an exchange and bind queues – any consumer with a new subscription name automatically creates a “tap” on the topic and gets everything.",[324,12660,12661],{},"Selective routing can be achieved by topic design and consumer patterns, which, while requiring more upfront design, results in a more type-safe or schema-like separation of streams (instead of a single exchange carrying many message types).",[324,12663,12664],{},"Content-based routing isn’t built into the broker core, but Pulsar Functions provide that capability in a decoupled way, and consumers can always self-filter if needed.",[48,12666,12667],{},"Another advantage of not having exchanges is simplification of the system’s moving parts. You don’t have to manage exchange durability, binding lifecycles, etc. The trade-off is that you, the developer, decide topic organization that suits your routing needs. Pulsar’s philosophy is to keep the messaging model simpler (just pub-sub streams) and push specialized routing logic to the edges or to its lightweight compute layer.",[48,12669,12670],{},"To make it concrete, let’s compare side-by-side a scenario in RabbitMQ vs Pulsar:",[48,12672,12673],{},"Scenario: A producer emits events of types A, B, C into RabbitMQ. Consumers for A, B, C should get only their respective events.",[321,12675,12676,12679],{},[324,12677,12678],{},"RabbitMQ solution: Producer sends all events to an exchange “events_exch” with routing key “A”, “B”, or “C” per event type. Three queues (QueueA, QueueB, QueueC) are bound to the exchange with binding keys “A”, “B”, “C” respectively. Each consumer group listens to one queue. RabbitMQ exchange ensures A events go only to QueueA, etc.",[324,12680,12681],{},"Pulsar solution: Create three topics: events-A, events-B, events-C. Producer sends each event to the topic corresponding to its type (this logic can be in producer code or perhaps a Pulsar Function that reads a combined topic and splits them – but simpler is producer knows where to send). Consumers just subscribe to the one topic they need (with their subscription name). No other filtering needed – they only get the type they subscribed to. If having separate topics for each type seems heavy, one could also send all events to a single events topic, and have consumers filter messages of type A vs B vs C by examining a property. But splitting by topic usually scales better and is clearer.",[48,12683,12684],{},"The Pulsar approach treats topic names as the routing key namespace. And because topics are not expensive (Pulsar can handle many topics – hundreds or thousands easily), this is usually fine. In RabbitMQ, having tons of exchanges or queues can become hard to manage; in Pulsar, splitting streams by topic is normal practice.",[40,12686,12688],{"id":12687},"headers-to-properties-mapping","Headers to Properties Mapping",[48,12690,12691],{},"For JMS users, Pulsar messages support properties (key-value pairs) on messages that are analogous to JMS message properties or RabbitMQ headers. You set them in the producer and retrieve on consumer. Pulsar doesn’t do anything with these properties by itself (no broker-side filter), but they are very useful in Pulsar Functions or consumer logic. So you could carry a header like department:finance in a Pulsar message property and either have a specific topic for finance messages or a function that looks at that property to route the message accordingly.",[48,12693,12694],{},"One more related Pulsar feature: Message Key hashing and Key_Shared subscription. This is more about ordering and load-balancing than routing to different topics, but worth a mention. Pulsar allows a producer to tag a message with a key, and if the topic is partitioned, that key will consistently hash to the same partition. Consumers with Key_Shared subscription ensure all messages with the same key go to the same consumer. This is not exactly like routing keys to different queues – it’s more for ordering guarantees across consumers – but it highlights that Pulsar does pay attention to the message key for partition distribution. We’ll talk about this in Post 7 (Ordering Guarantees). Just note that “routing key” in Rabbit isn’t the same as Pulsar’s “message key”: Rabbit’s routing key chooses which queue, Pulsar’s key chooses which partition\u002Fconsumer, not a different topic.",[40,12696,8924],{"id":8923},[321,12698,12699,12702,12705,12708,12711],{},[324,12700,12701],{},"No Exchange Object in Pulsar: Producers send directly to topics. This simplifies the topology – you don’t configure fanout or direct exchanges – but you use topics and subscriptions creatively to get the same results.",[324,12703,12704],{},"Fanout = multiple subscriptions: To broadcast a message, have multiple subscriptions on a topic. Pulsar will deliver each message to each subscription’s backlog. This covers RabbitMQ’s fanout exchange and JMS topic use cases easily.",[324,12706,12707],{},"Direct\u002FTopic routing = use topic names and patterns: Instead of one exchange with many routing keys, you might create multiple Pulsar topics (perhaps sharing a naming convention). Consumers can use regex subscription patterns to subscribe to multiple topics if needed (like topic wildcards). Essentially, designing a good topic naming scheme replaces a lot of what exchange bindings do.",[324,12709,12710],{},"No built-in content-based routing: Pulsar brokers don’t filter by headers\u002Fproperties like RabbitMQ’s headers exchange or JMS selectors. To implement that, you can use Pulsar Functions (to route messages based on content to different topics) or simply subscribe to the whole topic and filter in your code. Pulsar Functions provide an in-cluster way to emulate content-based routing logic.",[324,12712,12713],{},"Simplicity and flexibility: Pulsar’s approach might require a bit more thinking about topic taxonomy upfront, but it also means the messaging layer is straightforward and high-performance. You won’t accidentally create an exchange binding loop or have to debug complex routing rules – the routing is mostly by topic design or small functions that you control.",[48,12715,12716],{},"In the next post, we’ll dive into message delivery guarantees in Pulsar: how it achieves at-least-once delivery, how acknowledgments work, and what it means to get “effectively-once” or “exactly-once” processing. If you’re curious how Pulsar handles reliability compared to JMS acknowledgments or RabbitMQ’s acknowledges and redeliveries, read on!",[48,12718,3931],{},[208,12720],{},[48,12722,3931],{},[48,12724,8956,12725,8960],{},[55,12726,5405],{"href":6135,"rel":12727},[264],[48,12729,8963],{},[48,12731,12732],{},[55,12733,8970],{"href":8968,"rel":12734},[264],[48,12736,12737],{},[55,12738,8976],{"href":7969,"rel":12739},[264],[48,12741,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":12743},[12744,12745,12746,12747,12748,12749,12750,12751],{"id":12511,"depth":19,"text":12512},{"id":12541,"depth":19,"text":12542},{"id":12560,"depth":19,"text":12561},{"id":12576,"depth":19,"text":12577},{"id":12614,"depth":19,"text":12615},{"id":12649,"depth":19,"text":12650},{"id":12687,"depth":19,"text":12688},{"id":8923,"depth":19,"text":8924},"Discover how Apache Pulsar replaces RabbitMQ’s exchanges—fanout, direct, topic, and headers—with a simpler yet equally powerful model using topics, subscriptions, and application logic. Learn how to implement familiar messaging patterns in Pulsar’s world.","\u002Fimgs\u002Fblogs\u002F689360cf7196382ef51fb769_02.-Replaces-Fanout,-Routing,-and-Headers-with-Pulsar.png",{},"\u002Fblog\u002Fgoodbye-exchanges-how-pulsar-replaces-fanout-routing-and-headers",{"title":12499,"description":12752},"blog\u002Fgoodbye-exchanges-how-pulsar-replaces-fanout-routing-and-headers",[821,7347,11043],"bxsEpjLcb2p5ICO2uXCophBX5wKUX7JE8mWMU61e3BI",{"id":12761,"title":12762,"authors":12763,"body":12764,"category":821,"createdAt":290,"date":12978,"description":12979,"extension":8,"featured":294,"image":12980,"isDraft":294,"link":290,"meta":12981,"navigation":7,"order":296,"path":12982,"readingTime":4475,"relatedResources":290,"seo":12983,"stem":12984,"tags":12985,"__hash__":12986},"blogs\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-2-tenants-namespaces-bundles.md","Pulsar Newbie Guide for Kafka Engineers (Part 2): Tenants, Namespaces & Bundles",[808,809,810],{"type":15,"value":12765,"toc":12969},[12766,12768,12771,12775,12778,12781,12784,12787,12790,12793,12797,12800,12803,12806,12809,12812,12815,12819,12822,12825,12828,12832,12839,12842,12847,12850,12858,12861,12866,12869,12872,12876,12879,12882,12890,12893,12897,12902,12905,12922,12925,12927,12944,12946,12948,12950,12955,12957,12962,12967],[48,12767,8648],{},[48,12769,12770],{},"This post demystifies Pulsar’s multi-tenancy model – namely tenants, namespaces, and bundles. Pulsar is built for multi-tenancy from the ground up, unlike Kafka’s single-tenant assumption. A tenant is like a top-level account or project, a namespace is a grouping of topics within a tenant (with its own policies), and bundles are internal shards of a namespace used for load distribution. We’ll explain how these relate to Kafka’s concepts (e.g., Kafka has no direct equivalent, often one Kafka cluster = one tenant). By the end, you’ll understand how Pulsar isolates workloads and balances load across brokers seamlessly using bundles.",[40,12772,12774],{"id":12773},"understanding-tenants-and-namespaces","Understanding Tenants and Namespaces",[48,12776,12777],{},"Tenants in Pulsar are the highest-level grouping. You can think of a tenant as an account or a logical business unit. For example, if multiple teams or applications share a Pulsar cluster, you might create a tenant for each team (e.g., finance, iot, analytics). This is fundamentally different from Kafka, where usually a whole cluster is managed as one unit (multi-tenancy in Kafka is often achieved by separate clusters or naming conventions). Pulsar was designed with multi-tenancy from day one: “Pulsar was created from the ground up as a multi-tenant system”. Each tenant can be restricted to certain clusters (in geo-replication scenarios) and have its own admin policies like storage quotas or auth rules.",[48,12779,12780],{},"A namespace in Pulsar is a subdivision of a tenant, used to group topics. If a tenant is like a project, namespaces are like environments or categories within that project. For instance, under tenant finance, you might have namespaces transactions, audits, realtime etc. Technically, a namespace is identified by tenant\u002Fnamespace (for example, finance\u002Ftransactions). Namespaces are important because they are the unit at which many policies are applied (retention, TTL, anti-affinity, etc.) – “The configuration policies set on a namespace apply to all topics in that namespace”. In Kafka terms, you might compare a namespace to a group of topics that share configs, but Kafka doesn’t have a first-class entity like this – Pulsar’s namespaces are a unique feature to simplify administration.",[48,12782,12783],{},"Every Pulsar topic name includes the tenant and namespace as a prefix. The full name format is:",[48,12785,12786],{},"persistent:\u002F\u002Ftenant\u002Fnamespace\u002FtopicName",[48,12788,12789],{},"For example: persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic. Here public is the tenant (a built-in tenant that comes with Pulsar), and default is the namespace. This structure is why in Part 1 we always included public\u002Fdefault in our topic names. It ensures multi-tenant isolation in the topic naming itself.",[48,12791,12792],{},"Why this matters: In a Kafka cluster, all topics share the same broker namespace and you’d need to implement multi-tenancy via conventions (like topic name prefixes per team) or separate clusters. Pulsar’s approach ensures strong isolation – you can apply permissions at tenant or namespace level, set storage quotas per tenant, and so on, which is much cleaner for multi-tenant use cases.",[40,12794,12796],{"id":12795},"creating-and-managing-tenants-namespaces","Creating and Managing Tenants & Namespaces",[48,12798,12799],{},"To create a tenant in Pulsar (which an admin would do), you’d use:",[48,12801,12802],{},"bin\u002Fpulsar-admin tenants create my-tenant",[48,12804,12805],{},"This sets up a new tenant. Typically, you will specify which Pulsar clusters this tenant can use (in a multi-cluster deployment) with --allowed-clusters, but in a standalone or single cluster context, defaults are fine. After creating a tenant, you create namespaces under it:",[48,12807,12808],{},"bin\u002Fpulsar-admin namespaces create my-tenant\u002Fmy-namespace",[48,12810,12811],{},"This will create a namespace called my-namespace for my-tenant. By default, a new namespace is created with some number of bundles (more on bundles next) – often 4 bundles by default unless configured otherwise. You’ll notice a public tenant with public\u002Fdefault namespace automatically present for convenience.",[48,12813,12814],{},"You can list tenants with pulsar-admin tenants list and list namespaces in a tenant with pulsar-admin namespaces list tenant (as we saw in Part 1). Also, each namespace can be configured with policies: e.g., message TTL, retention, max consumers, etc., via pulsar-admin namespaces set-policy... commands or in configuration.",[40,12816,12818],{"id":12817},"what-are-bundles","What are Bundles?",[48,12820,12821],{},"Pulsar takes the scalability of topics further with namespace bundles. A bundle is essentially a subset of topics in a namespace, defined by a hash range, that can be treated as a unit for broker assignment. When Pulsar needs to balance load, it moves bundles between brokers, rather than individual topics. This is a key difference from Kafka: in Kafka, the unit of load distribution is a partition of a topic. In Pulsar, the unit is a bundle, which may comprise many topics (or just a few) in that namespace. This design allows Pulsar to handle thousands of topics without each topic individually overloading the metadata layer.",[48,12823,12824],{},"Formally, “a namespace bundle is a virtual group of topics that belong to the same namespace”. Each bundle is identified by a range of the 32-bit hash space (0x00000000 to 0xffffffff). By default, a new namespace is created with 4 bundles (which splits that hash range into 4 quarters). You can also specify a different number of bundles at creation time, e.g., --bundles 16 for finer granularity.",[48,12826,12827],{},"Analogy: If a namespace is like a big bucket of topics, bundles break that bucket into (hash) slices. Each slice can be independently moved to a different broker. Kafka doesn’t have an exact analog since Kafka topics are independent; you might compare a Pulsar bundle to a group of partitions from possibly multiple topics that a broker handles. But for simplicity: think of bundles as load-balancing shards of a namespace.",[40,12829,12831],{"id":12830},"bundle-load-balancing-in-action","Bundle Load Balancing in Action",[48,12833,12834,12835,12838],{},"Why bundles? Suppose you have 1000 topics in one namespace all on one broker – not ideal if traffic grows. Pulsar’s broker load manager monitors broker load and can move a busy bundle to another broker to spread out load. If one bundle (hash range) of topics is hot (lots of traffic), the broker can unload that bundle: “if the broker gets overloaded with the number of bundles, ",[2628,12836,12837],{},"you"," can unload a bundle from that broker, so it can be served by another broker”. This automatic movement is like Kafka’s partition rebalance, but occurs transparently and can be automatic (depending on load manager settings) – no manual partition reassignment needed for routine balancing.",[48,12840,12841],{},"How it works: Each topic’s name is hashed to determine which bundle it falls into. All topics in the same bundle live on the same broker. If that broker is strained, Pulsar can split the bundle into smaller ranges or move some bundles away:",[321,12843,12844],{},[324,12845,12846],{},"Splitting a Bundle: Pulsar supports splitting a bundle that’s too busy into two smaller bundles. This is done by admin command or automatically by some load managers. For example, to split a specific bundle range:",[48,12848,12849],{},"bin\u002Fpulsar-admin namespaces split-bundle --bundle 0x00000000_0x7fffffff my-tenant\u002Fmy-namespace",[321,12851,12852,12855],{},[324,12853,12854],{},"This would split the bundle covering hashes 0x00000000 to 0x7fffffff into two equal halves. After splitting, one or both of those new bundles could be unloaded to other brokers.",[324,12856,12857],{},"Unloading a Bundle: You can force a bundle off a broker (causing it to be reassigned) with:",[48,12859,12860],{},"bin\u002Fpulsar-admin namespaces unload --bundle range my-tenant\u002Fmy-namespace",[321,12862,12863],{},[324,12864,12865],{},"For example: ... unload --bundle 0x80000000_0xffffffff my-tenant\u002Fmy-namespace to unload that specific bundle. The brokers will auto-determine which broker should own it next (often the least loaded one).",[48,12867,12868],{},"These operations might sound low-level, but Pulsar can handle much of this automatically. By default, you usually won’t manually split\u002Funload unless debugging or pre-scaling.",[48,12870,12871],{},"From a user perspective, this is mostly transparent. You publish and consume from topics as normal; behind the scenes Pulsar may move the bundle containing your topic to a different broker, but your producers\u002Fconsumers follow automatically (Pulsar clients get redirected to the new broker). There is no need to manually rebalance as you might in Kafka when adding brokers – Pulsar’s design allows new brokers to immediately take over bundles and thus traffic.",[40,12873,12875],{"id":12874},"why-kafka-engineers-should-care","Why Kafka Engineers Should Care",[48,12877,12878],{},"If you ran a Kafka cluster with many topics, you might have encountered the challenge of too many open file handles or uneven load because one topic or partition was hot. Pulsar’s tenants and namespaces encourage you to logically organize topics (so you can apply policies easily), and bundles ensure dynamic load distribution. It’s like having an automatic partition rebalancer always on.",[48,12880,12881],{},"A Kafka engineer might ask: “Can’t we just have one big topic with partitions in Kafka to distribute load?” Yes, but then all those messages are interrelated. Pulsar’s bundles let completely unrelated topics still share brokers efficiently. It’s a different approach to scaling:",[321,12883,12884,12887],{},[324,12885,12886],{},"In Kafka, you scale by partitioning a topic and manually balancing partitions across brokers (each partition is stuck to a broker unless you move it).",[324,12888,12889],{},"In Pulsar, you scale by having many topics in a namespace and letting Pulsar move bundles of topics around as needed. You also partition individual topics for parallelism (similar to Kafka), but even those partitions are managed under the hood by bundles and can move.",[48,12891,12892],{},"Important note on topic naming and discovery: Because topics include tenant and namespace, tools and APIs will often ask for those. For example, to subscribe or produce you give the full name or use the client API with tenant\u002Fns parameters. If you attempt to use just a topic local name without tenant\u002Fns in Pulsar, it assumes the public\u002Fdefault namespace by default – which is fine for quick tests but not in a structured multi-tenant environment.",[40,12894,12896],{"id":12895},"recap-and-cli-tips","Recap and CLI Tips",[321,12898,12899],{},[324,12900,12901],{},"You can see all bundles in a namespace via:",[48,12903,12904],{},"bin\u002Fpulsar-admin namespaces bundles tenant\u002Fnamespace",[321,12906,12907,12910,12913,12916,12919],{},[324,12908,12909],{},"This will list the bundle ranges currently defined.",[324,12911,12912],{},"Default bundle count is 4. If you expect a very large number of topics or high throughput, you might pre-create the namespace with more bundles (e.g., 16 or 32) so load can spread over more brokers from the start.",[324,12914,12915],{},"‍Tenants and Namespaces creation requires superuser or tenant admin rights. In a managed environment, tenants would be set up by an admin, namespaces would be set up by either a super user or a tenant admin, and developers might only have access to create topics within an assigned namespace.",[324,12917,12918],{},"Pulsar isolates data by tenant: one tenant cannot access another’s topics unless explicitly granted permission. This is analogous to completely separate Kafka clusters for different tenants from a security perspective. We’ll touch on permission management more in the Security part.",[324,12920,12921],{},"The multi-tenancy doesn’t add overhead in usage – it’s mostly an organizational and isolation benefit. Producing\u002Fconsuming is the same, just with a qualified topic name.",[48,12923,12924],{},"Now that we have a handle on how Pulsar’s namespace system differs from Kafka, in the next part, we’ll delve into how Pulsar stores data with BookKeeper – covering Ledgers & Bookies – which is another big architectural difference from Kafka’s log segments on brokers.",[40,12926,8924],{"id":8923},[321,12928,12929,12932,12935,12938,12941],{},[324,12930,12931],{},"Tenants in Pulsar are top-level containers for isolating multiple applications or teams in one cluster. Kafka has no native tenant concept – Pulsar allows true multi-tenancy in a single cluster.",[324,12933,12934],{},"Namespaces are subdivisions of a tenant, grouping topics and applying policies collectively. Think of them as analogous to Kafka topic prefixes or categories, but enforceable and configurable at the broker level.",[324,12936,12937],{},"Bundles are Pulsar’s unit of horizontal scalability within a namespace – a hash range of topics that can move between brokers for load balancing. Kafka’s closest equivalent is partition reassignments, but Pulsar does it proactively and in a grouped manner.",[324,12939,12940],{},"Multi-tenancy and bundles mean adding a new broker to Pulsar can automatically relieve hotspots (brokers share the work by moving bundles), whereas in Kafka you’d manually rebalance partitions or use external tools.",[324,12942,12943],{},"Organizing topics into tenants\u002Fnamespaces is crucial for Pulsar usage. It might feel unfamiliar to Kafka users, but it provides powerful isolation and flexibility (per-namespace retention settings, encryption policies, etc.). Embrace these concepts to fully leverage Pulsar’s strengths.",[48,12945,3931],{},[208,12947],{},[48,12949,3931],{},[48,12951,8956,12952,8960],{},[55,12953,5405],{"href":6135,"rel":12954},[264],[48,12956,8963],{},[48,12958,12959],{},[55,12960,8970],{"href":8968,"rel":12961},[264],[48,12963,12964],{},[55,12965,8976],{"href":7969,"rel":12966},[264],[48,12968,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":12970},[12971,12972,12973,12974,12975,12976,12977],{"id":12773,"depth":19,"text":12774},{"id":12795,"depth":19,"text":12796},{"id":12817,"depth":19,"text":12818},{"id":12830,"depth":19,"text":12831},{"id":12874,"depth":19,"text":12875},{"id":12895,"depth":19,"text":12896},{"id":8923,"depth":19,"text":8924},"2025-08-05","Discover how Apache Pulsar's built-in multi-tenancy model works—learn about tenants, namespaces, and bundles, and how they compare to Kafka. This guide helps Kafka engineers understand Pulsar’s approach to isolation, scalability, and dynamic load balancing.","\u002Fimgs\u002Fblogs\u002F689221d6579ce83032ccb088_02.-Tenants,-Namespaces-&-Bundles.png",{},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-2-tenants-namespaces-bundles",{"title":12762,"description":12979},"blog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-2-tenants-namespaces-bundles",[821,7347,799],"q8tKCMYUJI5Wxyyt3Cm43-qSFBo3CF1Vq8M7pFZdV00",{"id":12988,"title":12989,"authors":12990,"body":12991,"category":821,"createdAt":290,"date":13175,"description":13176,"extension":8,"featured":294,"image":13177,"isDraft":294,"link":290,"meta":13178,"navigation":7,"order":296,"path":13179,"readingTime":3556,"relatedResources":290,"seo":13180,"stem":13181,"tags":13182,"__hash__":13183},"blogs\u002Fblog\u002Fpulsar-101-for-queue-users-queues-topics-and-subscriptions-explained.md","Pulsar 101 for Queue Users: Queues, Topics, and Subscriptions Explained (Pulsar Guide for RabbitMQ\u002FJMS Engineers 1\u002F10)",[808,809,810],{"type":15,"value":12992,"toc":13168},[12993,12995,12998,13002,13005,13013,13016,13019,13023,13026,13029,13032,13035,13039,13042,13045,13049,13052,13055,13058,13061,13067,13070,13073,13076,13080,13082,13088,13091,13094,13097,13100,13104,13107,13115,13118,13120,13140,13143,13145,13147,13149,13154,13156,13161,13166],[3933,12994,12505],{"id":7357},[48,12996,12997],{},"Apache Pulsar doesn’t use “queues” in the same way as RabbitMQ or JMS. Instead, Pulsar has topics that serve as the central pipeline for messages, and subscriptions that track consumption progress (acting like logical queues). In this post, we explain how a Pulsar topic plus a subscription can replicate the behavior of a queue or a JMS topic. By the end, you’ll understand Pulsar’s publish\u002Fsubscribe model, how it retains messages, and why you don’t explicitly create queue objects in Pulsar. We’ll also run through a quick example with Pulsar’s CLI and client API to demonstrate these concepts in action.",[40,12999,13001],{"id":13000},"topics-vs-queues-the-key-concept-shift","Topics vs Queues: The Key Concept Shift",[48,13003,13004],{},"If you’re coming from RabbitMQ or JMS, you’re used to queues as named containers that hold messages until consumers retrieve them. In Pulsar, there is no separate queue object – the Pulsar equivalent is achieved via topics and subscriptions. All messages in Pulsar are published to a topic, and consumers receive messages by attaching to a subscription on that topic. A subscription is Pulsar’s mechanism for tracking which messages have been delivered and acknowledged for a group of consumers.",[321,13006,13007,13010],{},[324,13008,13009],{},"Topic: In Pulsar, a topic is a category or feed name to which producers publish messages. It’s similar to a RabbitMQ exchange or JMS destination in that producers write to it. Topics in Pulsar can be persistent (durable storage) or non-persistent (in-memory), but by default we deal with persistent topics that durably store messages.",[324,13011,13012],{},"Subscription: A subscription is like a named pointer into a topic’s message stream. Consumers specify a subscription name when they subscribe. Pulsar will then deliver messages on that topic to the consumer and mark them as delivered for that subscription. If no consumer is currently active, the subscription retains all messages until they can be delivered. This is analogous to a durable queue holding messages for a consumer group. In fact, from a queue user’s perspective, a subscription is the queue – it accumulates unacknowledged messages for later processing, providing load balancing or broadcast behavior depending on how many consumers attach to it.",[48,13014,13015],{},"A single Pulsar topic can have multiple subscriptions. Each subscription acts as an independent feed of the topic. If you create two subscriptions on the same topic (say “SubA” and “SubB”), each will receive all messages published to the topic (each subscription has its own backlog). This is how Pulsar implements a pub\u002Fsub pattern (multiple subscribers each get their own copy of each message). On the other hand, if multiple consumers share the same subscription name (e.g. two consumer processes both subscribing with name “SubA”), then they will form a consumer group and Pulsar will distribute messages of that subscription’s backlog among them – effectively acting like a queue with competing consumers. We’ll dive deeper into those modes in a later post, but it’s important to grasp that in Pulsar, “queues” don’t exist as standalone objects – they emerge from how you use subscription names.",[48,13017,13018],{},"JMS perspective: In JMS, you have the concepts of Queue (point-to-point) and Topic (publish-subscribe). Pulsar’s model unifies these. A Pulsar topic can do both – if you have one subscription (like one queue) all consumers can share it and get load-balanced messages (point-to-point), or if you create multiple independent subscriptions, each behaves like a JMS durable topic subscription receiving a copy of each message (pub-sub). You don’t decide upfront whether a Pulsar topic is “queue-like” or “topic-like” – it can handle both patterns simultaneously. This flexibility is initially confusing but powerful once understood.",[40,13020,13022],{"id":13021},"how-pulsar-stores-and-delivers-messages","How Pulsar Stores and Delivers Messages",[48,13024,13025],{},"One major difference from RabbitMQ: Pulsar topics are backed by a persistent log (by default). When a message is published to a Pulsar topic, the broker writes it to durable storage (Apache BookKeeper bookies) and it remains there until it’s acknowledged by all subscriptions that are consuming that topic. This means if you have a subscription with no active consumers, the messages will sit in storage (in a backlog) indefinitely until a consumer comes along to consume\u002Facknowledge them. Pulsar will not drop messages for an inactive subscription unless you configure explicit expiration (TTL) – so it behaves like a durable queue that keeps data safe even if consumers are offline, similar to JMS durable subscriptions.",[48,13027,13028],{},"By contrast, in RabbitMQ, once a message reaches a queue, if no consumer is attached the message will still sit in RAM\u002Fdisk – so that part is similar. But RabbitMQ will drop messages on a non-durable queue if the broker restarts or if TTLs expire, etc., whereas Pulsar’s storage is persistent by default (we’ll cover durability in a later post). The key takeaway is that Pulsar decouples the storage of messages from the delivery. The topic is the stored log of messages, and the subscription is a position in that log. As consumers acknowledge messages, the subscription’s position moves forward and Pulsar knows it can safely remove acknowledged data (or retain it if you’ve enabled a retention policy for replay).",[48,13030,13031],{},"Acknowledgments: Pulsar uses acknowledgments to know when it can remove messages from the subscription backlog. When a consumer has processed a message, it sends an acknowledgement to the broker. Pulsar then marks that message as delivered for that subscription, and once all subscriptions have acknowledged a given message, it can be deleted from storage (or archived if retention is enabled). If a message is never acknowledged (perhaps the consumer crashed), the message remains in the backlog and will be redelivered to the next consumer that comes along on that subscription (ensuring at-least-once delivery, which we’ll discuss in Post 3).",[48,13033,13034],{},"One nice aspect: you don’t have to explicitly “create” a subscription ahead of time in code. If a consumer subscribes to a topic with a new subscription name, Pulsar will automatically start a subscription with that name. The first time you run a consumer with subscriptionName = \"my-subscription\" on topic persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic, Pulsar creates the subscription state and begins tracking message delivery for it. It’s analogous to how RabbitMQ’s queues might be declared, but in Pulsar it often happens implicitly on first use (unless you disabled auto-creation). Topics themselves can also auto-create when first referenced (depending on broker settings), meaning you might not have to run an admin command to make a topic in simple cases – publishing to or subscribing from my-topic can make it appear. (In production, you might pre-create topics or use namespace policies to control this behavior, but it’s a convenient feature for development.)",[40,13036,13038],{"id":13037},"quick-walkthrough-producing-and-consuming-with-a-subscription","Quick Walkthrough: Producing and Consuming with a Subscription",[48,13040,13041],{},"Let’s solidify these ideas with a quick example. Suppose you want to use Pulsar to replicate a simple RabbitMQ work queue. In Rabbit, you’d have a queue (let’s call it “tasks”) and you’d basicPublish messages to that queue. In Pulsar, we’ll use a topic, say “tasks-topic”. We want one consumer group processing it (could be one or many consumers), so we’ll use one subscription name, say “tasks-subscription”.",[48,13043,13044],{},"Step 1: Produce messages to the topic. We can use Pulsar’s CLI or a client library to send messages. For example, using the Pulsar CLI producer:",[8300,13046,13048],{"id":13047},"using-pulsar-client-cli-to-produce-some-messages-to-a-topic","Using pulsar-client CLI to produce some messages to a topic",[48,13050,13051],{},"$ bin\u002Fpulsar-client produce persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftasks-topic -m \"Task-1\" -m \"Task-2\" -m \"Task-3\"",[48,13053,13054],{},"This will create the topic tasks-topic (if not already created) in the public\u002Fdefault namespace and publish three messages: “Task-1”, “Task-2”, “Task-3”. In Pulsar, topics have a fully qualified name including a tenant and namespace; public\u002Fdefault is the default namespace most beginners use.",[48,13056,13057],{},"Step 2: Consume messages with a subscription. Now let’s start a consumer to receive these tasks. We’ll give it the subscription name “tasks-subscription”:",[48,13059,13060],{},"$ bin\u002Fpulsar-client consume persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftasks-topic \\",[8325,13062,13065],{"className":13063,"code":13064,"language":8330},[8328],"-s \"tasks-subscription\" -n 0 -p Earliest -t Shared\n",[4926,13066,13064],{"__ignoreMap":18},[48,13068,13069],{},"Here, -s specifies the subscription name, -p Earliest sets the consumer to read from the beginning of the topic, and -n 0 means consume indefinitely. When this consumer starts, Pulsar will see that subscription “tasks-subscription” exists (if first time, it creates it) and begin delivering messages. Since we published three tasks, the consumer should receive Task-1, Task-2, Task-3. After processing each, it will ack (the CLI consumer acks messages after printing them). Pulsar then marks those messages as acknowledged on “tasks-subscription” and clears them from the backlog.",[48,13071,13072],{},"If we were to run another consumer with the same Shared or Key_Shared subscription (tasks-subscription) at the same time, Pulsar would distribute the messages between the two consumers. For instance, one consumer might get Task-1, the other gets Task-2, etc., in a round-robin fashion. That’s the queue\u002Fcompeting-consumer pattern. But if we instead run a second consumer with a different subscription name (say “tasks-subscription-2”), that second consumer will receive a full copy of all messages independently. In our example, starting a second subscription after the tasks were sent wouldn’t receive those three, but if we send more tasks, each subscription would get its own copy.",[48,13074,13075],{},"To illustrate, let’s do that:",[8300,13077,13079],{"id":13078},"start-a-second-consumer-with-a-different-subscription","Start a second consumer with a different subscription",[48,13081,13060],{},[8325,13083,13086],{"className":13084,"code":13085,"language":8330},[8328],"-s \"tasks-subscription-2\" -n 0 -t Shared\n",[4926,13087,13085],{"__ignoreMap":18},[48,13089,13090],{},"Now publish another message:",[48,13092,13093],{},"$ bin\u002Fpulsar-client produce persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftasks-topic -m \"Task-4\"",[48,13095,13096],{},"Now, both the first consumer (tasks-subscription) and the second consumer (tasks-subscription-2) will receive “Task-4”. They are on different subscriptions, so each maintains its own position in the topic. This mimics a pub-sub (fan-out) scenario: you effectively have two “queues” listening to the same topic. In RabbitMQ terms, it’s as if we had an exchange with two queues bound to it, so a message went to both queues. Pulsar did that internally with one topic and two subs.",[48,13098,13099],{},"Conversely, if we had launched multiple consumers all using the same subscription “tasks-subscription”, then only one of them would get each Task-4 message (preventing duplicate processing), similar to multiple workers pulling from one RabbitMQ queue.",[40,13101,13103],{"id":13102},"when-to-use-multiple-subscriptions-vs-shared-consumers","When to Use Multiple Subscriptions vs Shared Consumers",[48,13105,13106],{},"You might be wondering: how do I choose to use separate subscriptions or not? It depends on your use case:",[321,13108,13109,13112],{},[324,13110,13111],{},"Multiple independent subscriptions (each with its own name): Use this when you want to broadcast messages to multiple independent groups of consumers. For example, one microservice (with subscription “serviceA-sub”) might process every message for one purpose, and another microservice (“serviceB-sub”) might also need every message for a different purpose. Each subscription will get its own copy of each message. This is analogous to multiple RabbitMQ queues bound to a fanout or topic exchange – each queue (subscription) gets the message.",[324,13113,13114],{},"Shared subscription (multiple consumers, same name): Use this when you have a pool of consumers that collectively handle the messages in a load-balanced way. This is the classic worker queue scenario: many consumers, but each message should be processed only once by one of them. In Pulsar, they simply share the same subscription name and set the subscription type to “Shared” (more on subscription types in Post 4) to compete for messages from that one subscription backlog. This is analogous to multiple consumers on one RabbitMQ queue – the queue distributes messages to one consumer each.",[48,13116,13117],{},"The beauty of Pulsar is that the topic is decoupled from these patterns. You don’t need separate physical queues for fan-out versus work distribution. It’s all about how you use subscription names. For someone used to JMS, think of Pulsar’s topic as either a JMS Topic or Queue depending on how you subscribe: if each consumer uses a different durable subscription name, it acts like a JMS Topic (each durable sub gets all messages). If consumers share a subscription, it acts like a JMS Queue (messages go to one consumer). In fact, the Pulsar \u003C-> JMS mapping (via Starlight for JMS) literally treats a JMS Queue as a Pulsar topic with a single shared durable subscription under the hood.",[40,13119,8924],{"id":8923},[321,13121,13122,13125,13128,13131,13134,13137],{},[324,13123,13124],{},"Topics are the core entity in Pulsar: producers write to topics, and topics store messages durably. There is no standalone “queue” object as in RabbitMQ; Pulsar topics + subscriptions cover that functionality.",[324,13126,13127],{},"Subscriptions act like durable queues: a subscription retains unacknowledged messages and ensures consumers receive them. Each subscription has its own backlog of messages on a topic. A subscription can have one or many consumers attached.",[324,13129,13130],{},"Multiple subscription names = pub-sub: If you create multiple subscriptions on the same topic, each behaves like an independent stream of all messages (fan-out). In other words, Pulsar can deliver one message to multiple subscriber groups without duplicating producers or topics.",[324,13132,13133],{},"Shared subscription (one name, many consumers) = queue load-balancing: If consumers share the same subscription, Pulsar distributes messages among them (each message to only one consumer) – analogous to multiple consumers on one queue. You get parallel processing without double-handling of the same message.",[324,13135,13136],{},"Durable by default: Pulsar’s persistent topics retain messages until acknowledged by the subscription. Consumers can be offline, and on return they will get the backlog. This is similar to JMS durable subscriptions and unlike non-durable transient queues – Pulsar defaults to durability so you don’t lose messages.",[324,13138,13139],{},"No need to pre-create queues: You typically don’t pre-declare a “queue” in Pulsar. You decide on topic names and subscription names in your client, and Pulsar will manage the rest (topics and subs can auto-create on first use, unless locked down by config).",[48,13141,13142],{},"Armed with this knowledge, you’re ready to dive deeper. Next, we’ll explore how Pulsar handles the routing patterns that RabbitMQ implements with exchanges. If you’re wondering “how do I do a fanout or direct routing in Pulsar without exchanges?”, stay tuned for the next post!",[48,13144,3931],{},[208,13146],{},[48,13148,3931],{},[48,13150,8956,13151,8960],{},[55,13152,5405],{"href":6135,"rel":13153},[264],[48,13155,8963],{},[48,13157,13158],{},[55,13159,8970],{"href":8968,"rel":13160},[264],[48,13162,13163],{},[55,13164,8976],{"href":7969,"rel":13165},[264],[48,13167,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":13169},[13170,13171,13172,13173,13174],{"id":13000,"depth":19,"text":13001},{"id":13021,"depth":19,"text":13022},{"id":13037,"depth":19,"text":13038},{"id":13102,"depth":19,"text":13103},{"id":8923,"depth":19,"text":8924},"2025-08-04","New to Apache Pulsar? Learn how Pulsar topics and subscriptions map to traditional queues and pub-sub systems like RabbitMQ and JMS. Understand core concepts, message delivery, and how to model familiar patterns using Pulsar’s unified architecture.","\u002Fimgs\u002Fblogs\u002F6890c928669a9565ea05e386_01.-Pulsar-101-for-Queue-Users.png",{},"\u002Fblog\u002Fpulsar-101-for-queue-users-queues-topics-and-subscriptions-explained",{"title":12989,"description":13176},"blog\u002Fpulsar-101-for-queue-users-queues-topics-and-subscriptions-explained",[821,7347,11043],"AdtfIqkw5vIrjuK7Ht_4tKRHaJ-mMPZb56H8YhvwDOU",{"id":13185,"title":13186,"authors":13187,"body":13188,"category":821,"createdAt":290,"date":13482,"description":13483,"extension":8,"featured":294,"image":13484,"isDraft":294,"link":290,"meta":13485,"navigation":7,"order":296,"path":13486,"readingTime":3556,"relatedResources":290,"seo":13487,"stem":13488,"tags":13489,"__hash__":13490},"blogs\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-1-kafka---pulsar-cli-cheatsheet.md","Pulsar Newbie Guide for Kafka Engineers (Part 1): Kafka → Pulsar CLI Cheatsheet",[808,809,810],{"type":15,"value":13189,"toc":13476},[13190,13192,13195,13197,13200,13209,13213,13216,13221,13224,13231,13236,13239,13246,13251,13254,13257,13260,13263,13268,13271,13274,13279,13282,13285,13290,13293,13296,13303,13308,13311,13317,13320,13325,13328,13331,13362,13366,13369,13377,13380,13386,13394,13397,13408,13411,13417,13422,13425,13427,13451,13453,13455,13457,13462,13464,13469,13474],[3933,13191,7358],{"id":7357},[48,13193,13194],{},"This post provides a quick cheatsheet mapping common Kafka CLI commands to Apache Pulsar. We’ll show how to create topics, list topics, produce and consume messages, and check metadata using Pulsar’s CLI tools. For each Kafka command, you’ll see the Pulsar equivalent (using pulsar-admin, pulsar-client, etc.) so you can hit the ground running with Pulsar. Bottom line: Pulsar’s CLI is just as powerful as Kafka’s, with a single unified tool for administration and simple clients for testing.",[40,13196,46],{"id":42},[48,13198,13199],{},"If you’re familiar with Kafka’s command-line tools (like kafka-topics.sh, kafka-console-producer.sh, kafka-console-consumer.sh), you’ll be glad to know Pulsar offers similar capabilities. Pulsar’s main CLI tool is pulsar-admin, which lets you manage topics, tenants, namespaces, subscriptions and more. There’s also pulsar-client for producing\u002Fconsuming messages and pulsar-perf for performance testing. This section will translate your Kafka CLI know-how to Pulsar commands.",[48,13201,13202,13203,13208],{},"Environment Assumption: We assume you ",[55,13204,13207],{"href":13205,"rel":13206},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F4.0.x\u002Fgetting-started-standalone\u002F",[264],"have a Pulsar installation (or standalone)"," and have the bin directory in your PATH. In examples, we use bin\u002Fpulsar-admin and bin\u002Fpulsar-client as if running from the Pulsar install directory. Adjust accordingly for your setup.",[40,13210,13212],{"id":13211},"equivalents-of-common-kafka-commands","Equivalents of Common Kafka Commands",[48,13214,13215],{},"Let’s go through typical tasks:",[321,13217,13218],{},[324,13219,13220],{},"Listing Topics: In Kafka, you might run kafka-topics.sh --bootstrap-server localhost:9092 --list. In Pulsar, topics are scoped by namespace (more on that in Part 2). To list all topics in a namespace, use pulsar-admin topics list. For example, to list topics in the public\u002Fdefault namespace (the default namespace in a standalone cluster):",[48,13222,13223],{},"bin\u002Fpulsar-admin topics list public\u002Fdefault",[48,13225,13226,13227],{},"This will output all topics under public\u002Fdefault. You can also list all namespaces in a tenant (bin\u002Fpulsar-admin namespaces list ",[13228,13229,13230],"tenant",{},") or all tenants (bin\u002Fpulsar-admin tenants list) similar to how Kafka has --list for topics and maybe uses separate tooling for multi-tenant setups (which Pulsar handles natively).",[321,13232,13233],{},[324,13234,13235],{},"Creating a Topic: Kafka’s kafka-topics.sh --create ... lets you create a topic (and specify partitions). Pulsar can auto-create topics when producers send data, but you can also explicitly create them. Use pulsar-admin topics create. Pulsar topics have a persistent or non-persistent prefix. By default, use persistent topics. For example:",[48,13237,13238],{},"bin\u002Fpulsar-admin topics create \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic\"",[48,13240,13241,13242],{},"This creates a topic named “my-topic” in the public\u002Fdefault namespace. If you wanted a non-persistent topic (rare for newbies; non-persistent means messages not durably stored), you’d specify the non-persistent:\u002F\u002F prefix. If the namespace or tenant doesn’t exist, Pulsar will throw an error (you should create the tenant\u002Fnamespace first, see Part 2). By default, a new topic is non-partitioned (single partition in Kafka terms), but you can create a partitioned topic by adding -p ",[13243,13244,13245],"num",{}," to specify number of partitions.",[321,13247,13248],{},[324,13249,13250],{},"Creating a Partitioned Topic: In Kafka, the --partitions flag on create or the kafka-topics.sh --alter can set partitions. In Pulsar, a partitioned topic is a higher-level construct consisting of multiple internal topic partitions. To create one, add -p (partitions) flag:",[48,13252,13253],{},"bin\u002Fpulsar-admin topics create-partitioned-topic \\",[48,13255,13256],{},"--partitions 4 \\",[48,13258,13259],{},"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-partitioned-topic",[48,13261,13262],{},"This will create 4 internal partitions (my-partitioned-topic-partition-0 to -3). In Pulsar’s view this is one logical topic with 4 partitions (comparable to a Kafka topic with 4 partitions).",[321,13264,13265],{},[324,13266,13267],{},"Producing Messages (Console Producer): Kafka’s kafka-console-producer.sh allows sending test messages from the terminal. Pulsar provides pulsar-client for a similar purpose. For example, to send a message \"Hello Pulsar\":",[48,13269,13270],{},"bin\u002Fpulsar-client produce persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic --messages \"Hello Pulsar\"",[48,13272,13273],{},"This will produce a message to my-topic. The CLI will print confirmation of send success. You can send multiple messages by separating with commas or using --messages multiple times.",[321,13275,13276],{},[324,13277,13278],{},"Consuming Messages (Console Consumer): Kafka’s console consumer reads messages from a topic. Pulsar’s equivalent is also via pulsar-client. To consume messages:",[48,13280,13281],{},"bin\u002Fpulsar-client consume persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic -s \"my-subscription\" -n 0 -p Earliest",[48,13283,13284],{},"This command will subscribe to my-topic with subscription name \"my-subscription\" and print messages to the console. The -n 0 means consume indefinitely (or you can specify a number of messages to consume). Pulsar requires a subscription name for consumers – think of it like a consumer group ID in Kafka (see Part 4 for details on subscriptions). If the subscription doesn’t exist, it will be created on the fly. You’ll start seeing any messages (including the ones produced above) printed out. Each message will be acknowledged as it’s consumed by default.",[321,13286,13287],{},[324,13288,13289],{},"Viewing Topic Details and Stats: Kafka has tools like kafka-topics.sh --describe and kafka-consumer-groups.sh to show topic configs or consumer group offsets. Pulsar consolidates much of this in pulsar-admin commands. For example, to get topic statistics:",[48,13291,13292],{},"bin\u002Fpulsar-admin topics stats persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic",[48,13294,13295],{},"This will output stats including the number of messages published, number of subscriptions, backlog (unacknowledged messages count per subscription), and other metrics. It’s similar to Kafka’s topic description and consumer lag combined – you can see who’s connected, backlog (like consumer lag), etc.",[48,13297,13298,13299],{},"To see internal stats including storage size, use topics stats-internal. For partitioned topics, use topics stats ",[13300,13301,13302],"partitioned-topic",{}," to get aggregated stats for all partitions.",[321,13304,13305],{},[324,13306,13307],{},"Managing Subscriptions: Kafka uses consumer group commands to manage offsets (like resetting to earliest). Pulsar allows resetting a subscription cursor to a specific message or time. For example, to rewind a subscription to the earliest message:",[48,13309,13310],{},"bin\u002Fpulsar-admin topics reset-cursor persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic \\",[8325,13312,13315],{"className":13313,"code":13314,"language":8330},[8328],"--subscription my-subscription --time 0\n",[4926,13316,13314],{"__ignoreMap":18},[48,13318,13319],{},"This moves the subscription cursor to the very beginning (timestamp 0 as a special value) so the consumer can replay from the start. You can also use a message ID (--messageId) if you have a specific ledger:entry position. This is roughly analogous to Kafka’s --reset-offsets tool but more granular.",[321,13321,13322],{},[324,13323,13324],{},"Deleting Topics: Kafka’s kafka-topics.sh --delete marks a topic for deletion (if enabled). In Pulsar, you can delete topics with:",[48,13326,13327],{},"bin\u002Fpulsar-admin topics delete persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic",[48,13329,13330],{},"However, Pulsar won’t delete a topic with active producers\u002Fconsumers by default. You can force delete with -f (and -d to also delete schema) if needed. Deleting a partitioned topic requires delete-partitioned-topic command (which removes all partitions).",[321,13332,13333,13336,13342,13352,13359],{},[324,13334,13335],{},"Other Handy Commands: A few more worth noting:List Tenants: bin\u002Fpulsar-admin tenants list – shows all tenants (Kafka has no direct analog since it’s not multi-tenant by default).",[324,13337,13338,13339],{},"List Namespaces: bin\u002Fpulsar-admin namespaces list ",[13228,13340,13341],{}," – shows namespaces in a tenant (akin to logical group of topics; again, Kafka doesn’t have this concept natively).",[324,13343,13344,13345],{},"Examine Messages: bin\u002Fpulsar-admin topics peek-messages -s ",[13346,13347,13348,13349],"sub",{}," -n 1 ",[9857,13350,13351],{}," lets you peek at a subscription’s unacknowledged message(s) without consuming\u002Facking. This is useful to inspect queued messages for a subscription (Kafka doesn’t have an exact equivalent, since unconsumed messages are just those at an offset the consumer hasn't reached).",[324,13353,13354,13355],{},"Skip Messages: bin\u002Fpulsar-admin topics skip ",[13356,13357,13358],"args",{}," can skip messages on a subscription (acknowledging them without consuming) – helpful for clearing a backlog without reading everything.",[324,13360,13361],{},"Shell Completion and Help: pulsar-admin supports --help on any subcommand, and you can use tab completion if configured. The CLI is well-documented, and you can refer to the official docs for all flags.",[40,13363,13365],{"id":13364},"example-workflow","Example Workflow",[48,13367,13368],{},"Let’s tie it together with a quick example scenario:",[1666,13370,13371,13374],{},[324,13372,13373],{},"Create a Topic: Suppose you want to create a topic for an application, analogous to Kafka. Run pulsar-admin topics create persistent:\u002F\u002Fpublic\u002Fdefault\u002Fapp-events.",[324,13375,13376],{},"Produce some messages: Use pulsar-client to send a few test messages:",[48,13378,13379],{},"bin\u002Fpulsar-client produce persistent:\u002F\u002Fpublic\u002Fdefault\u002Fapp-events \\",[8325,13381,13384],{"className":13382,"code":13383,"language":8330},[8328],"--messages \"event1\",\"event2\",\"event3\"\n",[4926,13385,13383],{"__ignoreMap":18},[1666,13387,13388,13391],{},[324,13389,13390],{},"Each comma-separated string is sent as a separate message.",[324,13392,13393],{},"Consume the messages: In a separate shell, start a consumer:",[48,13395,13396],{},"bin\u002Fpulsar-client consume persistent:\u002F\u002Fpublic\u002Fdefault\u002Fapp-events -s tester -n 3 -p Earliest",[1666,13398,13399,13402,13405],{},[324,13400,13401],{},"This will receive 3 messages from the tester subscription and then exit. You should see “event1”, “event2”, “event3” output.",[324,13403,13404],{},"Check stats: Now check pulsar-admin topics stats persistent:\u002F\u002Fpublic\u002Fdefault\u002Fapp-events. It should show no backlog for subscription \"tester\" (since we consumed and acked all messages), and it will show the total messages published, throughput, etc..",[324,13406,13407],{},"Experiment with reset: If you run the consumer again with the same subscription, you won’t get any messages (they were already acknowledged). But you can reset the subscription to earliest:",[48,13409,13410],{},"bin\u002Fpulsar-admin topics reset-cursor persistent:\u002F\u002Fpublic\u002Fdefault\u002Fapp-events \\",[8325,13412,13415],{"className":13413,"code":13414,"language":8330},[8328],"--subscription tester --time 0\n",[4926,13416,13414],{"__ignoreMap":18},[1666,13418,13419],{},[324,13420,13421],{},"Now running the consumer again will re-read the messages from the start (very useful for replays, similar to Kafka consumer groups that seek to beginning).",[48,13423,13424],{},"This workflow shows how Pulsar’s CLI can accomplish what Kafka engineers expect, often with even more flexibility (for example, the ability to peek or reset cursors easily).",[40,13426,8924],{"id":8923},[321,13428,13429,13439,13442,13445,13448],{},[324,13430,13431,13432,758,13435,13438],{},"Pulsar’s main admin tool is pulsar-admin, which combines the functionality of multiple Kafka scripts (topics, configs, consumer groups) into one CLI. Use pulsar-admin ",[2628,13433,13434],{},"resource",[2628,13436,13437],{},"operation"," format (e.g., topics list, namespaces create, brokers list) to manage the cluster.",[324,13440,13441],{},"Pulsar topics are referred to by a full name including tenant and namespace (e.g., persistent:\u002F\u002Ftenant\u002Fnamespace\u002Ftopic). Ensure you include the full name in CLI commands to avoid confusion. The default tenant\u002Fnamespace in a new standalone cluster is public\u002Fdefault.",[324,13443,13444],{},"Producing and consuming test messages is easy with pulsar-client (no coding required). This parallels Kafka’s console producer\u002Fconsumer tools and is great for smoke testing topics.",[324,13446,13447],{},"Many Kafka concepts (like consumer group offset resets, topic inspections, etc.) are available via Pulsar CLI: you can reset subscriptions, skip messages, and even peek at messages without consuming – features that can simplify troubleshooting.",[324,13449,13450],{},"Pulsar’s CLI embraces Pulsar’s multi-tenancy and segmentation. You’ll routinely use tenant and namespace in commands (unlike Kafka). This guides us into the next post, where we’ll dive into Tenants, Namespaces & Bundles – the foundations of Pulsar’s multi-tenant architecture.",[48,13452,3931],{},[208,13454],{},[48,13456,3931],{},[48,13458,8956,13459,8960],{},[55,13460,5405],{"href":6135,"rel":13461},[264],[48,13463,8963],{},[48,13465,13466],{},[55,13467,8970],{"href":8968,"rel":13468},[264],[48,13470,13471],{},[55,13472,8976],{"href":7969,"rel":13473},[264],[48,13475,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":13477},[13478,13479,13480,13481],{"id":42,"depth":19,"text":46},{"id":13211,"depth":19,"text":13212},{"id":13364,"depth":19,"text":13365},{"id":8923,"depth":19,"text":8924},"2025-07-30","Discover how to transition from Kafka to Pulsar with this CLI cheatsheet. Learn equivalent commands for creating topics, producing\u002Fconsuming messages, and managing Pulsar resources for a smooth migration.","\u002Fimgs\u002Fblogs\u002F688a3f966e25861a466483c8_01.-Kafka-→-Pulsar-CLI-Cheatsheet.png",{},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-1-kafka-pulsar-cli-cheatsheet",{"title":13186,"description":13483},"blog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-1-kafka---pulsar-cli-cheatsheet",[7347,821,799],"O05ZPZEkxNtxvqcdgHdk1yLdxqS__GBQM-dJUJMUsdc",{"id":13492,"title":13493,"authors":13494,"body":13495,"category":3550,"createdAt":290,"date":13602,"description":13603,"extension":8,"featured":294,"image":13604,"isDraft":294,"link":290,"meta":13605,"navigation":7,"order":296,"path":13606,"readingTime":4475,"relatedResources":290,"seo":13607,"stem":13608,"tags":13609,"__hash__":13610},"blogs\u002Fblog\u002Fintroducing-streamnative-cloud-notifications-for-functions.md","Introducing StreamNative Cloud Notifications for Functions",[311,6969],{"type":15,"value":13496,"toc":13597},[13497,13500,13504,13507,13510,13513,13516,13524,13527,13530,13535,13537,13540,13545,13548,13553,13556,13561,13563,13566,13569,13572,13574,13595],[48,13498,13499],{},"We’re excited to announce the launch of notifications capabilities for StreamNative Functions—a new feature designed to help teams monitor critical workloads running in their clusters. This is the first step in bringing a broader notifications experience to StreamNative Cloud, starting with Functions and expanding to other services in the future.",[40,13501,13503],{"id":13502},"built-in-notification-rules-for-functions","Built-In Notification Rules for Functions",[48,13505,13506],{},"With this initial release, notification is scoped specifically to StreamNative Functions. Our built-in notification rules allow you to quickly gain visibility into issues before they impact production. Today, we offer two types of preconfigured rules that cover common failure scenarios and health checks.",[48,13508,13509],{},"When a rule is triggered, StreamNative Cloud will automatically send an email notification so you can take prompt action.",[48,13511,13512],{},"Key highlights:",[48,13514,13515],{},"Enable or Disable Rules From Console:\nYou can easily turn notifications on or off for each rule directly from the console, giving you fine-grained control over what you monitor.Two types of rules are available today, covering common failure scenarios and health checks.",[321,13517,13518,13521],{},[324,13519,13520],{},"function-crash-loop-backoff",[324,13522,13523],{},"function-oom-killed",[48,13525,13526],{},"More built-in rules are planned for future releases.",[48,13528,13529],{},"By default, the notifications are disabled. Users will have to enable them.",[48,13531,13532],{},[384,13533],{"alt":18,"src":13534},"\u002Fimgs\u002Fblogs\u002F687a39511325bafdfe20a825_AD_4nXeZL1OIEtsnHBqzQaUqt-65zc6OrJy2-Av8D9HAujARoOMndT3KccMtHaDJdjNXxwv6xwO0mqgn_HbWsZrYmGkt_iZY9LHbypMSX5Uqc0UO9FFZ15d35JcQW7-KJclDKfow2Pr8Rw.png",[48,13536,3931],{},[48,13538,13539],{},"Immediate Email Notifications:\nAs soon as a notification is triggered in your cluster, an email is dispatched to ensure you never miss a critical event.",[48,13541,13542],{},[384,13543],{"alt":18,"src":13544},"\u002Fimgs\u002Fblogs\u002F687a39511325bafdfe20a82e_AD_4nXfk-0SdDuFv5XDOI4NkYmA-GyyJ6qg-ArW80s_wZusHt9hTQKwQUasaB9ZycHwRq_5U9PlEyy0DNmAz0wgWcDbzTU__zJwVSuuoo1XomFVCMHiqMxo5SKGDHi1z_iQ40SFCJQjzeg.png",[48,13546,13547],{},"Once the incident is resolved, an additional email is sent to notify users of the resolution.",[48,13549,13550],{},[384,13551],{"alt":18,"src":13552},"\u002Fimgs\u002Fblogs\u002F687a39511325bafdfe20a829_AD_4nXceqaaJTp5b9T8NtNz0V5rqXil0TjMhXs2STei4XhxBG-dIuDzI3nv48GiGsLocIf2-bG4RyEq8vdXsbE83JvJmUy8Xwi3YBRE1ZNw-iwm75nar6xLzh2Ix1-NcrlNH4dqpXtr-Cw.png",[48,13554,13555],{},"Flexible Recipient Management:\nEmail notifications are sent to your configured recipients. By default, notifications go to your organization’s technical contact email, or the billing contact email if a technical contact isn’t specified.",[48,13557,13558],{},[384,13559],{"alt":18,"src":13560},"\u002Fimgs\u002Fblogs\u002F687a39511325bafdfe20a820_AD_4nXcom0N5EBthjmI39rgiMUWusAUi2OtPyc62FgjPlb-Hk5CY57j-f5MATxPTb4Xe6U5hUnT0agDs19M45a577yinHkv2swtVpEbnx2S2NWZ5K1sV_1LgTn0Io9xHVexvd-CmmNEB9g.png",[48,13562,3931],{},[40,13564,13565],{"id":1727},"What’s Next?",[48,13567,13568],{},"This is just the beginning. Over the coming months, we’ll expand notification rules to cover other components across the StreamNative Cloud platform so you can monitor everything from Pulsar brokers and connectors to storage integrations—all within a unified experience.",[48,13570,13571],{},"Stay tuned for future updates as we continue to build out a robust observability toolkit to help you operate data streaming workloads with confidence.",[40,13573,2149],{"id":2146},[48,13575,13576,13581,13582,13586,13587,13594],{},[55,13577,13579],{"href":3907,"rel":13578},[264],[44,13580,7137],{},"** **and get started for free. ",[55,13583,13584],{"href":7141},[44,13585,7142],{}," to learn more about StreamNative Cloud. Visit your StreamNative Cloud Console to ",[55,13588,13591],{"href":13589,"rel":13590},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Flog-and-monitor\u002Fmanage-notifications",[264],[44,13592,13593],{},"enable notifications for your Functions"," and take control of your monitoring workflows.",[48,13596,4446],{},{"title":18,"searchDepth":19,"depth":19,"links":13598},[13599,13600,13601],{"id":13502,"depth":19,"text":13503},{"id":1727,"depth":19,"text":13565},{"id":2146,"depth":19,"text":2149},"2025-07-18","Get real-time email alerts for critical issues like crashes and OOM kills. Monitor your data streaming workloads effortlessly with built-in rules—enabled directly from your console. Start for free today!","\u002Fimgs\u002Fblogs\u002F687a37ec61b3bce553cc7eb3_Cloud-notifications-for-functions.png",{},"\u002Fblog\u002Fintroducing-streamnative-cloud-notifications-for-functions",{"title":13493,"description":13603},"blog\u002Fintroducing-streamnative-cloud-notifications-for-functions",[3550,9636],"9SPcgSWFaS1RM_L9u1gVYqkjNJuq8pTilfseNTUGDEs",{"id":13612,"title":13613,"authors":13614,"body":13615,"category":7338,"createdAt":290,"date":13705,"description":13706,"extension":8,"featured":294,"image":13707,"isDraft":294,"link":290,"meta":13708,"navigation":7,"order":296,"path":13653,"readingTime":4475,"relatedResources":290,"seo":13709,"stem":13710,"tags":13711,"__hash__":13712},"blogs\u002Fblog\u002Fbeyond-the-broker-standardizing-the-streaming-api.md","Beyond the Broker: Standardizing the Streaming API",[806,28],{"type":15,"value":13616,"toc":13703},[13617,13620,13623,13655,13657,13659,13662,13665,13673,13676,13679,13682,13685,13688,13691,13694,13697,13700],[48,13618,13619],{},"Navigate the series — De-composing Streaming Systems:",[48,13621,13622],{},"This article is one chapter in a five-part deep dive into the future of real-time data. Explore the rest of the series here:",[321,13624,13625,13631,13637,13643,13649],{},[324,13626,13627],{},[55,13628,13630],{"href":13629},"\u002Fblog\u002Fwhy-streams-need-their-iceberg-moment","Part 1 — Why Streams Need Their Iceberg Moment",[324,13632,13633],{},[55,13634,13636],{"href":13635},"\u002Fblog\u002Fanatomy-of-a-stream-data-vs-metadata-vs-protocol","Part 2 — Anatomy of a Stream: Data vs Metadata vs Protocol",[324,13638,13639],{},[55,13640,13642],{"href":13641},"\u002Fblog\u002Finside-stream-format-a-table-for-infinite-logs","Part 3 — Inside Stream Format: A Table for Infinite Logs",[324,13644,13645],{},[55,13646,13648],{"href":13647},"\u002Fblog\u002Fcatalogs-for-streams-lessons-from-icebergs-rest-spec","Part 4 — Catalogs for Streams: Lessons from Iceberg’s REST Spec",[324,13650,13651],{},[55,13652,13654],{"href":13653},"\u002Fblog\u002Fbeyond-the-broker-standardizing-the-streaming-api","Part 5 — Beyond the Broker: Standardizing the Streaming API.",[208,13656],{},[48,13658,3931],{},[48,13660,13661],{},"In the messaging and streaming arena, there has never been a one-size-fits-all protocol. Apache Kafka, RabbitMQ, Apache Pulsar, NATS, MQTT, AMQP – each was created with different assumptions and goals. This diversity is reminiscent of the early database world with multiple query languages, before SQL became a standard. But unlike databases, streaming systems have fundamental semantic differences that make a single unified “standard API” challenging. Instead of forcing one protocol to rule them all, the emerging consensus is to embrace multiple protocols (each optimized for certain use cases) and ensure they can interoperate at a deeper level. It’s analogous to the data lakehouse philosophy: multiple query engines can coexist (Spark, Trino, TensorFlow), as long as they operate on the same unified data storage. Similarly, multiple streaming protocols might coexist while sharing the same underlying event streams.",[48,13663,13664],{},"First, let’s acknowledge why multiple protocols exist and persist:",[321,13666,13667,13670],{},[324,13668,13669],{},"Different messaging semantics: Kafka popularized the idea of a durable log with replayable events and consumer-driven offsets – great for streaming analytics and event sourcing. RabbitMQ (AMQP) and similar MQs focus on push-based, worker queue semantics (each message goes to one consumer, often for task processing) with features like acknowledgments, routing keys, and transactions for reliability in business processes. Pulsar designed a system to handle both patterns (pub-sub and work-queues) in one, introducing the concept of exclusive vs shared subscriptions. Meanwhile, systems like MQTT and NATS cater to lightweight, transient messaging (IoT devices, in-memory microservices) where low overhead and simplicity matter more than durability. No single protocol covers all these scenarios perfectly, because optimizing for one can mean trade-offs for another (e.g., a design for ultra-low latency ephemeral messaging might not guarantee durability or ordering needed for financial event streams).",[324,13671,13672],{},"Historical ecosystems: Companies and open-source communities have built rich ecosystems around these protocols. Kafka, for example, has an entire ecosystem of connectors, Stream processing libraries, and a large installed base. JMS (Java Messaging Service) tried to standardize an API for message queues, but it mainly provided a common abstraction in Java – it didn’t unify wire protocols across vendors. The inertia of existing applications means any “new standard” would have to either seamlessly emulate these protocols or convince everyone to rewrite their systems, which is unlikely.",[48,13674,13675],{},"That said, we do see a convergence in capabilities. Modern Kafka is adding features that look more like traditional queues: for instance, Kafka 4.0 introduced “Queues for Kafka” (KIP-932), which enables true shared consumption where a group of consumers can cooperatively consume from a topic without fixed partitions. This essentially gives Kafka point-to-point queue semantics (multiple consumers dividing up messages of a topic) similar to JMS or Pulsar’s shared subscription. On the flip side, Apache Pulsar from day one offered both queue and pub-sub in one API (you can create a subscription as exclusive, shared, or failover), and even introduced transaction support to match Kafka’s exactly-once features. RabbitMQ has added streams (a new data structure for persistent logs) to catch up with high-throughput use cases that Kafka handles. We see that protocols are evolving and borrowing features: the gaps are narrowing.",[48,13677,13678],{},"However, this doesn’t mean they are becoming identical or that one will subsume all others. Each community prioritizes different aspects – for example, Kafka prioritizes throughput and a simple partition model, Pulsar prioritizes multi-tenancy and infinite retention via tiered storage, RabbitMQ prioritizes flexible routing and ease of use for work queues, etc. Therefore, expecting a single standard API (akin to ODBC or JDBC in databases) to replace these is unrealistic in the near term. The richer the semantics, the harder to standardize without lowest-common-denominator.",[48,13680,13681],{},"So, what’s the path beyond the broker? It’s to look below or behind the broker API – towards the storage and data layer. Instead of standardizing the API that producers\u002Fconsumers use, standardize how the data is stored and shared so that different APIs can access it. This is exactly how the lakehouse works for batch data: engines don’t need the same API, they just need to agree on the format of data (Parquet, Iceberg metadata). In streaming, this could mean agreeing on a common log or table format for the messages, and building adapters so that a Kafka client and a Pulsar client, for example, could read from the same stream of events. We already discussed how Ursa writes data to open formats – envision that a Kafka application writes to a stream and a Pulsar application reads from that same stream’s storage, each using their own API, but the data interchange happens at the storage layer in Parquet\u002FJSON format. StreamNative’s platform actually moves in this direction: they allow Kafka clients to produce to a Pulsar-managed topic (via KSN: Kafka on StreamNative which uses Pulsar underneath). In that scenario, Pulsar’s broker is translating the Kafka protocol into the underlying Pulsar log, and because Pulsar offloads data to tiered storage in open format, any other protocol handler or tool that knows how to interpret that format could also consume it.",[48,13683,13684],{},"In essence, multi-protocol streaming is becoming a reality, much like multi-engine lakehouses. Apache Pulsar’s architecture supports pluggable protocol handlers – already there are implementations for Kafka (so Kafka apps talk to Pulsar as if it were a Kafka broker), for AMQP (starlight for RabbitMQ), and others. This means one data stream can be accessed via multiple APIs. Another approach is at the ingestion level: for instance, an event could be produced via an HTTP API (e.g., a REST call) and consumed via a WebSocket or Kafka API – again multiple interfaces to the same stream. The data layer unifies them. We can draw a parallel to multi-engine lakehouse: in a lakehouse, you don’t force all queries to use one SQL dialect or one engine; you let each engine do what it’s best at (Spark for large ETL, Pandas for small-scale data science, Dremio\u002FTrino for ad-hoc SQL) but ensure they operate on the same single source of truth. For streaming, one protocol might be best for one scenario (say MQTT for IoT ingestion, because it’s lightweight), another for another scenario (Kafka API for connecting to legacy systems that speak Kafka), and a third for something else (Pulsar’s own API for its rich feature set). If they all write to\u002Fread from the same stored stream, we’ve achieved interoperability without forcing a single API.",[48,13686,13687],{},"Let’s consider a concrete example: imagine an e-commerce company with a stream of orders. Some internal systems are built with Kafka and use its API to produce and consume order events. Meanwhile, a new microservices team prefers Pulsar for its flexibility and multi-tenancy. In a traditional world, you’d either run two parallel pipelines (duplication) or try to bridge them with connectors (added complexity). In the emerging world, you could use a unified storage format for the “orders” stream – say an Iceberg table or a distributed log on S3. The Kafka producers send events, a Pulsar cluster (with a Kafka protocol handler) ingests them into that storage, and Pulsar consumers or even Athena queries can access the data. Both teams see the same events consistent in storage, even if one team thinks in terms of Kafka topics and the other in Pulsar subscriptions. This scenario is already hinted at by cloud offerings: for example, Cloudera’s cloud platform had a unified messaging where multiple interfaces sat on top of the same store, and Azure’s Event Hubs can speak Kafka protocol while using its own storage underneath.",[48,13689,13690],{},"It’s worth noting that attempts have been made to define a common messaging API (e.g., AMQP as an open wire protocol, and the OpenMessaging initiative under Linux Foundation). AMQP is used by many systems (including RabbitMQ, Apache Qpid, Azure Service Bus) – it provides a standardized wire format for messaging operations. Yet Kafka notably did not adopt AMQP, and neither did most log-based systems, because it didn’t align with their design. OpenMessaging aimed to be a cloud-era abstraction to allow applications to be messaging-system-agnostic. It defined some common concepts (Message, Producer, Consumer, Namespace) and even a benchmark suite. However, it’s still not broadly accepted as the API – partly because again, the lowest common feature set may be too limiting, and performance optimizations often rely on protocol-specific tweaks.",[48,13692,13693],{},"Given this reality, the industry trend is toward protocol adapters and bridges rather than a new unified protocol. Multi-protocol brokers like Pulsar can natively speak multiple languages to clients. Kafka itself, via the community (Confluent’s efforts or others), might gain bridging capabilities (e.g., ingest MQTT directly, etc.). There’s also the idea of event formats like CloudEvents (a CNCF standard for event message schema) to standardize the content of messages even if transport differs.",[48,13695,13696],{},"The phrase “beyond the broker” implies we should look past the broker-specific APIs to the underlying substrate of streaming. That substrate is the log of events itself. Standardizing that – via open file formats, shared object stores, and common metadata – is more feasible and arguably more useful than trying to get everyone to use the same client API. It means, for example, a company could run multiple broker technologies (Kafka for some parts of the workload, Pulsar for others, maybe AWS Kinesis for something else) but decide that all will offload their data to a unified storage layer (say an S3 data lake in Iceberg format). In that unified storage, each topic\u002Fstream from any source is just a table or folder. Consumers that really don’t care about the live sub-second latency could even read directly from that store (batch or micro-batch style), while real-time consumers attach to the brokers. Over time, as brokers themselves evolve to separate compute\u002Fstorage (which Pulsar already does, Kafka is also evolving with Tiered Storage), the storage becomes the source of truth, and brokers are more like caching and routing layers.",[48,13698,13699],{},"To sum up, trying to standardize the streaming API is a bit like trying to standardize programming languages – it’s not necessary if you can standardize the ABI or runtime under the hood. Each protocol will continue to serve its niche and play to its strengths – there is no one-size-fits-all protocol, and that’s okay. The focus should instead be on interoperability: ensuring data can flow from one system to another with minimal friction. The unified log\u002Ftable storage approach is a promising path to achieve this. It decouples the “language” of the streams from the data itself. In practical terms, we’ll see more systems where a single stream of events can be accessed via multiple APIs. It’s already happening with Pulsar’s multi-protocol support and Kafka’s foray into queue semantics.",[48,13701,13702],{},"In the future, we might not need to ask “should I use Kafka or Pulsar or RabbitMQ for this?” as an either-or question. We might publish data, and that data can be consumed by any number of different protocol clients depending on what’s convenient – much like data in a lakehouse can be queried by SQL, or read via Python, or processed with R, all equally. The broker becomes less of a monolith that holds data hostage in its format, and more of a serving layer. Going beyond the broker means designing streaming systems where the value lies in the data and its open accessibility, rather than in proprietary APIs. It’s an exciting convergence of ideas: messaging systems learning from data lakes, and vice versa. By standardizing on storage and embracing multiple protocols, we get the reliability and maturity of existing systems without forcing a single new standard. In short, the future of streaming will be multi-protocol, and that’s not a drawback but a strength – as long as we ensure they can all talk to each other’s data. The lakehouse for streams is on the horizon, and it speaks many languages fluently.",{"title":18,"searchDepth":19,"depth":19,"links":13704},[],"2025-07-17","Explore the future of streaming APIs beyond Kafka, Pulsar, and RabbitMQ. Why a single standard won’t work—and how multi-protocol interoperability (like data lakehouses) is the real solution. Learn how open storage formats bridge Kafka, Pulsar, and more.","\u002Fimgs\u002Fblogs\u002F6878c2fd98a89a75de63a8a2_beyond-the-broker.png",{},{"title":13613,"description":13706},"blog\u002Fbeyond-the-broker-standardizing-the-streaming-api",[800,821,799,3550],"IAtlazJ2f7deoNEMqK-KRzVc7j1-ZI7G0PpIQbz3j6c",{"id":13714,"title":13715,"authors":13716,"body":13717,"category":1332,"createdAt":290,"date":13804,"description":13805,"extension":8,"featured":294,"image":13806,"isDraft":294,"link":290,"meta":13807,"navigation":7,"order":296,"path":13647,"readingTime":4475,"relatedResources":290,"seo":13808,"stem":13809,"tags":13810,"__hash__":13811},"blogs\u002Fblog\u002Fcatalogs-for-streams-lessons-from-icebergs-rest-spec.md","Catalogs for Streams: Lessons from Iceberg’s REST Spec",[806,28],{"type":15,"value":13718,"toc":13802},[13719,13721,13723,13745,13747,13749,13752,13755,13766,13769,13772,13779,13782,13796,13799],[48,13720,13619],{},[48,13722,13622],{},[321,13724,13725,13729,13733,13737,13741],{},[324,13726,13727],{},[55,13728,13630],{"href":13629},[324,13730,13731],{},[55,13732,13636],{"href":13635},[324,13734,13735],{},[55,13736,13642],{"href":13641},[324,13738,13739],{},[55,13740,13648],{"href":13647},[324,13742,13743],{},[55,13744,13654],{"href":13653},[208,13746],{},[48,13748,3931],{},[48,13750,13751],{},"When you adopt the idea of streams as tables, a new question arises: How do we track and discover all these streaming tables? In traditional streaming platforms, the “catalog” of topics (streams) is often just the broker or cluster itself – for example, Kafka brokers know what topics exist, and clients ask the broker for metadata. There isn’t a global, standardized catalog for streams akin to a Hive Metastore or Glue Catalog in the batch world. However, as streaming data starts living in open table formats, the need for a stream catalog becomes clear. We want a central place to register, enumerate, and manage stream metadata (namespaces, schemas, retention policies, etc.), ideally in a vendor-neutral, interoperable way. Here is where lessons from Apache Iceberg’s REST Catalog specification can be applied.",[48,13753,13754],{},"Iceberg’s REST Catalog spec was introduced to solve a metadata interoperability problem for tables. Previously, each deployment might use a different catalog backend (Hive Metastore, AWS Glue, etc.), making it hard to integrate across systems. The Iceberg REST spec defines a uniform HTTP API for table operations – creating tables, listing tables and namespaces, retrieving table metadata, and committing changes (snapshots) – regardless of the underlying implementation. This standardization brought several benefits that are just as relevant for streaming catalogs:",[321,13756,13757,13760,13763],{},[324,13758,13759],{},"Interoperability: A RESTful catalog API means any client (in any language) can manage and query the metadata of data objects using simple HTTP calls. For streams, this could mean different streaming engines or services could all register their streams in one central catalog service.",[324,13761,13762],{},"Decoupling Metadata Store: The spec abstracts what the metadata is (tables and schemas) from where it is stored. In Iceberg’s case, you can have a REST catalog backed by a relational DB, NoSQL store, or even a Git repo – clients don’t need to know. Similarly, a stream catalog could be backed by a highly available service (perhaps built on a consensus DB or cloud service), but clients just see a uniform REST interface.",[324,13764,13765],{},"Multi-Tenancy and Cloud-Native Design: REST catalogs are designed to be cloud-friendly (HTTP-based, stateless) and support auth tokens for multi-tenant security. A streams catalog should offer the same, since organizations will have many teams registering streams and need access control and auditing at a central point.",[48,13767,13768],{},"How would a catalog for streams differ from one for tables? The core entities are similar – we have namespaces (or tenants), stream names, and schema – but streams also have traits like partitions, replication factors, and retention policies. Operations on a stream (like “create stream”, “delete stream”) are analogous to table operations. Iceberg’s spec already covers creating and dropping tables and even transactions for commits. One can imagine extending a similar RESTful approach: e.g., \"POST \u002Fv1\u002Fstreams\" to create a new stream in the catalog (with parameters like number of partitions, etc.), or \"GET \u002Fv1\u002Fstreams\u002F{name}\" to fetch metadata about a stream (its schema, location, status). The key lesson from Iceberg is to use open and standard APIs for these operations, rather than proprietary RPCs tied to one vendor’s platform.",[48,13770,13771],{},"In fact, we’re starting to see this pattern. StreamNative’s Ursa engine, when writing Pulsar streams into Iceberg tables, uses Iceberg’s REST Catalog under the hood. When a new topic is created in Ursa, it calls the Iceberg REST API to create a corresponding table for that stream. The catalog (which could be an AWS Glue or a Snowflake’s Iceberg implementation, etc.) now knows about the table for that stream. This means any external tool or analytics service can discover the stream’s data via standard catalog queries. For example, AWS’s analytic services (like Athena or SageMaker) can list and query those Iceberg tables once they are registered, without special integration to Ursa. The stream metadata (table schemas, partition info) lives in the same catalog as batch tables, breaking down the wall between real-time and batch datasets.",[48,13773,13774,13775,13778],{},"Figure: ",[55,13776,13777],{"href":6647},"StreamNative Ursa"," integrates with an Iceberg REST Catalog to map streaming topics into table metadata on cloud object storage (Amazon S3 in this case). Each Pulsar\u002FKafka topic (left) gets an Iceberg table in a catalog (center), stored under a namespace corresponding to the topic’s tenant and namespace. This allows external query engines and services (right) to discover and query stream data using the standard table interface, treating streams as just another set of tables.",[48,13780,13781],{},"From these lessons, a vision emerges for catalogs for streams:",[321,13783,13784,13787,13790,13793],{},[324,13785,13786],{},"Streams should be first-class entries in a unified metadata store. Whether it’s an Iceberg REST catalog or another open standard, we need a place where all data streams are registered just like tables. This makes streams discoverable by data analysts and engineers who might not be familiar with the messaging system details.",[324,13788,13789],{},"The catalog would store stream schema (much like a table schema), and possibly stream-specific properties (number of partitions, retention period, etc.). It could also track current status (for instance, is the stream live or paused) and the mapping to storage (e.g., the cloud bucket or path where the stream’s table data lives).",[324,13791,13792],{},"By using a REST API or similar open interface, any tool or platform can integrate to create or query streams. Imagine a CI\u002FCD pipeline calling \"DELETE \u002Fv1\u002Fstreams\u002Forders\" to clean up a stream, or a data catalog UI listing all streams under a project by calling \"GET \u002Fv1\u002Fnamespaces\u002FprojectX\u002Fstreams\". This decoupling means your streaming metadata isn’t locked inside a single vendor’s broker – it’s accessible and portable.",[324,13794,13795],{},"Importantly, a stream catalog can help manage consistency between multiple protocols. If the same underlying stream is accessible via, say, a Pulsar API and a Kafka API (multi-protocol access), a shared catalog entry can represent that one logical stream. Clients of either protocol could then consult the same catalog to understand the stream’s schema and history.",[48,13797,13798],{},"By looking at Iceberg’s REST spec, we also learn the value of transactions in metadata for streaming. In Iceberg, when data is appended to a table, the commit is a transactional API call to update the table state (with optimistic concurrency control). Ursa leverages this by committing each batch of events as an Iceberg transaction, ensuring no partial or corrupt metadata states. A future streams catalog spec might similarly allow committing offsets or watermarks as part of metadata. For instance, a commit could encapsulate “I’ve added these new files (or log segments) to the stream’s storage, corresponding to events up to timestamp X.” Having a standardized way to commit and track stream progress in the catalog could enable cross-system consistency (imagine a Flink job advancing a streaming query and recording its point of consistency in the catalog).",[48,13800,13801],{},"In summary, the world of streaming is borrowing the playbook of data lakehouse metadata. Apache Iceberg’s REST catalog spec teaches us that open, RESTful metadata services can foster interoperability across diverse tools. Applying this to streams means treating streams similarly to tables in our organizational data catalog. It’s a shift from the siloed view (where only the message broker knows about the stream) to a global view where streams are discoverable data assets. The payoff is huge: easier integration of real-time data in analytics, unified governance (one can apply data policies uniformly), and the ability to mix streaming and batch sources seamlessly in data pipelines. As streaming data continues to grow, adopting standard cataloging practices will ensure that real-time datasets don’t become second-class citizens in the data ecosystem. Instead, they will be as easily searched, understood, and integrated as any table – thanks to lessons learned from Iceberg and the lakehouse community.",{"title":18,"searchDepth":19,"depth":19,"links":13803},[],"2025-07-16","Learn how applying Apache Iceberg's REST Catalog specification to streaming data can create a unified, vendor-neutral catalog for streams, improving discoverability, interoperability, and governance for real-time datasets.","\u002Fimgs\u002Fblogs\u002F687781c4a55fe30abb368c41_catalogs-for-streams.png",{},{"title":13715,"description":13805},"blog\u002Fcatalogs-for-streams-lessons-from-icebergs-rest-spec",[1332,1330],"n1jWUb3eBTLmcm60DdUzFGtxLm1iVCOMdJXPi3VgqYI",{"id":13813,"title":13814,"authors":13815,"body":13816,"category":6415,"createdAt":290,"date":13984,"description":13985,"extension":8,"featured":294,"image":13986,"isDraft":294,"link":290,"meta":13987,"navigation":7,"order":296,"path":13845,"readingTime":4475,"relatedResources":290,"seo":13988,"stem":13989,"tags":13990,"__hash__":13991},"blogs\u002Fblog\u002Fone-bus-many-voices-why-protocol-flexibility-matters-for-ai-agents.md","One Bus, Many Voices: Why Protocol Flexibility Matters for AI Agents",[807],{"type":15,"value":13817,"toc":13980},[13818,13821,13824,13846,13848,13850,13853,13862,13866,13869,13883,13886,13889,13900,13903,13906,13920,13923,13927,13941,13944,13947,13949,13966,13969,13972,13975],[48,13819,13820],{},"Explore the Series — Building AI Agents with Apache Pulsar:",[48,13822,13823],{},"This article is part of a three-part deep dive into how messaging architectures—especially the Pulsar protocol—can meet the evolving infrastructure demands of AI agents. From speed and flexibility to built-in resilience, this series unpacks the core messaging principles that power more capable and reliable AI systems.",[321,13825,13826,13833,13840],{},[324,13827,13828,13829],{},"Part 1 - ",[55,13830,13832],{"href":13831},"\u002Fblog\u002Fstreams-vs-queues-why-your-agents-need-both--and-why-pulsar-protocol-delivers","Streams vs. Queues: Why Your Agents Need Both—and Why Pulsar Protocol Delivers",[324,13834,13835,13836],{},"Part 2 - ",[55,13837,13839],{"href":13838},"\u002Fblog\u002Freliability-that-thinks-ahead-how-pulsar-helps-agents-stay-resilient","Reliability That Thinks Ahead: How Pulsar Helps Agents Stay Resilient",[324,13841,13842,13843],{},"Part 3 - ",[55,13844,13814],{"href":13845},"\u002Fblog\u002Fone-bus-many-voices-why-protocol-flexibility-matters-for-ai-agents",[208,13847],{},[48,13849,3931],{},[48,13851,13852],{},"AI agent ecosystems are rarely homogenous – they often involve a mix of languages, frameworks, and device types, each with its own preferred communication protocol. You might have edge IoT sensors speaking MQTT, web services using REST or AMQP, and data pipelines built on Kafka. Integrating all these “voices” into a cohesive system can be a daunting task if your messaging infrastructure is inflexible. In this final post of our series, we explore how Apache Pulsar’s pluggable protocol architecture enables multiple protocols (Pulsar, Kafka, MQTT, etc.) to coexist on a single event bus. We’ll see how this flexibility reduces system sprawl and accelerates development of AI agents, compared to Apache Kafka’s single-protocol model that often requires bolting on additional components.",[48,13854,13855,13856,4003,13859,13861],{},"(Earlier in this series, we discussed Pulsar’s support for multiple messaging patterns and its robust delivery guarantees (",[55,13857,13858],{"href":13831},"Streams vs Queues: Why Your Agents Need Both—and Why Pulsar Protocol Delivers",[55,13860,13839],{"href":13838},"). Now we look at another dimension of flexibility: multiple messaging protocols on one platform.)",[40,13863,13865],{"id":13864},"diverse-agents-diverse-protocols","Diverse Agents, Diverse Protocols",[48,13867,13868],{},"Let’s set the scene with an example: imagine a smart city AI system with various agents:",[321,13870,13871,13874,13877,13880],{},[324,13872,13873],{},"IoT sensors (traffic cameras, weather stations) that send data via MQTT – a lightweight pub\u002Fsub protocol common in IoT.",[324,13875,13876],{},"Backend analytics microservices written in Java using a Kafka client library (because the team has Kafka experience).",[324,13878,13879],{},"Legacy systems or edge devices using AMQP (the protocol behind RabbitMQ and other message brokers) for certain messaging needs.",[324,13881,13882],{},"Perhaps some mobile apps or web dashboards that communicate via WebSockets or REST.",[48,13884,13885],{},"In a traditional setup, you might deploy Kafka for the analytics pipeline, RabbitMQ for the AMQP devices, and an MQTT broker (like EMQX or Mosquitto) for the sensors. You’d then stitch these together: e.g., use Kafka Connect or custom bridges to pipe MQTT data into Kafka, and vice versa, or have services subscribe to multiple systems. This “many systems” approach leads to what we call system sprawl – multiple messaging infrastructures to operate and integrate. It introduces latency at the boundaries, increased ops overhead, and more points of failure.",[48,13887,13888],{},"Apache Kafka’s approach: Kafka uses its own proprietary binary protocol for client communication. Out of the box, Kafka speaks only Kafka protocol. If you have non-Kafka clients (MQTT, AMQP, etc.), Kafka by itself cannot talk to them. You’d typically deploy auxiliary services:",[321,13890,13891,13894,13897],{},[324,13892,13893],{},"For MQTT, one approach is using a bridge or proxy: for instance, Confluent (Kafka’s company) provided an MQTT proxy that translates MQTT to Kafka, or you run a separate MQTT broker and use a Kafka Connect source\u002Fsink to move data between MQTT and Kafka.",[324,13895,13896],{},"For AMQP (e.g., RabbitMQ), you might have to consume from RabbitMQ and republish to Kafka (or vice versa) via a custom connector or application.",[324,13898,13899],{},"Each additional protocol usually means an additional layer or service to translate. This not only adds complexity, but can also limit functionality. For example, if you bridge MQTT to Kafka, features like MQTT’s persistent sessions or Kafka’s exactly-once might not translate perfectly through the bridge.",[48,13901,13902],{},"In short, Kafka’s single-protocol design means that if everything isn’t speaking Kafka, you need glue code or middleware. Many architectures with Kafka end up with a patchwork of brokers: Kafka + a message queue + an MQTT broker, etc., which is exactly what we want to avoid if possible.",[48,13904,13905],{},"Apache Pulsar’s approach: Pulsar was built with a concept of pluggable protocol handlers, enabling it to natively support multiple protocols on the same server. In practice, this means you can configure a Pulsar cluster to understand Kafka’s protocol, MQTT, AMQP, and more – all while storing and delivering messages using Pulsar’s backend. The Pulsar community has developed KoP (Kafka-on-Pulsar), MoP (MQTT-on-Pulsar), and AoP (AMQP-on-Pulsar) among other plugins. When these are enabled, a Pulsar broker effectively “speaks” the respective protocol:",[321,13907,13908,13917],{},[324,13909,13910,13911,13916],{},"KoP: ",[55,13912,13915],{"href":13913,"rel":13914},"https:\u002F\u002Fdzone.com\u002Farticles\u002Funderstanding-kafka-on-pulsar-kop#:~:text=Apache%20Pulsar%20and%20Apache%20Kafka,Pulsar%20%28KoP%29%20comes%20in",[264],"Kafka on Pulsar"," allows Kafka clients (producers\u002Fconsumers using the Kafka API) to connect to Pulsar as if it were a Kafka broker. The Pulsar broker listens on Kafka’s port (e.g., 9092) and understands Kafka protocol messages. This means an existing application coded to use the Kafka Java client can switch to Pulsar by just pointing it to the Pulsar cluster (with KoP enabled), no code change. The data it produces\u002Fconsumes is actually stored in Pulsar topics, not Kafka logs, but the application is none the wiser. This capability dramatically eases migrations – teams can move to Pulsar without rewriting their whole codebase at once. Moreover, once on Pulsar, those apps gain access to Pulsar’s features like multi-tenancy and infinite log retention on cheaper storage tiers, which Kafka lacks or requires add-ons for.",[324,13918,13919],{},"MoP: MQTT on Pulsar works similarly for IoT scenarios. Your swarm of MQTT devices can connect to Pulsar brokers (with MoP enabled) using standard MQTT protocols. They publish and subscribe as if to a regular MQTT broker; under the hood Pulsar stores those messages in its distributed log. This means you don’t need a dedicated MQTT broker for your sensors – Pulsar handles it. And all the nice Pulsar features (like durability, geo-replication, tiered storage of old data) become available to the MQTT streams as well. For example, MQTT is often used with ephemeral brokers that might lose data if a consumer isn’t online. Pulsar’s storage ensures even if an IoT device goes offline, data can be retained until it comes back or can be replayed later.",[48,13921,13922],{},"Beyond these, Pulsar’s design allows adding other protocols relatively easily. In fact, there’s also WebSocket support, and even experiments with other systems (there was an integration called RocketMQ-on-Pulsar, etc.). The key is that Pulsar’s brokers translate whatever protocol into the Pulsar internal message format and back. All messages, regardless of ingress method, end up in the same durable, scalable storage and can be routed to any consumer. This unified bus can drastically simplify an AI architecture.",[40,13924,13926],{"id":13925},"why-does-this-matter-for-ai-agents","Why Does This Matter for AI Agents?",[1666,13928,13929,13932,13935,13938],{},[324,13930,13931],{},"Easier integration of heterogeneous components: AI systems are evolving rapidly, and new tools or services come with their own interfaces. With Pulsar, you don’t have to constrain every component to one protocol. If your robotics team likes MQTT for device telemetry and your data science team likes Spark consuming from Kafka topics, that’s fine – both can work with the same Pulsar cluster. The MQTT devices publish to Pulsar (via MoP) and the Spark job (via KoP) can subscribe to that data, all in real time. No need to maintain a bridge or duplicate the data in two systems. This means you can plug in new agent components faster. The learning curve is lower too: developers can use the client libraries they are already familiar with (Kafka client, MQTT client, etc.) to interface with Pulsar. It lowers the barrier to adoption for various teams contributing to the agent ecosystem.",[324,13933,13934],{},"Reduced system sprawl and cost: Running one Pulsar cluster to handle multiple messaging needs is generally more efficient than running 2–3 separate systems (Kafka + RabbitMQ + MQTT broker). There’s less hardware overhead and fewer subsystems to monitor. For architects, this means fewer single-purpose data silos. Pulsar can act as a “single source of truth” event bus where all agent communications converge, even if they speak different protocols. Maintenance and scaling efforts focus on one system. It’s worth noting that Pulsar’s multi-protocol support doesn’t significantly degrade its performance; in many cases, the overhead of protocol translation is small compared to network and IO costs. So you can simplify your stack without sacrificing throughput.",[324,13936,13937],{},"Protocol-agnostic data flow: Because Pulsar decouples the storage of messages from the protocol, an event produced via one protocol can be consumed via another. For instance, an MQTT sensor publishes a message on topic “sensor\u002Ftemperature,” which is stored in Pulsar. A Kafka client could subscribe to the equivalent Pulsar topic (through KoP) and get those temperature events as if they were coming from Kafka. This inter-protocol bridging is automatic in Pulsar – the topic is the common denominator. In Kafka world, doing such bridging often requires writing a Kafka Connector or a custom adapter service that reads from one system and writes to another, introducing additional latency and points of failure. Pulsar’s unified approach enables more real-time and straightforward data sharing across heterogeneous agents.",[324,13939,13940],{},"Future-proofing and innovation: With Pulsar’s plugin model, you’re less likely to hit a dead end when new tech comes along. If tomorrow a new standard protocol gains popularity in the AI\u002Fagents space, there’s a path to support it on Pulsar by writing a new protocol handler. In contrast, with Kafka you might have to wait for the ecosystem to build a stable connector or gateway, or run that new system separately. Pulsar’s flexibility thus acts as a hedge against changing technology. It also means you can gradually transition systems: for example, run Pulsar with KoP to serve your existing Kafka-based apps, and over time migrate those apps to use Pulsar’s native API if desired (for even more features). During the migration, they continue to interoperate. This “have your cake and eat it” approach speeds up adoption — companies like Tencent, for instance, have used Pulsar to replace Kafka under the hood for certain use cases, precisely because they could do so without telling all upstream\u002Fdownstream apps to change at once.",[48,13942,13943],{},"Let’s illustrate with a scenario: suppose our smart city project initially used Kafka for aggregating events at the city level, and an MQTT broker for field devices. As it grows, the team finds maintaining two systems cumbersome. They decide to consolidate on Pulsar. They enable MoP, point all devices to the Pulsar endpoint (speaking MQTT) – devices don’t even notice the difference except perhaps improved reliability. They enable KoP, redirect existing Kafka clients (data sinks, analytics jobs) to Pulsar – those applications continue running as before. Now all data is flowing through one platform. Immediately, they notice benefits: data from devices is available to Kafka-based consumers with lower latency (no intermediate bridge needed). When a new AI agent service is developed in Python, the developers have options – they could use Pulsar’s native Python client. Either way, they tap into the same live data streams. The operational complexity drops, and the development agility increases (each team can work in the environment that suits them, while the system integrators ensure everything connects through Pulsar).",[48,13945,13946],{},"Meanwhile, Apache Kafka by itself would have pushed the team towards either writing a lot of integration code or standardizing on one protocol (often forcing everything into Kafka’s orbit). Some teams do end up standardizing on Kafka for all components (using Kafka clients everywhere). That can work for certain cases, but in contexts like IoT or edge AI, Kafka’s client library might be too heavy for small devices, or it may lack features like MQTT’s simple subscribe semantics or HTTP-based ingestion, etc. Pulsar avoids that “one size must fit all” trap by natively embracing multiple standards.",[48,13948,8417],{},[321,13950,13951,13954,13957,13960,13963],{},[324,13952,13953],{},"Pulsar’s multi-protocol support (KoP, MoP) allows one Pulsar cluster to natively handle Kafka and MQTT clients and more. This means AI agents and devices can communicate using their protocol of choice while sharing a common event bus.",[324,13955,13956],{},"Easier integration and migration: Kafka clients can migrate to Pulsar without code changes and immediately leverage Pulsar’s advanced features. MQTT devices can connect directly to Pulsar and benefit from its durable storage and scaling. This flexibility accelerates deployment of new agent components and integration of legacy systems.",[324,13958,13959],{},"Reduced complexity: Instead of running separate messaging systems for different parts of your AI platform (and maintaining bridges between them), Pulsar provides a unified infrastructure. Fewer moving parts lead to lower latency and easier operations. For example, integrating MQTT with Kafka otherwise requires connectors or proxies, adding operational burden – Pulsar eliminates that by doing it natively.",[324,13961,13962],{},"Protocol transparency: In Pulsar, an event doesn’t care how it was produced or consumed. A message from an MQTT device can be consumed by a Kafka client or vice versa through the Pulsar broker, enabling cross-ecosystem data flow with no extra code. Your AI agents can thus share information more freely, which is vital for building collaborative, real-time intelligent systems.",[324,13964,13965],{},"Future-proof and extensible: Pulsar’s design anticipates that the tech landscape is varied. As your agent architecture evolves, Pulsar can adapt – supporting new protocols or standards as needed. It gives architects confidence that adopting Pulsar means adopting a platform, not just a single-protocol tool.",[48,13967,13968],{},"In summary, Apache Pulsar serves as a “one bus for many voices.” It lets all the players in your AI system – be they tiny IoT sensors or big data crunching services – communicate through a common medium without forcing them to all speak the same dialect. This reduces friction and speeds up development, because you can choose the best protocol or tool for each job and rely on Pulsar to bridge the gaps. By contrast, Kafka’s more siloed approach often means additional layers or a push to consolidate on Kafka’s API, which isn’t always practical.",[48,13970,13971],{},"For developers and system architects, this protocol agility can be a revelation. It becomes significantly easier to incorporate diverse components into your real-time AI platform. Need to plug in a new third-party service that only knows how to write to Kafka? No problem – point it at Pulsar KoP and you’re done. Want to ingest data from an existing MQTT broker network? Pulsar can be that broker. The end result is an accelerated deployment cycle for AI agents: you spend less time building glue code or deploying connectors, and more time on the agents’ logic and insights.",[48,13973,13974],{},"This concludes our three-part exploration of why Apache Pulsar offers unique advantages for building reasoning and reactive AI agents. We’ve seen how Pulsar’s unified approach to streams and queues, its resilient delivery guarantees, and its protocol flexibility all contribute to a more powerful and adaptable infrastructure. For teams pushing the boundaries of AI applications – where real-time data and robust messaging are key – Pulsar provides a solid foundation that can evolve with your needs.",[48,13976,13977],{},[55,13978,13979],{"href":6392},"Try out Pulsar!",{"title":18,"searchDepth":19,"depth":19,"links":13981},[13982,13983],{"id":13864,"depth":19,"text":13865},{"id":13925,"depth":19,"text":13926},"2025-07-15","Discover how Apache Pulsar's flexible protocol architecture enables diverse AI agents and systems (Kafka, MQTT, AMQP, etc.) to communicate on a single event bus, reducing system sprawl and accelerating development compared to single-protocol alternatives.","\u002Fimgs\u002Fblogs\u002F68763bb779c00796695f20cd_one-bus.png",{},{"title":13814,"description":13985},"blog\u002Fone-bus-many-voices-why-protocol-flexibility-matters-for-ai-agents",[3988,821,10054],"stQft1mxkVi5UlcMcc4WMqXPOQOS7e_ObAff8946No4",{"id":13993,"title":13994,"authors":13995,"body":13996,"category":290,"createdAt":290,"date":14106,"description":14107,"extension":8,"featured":294,"image":14108,"isDraft":294,"link":290,"meta":14109,"navigation":7,"order":296,"path":14110,"readingTime":4475,"relatedResources":290,"seo":14111,"stem":14112,"tags":14113,"__hash__":14114},"blogs\u002Fblog\u002Fstreamnative-universal-linking-expanded-capabilities-now-in-public-preview.md","StreamNative Universal Linking: Expanded Capabilities Now in Public Preview",[311],{"type":15,"value":13997,"toc":14100},[13998,14005,14008,14012,14015,14020,14023,14028,14030,14034,14037,14040,14045,14049,14052,14057,14062,14067,14072,14075,14079,14087,14098],[48,13999,14000,14001,14004],{},"We’re excited to share new enhancements to ",[55,14002,14003],{"href":4863},"StreamNative Universal Linking",", now available in Public Preview. This powerful tool streamlines Kafka workload migrations and hybrid deployments by enabling seamless replication from any Kafka-compatible source—including Redpanda, Amazon MSK, and Apache Kafka—into StreamNative Cloud.",[48,14006,14007],{},"In this blog, we’ll highlight some of the latest updates designed to make Universal Linking even more flexible and developer-friendly.",[40,14009,14011],{"id":14010},"provider-specific-ui-for-kafka-sources","Provider-Specific UI for Kafka Sources",[48,14013,14014],{},"StreamNative Universal Linking features a native, user-friendly UI that streamlines the setup process for connecting to various Kafka providers. Whether users are working with Confluent, Redpanda, Amazon MSK, or open-source Kafka, the interface dynamically adjusts to capture provider-specific configuration fields. This tailored experience simplifies the authentication setup by guiding users through the exact parameters required for each provider, reducing complexity and ensuring secure, accurate connections with minimal effort.",[48,14016,14017],{},[384,14018],{"alt":18,"src":14019},"\u002Fimgs\u002Fblogs\u002F687088149933dd444fb2cb2d_AD_4nXcqZnUBVAbu-BDVhzsgIQ0RGV9ZIpYeEqLhYut7IkK2UhUKDP7OwXJO_vmJ8WUAcYejWI0gqx3SDPLs1_PMcpqi2X191Rj49jsMpztq-stsVDivmqkE7n9jdaV524T-Qa8xDowv.png",[48,14021,14022],{},"StreamNative Universal Linking offers a native UI that simplifies connecting to different Schema Registry providers by dynamically displaying provider-specific authentication fields, making configuration fast and error-free.",[48,14024,14025],{},[384,14026],{"alt":18,"src":14027},"\u002Fimgs\u002Fblogs\u002F687088149933dd444fb2cb33_AD_4nXedLFuUBbDd2Dl3gwpOvmwLy0VSq8sOUbP9wD8SvH9Vp__BD5f7QILUIGB0HgsKm8DGzBJds8w7sUuSML1QD7Khr9c26u27YUr0khuBVFjVJr3f2F4dLPBuKfsA84Pay4_3uLAn.png",[48,14029,3931],{},[40,14031,14033],{"id":14032},"expanded-authentication-options","Expanded Authentication Options",[48,14035,14036],{},"Universal Linking now supports a broader set of authentication mechanisms to connect to source Kafka clusters. Whether you're running on Redpanda, Amazon MSK, or another Kafka-compatible platform, you can now use various authentication methods — SCRAM-SHA, or TLS-based certs—to securely link your systems with ease.",[48,14038,14039],{},"This update enables secure, enterprise-grade connectivity for a growing set of Kafka sources, helping teams avoid the complexity of manual migration or duplicate tooling.",[48,14041,14042],{},[384,14043],{"alt":18,"src":14044},"\u002Fimgs\u002Fblogs\u002F687088149933dd444fb2cb26_AD_4nXeiKn7wz2CDQBSPD9qK7x_3eVpdfs90_D1DJp9HI2Nv9pfXoXPGkD5KVwC_2OGf63Pu-Y90LyPfowus5fge5xG_1ucKlsSw8wjg8nxSQ_H7SiMp2xLAAWdP4RA_XDWDoyaOGDPaVw.png",[40,14046,14048],{"id":14047},"enhanced-console-ui-to-view-replicated-kafka-data-and-schemas","Enhanced Console UI to view replicated Kafka data and schemas",[48,14050,14051],{},"A major UI upgrade in the StreamNative Cloud Console offers increased transparency and control over your replication workflows:",[321,14053,14054],{},[324,14055,14056],{},"Replicated Consumer Groups: Easily view all consumer groups that are part of the source Kafka cluster and track their metadata in real time.",[48,14058,14059],{},[384,14060],{"alt":18,"src":14061},"\u002Fimgs\u002Fblogs\u002F687088149933dd444fb2cb29_AD_4nXdsvE83UxzpJJ7oRrCs6yKW40wJJJ7v2mGZHFmw51Y44rxCakqDZ9hMWMjFiW630nlTK7wAcruwQbnkWauXZPFJsm-X7qWDv0mBkclixliNo0YXs-OtOiZkKsh-k77mBfDIRYBgBQ.png",[321,14063,14064],{},[324,14065,14066],{},"Schema Visibility: Access details of schemas registered and synced during replication, allowing you to verify that your topic data and schema evolution are accurately preserved in StreamNative Cloud.",[48,14068,14069],{},[384,14070],{"alt":18,"src":14071},"\u002Fimgs\u002Fblogs\u002F687088149933dd444fb2cb30_AD_4nXcAm3fNv_5XV7KlDeExoJObhLvHfGLh4zlbv5rDz_5o79nGi_GI9FOWvfV0K5UQX-nO37sRdG7GdkMyrzRmuNkykVUXa7UhsZPRnxDXinIz5XF6kOQDYk_at9rlMeDAravDv-3b9g.png",[48,14073,14074],{},"These updates give developers and operators better observability across both data and metadata, ensuring a smoother and more confident migration journey.",[40,14076,14078],{"id":14077},"try-it-today","Try It Today",[48,14080,14081,14082,190],{},"‍If you're considering migrating from Kafka or Redpanda, or integrating with StreamNative Cloud for hybrid or multi-cloud data architectures, now’s the perfect time to explore Universal Linking. ",[55,14083,14086],{"href":14084,"rel":14085},"https:\u002F\u002Fyoutu.be\u002FK04u9USGW8c",[264],"Watch a quick video to view all the new features",[48,14088,14089,14090,1154,14094,14097],{},"👉 ",[55,14091,14093],{"href":3907,"rel":14092},[264],"Start your free trial",[55,14095,14096],{"href":6392},"reach out to our team"," to get early access and guidance on your migration journey.",[48,14099,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":14101},[14102,14103,14104,14105],{"id":14010,"depth":19,"text":14011},{"id":14032,"depth":19,"text":14033},{"id":14047,"depth":19,"text":14048},{"id":14077,"depth":19,"text":14078},"2025-07-11","Discover the latest enhancements to StreamNative Universal Linking, now in Public Preview. Streamline Kafka workload migrations from Redpanda, Amazon MSK, and more—with expanded authentication options, provider-specific UI, and improved observability tools.","\u002Fimgs\u002Fblogs\u002F68708696037e6ad896a05171_image-112.png",{},"\u002Fblog\u002Fstreamnative-universal-linking-expanded-capabilities-now-in-public-preview",{"title":13994,"description":14107},"blog\u002Fstreamnative-universal-linking-expanded-capabilities-now-in-public-preview",[799,4152,1332],"LCg0vrNY1bZnT_ABa-ugOw980FwbsJJpmKO_e_cn_jg",{"id":14116,"title":14117,"authors":14118,"body":14119,"category":290,"createdAt":290,"date":14212,"description":14213,"extension":8,"featured":294,"image":14214,"isDraft":294,"link":290,"meta":14215,"navigation":7,"order":296,"path":13641,"readingTime":11508,"relatedResources":290,"seo":14216,"stem":14217,"tags":14218,"__hash__":14219},"blogs\u002Fblog\u002Finside-stream-format-a-table-for-infinite-logs.md","Inside Stream Format: A Table for Infinite Logs",[806,28],{"type":15,"value":14120,"toc":14209},[14121,14123,14125,14147,14149,14151,14154,14165,14168,14182,14185,14189,14192,14206],[48,14122,13619],{},[48,14124,13622],{},[321,14126,14127,14131,14135,14139,14143],{},[324,14128,14129],{},[55,14130,13630],{"href":13629},[324,14132,14133],{},[55,14134,13636],{"href":13635},[324,14136,14137],{},[55,14138,13642],{"href":13641},[324,14140,14141],{},[55,14142,13648],{"href":13647},[324,14144,14145],{},[55,14146,13654],{"href":13653},[208,14148],{},[48,14150,3931],{},[48,14152,14153],{},"Streaming data has historically been treated differently from batch data. Streams are often seen as infinite logs – unbounded sequences of events – whereas batch processing uses static tables. Modern stream processing frameworks like Apache Spark or Apache Flink have blurred this line by treating streaming data as an “infinite table” that is processed incrementally. This insight is powerful: if we can model an ever-growing log as a table, we unlock the rich ecosystem of tools and guarantees from the data lake world (SQL queries, schema evolution, ACID transactions, etc.) for real-time data. In essence, a stream can be viewed as a continuously appending table, where each new event is like a new row added forever.",[48,14155,14156,14158,14159,14164],{},[55,14157,1332],{"href":6647},", StreamNative’s data streaming engine, embodies this concept by storing streams in open table formats. In Ursa’s architecture, each topic (stream) is ",[55,14160,14163],{"href":14161,"rel":14162},"https:\u002F\u002Faws.amazon.com\u002Fblogs\u002Fstorage\u002Fseamless-streaming-to-amazon-s3-tables-with-streamnative-ursa-engine\u002F#:~:text=StreamNative%20integrates%20with%20S3%20Tables,integration%20involves%20three%20key%20steps",[264],"materialized as a table on object storage",". When data flows into Ursa, it is immediately written in a table-friendly way: using a combination of a row-oriented Write-Ahead Log (WAL) for fast appends, and columnar files (Parquet) for efficient long-term storage. In practice, this means incoming events are first captured in small WAL files (ensuring low-latency writes and durability), and then compacted into larger Parquet files for analytics-friendly storage. Ursa’s stream format effectively turns a live log into a table with partitions and snapshots, making it queryable by engines like Spark or Trino without any ETL step.",[48,14166,14167],{},"What makes this approach powerful is the use of open table standards (such as Apache Iceberg, Delta Lake, or Hudi) as the underlying format. Ursa doesn’t invent a proprietary storage format; it leverages proven table formats that support schema evolution, indexing, and transactional updates. For example, with Delta Lake as the format, the infinite log of events gains features like ACID transactions and time-travel queries (since Delta maintains a transaction log of all changes). In Ursa’s case, the streaming engine writes data in Lakehouse table format, meaning every event appended to the stream is also an insert into an Iceberg\u002FDelta table. This bridge between streaming and table paradigms yields huge benefits:",[321,14169,14170,14173,14176,14179],{},[324,14171,14172],{},"Immediate Queryability: As soon as data lands in the stream, it’s part of a table that can be queried with SQL or read by any tool that understands Parquet. There’s no need to wait for a batch ETL job to dump the stream into a database – the stream is the database.",[324,14174,14175],{},"Unified Storage: Instead of keeping “hot” data in a messaging system and “cold” data in a separate warehouse, Ursa’s format uses a single storage layer (cloud object storage) for both real-time and historical data. A Pulsar or Kafka topic managed by Ursa will offload older segments to cheap storage in open format, effectively retaining infinite history at low cost.",[324,14177,14178],{},"Schema and Governance: By treating streams as tables, you can enforce schemas on event data and manage them with the same governance tools used for batch data. Schema registries and table catalogs ensure that as your infinite log evolves, consumers always know the data schema and can handle changes safely.",[324,14180,14181],{},"Interoperability: Perhaps most importantly, an open table format for logs means you are not locked into one vendor’s tools. Multiple frameworks (Flink, Spark, Pandas, Presto, etc.) can all read from the same streaming table. This is analogous to how many query engines share access to a Parquet\u002FDelta Lake files on a data lake. In streaming, Ursa’s format makes the “log as a table” accessible to any engine or language, fostering a rich ecosystem instead of a siloed stream.",[48,14183,14184],{},"Ursa’s implementation serves as a reference architecture of this general idea. It demonstrates that you can achieve real-time streaming performance while simultaneously structuring the data as a table in cloud storage. Other systems are moving in a similar direction. For instance, Redpanda has introduced an option to automatically write Kafka topic data into Iceberg table format on S3, and Cloud providers are enabling streaming inserts into table formats. The big picture is a shift toward treating streaming data as first-class table data, eliminating the divide between “streams” and “tables.” By using a table for infinite logs, organizations get the best of both worlds: the continuous, low-latency updates of messaging systems and the strong consistency, queryability, and openness of data lake tables.",[40,14186,14188],{"id":14187},"ursa-stream-format-in-action","Ursa Stream Format in Action",[48,14190,14191],{},"To cement the concept, let’s walk through how Ursa’s stream-table format works when data arrives:",[1666,14193,14194,14197,14200,14203],{},[324,14195,14196],{},"Incoming events → WAL: When a producer publishes messages to a topic, Ursa immediately writes these events to a write-ahead log file on object storage. This WAL is a lightweight append-only file that accumulates recent events quickly (much like Kafka’s segment logs, but stored in the cloud). It ensures durability and low latency. Once the WAL reaches a certain size or time threshold, Ursa will rotate it.",[324,14198,14199],{},"WAL → Columnar Files: Ursa’s engine continuously takes those WAL segments and compacts them into columnar Parquet files. During this compaction, it can also partition the data (e.g., by event time or key) and sort it, which optimizes later queries. The Parquet files constitute the permanent storage of the stream’s data, organized in the directory structure of an Iceberg\u002FDelta table (with partition folders, metadata files, etc.). Each compaction may also create a new snapshot in the table’s metadata, much like a batch job commit.",[324,14201,14202],{},"Metadata Management: Alongside data files, Ursa updates the table format’s metadata (for example, Iceberg manifest lists or Delta transaction log) to record the new Parquet files and delete the WAL segments that have been compacted. This metadata update is atomic and transactional, thanks to the table format. It’s as if every so often the “infinite table” of the stream gets a new committed batch of rows. These frequent small transactions keep the table up-to-date with the stream.",[324,14204,14205],{},"Retention & Evolution: Because the data is in a table format, enforcing retention policies (e.g. drop or archive data older than 1 year) becomes a matter of table maintenance (expiring or deleting old partitions) rather than broker-specific cleanup. Likewise, if the schema of the stream changes (new fields, etc.), the table schema can evolve using the format’s schema evolution features. Ursa’s approach handles this seamlessly, syncing Pulsar topic schemas with the table schema so that both stream consumers and batch readers see a consistent view.",[48,14207,14208],{},"In summary, inside Ursa’s stream format, a log truly behaves like an ever-growing table. This design meets the needs of high-throughput streaming (via append-optimized logs) and the needs of analytics (columnar storage, indexing, schema management) at once. The concept is vendor-neutral: any streaming system could, in theory, adopt a similar architecture of writing to an open table format. The advantage of Ursa is providing this out-of-the-box, turning Apache Pulsar (or Kafka, via compatibility mode) into a “lakehouse-native” streaming system. The takeaway lesson is that as data platforms evolve, the line between streaming and batch storage is disappearing. By viewing streams as infinite tables, we gain a unified data foundation that simplifies architectures and accelerates data access for all use cases.",{"title":18,"searchDepth":19,"depth":19,"links":14210},[14211],{"id":14187,"depth":19,"text":14188},"2025-07-10","Discover how StreamNative's Ursa engine unifies streaming and batch data by treating infinite logs as continuously appending tables. Learn how open table formats enable immediate SQL queryability, unified storage, and seamless interoperability for your data lakehouse.","\u002Fimgs\u002Fblogs\u002F686f4b318f8b4fbe732e7dc9_inside-stream-format.png",{},{"title":14117,"description":14213},"blog\u002Finside-stream-format-a-table-for-infinite-logs",[800,1332,12106],"1U0c6XB7uYwb93OqRhTcT7k03aZhfhzzFS8xYUYgLNQ",{"id":14221,"title":14222,"authors":14223,"body":14224,"category":1332,"createdAt":290,"date":14383,"description":14384,"extension":8,"featured":294,"image":14385,"isDraft":294,"link":290,"meta":14386,"navigation":7,"order":296,"path":13635,"readingTime":3556,"relatedResources":290,"seo":14387,"stem":14388,"tags":14389,"__hash__":14390},"blogs\u002Fblog\u002Fanatomy-of-a-stream-data-vs-metadata-vs-protocol.md","Anatomy of a Stream: Data vs Metadata vs Protocol",[806,28],{"type":15,"value":14225,"toc":14376},[14226,14228,14230,14252,14254,14256,14263,14266,14270,14273,14276,14279,14282,14285,14289,14292,14295,14298,14301,14304,14307,14311,14314,14317,14320,14323,14326,14329,14332,14336,14339,14350,14353,14356,14360,14363,14370,14373],[48,14227,13619],{},[48,14229,13622],{},[321,14231,14232,14236,14240,14244,14248],{},[324,14233,14234],{},[55,14235,13630],{"href":13629},[324,14237,14238],{},[55,14239,13636],{"href":13635},[324,14241,14242],{},[55,14243,13642],{"href":13641},[324,14245,14246],{},[55,14247,13648],{"href":13647},[324,14249,14250],{},[55,14251,13654],{"href":13653},[208,14253],{},[48,14255,3931],{},[48,14257,14258,14259,14262],{},"In our previous post - ",[55,14260,14261],{"href":13629},"Why Streams Need Their Iceberg Moment",", we introduced the vision of a three-layer architecture for streaming systems inspired by the Apache Iceberg-fueled lakehouse revolution. Now, let’s dissect each of those layers in detail. How exactly do we separate a streaming system into data, metadata, and protocol, and what does each part do? In this deep dive, we’ll explore the anatomy of a stream through the lens of these three layers. We’ll also draw parallels to the lakehouse model (the separation of table storage, table metadata, and query engine) to solidify the concepts.",[48,14264,14265],{},"Modern streaming platforms can be reimagined as three cooperative components:",[40,14267,14269],{"id":14268},"_1-the-data-layer-stream-storage-reimagined","1. The Data Layer: Stream Storage Reimagined",[48,14271,14272],{},"What it is: The data layer is responsible for storing the actual events\u002Fmessages that flow through streams. In a traditional broker, this corresponds to the log files on disk (e.g. Kafka’s segment files or Pulsar’s BookKeeper ledgers). In the new layered model, the data layer is broken out as an independent, stream storage service or medium.",[48,14274,14275],{},"Today’s situation: In Kafka’s classic design, each broker stores segments of the log on its local filesystem, and brokers replicate these segments to each other for fault tolerance. Apache Pulsar took a different approach by using Apache BookKeeper bookies as a distributed storage cluster for logs, separate from the broker that handles clients. Pulsar’s architecture was an early step toward decoupling: the broker became stateless for data, and storage responsibilities were offloaded to bookies. However, even BookKeeper clusters have their own disks and replication to manage, and the data is in a proprietary format not directly queryable by external analytics engines. Most streaming stacks still treat stored data as ephemeral: something to keep around for a retention period and then drop or offload elsewhere for analysis.",[48,14277,14278],{},"The new approach: In a truly disaggregated data layer, stream data lives in a scalable, low-cost store that can retain data indefinitely and make it accessible beyond just the streaming consumers. The prime candidate for this is cloud object storage (like Amazon S3, Google Cloud Storage, etc.) or a distributed file system, combined with an open file format. Instead of brokers writing to local disk, imagine that when a producer sends an event, it gets persisted directly into an “open table” format file on S3 (or a similar store). For example, a stream’s data could be stored as a set of Parquet files in a directory, or as append-only files following a format that external readers understand. This is analogous to how Iceberg stores table data as files in object storage. The data layer would handle chunking and writing events in an efficient way (perhaps buffering in memory and writing larger blocks, similar to how columnar file formats work). It would also handle replication or durability – with cloud storage, replication is often built-in (multi-AZ redundancy), so we avoid the cost of triple-replicating at the application level.",[48,14280,14281],{},"A key benefit of this approach is “streaming data = lakehouse data”. The moment an event is written, it’s not only available to stream consumers, but it’s also sitting in an analytics-friendly store. Your streaming history is your data lake. This eliminates the need for separate ETL processes to copy data from a message queue into data lake storage. Anyone with access to the storage (and proper permissions) could run SQL queries or AI model training on the live data, using tools like Spark or Flink, because it’s already in an accessible format. In practice, there will be indexes or additional metadata to help readers efficiently find their position in the stream (e.g. an index mapping offsets to file ranges), but those are part of the metadata layer (covered next).",[48,14283,14284],{},"Challenges and considerations: Decoupling storage like this raises questions about latency and throughput. Directly writing to object storage can introduce higher latency than writing to a local disk due to network hops. However, there are ways to mitigate this: use caching for hot data, memory buffers, and perhaps a tiered approach (write immediately to a WAL in memory or fast storage, then flush asynchronously to S3). Many systems (including StreamNative Ursa) use such tricks to get the best of both worlds – the durability of object storage with the latency of fast storage. Another consideration is format: should streams be stored as pure log (sequence of events) or also translated into columnar formats for efficiency? Projects like Apache Iceberg are exploring streaming ingest, where streams of inserts can continuously produce new table files, so the line between “stream” and “table” blurs. Regardless of implementation details, the data layer’s overarching role is clear: keep all the data safe and accessible, decoupled from the serving compute. This layer can be scaled by simply adding storage capacity (or letting the cloud storage scale automatically), independent of how many clients or how much compute power is needed to serve the data.",[40,14286,14288],{"id":14287},"_2-the-metadata-layer-the-stream-catalog-and-brain","2. The Metadata Layer: The Stream Catalog and Brain",[48,14290,14291],{},"What it is: The metadata layer manages all the information about the streams – their definitions, the location of data, the consumer positions, and any coordination needed. Think of it as the catalog or meta-store for streaming data. In databases, this would be your system tables or Hive Metastore; in Iceberg’s world, it’s the catalog services that track table schema and snapshots.",[48,14293,14294],{},"Today’s situation: In current streaming systems, metadata is often entangled with the brokers or tied to external systems like ZooKeeper. For example, Kafka (pre-KIP-500) stored partition assignments, leader election, and consumer group offsets in ZooKeeper (and later in internal topics on brokers). The metadata about what topics exist, how many partitions, who is the leader, etc., was spread across ZK and the brokers’ memory\u002Fstate. Pulsar, on the other hand, uses a pluggable metadata store (like ZooKeeper, Etcd, or Oxia) to store metadata about topics, cursors (subscriptions), and so forth. This means scaling or changing how metadata is handled can be as complex as the data scaling problem itself. A lot of the “complexity” of running a Kafka cluster, for instance, has historically been about managing this metadata: ensuring ZooKeeper is healthy, performing controlled leader elections, and recently, migrating to the new Kafka Raft metadata quorum (as ZooKeeper is phased out). The bottom line is that metadata hasn’t been a first-class, independent component – it’s been tightly linked to the runtime of the streaming cluster.",[48,14296,14297],{},"The new approach: In a three-layer design, the metadata layer is a dedicated, standalone service or set of services that act as the source of truth for stream state. We can envision it as a Stream Catalog analogous to an Iceberg catalog. It would hold information like: a list of all streams (topics) and their configurations, schemas for each stream (if using schema registry integration), the mapping of stream partitions to data files or objects in the data layer, and consumer group offsets or positions (i.e., where each consumer has read up to). Essentially, any piece of information needed to coordinate producers and consumers lives here, rather than being implicitly known by a broker process.",[48,14299,14300],{},"Designing this layer brings questions: Do we implement it as a highly-available database? As a set of metadata files on the same object storage (like how Iceberg maintains a metadata JSON and manifest lists)? The answer could be either or a mix. One promising pattern is to leverage the same tech as the lakehouse: for instance, one could treat each stream as akin to an Iceberg table internally – with snapshots pointing to new data files, etc. In fact, if the data layer writes Parquet files for a stream, an Iceberg table’s metadata could naturally catalog those files. However, streaming has extra needs (like real-time consumer offsets and perhaps event time indexes) that might extend beyond a static table definition. It might be that the metadata layer uses a lightweight distributed consensus system (e.g., etcd or a Raft-based service) to manage monotonically increasing sequences for offsets and to manage subscribers. The key is that brokers (protocol servers) consult this metadata service rather than owning that knowledge exclusively.",[48,14302,14303],{},"Benefits: By isolating metadata, we gain clarity and consistency. Multiple protocol servers can all refer to one canonical source of truth, ensuring they behave consistently. It also improves governance and interoperability: a well-defined catalog of streams could be exposed via standard APIs. Imagine being able to query the streaming metadata layer to discover what streams exist and their schema, just like you’d query an Iceberg Catalog for available tables. This makes it easier to integrate streaming data with other systems – for example, an ETL job could use the catalog to find the latest snapshot of a stream, or an auditor could verify all streams comply with certain retention policies. Another boon is independent scaling and tuning: the metadata store can be optimized for high-write, low-latency operations (like committing a new event sequence or updating a consumer offset) and scaled out with consensus nodes, without touching the data path or client handling logic. If the volume of streams or consumer groups grows, we beef up the metadata service accordingly, without needing to muck with the storage layer.",[48,14305,14306],{},"Iceberg again provides a guiding analogy: Iceberg’s metadata layer (its catalogs and metadata files) enabled features like time travel, schema evolution, and concurrent writes in the batch world. A robust streaming metadata layer could similarly enable new stream features – think seal\u002Funseal of streams, consistent replay from points in time (since the metadata could mark snapshots of a stream at intervals), or even transactions across streams. It becomes the brain that coordinates complex behaviors that would be very hard to bolt onto a monolithic broker. As a bonus, if the metadata is stored in a standardized way (say, using Iceberg’s format for log segments), external systems might even read it to understand stream contents or hook streaming data into data lineage tools.",[40,14308,14310],{"id":14309},"_3-the-protocol-layer-pluggable-brokers-and-interfaces","3. The Protocol Layer: Pluggable Brokers and Interfaces",[48,14312,14313],{},"What it is: The protocol layer is what the outside world interacts with – it’s the API and the delivery mechanism for streaming data. In simple terms, these are the servers that clients connect to using some protocol (e.g., the Kafka binary protocol, REST, MQTT, etc.). Their job is to accept data from producers, serve data to consumers, and enforce the semantics of the streaming system (ordering guarantees, acknowledgments, subscription management). In a traditional architecture this is tightly bound to the storage – the broker both speaks the protocol and writes to disk. Here, we split it out: the protocol layer’s components speak the language of streaming but delegate actual data persistence and state to the other layers.",[48,14315,14316],{},"Today’s situation: Kafka’s protocol is quite complex but well-understood; clients talk to specific broker hosts which handle both the network IO and disk IO for partitions they lead. Scaling the throughput means scaling brokers since they do all the work. If you want a different interface (say an HTTP API for Kafka topics), you typically need a separate bridge that still ultimately talks to the Kafka brokers. Pulsar took a step here by supporting multiple protocols (via protocol handlers) – Pulsar brokers can natively understand not just the Pulsar protocol but also MQTT or Kafka (with a plugin), translating those calls to the Pulsar core. That hints at what could be possible if the protocol handling was more modular. But even in Pulsar, the broker is still where data lives (in memory until written to bookies) and where subscription cursors update, etc., so it’s not fully isolated.",[48,14318,14319],{},"The new approach: In the three-layer model, the protocol servers are stateless or near-stateless. They can be thought of as edge servers or API gateways to the streaming system. Their responsibilities would include: handling client connections, implementing the nuances of a protocol (e.g. Kafka’s fetch and produce requests, or Pulsar’s subscribe and flow control commands), orchestrating data flow between clients and the data layer, and making calls to the metadata layer for coordination (like finding out where the latest data for a topic is, or updating a consumer’s position). Crucially, these protocol nodes do not store the full stream data on local disk (except perhaps a cache); they don’t have exclusive ownership of a partition’s data. Instead, they might temporarily cache recent messages for speed, but the authoritative storage is the data layer. In case of a failure, any other protocol node can take over serving a given client, because all the state it needs (what data has been written, where to find it, what the consumer offset is) is in the data and metadata layers.",[48,14321,14322],{},"This design means you could have multiple different protocol services running in parallel. For example, you might run a set of “Kafka API servers” that let Kafka clients produce\u002Fconsume to\u002Ffrom the streams, and alongside them a set of “Pulsar API servers” for applications using Pulsar’s features – both accessing the same underlying streams. Because these servers are stateless, you can scale out each type as needed – if you have 1000 Kafka clients and only 10 Pulsar clients, you deploy more of the Kafka protocol instances. The streaming system thus speaks many languages without duplicating the data.",[48,14324,14325],{},"Benefits: The protocol layer being separate yields immediate flexibility. Adopting new protocols or client standards becomes much easier – you don’t need to overhaul how data is stored, you just stand up a new front-end. It’s similar to how in databases, you might add new query endpoints (like a REST API to an SQL database) without changing the storage engine. Another benefit is resilience and elasticity: since these nodes keep no critical data, you can auto-scale them based on traffic patterns (spin up more during peak ingest times, scale down in off hours), all without migrating any stored data. If one crashes or needs maintenance, you remove it from the load balancer and traffic seamlessly flows to others. No more worrying that a broker failure means data might be temporarily unavailable – as long as some protocol node is up, it can retrieve data from storage and serve it.",[48,14327,14328],{},"Ordering and consistency: One might wonder, how do we preserve ordering guarantees or consistency if any stateless server can serve data? The answer lies in smart coordination via the metadata layer. The system might still elect a “leader” for a partition but that leader’s role is just to coordinate writes (to ensure ordering) – it could be one of the protocol nodes assigned dynamically, or the data layer itself could enforce ordering (e.g., an object store might allow appends in order via a lock). There are multiple ways to implement it. The key is, even if a particular node is leader for a partition’s writes, that leadership can be quickly handed off if needed (since no long-lived data lives on the leader). This is, in fact, how Pulsar’s design is imagined: brokers handle ordering and act as a “cache & coordinator” while the data lives on an external log storage like Apache BookKeeper. So we still ensure that each partition’s events are delivered in order – the clients don’t directly all write to storage concurrently; they go through a protocol node which sequences them. But unlike the old model, that node doesn’t own the data forever – once written, the data is on durable storage and any node can read it.",[48,14330,14331],{},"To draw an analogy, consider a content delivery network (CDN) for websites: the CDN edge servers don’t store the master copy of the website content; they cache and serve it, while the origin server (storage) holds the source of truth. In our streaming case, the protocol layer are like those edge servers, and the data layer is the origin. It’s a pattern proven to work for scaling web content to millions of users – now we are applying it to event streams.",[40,14333,14335],{"id":14334},"from-lakehouse-to-lakestream-comparing-the-layers","From Lakehouse to Lakestream: Comparing the Layers",[48,14337,14338],{},"It’s worth explicitly comparing this three-layer streaming model to the lakehouse (Iceberg) model to cement the understanding:",[321,14340,14341,14344,14347],{},[324,14342,14343],{},"Data Layer: In a lakehouse, this is the object store with parquet\u002Forc files containing table data. In streaming, it’s a durable log store (ideally also an object store or distributed FS) holding event data. Both serve as the single source of truth for raw data. The difference is streams are continually appending, whereas tables see batches of appends\u002Fupdates – but conceptually, it’s analogous.",[324,14345,14346],{},"Metadata Layer: In the lakehouse, this is Iceberg’s table metadata (manifest files, snapshots, and the catalog service like Hive Metastore or Glue) which tracks where data files are and what the schema is. In streaming, the metadata layer tracks active topics\u002Fpartitions, where the latest offset is, who the leader is (if using leaders), and consumer read positions. Both provide transactional metadata that can be updated atomically (commit a new snapshot or commit an event sequence).",[324,14348,14349],{},"Protocol\u002FCompute Layer: In the lakehouse world, this is the query engine or processing engine – Spark, Trino, Flink, etc. They read data via the metadata layer and compute results. In streaming, the protocol servers are like a continuously running “query” that pulls data for consumers or ingests from producers. They are the compute layer that interfaces with clients. One could even view a streaming consumer as analogous to a continuous query over the data layer. The protocol layer ensures that the continuous queries (subscriptions) get the right data in the right order, just as a batch query engine ensures a SQL query reads the right snapshot of a table.",[48,14351,14352],{},"The separation of concerns is remarkably parallel. By adopting this layered approach, streaming systems become “lakestreams” – first-class real-time data streaming stores that maintain the openness and reliability of a lakehouse. We can use the term lakestream to denote a streaming system built with the same architectural ideals as a data lakehouse.",[48,14354,14355],{},"\"Lakestream\" is preferred over \"streamhouse\" or \"streaming lakehouse\" to denote an architecture that stores a single copy of data for both streams and tables, unlike many \"streaming lakehouse\" concepts, which maintain two separate copies.",[40,14357,14359],{"id":14358},"conclusion-embracing-the-modular-future-of-streaming","Conclusion: Embracing the Modular Future of Streaming",[48,14361,14362],{},"Decomposing streams into Data, Metadata, and Protocol layers is more than an academic exercise – it’s a blueprint for the next generation of streaming infrastructure. This approach addresses the core pain points we outlined in part 1: high costs drop when you utilize object storage and stateless scaling; slow evolution flips to rapid innovation when each layer can change independently; operational burdens lighten when state is centralized and immutable in storage, rather than spread across dozens of servers.",[48,14364,14365,14366,14369],{},"We’re already seeing early implementations of this vision. Apache Pulsar’s design validated the benefits of splitting storage from serving, and it’s evolving further to remove even more coupling (e.g., eliminating ZooKeeper, integrating with tiered storage). Newer platforms like ",[55,14367,14368],{"href":6647},"StreamNative’s Ursa"," are pushing the envelope by combining the Kafka API with a lakehouse storage foundation – essentially an embodiment of the three-layer idea: protocol flexibility, a unified stream\u002Ftable storage, and a separate metadata store. All these efforts, from open-source projects to cloud services, point in the same direction: streams are becoming cloud-native and open.",[48,14371,14372],{},"For CTOs and engineering leaders, the message is clear. To stay ahead in a world of real-time data and AI, it’s time to rethink your streaming architecture. Just as you wouldn’t build a data lake today without an open table format and a separation of compute\u002Fstorage, soon we’ll consider it equally antiquated to build streaming systems on a 2010s-style monolithic broker. The three-layer “Iceberg moment” for streams will mean your data infrastructure is more interoperable, future-proof, and cost-efficient. It will enable use cases like instant replays of years of event history, on-the-fly stream processing with SQL engines, and streaming analytics that seamlessly blend historical and real-time data. And crucially, this can be achieved in a vendor-neutral way – through open standards for stream storage and metadata, and widely adopted protocols.",[48,14374,14375],{},"In conclusion, the anatomy of a modern stream is one of independence and unity: independent layers each doing one job well, and a unified vision of data that transcends the old batch vs streaming divide. By embracing this architecture, we stand to unlock the full potential of streaming data, much as the lakehouse did for batch data. The iceberg has shown us only the tip of what’s possible – now it’s up to us to complete the picture for streaming.",{"title":18,"searchDepth":19,"depth":19,"links":14377},[14378,14379,14380,14381,14382],{"id":14268,"depth":19,"text":14269},{"id":14287,"depth":19,"text":14288},{"id":14309,"depth":19,"text":14310},{"id":14334,"depth":19,"text":14335},{"id":14358,"depth":19,"text":14359},"2025-07-03","Explore a three-layer architecture for modern streaming systems, separating data, metadata, and protocol to achieve flexibility, resilience, and cost-efficiency, inspired by the lakehouse model.","\u002Fimgs\u002Fblogs\u002F68667c08574e70fbb7600ea3_anatomy-of-a-stream.png",{},{"title":14222,"description":14384},"blog\u002Fanatomy-of-a-stream-data-vs-metadata-vs-protocol",[800,1332],"1HmJ0GRNHY4ssZYjgZOogLI-CUlIbNgHVCvsk4Jrp1E",{"id":14392,"title":14393,"authors":14394,"body":14395,"category":6415,"createdAt":290,"date":14517,"description":14518,"extension":8,"featured":294,"image":14519,"isDraft":294,"link":290,"meta":14520,"navigation":7,"order":296,"path":13838,"readingTime":4475,"relatedResources":290,"seo":14521,"stem":14522,"tags":14523,"__hash__":14524},"blogs\u002Fblog\u002Freliability-that-thinks-ahead-how-pulsar-helps-agents-stay-resilient.md"," Reliability That Thinks Ahead: How Pulsar Helps Agents Stay Resilient",[807],{"type":15,"value":14396,"toc":14512},[14397,14399,14401,14415,14417,14419,14422,14429,14433,14436,14444,14447,14458,14462,14465,14470,14481,14486,14489,14491,14505,14508],[48,14398,13820],{},[48,14400,13823],{},[321,14402,14403,14407,14411],{},[324,14404,13828,14405],{},[55,14406,13832],{"href":13831},[324,14408,13835,14409],{},[55,14410,13839],{"href":13838},[324,14412,13842,14413],{},[55,14414,13814],{"href":13845},[208,14416],{},[48,14418,3931],{},[48,14420,14421],{},"Real-world AI agents operate in unpredictable environments. An agent might encounter transient errors – a large language model (LLM) call that times out, a database that’s briefly down, or a sensor message that fails validation. How your messaging system handles these hiccups is critical. In this post, we focus on message acknowledgments, retries, and dead-letter queues – features that keep your agentic pipelines resilient. We’ll contrast Apache Pulsar’s reliability features with Apache Kafka’s more basic offset model, to show how Pulsar can help your agents recover gracefully from failures.",[48,14423,14424,14425,14428],{},"(If you missed it, in ",[55,14426,14427],{"href":13831},"Post 1: Streams vs Queues: Why Your Agents Need Both—and Why Pulsar Protocol Delivers, ","we covered how Pulsar supports both streaming and queueing patterns natively. Now we build on that foundation by examining what happens after a message is sent – does it get processed successfully? And what if it doesn’t?)",[40,14430,14432],{"id":14431},"the-challenge-of-failure-in-message-processing","The Challenge of Failure in Message Processing",[48,14434,14435],{},"Consider an AI workflow where each message triggers a sequence of actions. For example, a message might instruct an agent to invoke an external API or run an ML inference. If one of those actions fails for a particular message, we’d like to retry it or handle it specially, without losing the message or blocking the entire pipeline. We also want to avoid duplicating messages or processing them out of order unnecessarily. Traditional message queues (like JMS brokers) have long provided per-message acknowledgment and dead-letter queues (DLQs) to address this – ensuring no message is lost and problematic ones can be set aside. Let’s see how Pulsar and Kafka differ here:",[321,14437,14438,14441],{},[324,14439,14440],{},"Kafka’s offset model: In Kafka, a consumer’s progress is tracked by committing offsets. The consumer periodically records “I have processed up to message X in partition Y.” However, Kafka does not acknowledge individual messages. A commit always implies all prior messages in that partition are handled. This is often called a high-watermark commit model. The implication: if your consumer fails on message 100, it cannot tell Kafka “only message 100 failed” – it either doesn’t commit offset 100 (meaning it will reprocess it and any following messages on restart), or it skips it by committing offset 101 (thereby implicitly acknowledging 100 as well, even though it failed). There’s no built-in concept of NACK (negative acknowledgment) to say “retry this one and don’t advance the offset.” This all-or-nothing batch acknowledgment makes fine-grained error handling tricky. Developers end up implementing workarounds: one pattern is to process messages one-by-one per partition and commit immediately after each, to know exactly which message caused an issue. If a message fails, the consumer can stop without committing that offset, effectively pausing that partition. But this means other messages in that partition (even those already fetched) won’t be processed until the consumer is restarted and the message is either skipped or handled. Alternatively, teams implement custom logic to store offsets externally (e.g. in a database) so they can mark individual messages as processed or failed – but this is complex and outside Kafka’s native support.",[324,14442,14443],{},"Kafka and retries\u002FDLQ: Since Kafka doesn’t track individual message acknowledgment, it also doesn’t automatically redirect failed messages to a DLQ. Handling a poison message (one that consistently fails processing) is entirely up to the application. A common approach is: if processing fails, produce that message to a special “error topic” (the DLQ) for later analysis, and then commit the offset to skip it in the main topic. This approach works, but you have to code it manually and ensure atomicity (you don’t want to lose the message between failing and writing to the DLQ). There are Kafka libraries\u002Fpatterns to help with this, but again, nothing built-in prior to newer Kafka streams APIs or Kafka Connect error handling (and those are limited to those frameworks). Simply put, Kafka’s design assumes consumers manage their own retries. If a consumer dies, Kafka will allow another consumer in the group to take over the partition, but that new consumer will by default re-read from the last committed offset – meaning it may replay some messages (including the one that caused the crash if it wasn’t committed). This provides at-least-once delivery, but the burden is on you to handle duplicates and failures.",[48,14445,14446],{},"Now let’s see how Pulsar handles the same scenarios:",[321,14448,14449,14452,14455],{},[324,14450,14451],{},"Pulsar’s per-message ack & NACK: Pulsar consumers explicitly acknowledge each message (or a batch of messages) to the broker when processed. This acknowledgement is tracked per message, not just by offset. If a consumer fails to process a message, it can send a negative acknowledgment (NACK) for that single message. A NACK tells the Pulsar broker “I couldn’t process message X, please redeliver it later.” Crucially, this does not block the acknowledgement of other messages. For example, if messages 100 and 101 were fetched by a Pulsar consumer and 100 fails while 101 succeeds, the consumer can ack 101 and NACK 100. Message 100 will be redelivered (to the same consumer or another, depending on subscription mode) after a configurable delay, while message 101 is not reprocessed since it was acked. This fine-grained control means a slow or problematic message need not stall the pipeline – other messages keep flowing. Pulsar also has an acknowledgment timeout feature: if a consumer forgets to ack a message within a configured time (say the process died mid-task), the broker will automatically consider it failed and redeliver it. This answers the question “how do we detect if a processing instance died?” – the broker’s ack timeout handles it by ensuring unacked messages don’t disappear.",[324,14453,14454],{},"Retries and Dead-Letter Topics: Pulsar supports automatic retry and dead-lettering policies at the consumer level. You can configure a subscription such that if a message fails to be processed a certain number of times (i.e., it’s NACKed or times out repeatedly), Pulsar will route it to a Dead Letter Topic (DLQ) associated with your subscription. This is analogous to the “dead-letter queue” concept in traditional message queuing systems. The message is then out of the main flow, so your consumer group isn’t stuck on it, but it’s safely stored for inspection or special handling later. Pulsar’s DLQ feature is built-in and easy to enable, whereas with Kafka, you would have to create and manage the dead-letter topic manually. Additionally, Pulsar can use a retry letter topic alongside the DLQ. The idea is that Pulsar will requeue the message to a retry topic for a certain number of attempts (optionally with some delay between attempts), and only if it still fails after max retries will it go to the DLQ. The original consumer can be set up to automatically consume from the retry topic after a delay, implementing a backoff strategy – all configured declaratively. This kind of baked-in retry mechanism “thinks ahead” for you, simplifying what would otherwise be custom retry loop code.",[324,14456,14457],{},"No-block processing: Because of individual acking, Pulsar consumers don’t have to process strictly in sequence if they don’t want to. For example, with a Shared subscription (our queue scenario), one slow message doesn’t prevent other consumers from processing subsequent messages from the topic. Even a single consumer can use multiple threads to process messages in parallel (fetching a batch and acking each as done). In Kafka, parallelizing within a partition is dangerous because you can’t ack messages out of order – Pulsar doesn’t have that limitation. As an illustration, if our agent receives 100 tasks in a Pulsar queue, it could farm them out to multiple worker threads and acknowledge each as they finish. A Kafka consumer would either have to increase partitions (one thread per partition, effectively) or process sequentially within one partition to avoid offset issues. Pulsar’s design thus yields better utilization and throughput especially for heterogeneous workloads where some messages take longer than others.",[40,14459,14461],{"id":14460},"recovering-from-agent-failures-example","Recovering from Agent Failures: Example",[48,14463,14464],{},"Let’s say we have an AI agent that monitors news articles and, for each article event, the agent must call an LLM to summarize it and then index the summary in a database. Suppose one particular article causes the LLM to hang or produce an error (maybe it’s too long or has problematic content). Here’s how Kafka vs Pulsar would handle it:",[321,14466,14467],{},[324,14468,14469],{},"In Kafka: The agent consumes an event from the “articles” topic. If using auto-commit, it may have already marked prior messages as consumed and is now stuck on this bad one. If using manual commit, it withholds the commit. Either way, that partition’s processing is halted until this is resolved. You have a few choices:",[1666,14471,14472,14475,14478],{},[324,14473,14474],{},"Crash or stop the consumer, log the error, and restart later from the last commit (which will re-read the bad message, and likely fail again unless code changed or external state changed).",[324,14476,14477],{},"Skip the message: catch the exception, produce the event to an “article_errors” topic for later, then commit the offset past it so the main consumer can continue. But you must implement that production + commit carefully to not lose data. Also, you’ve now introduced a secondary flow (the error topic) which you need to monitor.",[324,14479,14480],{},"Move the logic that might fail (LLM call) out-of-band: for example, quickly commit the message, and process the LLM call asynchronously so the consumer isn’t holding up Kafka. But if that async fails, you’d still need to send that info to a separate channel because Kafka already marked it done.**None of these is impossible, but they all put the responsibility on the developer to implement reliability.",[321,14482,14483],{},[324,14484,14485],{},"In Pulsar: The agent’s consumer receives the article event. If the LLM call fails, the consumer can simply call \"consumer.negativeAcknowledge(message)\" (in code) for that message. Pulsar will record that as a NACK. The consumer could even continue to process further messages in the meantime (depending on config). Pulsar will redeliver that message after a default delay (say 1 minute), giving the system time to recover or handle temporary issues. If the message keeps failing every time (e.g., the article is too large for the LLM consistently), after, say, 3 attempts, it will be routed to the dead-letter topic automatically. Your main consumer will never be stuck on it – it can move on to other messages. Meanwhile, your team can have a separate process or a monitoring dashboard consuming from the dead-letter topic “articles-DLQ” to inspect what went wrong with those problematic events. Perhaps the team finds that those DLQ’d articles were in an unsupported format and can take action, but importantly, the agent system as a whole kept chugging along despite the hiccup. No manual offset fiddling or urgent intervention was needed in the moment – Pulsar’s reliability features did the heavy lifting.",[48,14487,14488],{},"Another aspect of resilience is how the system behaves when scaling consumers up or down. Kafka users are familiar with the rebalance process: if you add a new consumer to a group or one dies, Kafka will pause consumption briefly to redistribute partition ownership. During this rebalance, no messages are delivered to consumers of that group. In large Kafka deployments with many partitions, rebalances can take quite some time, meaning a scaling event or a single consumer failure causes a delay in processing for the whole group. Pulsar’s shared subscription has a smoother ride here – since any consumer can grab any message, adding or removing consumers doesn’t require an explicit rebalance pause. If a consumer goes away, its unacked messages simply become available for others immediately; if a new consumer joins, it starts receiving a share of messages without broker-side re-partitioning. There’s effectively no downtime when scaling Pulsar consumers in a shared subscription. This “graceful scaling” further boosts resilience for agent systems, which might need to dynamically adjust to load.",[40,14490,8924],{"id":8923},[321,14492,14493,14496,14499,14502],{},[324,14494,14495],{},"Per-message acking: Pulsar allows acknowledging individual messages, whereas Kafka can only acknowledge by advancing the offset watermark. This means Pulsar consumers can succeed or fail messages independently, preventing one bad message from holding up others.",[324,14497,14498],{},"Built-in retry and DLQ: Pulsar has native support for retrying messages and sending them to dead-letter topics after a max retry count. Kafka lacks built-in DLQ; implementing it requires custom logic and managing separate error topics. Pulsar’s approach simplifies error handling and improves reliability in complex pipelines.",[324,14500,14501],{},"Negative acknowledgments: Pulsar’s NACK feature lets consumers explicitly signal a failure, triggering message redelivery. Kafka consumers have no native NACK – they must either not commit (causing a rebalance or stall) or manually requeue the message elsewhere. Pulsar’s NACK + ackTimeout together ensure that crashed or slow consumers don’t result in lost or stuck messages.",[324,14503,14504],{},"Resilience in scaling: Pulsar’s no-stop consumer scaling (no rebalance needed for shared subscriptions) means the system adapts to consumer failures or additions without a processing halt. Kafka consumer group rebalances, in contrast, temporarily stop message processing during partition reassignments.",[48,14506,14507],{},"All these features add up to a messaging foundation that “thinks ahead” about reliability. For AI agents, which may run 24\u002F7 and deal with unpredictable inputs, having the messaging layer automatically handle retries and failures is a game-changer. Your agents can stay focused on what to do with data, while Pulsar ensures the delivery of that data is rock-solid even when things go wrong.",[48,14509,14510],{},[55,14511,13979],{"href":6392},{"title":18,"searchDepth":19,"depth":19,"links":14513},[14514,14515,14516],{"id":14431,"depth":19,"text":14432},{"id":14460,"depth":19,"text":14461},{"id":8923,"depth":19,"text":8924},"2025-06-27","Learn how Apache Pulsar's per-message acknowledgments, built-in retries, and dead-letter queues provide superior resilience for AI agents in unpredictable environments, contrasting its robust features with Apache Kafka's basic offset model.","\u002Fimgs\u002Fblogs\u002F685e6ca1aeab39b240ec760a_-Reliability-That-Thinks-Ahead.png",{},{"title":14393,"description":14518},"blog\u002Freliability-that-thinks-ahead-how-pulsar-helps-agents-stay-resilient",[3988,821,10054],"iPt4pzXdBU2bJDhw0vwZ68JgkS5R5HLwP9BuRg3SfH0",{"id":14526,"title":14527,"authors":14528,"body":14529,"category":1332,"createdAt":290,"date":14669,"description":14670,"extension":8,"featured":294,"image":14671,"isDraft":294,"link":290,"meta":14672,"navigation":7,"order":296,"path":4784,"readingTime":4475,"relatedResources":290,"seo":14673,"stem":14674,"tags":14675,"__hash__":14676},"blogs\u002Fblog\u002Fstreamnative-ursa-is-now-available-for-public-preview-on-microsoft-azure.md","StreamNative Ursa Is Now Available for Public Preview on Microsoft Azure",[311],{"type":15,"value":14530,"toc":14663},[14531,14540,14544,14547,14563,14566,14570,14574,14577,14606,14610,14614,14617,14621,14624,14628,14631,14635,14638,14640,14643,14660],[48,14532,14533,14534,4003,14537,14539],{},"We’re thrilled to announce the Public Preview of Ursa on Microsoft Azure, bringing our leaderless, lakehouse-native data streaming engine to the Azure ecosystem. Building on our momentum from ",[55,14535,14536],{"href":6864},"AWS",[55,14538,6872],{"href":4788},", this milestone makes it easier than ever for organizations to adopt a cost-efficient, cloud-native alternative to legacy streaming platforms on Azure.",[40,14541,14543],{"id":14542},"recap-ursas-breakthroughs-in-streaming-architecture","Recap: Ursa’s Breakthroughs in Streaming Architecture",[48,14545,14546],{},"Ursa is a next-generation streaming engine designed for the cloud and the lakehouse era. Since launching Ursa on AWS and GCP, organizations have used it to transform their streaming stack—simplifying operations, cutting costs, and modernizing analytics. Key highlights include:",[321,14548,14549,14557,14560],{},[324,14550,14551,14552,14556],{},"Leaderless Architecture: ",[55,14553,14555],{"href":14554},"\u002Fblog\u002Fleaderless-architecture-and-lakehouse-native-storage-for-reducing-kafka-cost","Ursa eliminates the operational burden of leader elections and cross-zone replication",". Its stateless brokers and distributed consensus model remove single points of failure, reducing both downtime and complexity.",[324,14558,14559],{},"Lakehouse-Native Design: Ursa uniquely supports open formats like Apache Iceberg and Delta Lake as its native storage layer—embedding schema, snapshots, and compaction directly in object storage. The result? Seamless data interoperability and 10x or greater storage savings.",[324,14561,14562],{},"Kafka-Compatible Ingestion: Maintain your existing Kafka APIs while replacing underlying Kafka infrastructure with Ursa’s more efficient engine—no code changes required.",[48,14564,14565],{},"These innovations have delivered strong results for customers on AWS and GCP. Now, we’re bringing the same capabilities to Azure.",[40,14567,14569],{"id":14568},"introducing-ursa-on-azure-public-preview","Introducing Ursa on Azure Public Preview",[3933,14571,14573],{"id":14572},"why-azure","Why Azure?",[48,14575,14576],{},"Many enterprises rely on Microsoft Azure as a core part of their cloud strategy—and asked for a way to run Ursa natively on Azure infrastructure. With this Public Preview, customers can now:",[321,14578,14579,14588,14596,14599],{},[324,14580,14581,14582,14587],{},"Provision Ursa in Azure: ",[55,14583,14586],{"href":14584,"rel":14585},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbyoc-azure-access",[264],"Deploy Ursa clusters directly into your own Azure"," subscription, with full control over compute, networking, and storage.",[324,14589,14590,14591,190],{},"Launch via Azure Marketplace: Simplify procurement and compliance with one-click deployment through the",[55,14592,14595],{"href":14593,"rel":14594},"https:\u002F\u002Fazuremarketplace.microsoft.com",[264]," Azure Marketplace",[324,14597,14598],{},"Integrate with Databricks & Snowflake on Azure: Stream data directly into Delta Lake or Apache Iceberg tables governed by Databricks Unity Catalog or Snowflake Open Catalog, all natively hosted on Azure.",[324,14600,14601,14602,14605],{},"Migrate from Kafka with Universal Linking: Seamlessly shift workloads from Apache Kafka or compatible services (like Confluent Cloud or MSK) using ",[55,14603,14604],{"href":4863},"StreamNative’s Universal Linking",", now fully available on Azure.",[40,14607,14609],{"id":14608},"whats-in-this-public-preview","What’s in This Public Preview",[3933,14611,14613],{"id":14612},"ursa-on-azure-byoc-bring-your-own-cloud","✅ Ursa on Azure BYOC (Bring Your Own Cloud)",[48,14615,14616],{},"Deploy Ursa clusters into your own Azure subscription while preserving internal compliance, security, and budget controls. Ursa’s stateless broker model pairs naturally with Azure services like Azure Blob Storage and Azure Virtual Network (VNet) peering.",[3933,14618,14620],{"id":14619},"azure-marketplace-integration","✅ Azure Marketplace Integration",[48,14622,14623],{},"Ursa is now available in the Azure Marketplace, making it easy to launch and manage your streaming environment through your existing Azure billing and subscription agreements.",[3933,14625,14627],{"id":14626},"lakehouse-integrations-on-azure","✅ Lakehouse Integrations on Azure",[48,14629,14630],{},"With native support for Delta Lake (via Unity Catalog) and Apache Iceberg (via Snowflake or open catalogs), Ursa gives you a direct pipeline from real-time topics to structured lakehouse tables—ideal for data analytics, AI, and ML workloads on Azure.",[3933,14632,14634],{"id":14633},"kafka-workload-migration-via-universal-linking","✅ Kafka Workload Migration via Universal Linking",[48,14636,14637],{},"Universal Linking is now supported on Azure, allowing for zero-downtime migration from existing Kafka clusters. Whether you're consolidating on Azure or embracing multi-cloud, you can incrementally migrate producers and consumers with no service disruption.",[40,14639,2890],{"id":749},[48,14641,14642],{},"Interested in trying Ursa on Azure? Here's how to begin:",[1666,14644,14645,14651,14654,14657],{},[324,14646,14647,14648,190],{},"Sign Up for a Free Trial: Start your journey with StreamNative Cloud at",[55,14649,14650],{"href":10259}," streamnative.io",[324,14652,14653],{},"Deploy Ursa via Azure Marketplace: Launch your BYOC Ursa cluster directly from the Azure Marketplace and link it to your Azure subscription.",[324,14655,14656],{},"Integrate with Databricks or Snowflake: Set up Ursa to stream directly into Unity Catalog or Open Catalog on Azure for lakehouse-native analytics.",[324,14658,14659],{},"Migrate with Universal Linking: Follow our Kafka Migration Guide to incrementally offload workloads to Ursa, with full offset preservation and schema compatibility.",[48,14661,14662],{},"We’re excited to see how Azure customers harness Ursa to modernize their streaming stack, power real-time analytics, and drive innovation across industries.",{"title":18,"searchDepth":19,"depth":19,"links":14664},[14665,14666,14667,14668],{"id":14542,"depth":19,"text":14543},{"id":14568,"depth":19,"text":14569},{"id":14608,"depth":19,"text":14609},{"id":749,"depth":19,"text":2890},"2025-06-26","StreamNative Ursa is now in Public Preview on Microsoft Azure! Discover how this leaderless, lakehouse-native streaming engine offers a cost-efficient, cloud-native alternative to legacy platforms on Azure, with Kafka compatibility and seamless integration with Databricks and Snowflake.","\u002Fimgs\u002Fblogs\u002F685d34bf86e4cb10f57830a0_Ursa-PR-on-Azure.png",{},{"title":14527,"description":14670},"blog\u002Fstreamnative-ursa-is-now-available-for-public-preview-on-microsoft-azure",[10322,1332],"jQ9nwvOTzZoZTpuNG6jxIzdwO18clpilw9HRkUr7bDs",{"id":14678,"title":14261,"authors":14679,"body":14680,"category":1332,"createdAt":290,"date":14669,"description":14831,"extension":8,"featured":294,"image":14832,"isDraft":294,"link":290,"meta":14833,"navigation":7,"order":296,"path":13629,"readingTime":4475,"relatedResources":290,"seo":14834,"stem":14835,"tags":14836,"__hash__":14837},"blogs\u002Fblog\u002Fwhy-streams-need-their-iceberg-moment.md",[806],{"type":15,"value":14681,"toc":14826},[14682,14685,14687,14709,14711,14713,14716,14719,14723,14726,14737,14740,14744,14747,14750,14764,14767,14770,14773,14776,14780,14783,14794,14797,14811,14818,14821,14824],[3933,14683,13619],{"id":14684},"navigate-the-series-de-composing-streaming-systems",[48,14686,13622],{},[321,14688,14689,14693,14697,14701,14705],{},[324,14690,14691],{},[55,14692,13630],{"href":13629},[324,14694,14695],{},[55,14696,13636],{"href":13635},[324,14698,14699],{},[55,14700,13642],{"href":13641},[324,14702,14703],{},[55,14704,13648],{"href":13647},[324,14706,14707],{},[55,14708,13654],{"href":13653},[208,14710],{},[48,14712,3931],{},[48,14714,14715],{},"Apache Iceberg and similar lakehouse table formats have revolutionized the data analytics landscape. By completely decoupling the storage and computing layers, this approach has led to significant improvements in efficiency and flexibility. The swift embrace of open table storage formats, which are entirely separate from the analytical engine, underscores the importance of vendor neutrality.",[48,14717,14718],{},"The data streaming landscape has long been dominated by Apache Kafka, a platform that, while revolutionary in its time, is now showing its age. Its tightly-coupled design leads to escalating costs, operational complexity, and sluggish innovation. We believe it's time for a breakthrough in data streaming platforms, similar to advancements seen in the data analytics space – put simply, streaming needs its own Iceberg moment. In this post, we will present an architecture that separates concerns into a three-layer model – and how this vision can slash costs and accelerate evolution while staying vendor-neutral.",[40,14720,14722],{"id":14721},"the-pain-of-tightly-coupled-streaming","The Pain of Tightly-Coupled Streaming",[48,14724,14725],{},"Traditional streaming platforms like Kafka rely on brokers as their all-in-one workhorses. These monolithic server processes are responsible for managing data storage, metadata, and client protocols. While effective initially, this tight coupling now presents several challenges as these platforms scale:",[321,14727,14728,14731,14734],{},[324,14729,14730],{},"High Infrastructure Costs: Since these brokers store data on local disks, scaling the cluster to handle increased throughput necessitates adding more brokers. While replicating data across brokers is vital for data durability in the event of a broker failure, it comes with considerable disk and network overhead. This includes expensive cross-zone replication fees, particularly in cloud environments. Studies have shown that decoupling storage can trim streaming costs by up to 90% by leveraging cheaper object stores. In the current model, however, you’re paying for triple-replicated storage and idle capacity on every broker node.",[324,14732,14733],{},"Operational Complexity: Because tightly-coupled brokers are stateful, scaling or upgrading them is challenging. Adding a new broker initiates data rebalancing, a slow and risky process of reshuffling all of the existing data across the brokers and onto new partitions. Broker failures also lead to a heavyweight partition recovery process. These issues consume countless hours of team time on cluster maintenance (e.g., planning maintenance, manually reassigning partitions) instead of feature development.",[324,14735,14736],{},"Slow Feature Evolution: Kafka's monolithic architecture has historically impeded innovation. Implementing improvements, such as new replication mechanisms or consistency guarantees, requires extensive modifications to the core broker software, impacting the entire system. Efforts within the Kafka community, such as the multi-year initiative to eliminate ZooKeeper for metadata (KIP-500), highlight the challenges posed by tight broker integrations. Similarly, integrating tiered storage (transferring cold data to cloud storage) into Kafka was a significant undertaking, yet it remains an incomplete solution. The tightly coupled architecture means storage, metadata, and protocol are intertwined, making any evolution—like adopting a new storage engine or supporting a new client API—a slow and arduous process.",[48,14738,14739],{},"High costs, scaling bottlenecks, and stagnant feature velocity are the symptoms of this architectural debt. In short, today’s streaming systems carry the baggage of an earlier era – an era when coupling everything in one broker made sense for simplicity. But in the cloud-native, real-time AI world of 2025, that all-in-one model is creaking under the strain.",[40,14741,14743],{"id":14742},"lessons-from-the-lakehouse-revolution","Lessons from the Lakehouse Revolution",[48,14745,14746],{},"To find a solution, we can draw parallels with the recent transformation of data lakes. Just a few years ago, data lakes faced a similar predicament: data stored in affordable storage (such as HDFS or cloud blobs) proved challenging to manage and query effectively. The proliferation of engines and pipelines resulted in inconsistency, sluggish queries, and pipeline failures, while duplicate data and redundant work inflated costs. The underlying issue? A flawed architecture – doesn't that sound familiar?",[48,14748,14749],{},"The \"Iceberg moment\" marked a shift in data management. Apache Iceberg introduced the concept of an open table format that separated data storage from processing engines. It also incorporated a metadata layer for overseeing table states. This innovation, along with similar initiatives like Delta Lake and Apache Hudi, transformed a chaotic data lake into an organized lakehouse. Key aspects of this transformation include:",[321,14751,14752,14755,14758,14761],{},[324,14753,14754],{},"Separation of Concerns: Scalable object storage houses data files written in well defined formats such as Parquet and ORC, while a separate catalog manages table metadata, including schemas, partitions, and snapshots. Query engines like Spark, Trino, and Flink interact through a standardized table API, rather than relying on assumptions about data disk layout.",[324,14756,14757],{},"ACID and Governance: The metadata layer transforms a chaotic blob store into an organized system with transactional integrity (ACID commits) and schema evolution. This enables seamless coordination among multiple writers and readers, ensuring data consistency and reliability.",[324,14759,14760],{},"Multi-Engine Interoperability: Iceberg's open and standardized storage format and metadata enable diverse tools to share data seamlessly. This means a single Iceberg table can simultaneously handle streaming ingestion and batch SQL queries. Such unified access to both batch and streaming data facilitates real-time analytics, a capability previously difficult to achieve without intricate ETL pipelines.",[324,14762,14763],{},"Rapid Innovation: Independent evolution is now possible for each layer. A new query engine can be implemented by simply integrating the Iceberg API, eliminating the need to rewrite data storage methods. Improved compression or encodings in storage can be immediately leveraged by engines, provided the format aligns with the metadata specification. This modularity has spurred significant innovation within the data ecosystem, all built upon the foundation provided by Apache Iceberg.",[48,14765,14766],{},"Adopting a lakehouse approach has delivered dramatic results for companies, leading to significant improvements such as faster queries, reduced costs, and simplified architectures. Crucially, these benefits were achieved through the use of vendor-neutral, community-driven technology. Apache Iceberg exemplifies this by being an open standard, not confined to a single vendor's ecosystem, and widely adopted across the industry. Its broad acceptance has solidified its position as a de facto modern standard for analytic data.",[48,14768,14769],{},"Similar to the evolution of data lakes before Iceberg, current streaming platforms face comparable challenges. Fortunately, the core principles of decoupling, standardization, and opening up the architecture—which proved effective for data lakes—are equally applicable to streaming data.",[48,14771,14772],{},"Early indicators of this transformation are already evident: Apache Kafka's roadmap now features proposals for \"diskless\" topics that write directly to object storage, aiming to reduce costs. Additionally, there are plans to modernize metadata management by replacing ZooKeeper with an internal metadata quorum.",[48,14774,14775],{},"Apache Pulsar adopted a two-tier architecture, separating compute brokers from BookKeeper storage nodes. This design breaks away from traditional monolithic systems and provides a unified messaging model, which allows for independent consumption and storage of data. While these steps are positive, we can achieve more. We need to fundamentally re-envision streaming systems as three separate layers: data, metadata, and protocol. This approach mirrors how the lakehouse model disaggregated analytics and represents the \"Iceberg moment\" for streams: a streamlined, layer-centric architecture that frees us from prior compromises.",[40,14777,14779],{"id":14778},"a-three-layer-vision-for-streaming","A Three-Layer Vision for Streaming",[48,14781,14782],{},"Imagine a streaming data platform built from the ground up on three independent layers:",[1666,14784,14785,14788,14791],{},[324,14786,14787],{},"Data Layer – A scalable, durable storage substrate for the raw streaming data (the actual event log).",[324,14789,14790],{},"Metadata Layer – An authoritative repository for stream-related metadata, encompassing details such as existing streams (topics), their schemas, offsets, and retention policies.",[324,14792,14793],{},"Protocol Layer – Stateless services, designed to speak various streaming protocols (such as Kafka, Pulsar, and MQTT), manage client connections and orchestrate read\u002Fwrite operations. However, these services do not offer long-term data persistence.",[48,14795,14796],{},"In this model, the traditional \"broker\" — a single, monolithic server — is replaced. Brokers now function as stateless routers or protocol translators. The substantial state (data and metadata) is offloaded to specialized layers, allowing them to scale and evolve independently. Let’s briefly examine the benefits, which closely mirror those seen in the Iceberg\u002Fdata lakehouse world:",[321,14798,14799,14802,14805,14808],{},[324,14800,14801],{},"Cost Efficiency & Scalability: The data layer can reside on cheap, infinite storage like cloud object stores, rather than on tightly-managed broker disks. This means you only pay for storage once and grow it as needed, instead of over-provisioning every broker. Brokers no longer need large disks, reducing their footprint to mostly CPU and memory for processing. With brokers being stateless, you can scale out or in the computing layer on demand (spin up new protocol servers during traffic spikes, shut them down when not needed) without moving any data – no lengthy rebalances or replication storms. The compute and storage scales independently, just as in a decoupled lakehouse architecture.",[324,14803,14804],{},"Faster Evolution of Each Layer: Each component can progress on its own timeline. For example, the protocol layer could support new client features or even entirely new protocols (say an MQTT interface or a new streaming SQL interface) without changing how data is stored. The data layer could adopt better formats or storage engines (imagine switching from segment files to columnar storage, or integrating directly with Apache Iceberg\u002FDelta Lake tables) without affecting client applications – they still talk the same protocol. The metadata layer could introduce stronger consistency, new subscription types, or integration with governance tools, all without touching the other layers. This modularity accelerates innovation since changes are localized and replaceable behind stable interfaces.",[324,14806,14807],{},"Multi-Tool and Multi-Use-Case Support: A three-layer streaming system is inherently more open. For instance, if the data is stored in an open format (like Parquet files with Iceberg metadata, or any self-describing log format), then external tools can read streaming data directly for analytics or AI training. Your streaming archive effectively doubles as a live data lake – no more one-way ETL from Kafka into data lakes just to run batch queries. At the same time, the protocol layer could allow multiple protocols to access the same data. It’s conceivable to have one unified storage of events but serve them through Kafka APIs, Pulsar APIs, and other interfaces simultaneously, depending on application needs. This breaks the silos between different streaming technologies and avoids vendor lock-in.",[324,14809,14810],{},"Reliability and Simplified Operations: Decoupling improves fault tolerance. The durable data layer (especially if using cloud storage) can offer very high availability and durability – e.g., object stores like S3 automatically replicate data across zones with 11-nines durability. The metadata layer, if built with a proper consensus or using a robust external catalog, ensures the stream definitions and cursors are always preserved. Meanwhile, stateless protocol servers mean failures are far less dramatic: if one goes down, clients can reconnect to another with zero data loss (since no unique data was on the failed node). Upgrades and maintenance become easier – you could even roll out new protocol server versions one at a time (since they don’t hold unique state) or swap out the storage backend without app downtime. Overall, operations begin to look more like managing a stateless microservice plus a database, rather than herding a fragile cluster of pet brokers.",[48,14812,14813,14814,14817],{},"The three-layer vision for streaming offers cloud-native efficiency, flexibility, and openness. This approach applies the principles of the lakehouse to event streams, transforming the streaming pipeline from a closed broker into an extensible data infrastructure. Much like Iceberg elevated \"dumb storage\" to a smart data platform, this blueprint is vendor-neutral, allowing any project or vendor to implement these layers in their unique way, provided they adhere to open interfaces. We’re already seeing movement in this direction: for example, Apache Pulsar separates serving and storage layers (a step toward stateless brokers), and emerging projects like ",[55,14815,14816],{"href":6647},"StreamNative’s Ursa Engine"," build on this idea by writing streaming data directly to Iceberg\u002FDelta lakehouse tables in object storage (making streaming “lakehouse-native” storage) while providing a Kafka-compatible protocol on top. The industry as a whole is converging on the notion that streams deserve the same architectural reboot that batch data got.",[48,14819,14820],{},"This “Iceberg moment” for streams isn’t just about any single technology – it’s about a change in philosophy. By breaking apart the old broker, we can solve the pain points that have plagued streaming for a decade. The result will be streaming platforms that evolve faster, cost a fraction of today’s setups, and integrate seamlessly with the rest of the data ecosystem.",[48,14822,14823],{},"Up next: In part 2 of this series, we’ll take a deeper dive into each of these three layers – Data, Metadata, and Protocol – to understand their roles and how they compare to the analogous pieces in a lakehouse architecture. Stay tuned for a technical anatomy of a modern stream.",[48,14825,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":14827},[14828,14829,14830],{"id":14721,"depth":19,"text":14722},{"id":14742,"depth":19,"text":14743},{"id":14778,"depth":19,"text":14779},"The data streaming landscape needs a revolution. Learn why tightly coupled systems like Kafka are costly and complex, and how a three-layer, decoupled architecture, inspired by Apache Iceberg, can slash costs, accelerate innovation, and deliver vendor-neutral, lakehouse-native streaming.","\u002Fimgs\u002Fblogs\u002F685d5b34cb5eff658726cb16_why-stream-need-its-iceberg-moment.png",{},{"title":14261,"description":14831},"blog\u002Fwhy-streams-need-their-iceberg-moment",[799,1332],"6SPyWUanICyXsl3424kaZXl6YPubHC62imyHJnOyP6I",{"id":14839,"title":14840,"authors":14841,"body":14842,"category":290,"createdAt":290,"date":15025,"description":15026,"extension":8,"featured":294,"image":15027,"isDraft":294,"link":290,"meta":15028,"navigation":7,"order":296,"path":6626,"readingTime":4475,"relatedResources":290,"seo":15029,"stem":15030,"tags":15031,"__hash__":15032},"blogs\u002Fblog\u002Fstreamnative-expands-unitycatalog-integration-with-iceberg-tables.md","StreamNative Expands Unity Catalog Integration with Managed Iceberg Tables",[311],{"type":15,"value":14843,"toc":15018},[14844,14852,14859,14863,14875,14878,14883,14886,14891,14894,14899,14902,14907,14910,14913,14918,14922,14925,14928,14939,14942,14946,14957,14961,14964,14978,14985,14987,14990,14993,14995,14997],[48,14845,14846,14847],{},"We’re excited to share that StreamNative is a launch partner for the ",[55,14848,14851],{"href":14849,"rel":14850},"https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Fannouncing-full-apache-iceberg-support-databricks",[264],"Private Preview of Managed Apache Iceberg tables and the Iceberg REST Catalog in Databricks Unity Catalog!",[48,14853,14854,14855,14858],{},"This milestone marks the next step in simplifying the real-time data pipeline from Apache Pulsar and Apache Kafka to Lakehouse Storage, powered by StreamNative’s Ursa engine and Iceberg’s open table format—now fully integrated with Unity Catalog for unified governance and optimized performance. ",[55,14856,14857],{"href":4811},"StreamNative already natively integrates with Unity Catalog through Delta Lake support",", and we’re excited to add native support for Apache Iceberg as well.",[40,14860,14862],{"id":14861},"streaming-into-unity-catalog-with-ursa-and-iceberg-rest","Streaming into Unity Catalog with Ursa and Iceberg REST",[48,14864,14865,14866,14869,14870,3584],{},"At the heart of this integration is ",[55,14867,14868],{"href":6647},"StreamNative’s Ursa engine",", purpose-built to transform streaming data into optimized open table formats. With the recent preview launch of native Iceberg support in Unity Catalog, Ursa can now stream data directly into Iceberg tables using the ",[55,14871,14874],{"href":14872,"rel":14873},"https:\u002F\u002Feditor-next.swagger.io\u002F?url=https:\u002F\u002Fraw.githubusercontent.com\u002Fapache\u002Ficeberg\u002Fmain\u002Fopen-api\u002Frest-catalog-open-api.yaml",[264],"Iceberg REST Catalog",[48,14876,14877],{},"This RESTful interface acts as the bridge between Ursa’s real-time streaming output and Unity Catalog’s metadata governance layer. As Ursa writes new data files, it coordinates with the REST Catalog to register those files into the appropriate Iceberg table versions, maintaining full compliance with Iceberg’s transactional semantics.",[48,14879,14880],{},[384,14881],{"alt":18,"src":14882},"\u002Fimgs\u002Fblogs\u002F684ab40b246a9e0b54c1953b_AD_4nXcUmXTWnnIaZnD06xRxEDPvsVHghvwUodbB7xkX3_VwO8cqn3_vR__WrD1CG8XHAw-GdXQh-MEs52fmyCURAwEFWDlQHXEL16Dc7doRBWmYDGkYCi9n315GJsxPjzvjdlCyNvEbeA.png",[48,14884,14885],{},"StreamNative now offers native integration with Unity Catalog, enabling seamless streaming of topic data as Iceberg tables to object storage, with direct publication into Unity Catalog for unified governance.",[48,14887,14888],{},[384,14889],{"alt":18,"src":14890},"\u002Fimgs\u002Fblogs\u002F684ab40b246a9e0b54c19538_AD_4nXcxwEQ-LpRvjVodpD0DuMt5gmzUCtwNzrMnZ7Jho1C5faO391B4j8xAzC2VrsjtEtCbouyeESkqLobcR_rbVKHNeFaE-Q30QA2sxBujQYvQJ2PDc5ZBsTfTUmOHdWEBcgGXgasd.png",[48,14892,14893],{},"StreamNative’s Ursa engine compacts streaming topic data and stores it as optimized Parquet files in object storage. Alongside the data files, Iceberg maintains a metadata folder that captures table state using versioned snapshots, enabling efficient query planning, time travel, and schema evolution.",[48,14895,14896],{},[384,14897],{"alt":18,"src":14898},"\u002Fimgs\u002Fblogs\u002F684ab40b246a9e0b54c19535_AD_4nXep64_AZLv-4-7gP3BfwIz7czmM3s8pa-NXOyDQ8SKbXIV8YFBwq5yA4pbC8lbjrmS33Rjpo_flRyO9Qh4jWn73fsBaUDY_mT_awOcX3Y138XB5AwYYOtLSxOBeodkQdcubZsB3.png",[48,14900,14901],{},"The Iceberg tables, comprising data stored in Parquet files along with metadata such as snapshots and supporting files, are located in the directory highlighted below.",[48,14903,14904],{},[384,14905],{"alt":18,"src":14906},"\u002Fimgs\u002Fblogs\u002F684adc99dbbd3dc2a4ab2222_AD_4nXeoNnhNB9jCVSnmWLeUDSlBguaIobzhDGBaUHWAqNRq8rWksPFKfFr4PYtMJxB5aT6W2aDlCU9Y_4MmuKFjtqdP1rRuqxH9lWFfOA3E-WFlM5qDZDtpSaLyil1CTqVTvZvbhLsdhQ.png",[48,14908,14909],{},"Query Iceberg tables from Unity Catalog",[48,14911,14912],{},"Once topic data is ingested as Iceberg tables and published to Unity Catalog, it becomes accessible for querying through a variety of external tools. The example below demonstrates how users can query these Iceberg tables using Spark SQL.",[48,14914,14915],{},[384,14916],{"alt":18,"src":14917},"\u002Fimgs\u002Fblogs\u002F684ab40b246a9e0b54c1953f_AD_4nXe-ka1Uzzw9HfHtGFlPluYH7D3dcSFSd0ajHLD7JAaiDFIxcJwIUjHAI4Vxj_e6ZY6UAsoIbSDL2jwjnzck1pNY6dEc95Ea1c_qmqxTh5Dk9BvxXSUiCVYLrcdANCCHSv3yrgfLMg.png",[40,14919,14921],{"id":14920},"from-streaming-chunks-to-iceberg-snapshots","From Streaming Chunks to Iceberg Snapshots",[48,14923,14924],{},"As messages stream into Ursa, they are initially stored in a write-optimized internal format. Ursa periodically compacts this data into columnar Apache Parquet files, which are then committed to cloud object storage (e.g., AWS S3, GCS, or Azure Blob Storage).",[48,14926,14927],{},"With each compaction cycle, Ursa creates a new Iceberg snapshot—a consistent version of the table at a point in time—by publishing a new manifest and metadata file through the Iceberg REST Catalog. These snapshots enable:",[321,14929,14930,14933,14936],{},[324,14931,14932],{},"Time travel and rollback to any prior version of the table",[324,14934,14935],{},"Incremental reads by downstream engines",[324,14937,14938],{},"Optimized compaction via Iceberg’s rewrite and maintenance APIs",[48,14940,14941],{},"This design ensures that real-time streaming data ingested via Pulsar is immediately queryable, governed, and fully versioned within the Unity Catalog-managed Iceberg table.",[40,14943,14945],{"id":14944},"unity-catalogs-iceberg-support-brings-the-following-features","Unity Catalog's Iceberg Support Brings the Following Features:",[321,14947,14948,14951,14954],{},[324,14949,14950],{},"Automated Table Optimization Predictive compaction and file management for long-term efficiency.",[324,14952,14953],{},"Smart Liquid Clustering Dynamically tunes table layouts for faster query performance.",[324,14955,14956],{},"Unified Read\u002FWrite Access from External Engines Enables broad analytics access—from BI to ML—on real-time data.",[40,14958,14960],{"id":14959},"why-it-matters","Why It Matters",[48,14962,14963],{},"This integration streamlines the path from real-time event streams to AI\u002FML-ready analytics using open standards and governed lakehouse infrastructure. Enterprises can now:",[321,14965,14966,14969,14972,14975],{},[324,14967,14968],{},"Ingest data with low-latency streaming from Pulsar.",[324,14970,14971],{},"Transform it into open-format Apache Iceberg tables.",[324,14973,14974],{},"Govern and optimize those tables with Unity Catalog.",[324,14976,14977],{},"Access the data with any external engine using Delta, Iceberg, or Spark-compatible tooling.",[48,14979,14980,190],{},[55,14981,14984],{"href":14982,"rel":14983},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=itdnYsNU7HQ",[264],"Take a peek at how Ursa streams data as Iceberg tables and seamlessly publishes them to Unity Catalog",[40,14986,13565],{"id":1727},[48,14988,14989],{},"StreamNative and Databricks are working together to deliver seamless real-time data pipelines that power the next generation of lakehouse analytics. With support for the Iceberg REST Catalog, you can now build streaming data applications with robust schema evolution, time travel, and unified governance—out of the box.",[48,14991,14992],{},"Stay tuned for technical walkthroughs, architecture deep dives, and joint use case spotlights.",[48,14994,4446],{},[48,14996,4135],{},[321,14998,14999,15006,15011],{},[324,15000,15001,15005],{},[55,15002,4690],{"href":15003,"rel":15004},"http:\u002F\u002Fconsole.streamnative.cloud",[264],"and get started for free",[324,15007,15008],{},[55,15009,15010],{"href":4811},"Learn more about StreamNative’s integration with Unity Catalog",[324,15012,15013],{},[55,15014,15017],{"href":15015,"rel":15016},"https:\u002F\u002Fwww.prnewswire.com\u002Fnews-releases\u002Fdatabricks-eliminates-table-format-lock-in-and-adds-capabilities-for-business-users-with-unity-catalog-advancements-302478796.html?utm_source=chatgpt.com",[264],"Databricks Eliminates Table Format Lock-in and Adds Capabilities for Business Users with Unity Catalog Advancements",{"title":18,"searchDepth":19,"depth":19,"links":15019},[15020,15021,15022,15023,15024],{"id":14861,"depth":19,"text":14862},{"id":14920,"depth":19,"text":14921},{"id":14944,"depth":19,"text":14945},{"id":14959,"depth":19,"text":14960},{"id":1727,"depth":19,"text":13565},"2025-06-12","StreamNative enhances Databricks Unity Catalog with native Apache Iceberg support, enabling seamless real-time data streaming from Pulsar\u002FKafka to governed lakehouse storage. Discover how Ursa engine and Iceberg REST Catalog simplify pipelines with unified metadata, time travel, and optimized query performance.","\u002Fimgs\u002Fblogs\u002F684ab726ad20a9327e4b907b_image-16.png",{},{"title":14840,"description":15026},"blog\u002Fstreamnative-expands-unitycatalog-integration-with-iceberg-tables",[800,2599,1332],"P_6KNdDdCHliJz6ZuuUsKldwQJYIRRjpw5WcyaequIo",{"id":15034,"title":13858,"authors":15035,"body":15036,"category":6415,"createdAt":290,"date":15143,"description":15144,"extension":8,"featured":294,"image":15145,"isDraft":294,"link":290,"meta":15146,"navigation":7,"order":296,"path":15147,"readingTime":4475,"relatedResources":290,"seo":15148,"stem":15149,"tags":15150,"__hash__":15151},"blogs\u002Fblog\u002Fstreams-vs-queues-why-your-agents-need-both--and-why-pulsar-protocol-delivers.md",[807],{"type":15,"value":15037,"toc":15138},[15038,15040,15042,15056,15058,15060,15063,15068,15072,15075,15078,15081,15085,15088,15091,15099,15103,15106,15114,15117,15119,15130,15133,15136],[48,15039,13820],{},[48,15041,13823],{},[321,15043,15044,15048,15052],{},[324,15045,13828,15046],{},[55,15047,13832],{"href":13831},[324,15049,13835,15050],{},[55,15051,13839],{"href":13838},[324,15053,13842,15054],{},[55,15055,13814],{"href":13845},[208,15057],{},[48,15059,3931],{},[48,15061,15062],{},"Developers building reasoning and reactive AI agents often grapple with two messaging patterns: streaming (event streams) and queuing (work\u002Ftask queues). It’s crucial to understand the difference because effective AI agents typically need both patterns in their architecture. In this first post, we’ll clarify stream vs. queue semantics and show how Apache Pulsar (protocol) uniquely delivers both out of the box, unlike Apache Kafka (protocol) which was designed around streams. We’ll use practical examples (imagine continuous sensor inputs and discrete task execution requests) to illustrate why agents demand both patterns and how Pulsar handles them natively.",[48,15064,15065],{},[384,15066],{"alt":18,"src":15067},"\u002Fimgs\u002Fblogs\u002F684829362b8d575a3d730acd_AD_4nXdLgRoJfi_Jv_uA9N_A7TskyTTrxYGJOt-Ec4kuOVheXHc8o1d2dfC9g8FpQVba7p8_9F7Dnc4is2B4NPLA85dM65VthE6Tly-rx62RdD4MdM2WVUM_GARgdnVj5xFdLgeIGZOZ6g.png",[40,15069,15071],{"id":15070},"stream-vs-queue-semantics-101","Stream vs. Queue Semantics 101",[48,15073,15074],{},"In streaming message systems, producers append data to an unbounded, ordered log (the stream). Consumers then read from this log in sequence, maintaining an offset (position) in the stream. Order is guaranteed per partition, and messages aren’t removed on consumption (they remain for a retention period). This is great when event order matters – for example, time-series sensor data or user click events should be processed in the exact order produced. Apache Kafka is the classic example of a streaming platform: it provides high throughput and strict ordering by partitions, which makes it ideal for ingesting ordered event streams.",[48,15076,15077],{},"In queuing message systems, producers send messages to a queue, and each message is processed by only one consumer (even if many consumers are listening). Consumers pull from the queue and acknowledge each message when done, upon which it’s removed from the queue. Queues excel at distributing tasks or jobs that can be done in parallel without a global ordering requirement. This pattern is common for background work: e.g. an agent that needs to perform independent tasks (send emails, execute API calls) can put those tasks on a queue, and a pool of workers will split them up. Systems like RabbitMQ or Amazon SQS embody queue semantics – focusing on one-message-per-consumer with robust features like message retries and dead-lettering.",[48,15079,15080],{},"Why do agents need both? Because AI agents operate in real-time environments and must perform reliable actions. For instance, consider a robotic agent: it ingests a stream of sensor readings (continuous, ordered data) while also handling discrete commands or tasks (which can be processed independently). A streaming pipeline ensures the robot’s perception of the world stays ordered (you don’t want to react to events out of sequence). A queue ensures the robot can execute tasks concurrently or retry failures without halting all other work. In practice, the most powerful systems leverage both patterns – streaming for live data feeds and queueing for task execution. Real-world examples include an IoT monitoring agent that uses streaming for sensor telemetry, plus queueing to distribute analysis jobs or alerts based on those sensor events.",[40,15082,15084],{"id":15083},"how-pulsar-and-kafka-handle-these-patterns","How Pulsar and Kafka Handle These Patterns",[48,15086,15087],{},"Apache Kafka (protocol) was originally built around the stream model. It provides high-performance ordered logs, but it doesn’t natively implement traditional queue semantics. You can use Kafka like a queue to some extent – for example, by creating a topic with multiple partitions and a consumer group so that each message goes to one consumer. However, because Kafka enforces per-partition order, this approach comes with caveats. If one message in a partition takes a long time to process or a consumer instance crashes, subsequent messages in that partition are blocked until the slow message is handled (since Kafka consumers can only mark progress by committing the offset up to the last processed message). In effect, a slow or stuck message can stall that “queue.” A common workaround in Kafka is to have the application catch a failed message and publish it to a separate topic (acting as a manual dead-letter queue or retry queue). But this introduces extra complexity – developers need to build custom logic for rerouting or reprocessing failed events, manage multiple topics for what conceptually is one queue, and potentially re-order results that arrive via retries.",[48,15089,15090],{},"Apache Pulsar (protocol) was designed to natively support both streaming and queueing paradigms. Pulsar topics are append-only logs like Kafka, but Pulsar’s consumer model is more flexible. Pulsar supports multiple subscription types on topics: for example, a Shared subscription lets multiple consumers fetch from the same topic in a round-robin fashion (each message goes to one consumer, like a work queue). This enables true distributed queuing on a single topic – you can have, say, 10 consumers all pulling tasks from one Pulsar topic and the broker will balance the load among them. Crucially, each consumer individually acknowledges messages in Pulsar, so the system knows exactly which messages were processed. If one consumer is slow or fails, it doesn’t hold up others – unacknowledged messages can be redelivered to another worker as needed (more on this in Post 2). The Exclusive\u002FFailover subscription modes, on the other hand, let only one consumer (or one primary with a hot standby) consume a topic, preserving total order like Kafka’s semantics. And Pulsar even has a Key_Shared mode where messages are distributed but ordering is maintained per key – effectively a hybrid that ensures all messages for a given entity go to the same consumer in order, while still load-balancing different keys across consumers.",[48,15092,15093,15094,15098],{},"What this means is that Pulsar delivers true queue and stream capabilities in one system. You can treat a Pulsar topic like a Kafka stream and\u002For like a distributed queue depending on the subscription. Under the hood, it’s the same topic, but the consumption pattern adapts to your needs. For example, a Pulsar topic with a Shared subscription is analogous to a RabbitMQ queue – multiple consumers each get a subset of messages – whereas the same topic could have another subscription that behaves like a Kafka stream (with a dedicated consumer reading the full ordered log). Indeed, Pulsar’s heritage at Yahoo was as a unified messaging platform intended to replace both their Kafka (stream) and RabbitMQ (queue) use cases. As one case study noted, Kafka was excellent for ordered event ingestion, but Yahoo’s team “",[55,15095,15097],{"href":15096},"\u002Fblog\u002Fhow-apache-pulsar-is-helping-iterable-scale-its-customer-engagement-platform#:~:text=When%20we%20started%20evaluating%20Pulsar%2C,challenging%20to%20find%20an%20alternative","used RabbitMQ for other use cases since Kafka lacked the necessary work-queue semantics","”. Pulsar was adopted because it could cover all Kafka-like and RabbitMQ-like scenarios in a single, scalable system.",[40,15100,15102],{"id":15101},"practical-example-sensor-streams-task-queues","Practical Example: Sensor Streams + Task Queues",[48,15104,15105],{},"Let’s revisit the example of an AI-powered robot for a concrete scenario:",[321,15107,15108,15111],{},[324,15109,15110],{},"Sensor input (streaming): The robot’s vision or telemetry sensors publish a constant stream of events (images, lidar scans, etc.). These need to be processed in order and possibly replayed for debugging. Using Pulsar, the sensor topics could be consumed with an exclusive subscription (strict order) by a stream processing component. Kafka could also handle this part well, as it’s a straight event log.",[324,15112,15113],{},"Task execution (queuing): When the robot’s AI decides on actions (e.g. “pick up object” or “navigate to location”), those tasks are added to a work queue. Here Pulsar shines: the tasks can be sent to a Pulsar topic with a shared subscription, so multiple executor modules (consumers) will divide the tasks. Each task message goes to one executor, which acknowledges it upon completion. If a task fails, the executor can negative-acknowledge it (signal a failure) and another instance can retry (we’ll explain this mechanism later). In Kafka, implementing this queue would be clumsier – you might create a single-partition topic (to ensure one consumer at a time) or a partition per consumer, but then you lose parallelism or have to predefine partitions. And without per-message ack, error handling would require manual intervention (like writing failed tasks to a new “retry” topic).",[48,15115,15116],{},"By using Pulsar for both patterns, our hypothetical robot agents get the best of both worlds seamlessly. There’s no need to run separate systems (Kafka for streams and a RabbitMQ or SQS for queues) and then glue them together. Pulsar can ingest the high-rate sensor streams and dispatch tasks with queue semantics in one unified platform. This simplicity translates to a more cohesive architecture for AI agents, where every kind of message – whether an ordered event or a one-off task – can flow through the same Pulsar cluster. It reduces operational overhead and eliminates the impedance mismatch when bridging different messaging systems. As developers, we can focus on our agent logic rather than on plumbing data between Kafka topics and a separate queue service.",[48,15118,8417],{},[321,15120,15121,15124,15127],{},[324,15122,15123],{},"Streams vs Queues: Streaming systems preserve an ordered log of events for replay or sequential processing, while queues distribute individual messages to consumers for parallel task execution. AI agents commonly require both patterns (e.g. process sensor events in order, handle commands\u002Ftasks concurrently).",[324,15125,15126],{},"Kafka’s limitation: Kafka natively provides streams, not work-queues. You can simulate queues on Kafka but face complications due to strict ordering and offset-based acknowledgments. A slow or failed message can block a partition, and handling retries means extra topics and custom logic.",[324,15128,15129],{},"Pulsar’s advantage: Pulsar supports both messaging semantics natively. Its flexible subscription modes (exclusive, shared, failover, key_shared) let you pick the right tool for the job on the same platform. You get Kafka-like high-throughput streams and RabbitMQ-like distributed queues in one system. This means less system sprawl and easier integration between components – a big win for complex AI agent architectures.",[48,15131,15132],{},"In the next post, we’ll delve into reliability and resilience – exploring how Pulsar’s acknowledgment and retry mechanisms keep AI agent pipelines robust where Kafka’s model can struggle.",[48,15134,15135],{},"Try out Pulsar on StreamNative Cloud!",[48,15137,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":15139},[15140,15141,15142],{"id":15070,"depth":19,"text":15071},{"id":15083,"depth":19,"text":15084},{"id":15101,"depth":19,"text":15102},"2025-06-10","This blog discusses the importance of understanding streaming and queuing messaging patterns for building effective AI agents, highlighting how Apache Pulsar uniquely supports both. Pulsar's flexible subscription modes allow it to handle both continuous data streams and discrete task queues within a single platform, unlike Apache Kafka, which is primarily designed for streams. This dual capability simplifies architecture for AI agents, reducing the need for separate systems and enhancing operational efficiency.","\u002Fimgs\u002Fblogs\u002F68482b07e4686003e6b6e82c_blog-streams-vs-queues.png",{},"\u002Fblog\u002Fstreams-vs-queues-why-your-agents-need-both-and-why-pulsar-protocol-delivers",{"title":13858,"description":15144},"blog\u002Fstreams-vs-queues-why-your-agents-need-both--and-why-pulsar-protocol-delivers",[3988,10054,821],"K1pNQX29yXWHkN31YHcudBkEvQtOFr7ualIEqtdq110",{"id":15153,"title":15154,"authors":15155,"body":15157,"category":7338,"createdAt":290,"date":15389,"description":15390,"extension":8,"featured":294,"image":15391,"isDraft":294,"link":290,"meta":15392,"navigation":7,"order":296,"path":5380,"readingTime":3556,"relatedResources":290,"seo":15393,"stem":15394,"tags":15395,"__hash__":15396},"blogs\u002Fblog\u002Fdata-streaming-summit-virtual-2025-recap.md","Data Streaming Summit Virtual 2025 Recap",[15156],"Emma Tian",{"type":15,"value":15158,"toc":15379},[15159,15163,15167,15170,15178,15181,15185,15188,15191,15194,15197,15200,15204,15207,15216,15224,15227,15230,15233,15236,15240,15243,15246,15249,15252,15255,15258,15261,15264,15268,15271,15274,15277,15280,15283,15286,15289,15293,15296,15307,15310,15314,15317,15337,15340,15344,15347,15350,15353,15376],[40,15160,15162],{"id":15161},"agentic-ai-the-new-paradigm-for-intelligent-systems","‍Agentic AI: The New Paradigm for Intelligent Systems",[225,15164,15166],{"id":15165},"we-are-entering-this-new-agentic-evolution","“We are entering this new agentic evolution.”",[48,15168,15169],{},"— Sijie Guo, Co-founder & CEO, StreamNative",[48,15171,15172,15173,15177],{},"As the digital world shifts away from static models towards systems of continuous adaptation, the ",[55,15174,5383],{"href":15175,"rel":15176},"https:\u002F\u002Fdeploy-preview-673--datastreaming-summit.netlify.app\u002Fevent\u002Fdata-streaming-virtual-2025",[264]," spotlighted a decisive transformation: the rise of Agentic AI. Over two days, with more than 36 sessions from top minds in data streaming, the summit mapped out how real-time technologies, open-source ecosystems, and unified architectures are driving the next wave of intelligent systems.",[48,15179,15180],{},"In this recap, we highlight the summit’s three central themes: Agentic AI, the open-source revolution in data infrastructure, and the convergence of stream and batch into the Streaming Lakehouse. We’ll also spotlight user stories and technical innovations shaping the road ahead.",[40,15182,15184],{"id":15183},"keynote-highlights","Keynote Highlights",[48,15186,15187],{},"The summit opened with a bold vision. Sijie Guo, CEO of StreamNative, framed the emergence of Agentic AI as “a new evolution,” forecasting a future where autonomous agents powered by real-time streams are not a luxury—but the norm. “Every enterprise will run real-time intelligent agents as a standard part of their operations,” he declared.",[48,15189,15190],{},"Matteo Merli, StreamNative CTO, shared the latest updates on the Ursa Engine: a Kafka-compatible, cloud-native streaming engine built on the success of Apache Pulsar. “Ursa Engine brings 95% cost savings for real-time workloads”, Merli shared, citing its separation of storage and compute, Lakehouse-native storage, and compatibility with Kafka protocol.",[48,15192,15193],{},"The release of Apache Flink 2.0 was another milestone, introduced by Xintong Song of Alibaba Cloud. “Flink 2.0 unlocks more AI use cases with lower costs”, he noted, referencing its disaggregated state management and AI-focused APIs that make streaming more accessible and scalable than ever.",[48,15195,15196],{},"​​Q6 Cyber offered a practitioner’s perspective on stream-first architecture. In their session, the team shared how they replaced a complex patchwork of cloud services and homegrown queues with Apache Pulsar at the center of their stack, streaming over 75 billion records into a Hudi lakehouse. They overcame serialization bottlenecks, multithreaded performance challenges, and scaled Pulsar Functions to meet the demands of large-scale security telemetry. This real-world story reinforced the summit’s themes of architectural simplification, open-source reliability, and scalability in mission-critical environments",[48,15198,15199],{},"These announcements reflected an industry-wide push for open, composable data architectures—laying the groundwork for intelligent, event-driven agentic systems at enterprise scale.",[40,15201,15203],{"id":15202},"agentic-ai-from-model-centric-to-agent-centric-systems","Agentic AI: From Model-Centric to Agent-Centric Systems",[48,15205,15206],{},"Agentic AI marks a pivot from passive models to autonomous, goal-driven agents that interact with and respond to real-time signals. This evolution was a central theme across the summit and set the tone for how AI systems will be built and operated in the future.",[48,15208,15209,15210,15215],{},"In a major keynote announcement, Neng Lu, Director of Platform Engineering at StreamNative, introduced the ",[55,15211,15214],{"href":15212,"rel":15213},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Fp4vf-QXbb0&t=4618s",[264],"StreamNative Agent Engine",", a new runtime that enables autonomous AI agents to process real-time events, reason with context, and take intelligent actions. “The Agent Engine is designed for systems that think and act”, Lu explained. It supports both Kafka and Pulsar protocols, integrates with frameworks like LangChain, LlamaIndex, and Google ADK, and provides a unified registry to manage both deterministic streaming functions and statistical agents (workflows). The Agent Engine combines deterministic event processing with context-aware decision-making—blending the reliability of stream processing with the flexibility of LLMs.",[48,15217,15218,15219,15223],{},"The Agent Engine is built to power the next generation of applications—from real-time copilots to self-healing infrastructure. According to the ",[55,15220,15222],{"href":15221},"\u002Fblog\u002Fintroducing-the-streamnative-agent-engine","launch blog",", it consists of modular components: a function runtime for event handling, a context store for shared memory, and a decision loop that ties together agents and environments. It positions StreamNative as a pioneer in agent-native infrastructure, bringing AI and stream processing into a unified, programmable environment.",[48,15225,15226],{},"Other sessions expanded on this vision. Mary Grygleski, in her session on generative AI workflows, explained how event-driven systems facilitate asynchronous communication between agents: “Event-driven architectures allow agents to work independently and scale flexibly,” she noted—critical attributes for distributed, multi-agent systems.",[48,15228,15229],{},"Andrew Brooks (Contextual Software) linked real-time streaming directly to business value: “Speed to processing means speed to payment,” he emphasized, illustrating how streaming pipelines improve responsiveness and ROI.",[48,15231,15232],{},"Finally, Hubert Zhang (EloqData) provided an architectural view of how Apache Pulsar supports elastic AI-native pipelines. “Pulsar gives you great streaming. EloqDoc gives you a better document store. ConvertDB ties it all together,” he explained, showing how decoupling compute and storage improves both scalability and cost-efficiency.",[48,15234,15235],{},"Together, these sessions and product launches reflect the rise of agentic architectures as not just a technical shift, but a strategic imperative for building intelligent, autonomous systems. As real-time context becomes the fuel for AI agents, infrastructures like the Agent Engine will be foundational for the next wave of enterprise AI.",[40,15237,15239],{"id":15238},"open-source-ecosystem-innovation-through-community","Open Source Ecosystem: Innovation through Community",[48,15241,15242],{},"The open-source ethos was front and center at the Data Streaming Summit Virtual 2025, showcasing how diverse communities are collaboratively shaping the future of real-time data streaming. As we are entering this new agentic evolution, open source is the foundation that will carry us forward.",[48,15244,15245],{},"This year’s summit featured technical leaders from across the open-source ecosystem: Kafka, Pulsar, and Ursa in the streaming layer; Flink, Spark, and RisingWave in processing; and Iceberg and Hudi in the lakehouse tier. This diversity reflects a maturing community—one that recognizes no single project can address every use case, but together, they form a cohesive and interoperable stack.",[48,15247,15248],{},"Sessions exemplified this collaborative spirit. David Kjerrumgaard from StreamNative and Peter Corless from StarTree introduced StreamQoS, an open standard for defining performance and SLA expectations across Kafka, RabbitMQ, and Pulsar, inviting the community to participate: “Get involved. We want your feedback. That’s the whole point of this.”",[48,15250,15251],{},"Penghui Li demonstrated how Ursa, a Kafka-compatible platform built on open table formats like Iceberg and Delta Lake, dramatically cuts inter-zone traffic via an S3-native architecture. “We saved almost all the internet traffic,” he said, underscoring the cost efficiencies possible through shared infrastructure and community innovation.",[48,15253,15254],{},"Other talks emphasized cross-project interoperability: from validating streaming correctness at scale with tooling built for Kafka, Pulsar, and Ursa, to exploring metadata unification with Oxia, a Zookeeper alternative designed to serve multiple ecosystems. A standout session on the “Apache Kafka API: The Unofficial Standard” explored how compatibility across platforms is helping break down silos while preserving developer familiarity.",[48,15256,15257],{},"As Apurva Mehta of Responsive shared in “Why Stream Processors Must Evolve,” the call for modular, Kubernetes-native, and community-driven processors is clear: “The obvious solution is to unbundle the state and control plane layers of Kafka Streams.”",[48,15259,15260],{},"By bringing together contributors and users from varied technologies—rather than promoting a single project—the summit highlighted a rising movement: one of interoperability over lock-in, modularity over monoliths, and open participation over vendor control.",[48,15262,15263],{},"The Data Streaming Summit is not just a gathering of like-minded developers—it's a reflection of a global, collaborative momentum across the open-source data stack. Whether you’re building with Flink or Spark, Kafka or Pulsar, Iceberg or Hudi, the summit reinforced a unifying message: we’re stronger when we build together.",[40,15265,15267],{"id":15266},"streaming-lakehouse-converging-stream-and-batch","Streaming Lakehouse: Converging Stream and Batch",[48,15269,15270],{},"A key theme of the summit was the convergence of real-time and historical analytics in a unified architecture: the Streaming Lakehouse. This approach breaks down the silos between streaming ingestion and analytical querying, enabling faster, more cost-effective data pipelines.",[48,15272,15273],{},"In \"Fluss: Reinventing Kafka for the Real-Time Lakehouse,\" Jark Wu of Alibaba Cloud introduced a Kafka-compatible, lakehouse-native engine that supports real-time reads, writes, deletes, and key lookups—all using a columnar storage format. “Fluss supports real-time streaming reads and writes, just like Kafka, but also supports updates, deletes, and key lookups,” Wu explained, highlighting both performance and cost efficiency.",[48,15275,15276],{},"Motorq’s Anirudh TN showcased how streaming into Snowflake with StreamNative’s Kafka connectors and Snowpipe Streaming eliminated the need for intermediate storage. “Our data latency dropped to seconds—and our cost dropped 2.5x,” he noted, affirming the economic case for streaming-first architecture.",[48,15278,15279],{},"Ververica’s Abdul Rehman Zafar presented a blueprint for replacing traditional ETL using Apache Flink, Iceberg, and Paimon. \"Using Paimon, you can replace Kafka completely,” he said, positioning Paimon as a streaming-native catalog store that merges batch and stream semantics.",[48,15281,15282],{},"Lee Kear from AWS introduced Amazon S3 Tables, a new abstraction over S3 buckets optimized for high-throughput Iceberg ingestion. “With S3 Tables, you get up to 10 times the transactions per second, or TPS, out of the box,” Kear explained. The system supports real-time analytics with smart partitioning and compaction strategies while simplifying security and performance tuning for streaming data lakes.",[48,15284,15285],{},"Dipankar Mazumdar from Onehouse.ai explored the concurrency challenge in streaming pipelines in his session, \"High-Throughput Streaming in Lakehouse with Non-Blocking Concurrency Control (NBCC)\". He demonstrated how NBCC in Apache Hudi eliminates write conflicts by enabling simultaneous ingestion across multiple writers. “Non-blocking concurrency control delivers zero write conflicts and consistent reads—all while Flink keeps every writer at full speed,” he explained, marking a significant advancement over traditional Optimistic Concurrency Control models.",[48,15287,15288],{},"Together, these sessions highlight how the streaming lakehouse has matured from a theoretical goal to a production-ready design pattern. Whether optimizing for cost, throughput, or developer simplicity, the future of data platforms lies in unifying batch and stream—delivered through an ecosystem of interoperable, cloud-native technologies.",[40,15290,15292],{"id":15291},"user-spotlights-real-world-transformation","User Spotlights: Real-World Transformation",[48,15294,15295],{},"Real-world use cases at the summit demonstrated how data streaming transforms industries:",[321,15297,15298,15301,15304],{},[324,15299,15300],{},"Netflix processes over 14 trillion records daily via Kafka and Flink, using a Data Mesh architecture. “We handle up to 100 million events per second,” said Sujay Jain, enabling real-time recommendations and game analytics.",[324,15302,15303],{},"A European bank achieved 4x faster performance and 30% cost savings by tuning Flink SQL for lower state usage and optimized memory. “The checkpoint time dropped 60%,” shared Zafar.",[324,15305,15306],{},"Attentive, a messaging platform, overcame distributed locking challenges using Pulsar’s Key Shared subscription. “We sent 620 million messages on Black Friday—without issues,” said Staff Engineer Danish Rehman.",[48,15308,15309],{},"These stories validate the summit’s themes: scalability, elasticity, and real-time intelligence are not theoretical—they're achievable today.",[40,15311,15313],{"id":15312},"technical-innovations-shaping-the-future","Technical Innovations Shaping the Future",[48,15315,15316],{},"Several technical breakthroughs stood out:",[321,15318,15319,15322,15325,15328,15331,15334],{},[324,15320,15321],{},"Fluss eliminates Kafka’s need for compaction by using columnar storage and supporting updates and deletes—ideal for lakehouse-native streaming with real-time and historical query unification.",[324,15323,15324],{},"Snowpipe Streaming + Kafka Connect accelerates data pipelines with near-zero latency and lower cloud spend by removing intermediate storage and simplifying schema evolution.",[324,15326,15327],{},"StreamQoS introduces cross-protocol QoS negotiation for messaging systems like Kafka, Pulsar, and RabbitMQ, allowing SLAs to be enforced dynamically via open metadata standards.",[324,15329,15330],{},"Ursa implements Kafka topic compaction on S3, optimizing for durability and cost. By using minor and major compactions entirely on cloud object storage, consumers can reconstruct state efficiently without broker disks.",[324,15332,15333],{},"PuppyGraph + Ursa enables real-time graph analytics on data lakes, eliminating the need for dedicated graph databases. Streaming data can be queried using Gremlin or openCypher directly over Iceberg tables—ideal for cybersecurity and observability use cases.",[324,15335,15336],{},"Oxia provides a cloud-native alternative to Zookeeper, offering scalable metadata and index storage with a sharded architecture and stateless coordination. It supports real-time workloads while minimizing latency and operational overhead",[48,15338,15339],{},"These innovations signal a future where data infrastructure is modular, intelligent, and optimized for continuous learning.",[40,15341,15343],{"id":15342},"looking-ahead-shaping-the-intelligent-data-backbone","Looking Ahead: Shaping the Intelligent Data Backbone",[48,15345,15346],{},"The summit made one thing clear: the age of Agentic AI is here, and real-time data is its backbone. Organizations that embrace open-source innovation, unify their data processing with streaming lakehouses, and build for intelligent agents will lead the next decade.",[48,15348,15349],{},"As we stand at the edge of this transformation, the invitation is clear: join the movement. Build systems that are open, intelligent, and always in motion.",[48,15351,15352],{},"Explore more from the Data Streaming Summit:",[321,15354,15355,15361,15367],{},[324,15356,15357,15358],{},"📺 ",[55,15359,15360],{"href":6141},"Watch all session recordings",[324,15362,15363,15364],{},"🤖 ",[55,15365,15366],{"href":10293},"Check out our Agentic AI blog series",[324,15368,15369,15370,15375],{},"💡 ",[55,15371,15374],{"href":15372,"rel":15373},"https:\u002F\u002Fsessionize.com\u002Fdata-streaming-summit-sf-2025\u002F",[264],"Submit your talk"," to the upcoming Data Streaming Summit happening on Sept 29 - 30 in San Francisco.",[48,15377,15378],{},"The agentic evolution is underway – join us in building the intelligent, real-time future!",{"title":18,"searchDepth":19,"depth":19,"links":15380},[15381,15382,15383,15384,15385,15386,15387,15388],{"id":15161,"depth":19,"text":15162},{"id":15183,"depth":19,"text":15184},{"id":15202,"depth":19,"text":15203},{"id":15238,"depth":19,"text":15239},{"id":15266,"depth":19,"text":15267},{"id":15291,"depth":19,"text":15292},{"id":15312,"depth":19,"text":15313},{"id":15342,"depth":19,"text":15343},"2025-06-06","The Data Streaming Summit Virtual 2025 showcased the rise of Agentic AI, transforming systems into dynamic, real-time agents. Over two days, leaders explored Agentic AI, open-source innovations, and the Streaming Lakehouse. Keynotes highlighted advances like the Ursa Engine and Apache Flink 2.0, focusing on real-time intelligence and efficiency. Real-world stories demonstrated industry impacts, while sessions introduced breakthroughs in data streaming. The summit emphasized the importance of embracing open-source, real-time data, and intelligent agents for future success.","\u002Fimgs\u002Fblogs\u002F684119a631b33a39e3d44adc_DSSV25-social-media-v2.0-1.png",{},{"title":15154,"description":15390},"blog\u002Fdata-streaming-summit-virtual-2025-recap",[5376,3988],"wM2s6XXfqe6yy529KciOSugm7_Pyc9Eq1eZhgsnY2v8",{"id":15398,"title":15399,"authors":15400,"body":15401,"category":6415,"createdAt":290,"date":15592,"description":15593,"extension":8,"featured":294,"image":15594,"isDraft":294,"link":290,"meta":15595,"navigation":7,"order":296,"path":15221,"readingTime":5505,"relatedResources":290,"seo":15596,"stem":15597,"tags":15598,"__hash__":15599},"blogs\u002Fblog\u002Fintroducing-the-streamnative-agent-engine.md","Introducing the StreamNative Agent Engine (Early Access): Your Intelligent Event Backbone for Enterprise-Scale AI Agents",[810,6500,806],{"type":15,"value":15402,"toc":15584},[15403,15406,15410,15418,15421,15425,15428,15431,15436,15439,15456,15459,15463,15466,15474,15479,15500,15503,15507,15510,15513,15516,15519,15522,15525,15533,15536,15539,15543,15546,15551,15554,15557,15560,15564,15573,15581],[48,15404,15405],{},"Real-time AI agents have captured our imaginations – from autonomous customer support bots to supply chain optimizers that adapt on the fly. The promise is huge: AI systems that can observe, reason, and act continuously on live data, without human prompts at every step. Yet building these intelligent agents in production has been an uphill battle. Many teams experimenting with agent frameworks find themselves hitting walls when moving from demos to real-world systems. Why? The infrastructure just isn’t there – data is siloed, integrations are brittle, and operations get overwhelming. It’s a pain point and an opportunity: those who solve it will unlock the next generation of AI-driven applications.",[40,15407,15409],{"id":15408},"the-challenge-fragmented-data-fragile-pipelines-and-high-operational-cost","The Challenge: Fragmented Data, Fragile Pipelines, and High Operational Cost",[48,15411,15412,15413,15417],{},"Today’s AI agents are often confined to isolated pockets, lacking a unified source of truth or a reliable way to work together. Consider a typical enterprise setup: one agent might be a chatbot fine-tuned on support tickets, another a script making API calls for analytics – each is an island. This fragmentation means no shared memory or context. ",[55,15414,15416],{"href":15415},"\u002Fblog\u002Fai-agents-real-time-data-bridge#:~:text=bad%20old%20days%20of%20applications,without%20fixing%20these%20silos%2C%20it","Agents operate on stale snapshots of data or their own narrow knowledge base, leading to redundant efforts and missed insights",". To make matters worse, connecting agents to fresh data streams or third-party tools means complex custom integrations – glue code, custom connectors, CLI “babysitting” – which become fragile pipelines that break with any change. It’s not uncommon to spend more time managing these data plumbing and orchestration scripts than developing the agent’s logic.",[48,15419,15420],{},"The operational burden of agent systems today is high. Each agent (or chain of agents) often runs in its own siloed process, with its own scheduling and error handling. Observability is minimal – when something goes wrong or an agent makes an odd decision, tracing back the why is incredibly difficult. Every agent maintains its own opaque state, making it “painful to reproduce decisions, satisfy compliance reviews, or debug issues across the fleet”. Lack of auditing and centralized monitoring isn’t just inconvenient – it’s risky in enterprise environments. All these challenges result in slow rollouts for any organization trying to leverage advanced AI agents. In short, the vision of autonomous, real-time AI collides with the reality of brittle infrastructure and siloed intelligence.",[40,15422,15424],{"id":15423},"a-streaming-native-solution-streamnative-agent-engine","A Streaming-Native Solution: StreamNative Agent Engine",[48,15426,15427],{},"It’s clear that a new approach is needed – one that treats real-time data as a first-class citizen and provides robust infrastructure for always-on AI agents. Today, we’re excited to introduce StreamNative Agent Engine, an event-driven, streaming-native runtime for deploying, managing, and coordinating AI agents at scale. In a nutshell, StreamNative Agent Engine is the missing backbone that takes you from “toy agent in a notebook” to production-grade autonomous services.",[48,15429,15430],{},"What makes it different? For starters, the Agent Engine is built on the proven foundation of Apache Pulsar’s serverless compute framework - Pulsar Functions, but evolved specifically for AI agents in real-time environments. This means every agent deployed is effectively a lightweight function that can ingest and emit events on a shared bus. Under the hood, we’ve repurposed this battle-tested streaming engine to handle long-lived AI agent workloads. Use the agent SDK you already know—LangChain, LlamaIndex, CrewAI, or anything else—without rewriting a line of code. Just package the agent like a serverless function, deploy it, and it automatically joins the shared event bus and service registry. From the moment it goes live, the agent taps into streaming data, keeps its own state, and emits actions — all fully governed and observable by the platform.",[48,15432,15433],{},[384,15434],{"alt":18,"src":15435},"\u002Fimgs\u002Fblogs\u002F68366c40d71596f214d73cad_AD_4nXdAqya4MABC1eHyMGuPTaK4_FTY_okkgBCp-oRagXF8wV0z4rlT7cgW27LL6sUL6VlQ9NUBcEEOKHvwsALXOTexfBSrb47qPDd_WMmQzdmiB_RmEv2jlPGY8ZhOv4zgUBvCdeeO.png",[48,15437,15438],{},"Crucially, StreamNative Agent Engine was designed to address the very pain points that have hampered agent projects in the past:",[321,15440,15441,15444,15447,15450,15453],{},[324,15442,15443],{},"Unified Event Bus for Context: All agents connect to event streams rather than operating in silos. This event bus acts as a “nervous system” linking your agents. An agent no longer has to poll for updates or work with stale data dumps – it can react to events (sensor readings, user actions, database updates, etc.) the instant they occur. The event bus provides up-to-the-moment context to every agent and also serves as a medium for agents to communicate with each other in real time. This dramatically reduces fragmentation and duplicated efforts, as agents can share facts and state through events.",[324,15445,15446],{},"Streaming Memory and State: Each agent in the Engine can have its own persistent state (backed by Pulsar Functions’ distributed state), allowing it to maintain memory beyond a single prompt\u002Fresponse cycle. Because the state is distributed and streaming-native, an agent’s observations or intermediate conclusions can be logged as events and stored for later recall. No more opaque black boxes – an agent’s “memory” can be externalized and even inspected or audited when needed. This design tackles the observability issue: you get a traceable event log of agent decisions and the data that informed them.",[324,15448,15449],{},"Fault-Tolerant, Scalable Architecture: By leveraging existing data streaming infrastructure, the Agent Engine inherently supports horizontal scaling, load balancing, and fault tolerance. Agents are distributed across the cluster (no single choke point) and can be scaled out to handle higher event volumes or compute needs. If one instance fails, the system can restart it or shift work to others – preventing the “single point of failure” scenario where one crashed agent script brings down an entire workflow. The architecture is cloud-native and battle-tested, so you don’t have to reinvent reliability for your AI logic.",[324,15451,15452],{},"Dynamic Composition vs. Monoliths: Traditional agent frameworks often produce a monolithic chain-of-thought – one big Python “main” function that orchestrates all steps, making it hard to reuse or modify parts. In contrast, StreamNative Agent Engine encourages a decomposed, modular approach. Complex tasks can be broken into multiple smaller agent functions that publish and subscribe to events from each other. Execution flows become dynamic and determined at runtime by events and conditions, not a fixed hardcoded sequence. This not only improves flexibility (agents can decide to invoke different tools or sub-agents based on live data), but also means pieces of the workflow can evolve independently. You can add or update one agent service without touching the others, akin to microservices architecture – bringing software best-practices to AI orchestration.",[324,15454,15455],{},"Observability and Governance Built-In: Because all interactions happen via an event bus and standard protocols (Kafka or Pulsar), it’s far easier to monitor and govern agent behaviors. StreamNative Agent Engine provides hooks for logging, tracing, and monitoring agent events, so you can see which events triggered which actions, how long steps took, and where any hiccups occurred. The Agent Registry offers a bird’s-eye view of all your deployed agents (and even connectors and functions) in one place. Want to pause an agent, roll out an update, or check its audit log? It’s all centrally managed. This level of observability and control is critical for enterprises to trust autonomous agents in production.",[48,15457,15458],{},"In short, StreamNative Agent Engine addresses the key needs for operationalizing AI agents: a real-time data backbone, a robust execution environment, and management tooling for visibility and control. It turns the idea of “AI agents living in the stream” into a practical reality.",[40,15460,15462],{"id":15461},"key-features-and-highlights","Key Features and Highlights",[48,15464,15465],{},"Let’s break down some of the standout features of the Agent Engine Early Access release:",[321,15467,15468,15471],{},[324,15469,15470],{},"🚀 Streaming-Native Runtime: The engine treats stream data as the default I\u002FO. Agents subscribe to Pulsar or Kafka topics for their inputs and can publish outputs or intermediate results to topics. This event-driven model means agents are always on, processing events as they arrive, rather than only responding to direct calls. They can also trigger one another by emitting events. The result is a highly reactive system of agents, perfect for scenarios where data never sleeps.",[324,15472,15473],{},"🗄 Agent & Function Registry: All your agents, along with any supporting components (like Kafka\u002FPulsar connectors or Pulsar functions), are registered in a unified registry. This means every agent is discoverable by name and type, and you can manage them collectively. The registry is essentially a directory of your AI services – the “brains” (agents), “tools” (functions\u002Fconnectors), and their metadata. Agents can look up other agents or tools via the registry, enabling dynamic coordination (for example, an “orchestrator” agent could find and invoke a specific expert agent for a task). For platform teams, the registry offers a single control plane to govern versions, dependencies, and access control for these AI components.",[48,15475,15476],{},[384,15477],{"alt":18,"src":15478},"\u002Fimgs\u002Fblogs\u002F68366c40259eed2a4272e94c_AD_4nXdus7T6ceLhzrL8Fa4BrVou9hTgZcaYIJLNhm1p3V3vpgN_3kTigZX3OvUeFazAx4FY683qiNUyz-baln6HwB1sNIhz_wuhJeQYVNs_-21dPiGP6VHiaqBa6nfC7x45uJGoNut9sA.png",[321,15480,15481,15484,15497],{},[324,15482,15483],{},"🏗 Integration with Any Python Agent Framework: We built the Agent Engine to be framework-agnostic. It’s not here to replace great libraries like LangChain or Haystack, nor does it force you into a proprietary SDK. Instead, bring your existing agent code – whether it’s written with LangChain, LlamaIndex, the Google Cloud Agent Toolkit (ADK), OpenAI’s Agent SDK, or just vanilla Python – and run it within the Engine. Your agents still use their familiar planning\u002Freasoning libraries; the Engine takes care of the deployment, scaling, and event plumbing. This “bring-your-own-framework” approach means you can invest in agent logic without worrying about how to operationalize it later. In fact, our runtime can orchestrate agents built on different frameworks side by side – giving you the freedom to choose the right tool for each job.",[324,15485,15486,15487,15491,15492,15496],{},"🛠 Functions & Tools via MCP: StreamNative Agent Engine embraces the Model Context Protocol (MCP) – an open standard (initially introduced by Anthropic) for ",[55,15488,15490],{"href":15489},"\u002Fblog\u002Fintroducing-the-streamnative-mcp-server-connecting-streaming-data-to-ai-agents#:~:text=In%20the%20last%20blog%2C%20we,in%20a%20universal%2C%20consistent%20way","connecting AI agents to external tools and data in a safe, uniform way",". In practice, this means an agent can use “tools” (like databases, web services, or even Cloud APIs) through a standardized interface, treating them almost like extensions of the model’s capabilities. With MCP support, our Engine allows agents to, for example, read from a live data stream, call a REST API, or even ",[55,15493,15495],{"href":15494},"\u002Fblog\u002Fintroducing-the-streamnative-mcp-server-connecting-streaming-data-to-ai-agents#:~:text=Today%2C%20we%E2%80%99re%20thrilled%20to%20unveil,without%20wrestling%20with%20complex%20commands","manage a Pulsar cluster via natural language commands"," – all through a common protocol. MCP essentially provides a universal adapter for tools, so you don’t have to custom-code each integration. It’s a key part of making agents operational in real environments, where they must safely interact with the outside world. We’ve integrated MCP compatibility into the Engine, so if your agent framework or client supports MCP (many are adopting it), it works out-of-the-box. This is one more example of how we’re not reinventing the wheel, but rather adopting open standards to accelerate the ecosystem.",[324,15498,15499],{},"☁️ BYOC Deployment: The Early Access release is available on a Bring-Your-Own-Cloud (BYOC) basis. This means you can run StreamNative Agent Engine in your own cloud environment (AWS, GCP, Azure, etc.) while StreamNative manages it for you. You get the benefits of cloud-native deployment – data locality, security controls, and integration with your existing cloud resources – without the headache of running the infrastructure yourself. The Engine runs on StreamNative Cloud’s managed data streaming service under the hood, delivered in your cloud account. This flexibility is ideal for enterprises with strict compliance or those who simply want to avoid data egress – your agents and data stay within your walls. BYOC also means you’re not tied to a single cloud or region; the same agent runtime can be deployed wherever your data streams live.",[48,15501,15502],{},"These features (and more) collectively turn the Agent Engine into a powerful platform for real-time AI. Importantly, none of this replaces your existing AI investments – it empowers them with real-time capabilities. You can think of StreamNative Agent Engine as the infrastructure layer that has been missing for agentic AI systems: akin to what Kubernetes did for microservice apps, we aim to do for AI agents. We handle the hard parts of running always-on, distributed, event-driven agents so you can focus on the logic and outcomes.",[40,15504,15506],{"id":15505},"data-streaming-ai-agents-in-action-the-fast-path-smart-path-pattern-for-fraud-detection","Data Streaming + AI Agents in Action: The Fast Path \u002F Smart Path Pattern for Fraud Detection",[48,15508,15509],{},"To demonstrate how StreamNative Agent Engine integrates deterministic data streaming with sophisticated agentic reasoning into a unified event-driven system, let's explore a real-time fraud detection scenario. By combining these two distinct workflows—deterministic, rule-based streaming (Fast Path) and advanced, statistical agentic analysis (Smart Path)—the Agent Engine efficiently balances speed with intelligent decision-making.",[48,15511,15512],{},"In the Fast Path, transactions undergo rapid, deterministic evaluation using streaming data and Pulsar Functions. Designed to swiftly manage straightforward, low-risk transactions, this path instantly approves or rejects transactions within milliseconds based on clear rules, such as transaction amount or geographic anomalies. For example, the RapidGuard agent processes incoming transaction data streams, quickly flagging suspicious transactions that clearly violate preset criteria or confidently approving safe ones.",[48,15514,15515],{},"In contrast, the Smart Path employs a statistical, lower-frequency approach to handle complex or ambiguous transactions. Leveraging advanced LLM-powered reasoning integrated through the Model Context Protocol (MCP), transactions escalated from the Fast Path receive deep, contextual analysis. The InsightDetect agent exemplifies this path, performing nuanced assessments by consulting enriched transaction histories, external fraud databases, and current fraud trends. Following this comprehensive analysis, InsightDetect issues a well-informed decision back into the event stream.",[48,15517,15518],{},"Because both deterministic and statistical workflows operate seamlessly on the same unified event bus, RapidGuard and InsightDetect continuously exchange real-time insights and decisions. RapidGuard benefits from InsightDetect’s deeper contextual understanding, reducing false positives and ensuring legitimate high-value transactions aren't incorrectly flagged. InsightDetect, in turn, adapts its evaluation strategies based on immediate patterns identified by RapidGuard.",[48,15520,15521],{},"This integrated, autonomous interaction between streaming data and agentic reasoning ensures high-throughput, low-latency processing while maintaining sophisticated, context-aware fraud detection capabilities. Organizations leveraging this Fast Path \u002F Smart Path pattern achieve robust fraud prevention, enhanced customer experiences, and operational efficiency.",[48,15523,15524],{},"Importantly, this is just one example of combining deterministic data streaming with statistical agentic reasoning within an event-driven architecture. Numerous other patterns and scenarios exist, such as:",[321,15526,15527,15530],{},[324,15528,15529],{},"Content Moderation: Fast Path for rapid filtering, Smart Path for nuanced human-like assessments.",[324,15531,15532],{},"Industrial IoT: Fast Path for immediate equipment adjustments, Smart Path for predictive analytics and proactive maintenance.",[48,15534,15535],{},"With StreamNative Agent Engine orchestrating these complementary paths, organizations can seamlessly integrate fast, deterministic operations with deep, intelligent reasoning across diverse use cases.",[48,15537,15538],{},"During the keynote presentation at Data Streaming Summit Virtual 2025, we have also demoed how we implement autonomous incident handling using StreamNative Agent Engine. You can also check out this demo at StreamNative’s YouTube channel.",[40,15540,15542],{"id":15541},"from-single-agents-to-an-agentmesh-the-future-of-autonomous-systems","From Single Agents to an AgentMesh: The Future of Autonomous Systems",[48,15544,15545],{},"The early access of StreamNative Agent Engine is more than just a product launch – it’s a step toward a new paradigm of software architecture. We believe the future is event-driven and autonomous, where instead of monolithic agents or isolated AI agents, you have a network of intelligent agents working in concert. This network is what we call an AgentMesh: a distributed, discoverable, and governable mesh of agents spanning an organization.",[48,15547,15548],{},[384,15549],{"alt":18,"src":15550},"\u002Fimgs\u002Fblogs\u002F68366c40b4d6e7a6a91006d6_AD_4nXeEwl0pyAyiItrIz-S0uy8aCYaCvcdGyyou4WHgs3dSn4g4_7aX827A1hYQh7fXfP1FA53r0O5VPSY4cu2a9MRPOFIs-y8sbe5ChHjbVv7eyl7Bx3Ksz7MQqmWUrweCpcfRxEGU.png",[48,15552,15553],{},"What does an AgentMesh look like? Much like a service mesh in microservices, an AgentMesh provides a structured way for many independent agents (each with a specialized role or expertise) to communicate and collaborate. Thanks to the Agent Engine’s shared event bus and registry, every agent knows how to find others and how to talk to them (via events or tool calls), and every interaction can be managed and secured. You might have dozens or hundreds of agents – some focused on customer data, some on internal IT tasks, some on external market signals – all coordinating through the platform. New agents can join the mesh and start contributing immediately, and retired ones can be removed without disruption. The mesh is self-organizing to an extent, but it’s not a free-for-all: because it’s built on a solid infrastructure, you have central governance – you can enforce policies (like data access rules, rate limits, compliance checks) across all agents uniformly.",[48,15555,15556],{},"We’re already seeing the need for this as AI projects mature. A year ago, teams were building single chatbots or proof-of-concept agents. Today, it’s common to see multiple AI services interacting – a scheduling agent handing off to a pricing agent, an HR screening agent collaborating with a legal-check agent, etc. Without an AgentMesh approach, you end up with “agents in silos” again, or ad-hoc integrations that crumble at scale. StreamNative Agent Engine lays the foundation for an AgentMesh by providing the core runtime and communication layer for these agents. By deploying your agents on the Engine, you’re essentially future-proofing your architecture for that scale-out. It moves you from “one clever agent” to “an army of cooperative agents”.",[48,15558,15559],{},"Most excitingly, this opens the door to applications that were previously too complex to reliably implement. When agents can maintain long-lived context, respond instantly to new data, and coordinate actions, you get systems that are dynamic, collaborative, and intelligent by design. Imagine a disaster response system where dozens of AI agents – for weather, logistics, medical resources, communication – continuously exchange information and adjust their plans in real time. Or a financial portfolio management suite where specialized agents (one per asset class, for example) negotiate with each other to rebalance in milliseconds as markets move. These are the kinds of autonomous, event-driven applications the Agent Engine is built to enable. We’re only at the beginning, but the trajectory is clear: from standalone AI components to immersive, always-on agent ecosystems.",[40,15561,15563],{"id":15562},"join-the-early-access-program-build-with-us","Join the Early Access Program – Build with Us",[48,15565,15566,15567,15572],{},"We invite developers, architects, platform engineers, and technical leaders to join us in this journey by participating in the ",[55,15568,15571],{"href":15569,"rel":15570},"https:\u002F\u002Fhs.streamnative.io\u002Fearly-access-program-for-streamnative",[264],"StreamNative Agent Engine Early Access Program",". This is your chance to get hands-on with the technology and help shape its evolution. As an early access user, you’ll be able to deploy and experiment with the Agent Engine in your own environment, with direct support from our engineering team and a direct line to provide feedback. We’re looking to collaborate closely with our early users – your input will directly influence the product so it best meets your real-world needs.",[48,15574,15575,15576,15580],{},"How to get involved? Visit our ",[55,15577,15579],{"href":15569,"rel":15578},[264],"Early Access page"," and sign up – it’s free to apply, and we’ll onboard teams gradually to ensure everyone gets the attention and resources they need. Once you’re in, you’ll receive documentation and guidance to deploy your first agents on the platform. Our team will be available for questions, troubleshooting, and brainstorming on your specific use cases. You’ll also receive exclusive updates on new features and the product roadmap as we march toward general availability.",[48,15582,15583],{},"This is more than just trying out a new feature – it’s an opportunity to co-create the future of autonomous intelligent systems. We believe that the move from static data pipelines to streaming AI agents is a transformative shift, one that will redefine how software and services are built in the coming years. By joining the early access, you’ll be at the forefront of that shift. Help us refine the Agent Engine, explore novel use cases, and develop best practices for this emerging space. Together, we can accelerate the arrival of the AgentMesh era – where AI agents become as ubiquitous and interoperable as microservices are today.",{"title":18,"searchDepth":19,"depth":19,"links":15585},[15586,15587,15588,15589,15590,15591],{"id":15408,"depth":19,"text":15409},{"id":15423,"depth":19,"text":15424},{"id":15461,"depth":19,"text":15462},{"id":15505,"depth":19,"text":15506},{"id":15541,"depth":19,"text":15542},{"id":15562,"depth":19,"text":15563},"2025-05-28","Deploy, scale, and govern autonomous AI agents on a unified event bus. Discover how StreamNative Agent Engine brings real-time intelligence to enterprise workloads.","\u002Fimgs\u002Fblogs\u002F6837159975eea474670f2c03_AI-Agent_early-access_simple-2.png",{},{"title":15399,"description":15593},"blog\u002Fintroducing-the-streamnative-agent-engine",[3988,821],"aR0i6kGgbPIVEEV6mD2K-vDpb4CP-G-GQVnHCN-5DM4",{"id":15601,"title":15602,"authors":15603,"body":15604,"category":6415,"createdAt":290,"date":15768,"description":15769,"extension":8,"featured":294,"image":15770,"isDraft":294,"link":290,"meta":15771,"navigation":7,"order":296,"path":15772,"readingTime":298,"relatedResources":290,"seo":15773,"stem":15774,"tags":15775,"__hash__":15776},"blogs\u002Fblog\u002Fdata-streaming-to-agentic-ai.md","From Data Streaming to Agentic AI: The Evolution of Processing",[806],{"type":15,"value":15605,"toc":15761},[15606,15629,15633,15636,15656,15660,15663,15683,15686,15690,15693,15704,15709,15712,15716,15719,15736,15739,15741,15744,15747],[48,15607,2609,15608,15612,15613,15617,15618,15622,15623,15628],{},[55,15609,15611],{"href":15610},"\u002Fblog\u002Fai-agents-real-time-data-bridge","first post",", we highlighted how fragmented AI agents struggle to work together and argued that a real-time event bus is crucial for shared awareness and coordination. In the ",[55,15614,15616],{"href":15615},"\u002Fblog\u002Fopen-standards-real-time-ai-mcp","second post",", we explore the emerging ",[55,15619,3583],{"href":15620,"rel":15621},"https:\u002F\u002Fmodelcontextprotocol.io\u002Fintroduction",[264]," as a standard that gives AI agents structured access to tools and data. That leaves a missing piece in between – a runtime to execute these agents and orchestrate their logic in real time. In this 3rd post of the series, we'll journey through the evolution of data processing, from traditional batch jobs to real-time streaming and lightweight compute, and see how it culminates in AI Agents. Along the way, we'll outline what a platform for realtime AI agents needs (a shared event bus, an agent registry, and a runtime for agents) and how developers organically arrive at these requirements. Finally, we'll discuss why ",[55,15624,15627],{"href":15625,"rel":15626},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F4.0.x\u002Ffunctions-overview\u002F",[264],"Pulsar Functions"," is a natural starting point for building an agent runtime on real-time infrastructure.",[40,15630,15632],{"id":15631},"from-batch-to-streaming-to-agents-a-data-processing-evolution","From Batch to Streaming to Agents: A Data Processing Evolution",[48,15634,15635],{},"To understand why we need a new kind of runtime for AI agents, it helps to look at how data processing paradigms have evolved over time. Each stage in this evolution addressed new requirements for timeliness and interactivity:",[321,15637,15638,15641,15644,15653],{},[324,15639,15640],{},"Batch Jobs: The earliest big data processing was done in batches. Systems would accumulate data over hours or days, then run heavy jobs (think Hadoop MapReduce or daily ETL scripts) to process the data. This model is high-throughput but high-latency – results arrive only after the batch completes. Batch frameworks like MapReduce and early Spark were great for large-scale offline analytics but too slow for reacting to events in real time. In the context of AI, batch processing means your model or logic only updates periodically, which introduces delays and stale data. Agentic AI systems that need to act on up-to-the-moment information can’t afford to wait hours for the next batch run, however they can use the results generated by these batch jobs.",[324,15642,15643],{},"Real-Time Stream Processors: To reduce latency, stream processing frameworks emerged. Apache Storm (circa 2010), Apache Flink, Apache Spark Streaming, and others enabled continuous processing of events as they arrive, often with sub-second or milliseconds latency. These systems run long-lived jobs that ingest event streams (from message brokers like Kafka or Pulsar) and update results continuously. Streaming processors brought near-real-time responsiveness and complex event processing capabilities. Developers could write code to, say, detect fraud or update metrics on the fly instead of waiting for a batch. This was a huge step for reactive systems. However, standing up a Flink or Spark streaming job is still a heavy-weight effort – you need to manage clusters, write your logic in a specific framework API, and deploy it as a separate service. The logic is typically fixed (compiled code or queries) and scaling may require careful tuning. Still, this era proved that processing data in-motion leads to timely insights and actions.",[324,15645,15646,15647,15652],{},"Lightweight Streaming Compute: In recent years, we’ve seen a push toward simplifying stream processing and embedding it within the messaging layer itself. ",[55,15648,15651],{"href":15649,"rel":15650},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Ffunctions-overview\u002F",[264],"Apache Pulsar Functions"," is a prime example – instead of running a separate Flink cluster, you can write a small function that consumes from one or more Pulsar topics, processes each message, and publishes results to another topic. Pulsar Functions bring a serverless feel to streaming: you focus on the per-message logic, and the Pulsar cluster handles the rest (scaling, fault tolerance, routing). Similarly, technologies like Kafka Streams (a library for building stream processing in Java apps) and cloud serverless services (like AWS Lambda triggered by Kinesis or EventBridge events) make it easier to deploy event-driven microservices. This lightweight approach drastically lowers the barrier to processing streams – no separate clusters or complex APIs, just write a simple function. The trade-off is that these functions typically handle more focused tasks (filtering, transformations, simple aggregations) and you might deploy many of them for different purposes. Still, they set the stage for dynamic logic attached directly to data streams.",[324,15654,15655],{},"Agentic Compute: Now we arrive at the cutting edge. Instead of pre-defining every step of logic, what if the “compute” on the stream could perceive, reason, and act based on events? This is the idea of agentic compute: embedding AI agents into the stream processing. An AI agent (for example, powered by an LLM or other AI models) can subscribe to events, maintain some understanding of context, and decide on actions to take – possibly generating new events or calling external tools. This represents a major shift from traditional if\u002Felse code to more autonomous, adaptive behavior. In essence, each agent is like a microservice with a brain: it doesn’t just execute a static function; it can interpret events, make decisions, and coordinate with other agents. Agentic compute opens the door to systems that are autonomous and goal-driven, not just reactive pipelines. However, it also introduces new challenges: these agents need fast, fresh data (which streaming provides), and they need a way to share state or knowledge with each other. It’s clear that simply running one agent in isolation won’t unlock the full potential – we will need a fleet of agents that work together, which brings new infrastructure requirements.",[40,15657,15659],{"id":15658},"developer-story-from-one-agent-to-an-ecosystem","Developer Story: From One Agent to an Ecosystem",[48,15661,15662],{},"To illustrate the need for an agentic platform, let’s walk through a scenario that a developer (let’s call her Alice) might experience:",[1666,15664,15665,15668,15671,15674,15677,15680],{},[324,15666,15667],{},"Building the First Agent: Alice creates an AI agent that monitors a stream of user activity events (clicks, page views, etc.) and looks for anomalies. The agent is powered by an LLM and some custom logic. It consumes events and, when it detects something unusual (say a sudden spike in errors), it can send an alert email. With just one agent, this is straightforward – she hardcodes it to listen to the event source and perform the action. It runs in a loop, reading events and reacting. Problem solved, right?",[324,15669,15670],{},"Adding Another Agent: Next, Alice builds a second agent that responds to these anomalies. This new agent’s job is to automatically create a Jira ticket whenever an anomaly alert is raised, including context about the issue. Now she has two agents: one detecting, one responding. But how should they communicate? Initially, she might be tempted to call Agent B (ticket creator) directly from Agent A’s code when an anomaly is found. That quickly becomes a tight coupling – Agent A needs to know about Agent B. Instead, Alice decides to use an event-driven approach: Agent A simply emits an “anomaly.alert” event when it finds an issue, and Agent B listens for that event and reacts by creating a ticket. This decouples the two agents. They don’t call each other directly; they communicate through an event bus (for example, a Pulsar or Kafka topic). This way, Agent A just announces what it found, and any interested agent can listen in.",[324,15672,15673],{},"Scaling Out with More Agents: Over time, Alice’s system grows. She introduces an Agent C that tries to diagnose the root cause of an anomaly by correlating it with recent deployments, an Agent D that notifies the on-call engineer via SMS, and perhaps an Agent E that attempts an automatic remediation if possible. All of these agents produce and consume events. For instance, Agent C might publish a “diagnosis” event that Agent E listens to before taking action. Very soon, the web of events becomes complex – in a good way (flexible), but also in a challenging way. Alice has essentially built a distributed system of AI agents, where events are the lingua franca connecting them. This loosely coupled design is exactly what we want for scalability and flexibility​. Each agent can do its job independently, and as long as they agree on event schemas and topics, new agents can join in without breaking others.",[324,15675,15676],{},"Discoverability and Coordination Challenges: With many agents in play, Alice encounters new questions. How does she keep track of what agents exist and what events they handle? If she adds a new agent into the system, how do others know about it or know that, say, two different agents are both handling the same type of event? This is where an Agent Registry starts to sound useful – a centralized (or distributed) directory where each agent can register itself (name, capabilities, event types it handles). Using a registry, Agent B could look up “Is there an agent that handles anomaly alerts? Ah yes, Agent A is the anomaly detector.” In practice the agents might not query the registry at runtime frequently, but the platform (or developers) uses it to manage the ecosystem. Alice also realizes that she needs to coordinate multi-step workflows: e.g., ensure Agent E (remediation) only runs after Agent C (diagnosis) has provided info. Rather than hard-coding those sequences inside the agents (which would reintroduce tight coupling), she wants the runtime to orchestrate these interactions. For example, one way is to have a workflow engine listening for events and invoking agents in order, but a more elegant way in an agentic system is to let the agents themselves carry state and conversations through events, which requires careful design of event protocols and perhaps a bit of higher-level orchestration logic.",[324,15678,15679],{},"Operational Considerations: As her agent system grows, Alice confronts operational issues: What if an agent crashes or runs slowly? We need monitoring and fault tolerance for agents similar to any microservice. What if an agent produces too many events and overwhelms others? We might need backpressure or rate limiting. Also, how to deploy and scale these agents? Running each as a separate process is an option, but could be heavy with dozens of agents. Alice wonders if they can be hosted in a common runtime that takes care of scaling (much like how serverless functions scale out automatically). All these concerns point toward the need for a more structured platform to manage agent execution.",[324,15681,15682],{},"Security and Auditability: As Alice’s fleet of agents grows and begins to touch customer data, production systems, and third-party tools, she quickly realizes that who did what, when, and with what permissions is no longer a nice-to-have—it’s critical. Each agent must authenticate to the event bus and external APIs with scoped, rotating credentials. Just as important, every action an agent takes (from reading an event to opening a Jira ticket or triggering a rollback) needs to be immutably logged with rich context: the exact input event, the LLM prompt\u002Fresponse pair (or model version), and the downstream side effects. These signed, tamper-evident logs become the basis for auditing, incident forensics, and compliance reporting. At scale, Alice wants the runtime to enforce least-privilege policies automatically (e.g., “only Agent E may modify deployment configs”) and to surface any deviations as security events on the same bus the agents use—closing the loop so other watchdog agents can respond. In short, an agentic platform must embed zero-trust principles and end-to-end observability from day one, or the very autonomy that makes agents powerful becomes a liability.",[48,15684,15685],{},"By the end of this journey, Alice has essentially re-discovered the requirements for an agentic platform. Her initial hacky solution grew into a complex network of intelligent components, and to keep it manageable she needs the same kind of support that past generations of compute had (like job schedulers for batch, resource managers for streaming jobs, etc.), adapted to AI agents.",[40,15687,15689],{"id":15688},"three-pillars-of-an-agentic-platform-runtime-event-bus-registry","Three Pillars of an Agentic Platform: Runtime, Event Bus, Registry",[48,15691,15692],{},"The story above highlights three essential components that a robust agentic platform should provide:",[321,15694,15695,15698,15701],{},[324,15696,15697],{},"Agent Execution Runtime: This is the engine that actually runs the agents’ code and orchestrates their execution. It’s analogous to a stream processing engine or an application server, but tailored for AI agents. The runtime should handle starting and stopping agents, scheduling them to handle incoming events, scaling them out (running multiple instances) if needed, and ensuring fault tolerance (if an agent instance crashes, restart it). The runtime is what keeps the whole system alive and responsive. It might also manage agent lifecycle concerns like state management (for agents that need to store context between events), version upgrades, and security isolation (running untrusted agent code safely). Orchestration is a key responsibility – not in the sense of a static pipeline, but ensuring that, for example, when an event comes in, the relevant agent(s) get invoked, possibly in parallel or in a certain order if there are dependencies. The runtime can also implement workflow logic if some agent interactions need to be coordinated beyond just pub\u002Fsub events. In summary, it’s the environment that hosts the agents and lets them do their jobs reliably.",[324,15699,15700],{},"Shared Event Bus: At the heart of the system is a high-throughput, low-latency event bus that all agents connect to. This is typically a publish\u002Fsubscribe messaging system (e.g., Apache Pulsar or Kafka topics) that delivers events to any agent that subscribes. The event bus decouples senders and receivers — agents produce events without knowing who will consume them, and agents consume events without tight coupling to producers. This loose coupling via events is what enables agents to coordinate and share context in real time​. The event bus should support persistent, replayable streams (for durability and to allow new agents to catch up on past events if needed), and it becomes the communication backbone for the agents.",[324,15702,15703],{},"Agent Registry: Just as microservice architectures often use service registries or API gateways, a multi-agent system benefits from an agent registry. This registry is a directory of all agents available in the system, along with metadata about each agent (its name\u002FID, what events or topics it listens to, what events it emits, maybe its purpose or health status). The agent registry allows both developers and the system itself to discover what agents exist. For example, a UI could query the registry to list all running agents. Or, if an agent wants to delegate a task, it could (programmatically) find if there's another agent capable of handling a certain event or query (though such dynamic lookup might be abstracted by the platform). The registry also helps avoid duplication and coordinate updates – if you deploy a new version of an agent, the registry gets updated. In Alice’s story, the registry was the missing piece to easily add new agents and have others be aware of them. Essentially, it provides shared knowledge of the agents.",[48,15705,15706],{},[384,15707],{"alt":18,"src":15708},"\u002Fimgs\u002Fblogs\u002F682e3f06ef712563fe6fda4d_AD_4nXdV8rPnMa39SFH_eos7sPBhfTxj0rN9uctlosA0TJzaEcJayATn1E7Wp5SirsGgq4oIeAms7v0f3yBgJJKbifONX9BR16N4JekOZZbSnbKVne3ysUgKAJlAzltAAQ7fgp3XoXHhTA.png",[48,15710,15711],{},"With these pieces in place, we solve the problems Alice encountered. Agents remain autonomous in their logic, but the platform provides connective tissue and governance. It’s worth noting that such an architecture aligns with broader trends in software: event-driven microservices, serverless computing, and now event-driven agents. In fact, others in the industry are converging on this idea. For example, recent discussions of “agent meshes” and event-driven agent systems echo the need for shared context and communication. In practice, an agentic platform could be built by stitching together existing tools – or, as we’ll discuss next, by extending an existing streaming system to natively support it.",[40,15713,15715],{"id":15714},"pulsar-functions-a-launchpad-for-agentic-runtime","Pulsar Functions – A Launchpad for Agentic Runtime",[48,15717,15718],{},"Now, how can we implement an agentic platform in reality? We have a strong hint from the evolution above: Apache Pulsar already offers two of the three components out-of-the-box (the event bus and a lightweight compute runtime). Pulsar’s messaging model provides the event bus, and Pulsar Functions serve as a built-in stream compute framework. By leveraging and extending Pulsar Functions, we can kickstart an Agent Engine and Agent Registry with relatively little new infrastructure. Let’s break down why Pulsar is a natural fit:",[321,15720,15721,15724,15727,15730,15733],{},[324,15722,15723],{},"Shared Event Bus: Pulsar is a cloud-native event streaming platform with a pub\u002Fsub message model. Pulsar topics can act as the channels through which agents communicate. In our architecture diagram, the gray bus could be implemented as a set of Pulsar topics (e.g., a topic per event type or a few topics for different categories of events). Pulsar’s design of decoupling producers and consumers, and allowing multiple subscriptions on a topic, fits perfectly for agents listening to the same stream. It also supports message retention and replay, which can be useful if an agent goes down and needs to catch up. In Part 1 of this series, we argued for exactly this: a real-time event stream as the backbone for AI agents. Pulsar gives us a proven, scalable backbone. StreamNative Cloud (including both Classic and Ursa engines) also natively support Kaka protocol which can be used for this backbone.",[324,15725,15726],{},"Lightweight Compute Runtime: Pulsar Functions are essentially functions-as-a-service running inside the Pulsar ecosystem​. You can deploy a snippet of code (Java, Python, Go) that subscribes to a topic, processes incoming messages, and publishes results to another topic. Under the hood, Pulsar’s Function Worker processes execute these functions and manage their lifecycle. We can view each AI agent as a more sophisticated Pulsar Function: instead of a simple transformation, its “processing logic” could involve prompting an LLM, doing some reasoning, and then emitting new events. The great thing is the scaffolding needed to run an agent is very similar to running a Pulsar Function – message in, do work, message out. Pulsar Functions already handle scaling (you can configure parallelism, and they’ll run on multiple nodes\u002Fthreads as needed) and fault tolerance (failed function instances can be restarted, etc.). By using Pulsar Functions as the basis, an Agent Engine can inherit these capabilities rather than starting from scratch. The function runtime would need to be enhanced with AI-specific context handling, but the core event-driven execution model is there.",[324,15728,15729],{},"Toward an Agent Registry: While Pulsar Functions today mainly focus on running code, the Pulsar Functions Worker service maintains metadata on all deployed functions (name, namespace, etc.). By augmenting it with agent‑specific attributes (capabilities, descriptions) and exposing it through MCP, the Functions Worker service can evolve into a full Agent Registry. Because Pulsar’s management API already treats connectors and functions as registry objects, every newly deployed function\u002Fagent could auto‑register, instantly appearing in a searchable directory of MCP tools. Other agents can then discover and invoke these tools at runtime, transforming the cluster into a dynamic, self‑describing agent ecosystem.",[324,15731,15732],{},"Integrating External Tools via MCP: Another advantage of building on Pulsar is the ease of integrating with external systems. In Part 2 of this series, we introduce the Model Context Protocol (MCP) – an open standard that allows AI agents to access tools and data through a uniform interface. If our Agent Engine is running on Pulsar, each agent can include an MCP client or server as needed, and we can expose agents or other Pulsar Functions as MCP endpoints. In other words, the Pulsar Function could serve as a bridge between the event world and the tool APIs. MCP essentially standardizes how an agent might, for example, call a vector database or fetch information from a SaaS app. By supporting MCP within the runtime, agents can use tools or act as tools themselves in a structured way. Pulsar’s plugin architecture and function APIs could let us plug in this capability (for instance, giving functions access to an MCP context object).",[324,15734,15735],{},"Unified Platform for Deterministic Workflows and Statistical Agents: Perhaps the most compelling reason to use Pulsar as the foundation is that many organizations already use Pulsar (or Kafka) as their central event bus for microservices and data streams. By extending that same platform to also host AI agent logic, we remove the need for a separate specialized agent orchestration system. Your AI agents become just another part of your real-time data infrastructure. They can tap into the same streams that feed your analytics and react immediately. This convergence of data and agents in one platform aligns with the idea of real-time AI. It also means ops teams have fewer systems to manage – the existing Pulsar ops (monitoring topics, throughput, etc.) now also covers agent execution metrics.",[48,15737,15738],{},"In short, we have outlined a solution for how enterprises can repurpose a battle-tested lightweight streaming compute framework (like Pulsar Function) to serve a new role in the age of AI. The agentic runtime will allow developers to deploy AI agents as easily as they deploy serverless functions, and have those agents automatically join a shared event bus and a registry of services. Each agent can then perceive events, reason (with the help of models and context), and act by emitting new events or invoking tools, all governed by the platform.",[40,15740,9609],{"id":9608},[48,15742,15743],{},"The progression from batch jobs to real-time streams to AI agents is a story of increasing immediacy and intelligence in our data systems. We started with periodic processing of static data, moved to continuous processing of streaming data, and now we’re enabling continuous reasoning and decision-making on streaming data. Building agentic AI systems on a real-time infrastructure requires rethinking the runtime environment – it’s not just about running code faster, it’s about hosting autonomous services that learn and interact. By providing a shared event bus, standard protocols like MCP for tool access, and an agent-oriented runtime, we can unlock a new class of applications that are dynamic, collaborative, and intelligent by design.",[48,15745,15746],{},"StreamNative is actively working on making this vision a reality. In the future posts, we will share more updates about how we do it to avoid reinventing the wheel. If you’re excited about the idea of AI agents seamlessly integrated with streaming data, stay tuned.",[48,15748,15749,15750,15755,15756,15760],{},"Call to Action: To learn more about this emerging technology and see it in action, ",[55,15751,15754],{"href":15752,"rel":15753},"https:\u002F\u002Fevents.zoom.us\u002Fev\u002FArZEA9V8FhVrzMLieLTMnL4oohWqqWcpxt7WFLBlU-dsVcDyERIt~AvnlZ_jyh3pjdqG0FIi3vw9JMBWfKgFXX2C9XMuuOeiv_8rg6_kecDPddg",[264],"join us"," at the ",[55,15757,5383],{"href":15758,"rel":15759},"https:\u002F\u002Fdatastreaming-summit.org\u002Fevent\u002Fdata-streaming-virtual-2025",[264]," on May 28 - 29, 2025. It’s a chance to dive into the technical details, ask questions, and envision how your engineering team can build the next generation of real-time AI systems. Register for the summit, and be part of the conversation on the future of data streaming and agentic AI!",{"title":18,"searchDepth":19,"depth":19,"links":15762},[15763,15764,15765,15766,15767],{"id":15631,"depth":19,"text":15632},{"id":15658,"depth":19,"text":15659},{"id":15688,"depth":19,"text":15689},{"id":15714,"depth":19,"text":15715},{"id":9608,"depth":19,"text":9609},"2025-05-21","Trace the journey from batch and data streaming to Agentic AI, and learn why Pulsar Functions + MCP power a real-time AI agent runtime.","\u002Fimgs\u002Fblogs\u002F682e6c14914b13dac8731b77_image-125.png",{},"\u002Fblog\u002Fdata-streaming-to-agentic-ai",{"title":15602,"description":15769},"blog\u002Fdata-streaming-to-agentic-ai",[3988,10054,821,1331],"yHn6CRNMqg0mRKJzT3Y_PqhxgjoR8ow-ONxJn0AxCMw",{"id":15778,"title":15779,"authors":15780,"body":15781,"category":6415,"createdAt":290,"date":16191,"description":16192,"extension":8,"featured":294,"image":16193,"isDraft":294,"link":290,"meta":16194,"navigation":7,"order":296,"path":16195,"readingTime":16196,"relatedResources":290,"seo":16197,"stem":16198,"tags":16199,"__hash__":16200},"blogs\u002Fblog\u002Fannouncing-one-cli.md","Introducing snctl 1.0: Your One-Stop CLI for All StreamNative Interactions",[810,6500],{"type":15,"value":15782,"toc":16172},[15783,15786,15790,15793,15807,15810,15814,15817,15828,15831,15835,15844,15848,15856,15860,15867,15870,15875,15878,15883,15886,15889,15893,15896,15951,15954,15958,15961,15966,15971,15976,15981,15985,15995,15999,16013,16017,16020,16040,16044,16047,16052,16059,16070,16074,16077,16088,16090,16093,16107,16120,16123,16125,16128,16132,16135,16166,16169],[48,15784,15785],{},"We are thrilled to announce the v1.0 version of snctl, a unified command-line interface (CLI) designed to simplify and streamline your interactions with Apache Pulsar, Apache Kafka, and the entire StreamNative ecosystem. Whether you’re working with Pulsar protocol, Kafka protocol (including Ursa clusters), or the comprehensive range of StreamNative Cloud resources, snctl consolidates it all into one convenient place.",[40,15787,15789],{"id":15788},"the-journey-from-fragmented-tools-to-a-unified-experience","The Journey: From Fragmented Tools to a Unified Experience",[48,15791,15792],{},"Historically, users of StreamNative have relied on multiple tools to manage different parts of their data infrastructure:",[321,15794,15795,15798,15801,15804],{},[324,15796,15797],{},"pulsarctl or pulsar-admin to interact with Pulsar clusters",[324,15799,15800],{},"Kafka CLI to manage Kafka-enabled clusters (including Ursa-engine clusters)",[324,15802,15803],{},"kcctl to work with Universal Connect (Kafka Connect connectors)",[324,15805,15806],{},"snctl for managing StreamNative Cloud resources (instances\u002Fclusters, infrastructure pools, service accounts, users, etc.)",[48,15808,15809],{},"While each individual tool served its purpose, juggling multiple CLIs meant juggling multiple configuration files, authentication flows, and usage patterns. Recognizing this fragmentation, we launched an initiative to consolidate these disparate tools into a unified experience – one that also aligns with our broader initiative for consistent workflows across CLI, Infrastructure-as-Code, and Kubernetes operators.",[40,15811,15813],{"id":15812},"one-cli-to-rule-them-all-introducing-the-new-snctl","One CLI to Rule Them All: Introducing the new snctl",[48,15815,15816],{},"Snctl v1.0 is the product of that consolidation effort, bringing all the functionalities of the separate CLIs under one command:",[321,15818,15819,15822,15825],{},[324,15820,15821],{},"Pulsar Admin\u002FClient Operations: Full support for Apache Pulsar management and data operations, including:- Complete pulsarctl command set for comprehensive cluster administration by snctl pulsar admin sub-commands - Native Pulsar client capabilities for producing and consuming messages- Seamless management of topics, subscriptions, schemas, and all Pulsar resources",[324,15823,15824],{},"Kafka Admin\u002FClient Operations: Comprehensive Kafka protocol support on StreamNative Cloud：- Complete administration for topics, partitions, and consumer groups- Integrated management of Schema Registry and Kafka Connect- Built-in client capabilities for producing and consuming messages - Unified experience across all Kafka-compatible endpoints",[324,15826,15827],{},"StreamNative Cloud Resource Management: Create and manage instances\u002Fclusters, configure infrastructure pools, handle service accounts and users, and more.",[48,15829,15830],{},"By consolidating these capabilities into a single CLI, snctl eliminates the headache of constantly switching contexts and tools. One configuration, one workflow, and one command-line tool to rule them all.",[40,15832,15834],{"id":15833},"see-it-in-action","See It in Action",[48,15836,15837,15838,15843],{},"Curious to learn more about how this consolidated CLI simplifies operations? ",[55,15839,15842],{"href":15840,"rel":15841},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=d0LqwroVWt0",[264],"Watch our demo video"," to see snctl in action. You’ll get a walkthrough of its core commands, see how easy it is to manage different resources, and discover advanced features that help you automate complex tasks.",[40,15845,15847],{"id":15846},"getting-started-with-snctl","Getting Started with snctl",[48,15849,15850,15851,190],{},"Below is a quick start guide to downloading and installing snctl, followed by examples for both StreamNative Cloud and self-managed environments. For full details, check out ",[55,15852,15855],{"href":15853,"rel":15854},"https:\u002F\u002Fdocs.streamnative.io\u002Fstreamnative-cli\u002Fstreamnative-cli-overview",[264],"our documentation",[32,15857,15859],{"id":15858},"_1-download-and-install","1. Download and Install",[48,15861,15862,15863,190],{},"You can use the curl command or Homebrew to install snctl on a Mac. For installing it on Linux or Windows, please refer to our ",[55,15864,7120],{"href":15865,"rel":15866},"https:\u002F\u002Fdocs.streamnative.io\u002Fstreamnative-cli\u002Fsnctl-overview",[264],[48,15868,15869],{},"Using curl command:",[48,15871,15872],{},[384,15873],{"alt":5878,"src":15874},"\u002Fimgs\u002Fblogs\u002F68231a1821620f8321dd5940_iShot_2025-05-13_18.08.10.png",[48,15876,15877],{},"Using Homebrew:",[48,15879,15880],{},[384,15881],{"alt":5878,"src":15882},"\u002Fimgs\u002Fblogs\u002F68231a6f921593b4ab7362a4_iShot_2025-05-13_18.09.41.png",[48,15884,15885],{},"Upgrading to v1.x:",[48,15887,15888],{},"When upgrading from snctl v0.x to v1.x, please run snctl config init again to ensure all newly introduced configuration settings are applied to your local configuration file.",[32,15890,15892],{"id":15891},"_2-use-snctl-with-streamnative-cloud","2. Use snctl with StreamNative Cloud",[48,15894,15895],{},"Getting started with StreamNative Cloud is now easier than ever with snctl. Follow these steps to connect and start managing your resources:",[1666,15897,15898,15914,15928,15945,15948],{},[324,15899,15900,15901,15904,15905,15908,15909,15913],{},"Authentication - Two flexible options to secure your connection:- User Authentication: ",[4926,15902,15903],{},"snctl auth login","This interactive flow opens your browser for a secure login experience.‍- Service Account Authentication: ",[4926,15906,15907],{},"snctl auth activate-service-account --key-file \u002Fpath\u002Fto\u002Fcredentials.json","Perfect for CI\u002FCD pipelines and automated workflows. Please visit ",[55,15910,15911],{"href":15911,"rel":15912},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fservice-accounts",[264]," for details about the Service Account.‍",[324,15915,15916,15917,15920,15921,15923,15924],{},"Organization Selection - Define your working context:",[15918,15919],"br",{},"- Set a default organization: snctl config set --organization \u003Corganization_id>",[15918,15922],{},"- Or specify per command: snctl -O \u003Corganization_id> ",[15925,15926,15927],"command",{},"‍‍",[324,15929,15930,15931,15933,15934,15936,15937],{},"Cluster Context - Seamlessly switch between your Pulsar clusters:",[15918,15932],{},"‍- Interactive selection: snctl context useBrings up a user-friendly menu to browse and select your available clusters.",[15918,15935],{},"- Direct specification: snctl context use --pulsar-instance ",[15938,15939,15940,15941],"instance",{}," --pulsar-cluster ",[15942,15943,15944],"cluster",{},"‍Ideal when you know exactly which cluster you need.‍",[324,15946,15947],{},"Pulsar & Kafka Operations - Unified syntax for all your messaging tasks:‍- Pulsar data operations: snctl pulsar client produce --topic my-tenant\u002Fmy-namespace\u002Fmy-topic --message \"Hello, StreamNative!\"‍- Pulsar admin tasks: snctl pulsar admin tenants list- Kafka data operations: snctl kafka client consume --topic my-topic --from-beginning- Kafka administration: snctl kafka admin topics list‍‍",[324,15949,15950],{},"Service Account Impersonation - Run commands with different permissions:- Specify a service account directly: snctl pulsar admin namespaces list --as-service-account my-function-sa- Interactive selection:snctl kafka admin consumer-groups list --use-service-accountThis feature is especially valuable when managing connectors and functions that require specific access controls.",[48,15952,15953],{},"With these straightforward steps, snctl empowers you to manage your entire StreamNative Cloud ecosystem from a single, consistent interface.",[32,15955,15957],{"id":15956},"_3-use-snctl-with-self-managed-pulsar-or-kafka","3. Use snctl with Self-Managed Pulsar or Kafka",[48,15959,15960],{},"If you're running your own Pulsar or Kafka clusters, snctl can still serve as your unified CLI:",[1666,15962,15963],{},[324,15964,15965],{},"Configure external contexts for Self-Managed Pulsar Cluster:",[48,15967,15968],{},[384,15969],{"alt":5878,"src":15970},"\u002Fimgs\u002Fblogs\u002F68231d388ddc5a189f35041d_iShot_2025-05-13_18.21.14.png",[1666,15972,15973],{},[324,15974,15975],{},"Configure external contexts for Self-Managed Kafka Cluster:",[48,15977,15978],{},[384,15979],{"alt":5878,"src":15980},"\u002Fimgs\u002Fblogs\u002F68231d89de8e361b083a0aaa_iShot_2025-05-13_18.22.56.png",[40,15982,15984],{"id":15983},"one-more-thing-run-streamnative-mcp-server-with-snctl","One more thing – Run StreamNative MCP server with snctl",[48,15986,15987,15988,15990,15991,15994],{},"Building on the unified CLI experience of ",[4926,15989,7040],{},", we're excited to introduce native integration with the ",[55,15992,15993],{"href":3576},"StreamNative MCP Serve"," (Note: The MCP server is only available with snctl v1.1.0 or later. Please ensure you are using snctl v1.1.0 or above if you plan to use the MCP server). This powerful addition connects your messaging infrastructure directly with the world of AI agents, providing a bridge between your streaming data and the latest generation of AI tools.",[32,15996,15998],{"id":15997},"what-is-the-streamnative-mcp-server","What is the StreamNative MCP Server?",[48,16000,16001,16002,16007,16008,16012],{},"The StreamNative MCP Server (",[55,16003,16006],{"href":16004,"rel":16005},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fstreamnative-mcp-server",[264],"streamnative-mcp-server",") implements the ",[55,16009,16011],{"href":15620,"rel":16010},[264],"Model Context Protocol"," specifically for data streaming platforms. This lightweight bridge enables people to instruct AI agents interacting with any Pulsar cluster, any Kafka cluster, or any StreamNative Cloud environment through natural language.",[32,16014,16016],{"id":16015},"the-power-of-snctl-mcp-integration","The Power of snctl MCP Integration",[48,16018,16019],{},"With the new snctl mcp command, you can now:",[1666,16021,16022,16025,16028,16034],{},[324,16023,16024],{},"Start a StreamNative MCP Server directly from your CLI: Launch a fully configured MCP server that connects to your current snctl service context with a single command.",[324,16026,16027],{},"Enable AI agent interactions: Give AI assistants like Claude or other MCP-compatible agents the ability to read from, write to, and administer your streaming resources through natural language.",[324,16029,16030,16031,16033],{},"Maintain consistent authentication: The MCP server inherits authentication from your current ",[4926,16032,7040],{}," context, ensuring secure access to your resources.",[324,16035,16036,16037,16039],{},"Leverage your existing configuration: The MCP server automatically configures itself based on your current ",[4926,16038,7040],{}," context, whether it's a StreamNative Cloud instance or an external Pulsar\u002FKafka cluster.",[32,16041,16043],{"id":16042},"using-snctl-mcp","Using snctl MCP",[48,16045,16046],{},"Starting an MCP server with snctl is straightforward:",[48,16048,16049],{},[384,16050],{"alt":5878,"src":16051},"\u002Fimgs\u002Fblogs\u002F68231ee26c01316cfd6b89fa_iShot_2025-05-13_18.28.31.png",[48,16053,16054,16055,16058],{},"The MCP server requires a service account for authentication, which can be specified using either the ",[4926,16056,16057],{},"--as-service-account"," flag with a specific service account name.",[48,16060,16061,16062,16065,16066,16069],{},"You can also control which features are enabled through the ",[4926,16063,16064],{},"--features"," flag and restrict operations to read-only mode with the ",[4926,16067,16068],{},"--read-only"," flag for enhanced security.",[32,16071,16073],{"id":16072},"security-considerations","Security Considerations",[48,16075,16076],{},"The MCP Server integration is designed with security in mind:",[1666,16078,16079,16082,16085],{},[324,16080,16081],{},"Service Account Authorization: The MCP server operates with the permissions of the specified service account, ensuring proper access control.",[324,16083,16084],{},"Read-Only Mode: For sensitive environments, you can enable read-only mode to prevent any modifications to your streaming resources.",[324,16086,16087],{},"Feature Restrictions: You can selectively enable only the features you need, limiting the scope of operations available to AI agents.",[32,16089,2890],{"id":749},[48,16091,16092],{},"To get started with the MCP Server integration:",[1666,16094,16095,16098,16101,16104],{},[324,16096,16097],{},"Ensure you're using snctl v1.1.0 or later",[324,16099,16100],{},"Set up your context using snctl context use or external context with snctl context use-external",[324,16102,16103],{},"Start the MCP server with snctl mcp stdio --as-service-account $SERVICE_ACCOUNT_NAME",[324,16105,16106],{},"Connect your MCP-compatible AI client (such as Claude Desktop, Cursor, or any MCP client) to the running server",[48,16108,16109,16110,16114,16115,190],{},"With this powerful integration, you're now ready to bring the intelligence of AI agents to your streaming platform, enabling natural language interactions with your Pulsar and Kafka resources. Learn more in our ",[55,16111,16113],{"href":16112},"\u002Fblog\u002Fintroducing-the-streamnative-mcp-server-connecting-streaming-data-to-ai-agents#interacting-with-pulsar-and-kafka-via-streamnative-mcp-server","MCP Server announcement blog"," or catch the details in our ",[55,16116,16119],{"href":16117,"rel":16118},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-BmxsE3q4WO8mu8gzsbVkjoXb-PFpQX",[264],"MCP video series",[48,16121,16122],{},"The MCP Server integration represents another step in our mission to provide a unified, consistent experience across all StreamNative interactions, making your data streaming infrastructure more accessible and easier to manage than ever before.",[40,16124,13565],{"id":1727},[48,16126,16127],{},"With the first milestone achieved – unifying all CLI functionality – our team is already working to bring the same consistent user experience to Terraform and Kubernetes operators. Ultimately, our goal is for you to have a single set of mental models and workflows across your entire infrastructure stack, from command-line tools to Infrastructure-as-Code and beyond.",[40,16129,16131],{"id":16130},"join-the-unified-revolution","Join the Unified Revolution",[48,16133,16134],{},"No more toggling between separate CLIs or maintaining multiple sets of credentials. Whether you’re managing Pulsar clusters, Kafka-compatible deployments, or StreamNative Cloud resources, snctl is your single command line for them all. We invite you to:",[1666,16136,16137,16144,16151],{},[324,16138,16139,16140,16143],{},"Download the latest version of ",[55,16141,7040],{"href":15865,"rel":16142},[264]," from our GitHub release page.",[324,16145,16146,16150],{},[55,16147,16149],{"href":15840,"rel":16148},[264],"Watch the demo video"," to view examples of commands.",[324,16152,16153,16154,16159,16160,16165],{},"Share your feedback on our ",[55,16155,16158],{"href":16156,"rel":16157},"https:\u002F\u002Fstreamnativecommunity.slack.com\u002F",[264],"community channels"," or our ",[55,16161,16164],{"href":16162,"rel":16163},"https:\u002F\u002Fsupport.streamnative.io\u002F",[264],"help center."," Let us know what works well and what you’d like to see next!",[48,16167,16168],{},"Embrace the unified CLI era, and say goodbye to fragmented tooling. With snctl, managing your data streaming workflows has never been simpler, faster, and more consistent.",[48,16170,16171],{},"Happy streaming with snctl!",{"title":18,"searchDepth":19,"depth":19,"links":16173},[16174,16175,16176,16177,16182,16189,16190],{"id":15788,"depth":19,"text":15789},{"id":15812,"depth":19,"text":15813},{"id":15833,"depth":19,"text":15834},{"id":15846,"depth":19,"text":15847,"children":16178},[16179,16180,16181],{"id":15858,"depth":279,"text":15859},{"id":15891,"depth":279,"text":15892},{"id":15956,"depth":279,"text":15957},{"id":15983,"depth":19,"text":15984,"children":16183},[16184,16185,16186,16187,16188],{"id":15997,"depth":279,"text":15998},{"id":16015,"depth":279,"text":16016},{"id":16042,"depth":279,"text":16043},{"id":16072,"depth":279,"text":16073},{"id":749,"depth":279,"text":2890},{"id":1727,"depth":19,"text":13565},{"id":16130,"depth":19,"text":16131},"2025-05-13","The StreamNative CLI (snctl) v1.0 is a unified command-line interface designed to streamline interactions with Apache Pulsar, Apache Kafka, and StreamNative Cloud resources by consolidating various tools into a single, efficient platform. This integration simplifies user workflows, enhances security, and supports seamless management of streaming data operations, making it easier for users to manage their data infrastructure from one place.","\u002Fimgs\u002Fblogs\u002F682310d54ff0303db56a9104_snctl-v1.0.png",{},"\u002Fblog\u002Fannouncing-one-cli","8 min",{"title":15779,"description":16192},"blog\u002Fannouncing-one-cli",[799,821,1332,3989],"Hc11QAHbUZ--mPavoY7gsWy6t8IsMyV-v67RRop3dCA",{"id":16202,"title":16203,"authors":16204,"body":16205,"category":7338,"createdAt":290,"date":16191,"description":16559,"extension":8,"featured":294,"image":16560,"isDraft":294,"link":290,"meta":16561,"navigation":7,"order":296,"path":3576,"readingTime":16562,"relatedResources":290,"seo":16563,"stem":16564,"tags":16565,"__hash__":16566},"blogs\u002Fblog\u002Fintroducing-the-streamnative-mcp-server-connecting-streaming-data-to-ai-agents.md","Introducing the StreamNative MCP Server: Connecting Streaming Data to AI Agents",[810,6500],{"type":15,"value":16206,"toc":16535},[16207,16210,16222,16230,16238,16241,16243,16246,16249,16252,16256,16259,16270,16273,16276,16280,16283,16286,16289,16292,16296,16299,16303,16306,16313,16316,16320,16323,16327,16330,16334,16337,16341,16344,16348,16351,16355,16363,16367,16375,16392,16396,16402,16421,16425,16431,16444,16448,16451,16465,16468,16472,16477,16481,16484,16488,16491,16495,16503,16512,16515,16526,16529,16532],[48,16208,16209],{},"Over the past few weeks in our Agentic AI blog series, our CEO has explored the immense potential of integrating AI agents with real-time data streams:",[321,16211,16212,16217],{},[324,16213,16214],{},[55,16215,16216],{"href":15610},"AI Agents Meet Real‑Time Data – Bridging the Gap",[324,16218,16219],{},[55,16220,16221],{"href":15615},"Open Standards for Real-Time AI Integration – A Look at MCP",[48,16223,16224,16225,16229],{},"In the last blog, we examined ",[55,16226,16228],{"href":15620,"rel":16227},[264],"the Model Context Protocol (MCP)",", an open protocol introduced by Anthropic. It’s designed to enable seamless, secure, and standardized connections between AI models – especially large language models (LLMs) – and a wide range of external data sources, tools, and environments. With the protocol, AI agents can access and interact with external data sources in a universal, consistent way.",[48,16231,16232,16233,16237],{},"Today, we’re thrilled to unveil the ",[55,16234,16236],{"href":16004,"rel":16235},[264],"StreamNative MCP Server"," and share it with all the streaming enthusiasts as an open-source project. It seamlessly connects any Kafka\u002FPulsar service to AI agents using the MCP protocol, regardless of whether it's on StreamNative Cloud or not. With the MCP Server, users can instruct agents to access fresh, real-time Kafka\u002FPulsar data and manage the cluster resources through natural language, performing tasks such as configuring topics, publishing\u002Fconsuming data, or even writing and submitting Pulsar Functions without wrestling with complex commands.",[48,16239,16240],{},"In the following sections, we’ll introduce the StreamNative MCP Server, explain how it works, and show how it connects Apache Pulsar, Apache Kafka, and StreamNative Cloud streams to AI in a unified, developer-friendly way.",[40,16242,15998],{"id":15997},[48,16244,16245],{},"The StreamNative MCP Server (aka “streamnative-mcp-server” or “snmcp”) is an open-source implementation of the Model Context Protocol designed specifically for bringing real-time streaming platforms – Apache Kafka and Apache Pulsar closer to LLMs and AI agents. By running the MCP Server, you can securely expose a Pulsar or Kafka deployment –  whether it’s on-premises, in StreamNative cloud, or in other Streaming Service Vendors’ cloud –  to any MCP-compatible AI client. This enables an LLM-based agent to read from, write to, and administer streams through a single standardized interface without any custom integration code. It significantly lowers the barrier to adopting streaming platforms and helps truly democratize streaming technology.",[48,16247,16248],{},"The server speaks MCP on one side and native Pulsar\u002FKafka protocols on the other. Because it adheres to the open MCP spec, it works out-of-the-box with any compliant client. Developers don’t need to reinvent protocols or worry about the underlying cluster details – the server abstracts those away using the familiar “tools,” “resources,” and “prompts” vocabulary that AI agents understand.",[48,16250,16251],{},"We're excited to open source the MCP server under the Apache 2.0 license, making it freely available for everyone to use, inspect, deploy, and extend without restriction. We believe this is a key step in unlocking real-time streaming for AI and helping accelerate the innovation between the streaming and AI landscape.",[40,16253,16255],{"id":16254},"how-it-works-tools-resources-and-prompts","How It Works: Tools, Resources, and Prompts",[48,16257,16258],{},"To understand how the MCP Server enables AI-to-streaming integration, let’s briefly review the core MCP concepts it implements. In MCP, servers don’t simply expose raw data – they offer structured capabilities that the AI can utilize. The three primary capability types are:",[1666,16260,16261,16264,16267],{},[324,16262,16263],{},"Resources – Read-only data that the server makes available to clients and LLMs. Resources include files or data snippets that an AI agent can pull in as context. These resources provide structured data without additional computation needed.",[324,16265,16266],{},"Prompts – Predefined prompt templates or workflows that the server provides. Prompts serve as shortcuts for common interactions or tasks. Think of them as stored queries or conversation templates that the AI can invoke.",[324,16268,16269],{},"Tools – Tools are executable actions that the MCP Server provides to AI agents, representing the most powerful capability of the platform. Through tools, the MCP Server empowers AI agents to perform operations on streaming platforms and related systems with appropriate permissions and oversight. Each tool is essentially a function that an AI can invoke via the MCP protocol.",[48,16271,16272],{},"Under the hood, the MCP Server implements these concepts according to the MCP specification. When an AI agent connects, it can query the server for available tools, resources, and prompts (using standard MCP requests like tools\u002Flist and resources\u002Flist). The server advertises everything it can do in a discoverable way. Then, during an AI dialogue, the agent may choose to invoke a tool or retrieve a resource to fulfill the user’s request. The MCP Server receives those requests (formatted as JSON-RPC messages over the MCP connection) and translates them into actions on the Pulsar or Kafka protocol.",[48,16274,16275],{},"For example, if a user asks the AI agent, “How many events per second are flowing through Pulsar topic X right now?”, the agent (via its MCP client) might collect the required info to call a pulsar-admin-topics tool on the MCP Server to get topic stats. The server, in turn, uses Pulsar’s admin API to fetch the metrics for topic X, then returns that data to the AI agent, which incorporates it into a natural language answer. All of this happens through the standardized MCP interface – the agent never needs to know Pulsar protocol. It simply requests a tool by name and description from the MCP server. This model aligns perfectly with modern AI agent frameworks like ReAct (Reason+Act): the agent focuses on the reasoning and determining what tool action is needed (e.g., call pulsar-admin-topics), while the MCP Server handles the execution details (how) of interacting with the streaming backend, returning the observation (the topic stats).",[40,16277,16279],{"id":16278},"how-it-connects-agents-to-streams-with-safety","How it Connects: Agents to Streams with Safety",[48,16281,16282],{},"The Model Context Protocol, implemented by the StreamNative MCP Server, provides the essential building blocks – Tools, Resources, and Prompts – that fundamentally expand what AI Agents can achieve when interacting with streaming data platforms. By leveraging these MCP primitives, agents gain two critical advantages: the ability to perceive and react to the world in real-time (connecting to streams), and the capacity to act within a framework of unified, secure administration (with safety).",[48,16284,16285],{},"First, MCP Resources and Tools directly address the limitation of static LLM knowledge by granting agents access to live data streams. Agents can utilize specific tools to query current states, consume messages, or even subscribe to continuous data feeds. This closes the gap between the agent's knowledge cutoff and the \"here-and-now\" reality reflected in platforms like Kafka and Pulsar, enabling truly context-aware agents to make timely decisions based on the latest events. This unlocks possibilities, allowing the agents to perform real-time monitoring or provide interactive diagnostics based on current system and platform states.",[48,16287,16288],{},"Second, the structured nature of MCP Tools and the ability to define accessible Resources provide the necessary foundation for governed agent actions. Administrators gain fine-grained control by selectively exposing specific tools and data resources to different agents. This allows AI agents to perform meaningful actions – like managing topics or understanding the real-time platform status using the authorised tools – while ensuring they operate within secure, predefined boundaries aligned with organisational policies. This capability is crucial for confidently deploying agents in enterprise environments, expanding their roles from passive information retrievers to active, yet controlled, participants in managing and interacting with streaming systems.",[48,16290,16291],{},"Therefore, the StreamNative MCP Server translates the potential of the Model Context Protocol into practice for your Kafka and Pulsar clusters. By providing controlled access to streaming capabilities and data, our server significantly enhances agent scope and reliability, enabling trustworthy, real-time AI applications. The next section details the specific features and capabilities built into the StreamNative MCP Server to deliver this value.",[40,16293,16295],{"id":16294},"key-features-and-capabilities","Key Features and Capabilities",[48,16297,16298],{},"Let’s drill into some of the technical highlights of the StreamNative MCP Server and what makes it developer-friendly:",[32,16300,16302],{"id":16301},"_30-built-in-tools-and-actions","🚀 30+ Built-In Tools and Actions",[48,16304,16305],{},"The MCP Server includes an elegantly designed toolkit of over 30 powerful tools that comprehensively cover the capabilities of modern streaming platforms.",[48,16307,16308,16309,16312],{},"Instead of building hundreds of single-purpose tools, we adopt an efficient approach by using 'Resource' and 'Operation' parameters within each tool, enabling one tool to handle multiple related functions. For example, the single ",[4926,16310,16311],{},"pulsar_admin_brokers"," tool can list active brokers, check health status, and manage configurations through different parameter combinations. The toolkit supports a broad range of functionalities, including data operations (e.g., publishing or consuming messages), administrative tasks (e.g., creating topics, managing subscriptions, and monitoring broker statistics), and StreamNative Cloud resources management capabilities.",[48,16314,16315],{},"With this powerful library, AI agents can conveniently perform a wide range of tasks on the data streaming platform. It can \"create a new topic for user logs,\" \"increase the retention of topic Y to 7 days,\" or \"write and run a Pulsar Function to process data\" – and it knows the exact tools to execute these user requests. Each tool accepts input parameters (defined by JSON schemas) and returns results, with actions subject to host application approval.",[32,16317,16319],{"id":16318},"secure-by-design","🔒 Secure by Design",[48,16321,16322],{},"Security is a fundamental consideration in the StreamNative MCP Server design. It employs a defense-in-depth approach to ensure safe and governed agent interactions with your system. The server integrates with your cluster's existing authorization model via specified service accounts for granular access control. A strict read-only mode (--read-only) can also be enabled for added protection in sensitive environments. Administrators also have fine-grained control through selective feature enablement (--features) to limit the agent's operational scope based on least privilege. Complementing these controls, the server's built-in prompts often incorporate their own restrictions, adding another layer of guidance to keep AI agents interacting within intended boundaries. This multi-layered security supports strict policies and minimizes the risk of unauthorized access or data manipulation.",[32,16324,16326],{"id":16325},"connector-integration","🔌 Connector Integration",[48,16328,16329],{},"The StreamNative MCP Server is designed to work with the Universal Connect (UniConn) framework, which means AI agents can leverage the rich ecosystem of Pulsar IO and Kafka Connect connectors through MCP. If your cluster is already ingesting or sinking data via connectors (e.g., from databases, cloud storage, etc.), the MCP Server can expose those as tools or resources as well. For instance, the MCP Server can spin up a Debezium MySQL → Pulsar pipeline on demand and then let the AI agent tap that stream to pull the latest change event or an entire batch of recent transactions. UniConn provides a unified interface for connectors on Pulsar and Kafka, and those connectors effectively become extensions of the AI’s reach. This opens up a world of external systems (SQL, NoSQL, SaaS APIs, etc.) to the AI agent through the same MCP Server. The agent could ask something like “What’s the latest record in our analytics DB?” and, via a connector tool, fetch that in real time. No custom code is needed to integrate these external sources – if there’s a connector, the MCP Server can likely expose it.",[32,16331,16333],{"id":16332},"️-dynamic-topic-management","🗄️ Dynamic Topic Management",[48,16335,16336],{},"Beyond simply reading or writing data, the MCP Server lets AI agents create, configure, and manage topics and subscriptions on the fly. An agent can spin up a brand-new stream (“Create a topic for sensor-XYZ data”), which maps to a pulsar-admin-topics call, or tweak retention, partition counts, and subscription properties using the same toolset. All changes respect cluster governance – quotas, ACLs, and policies still apply – but the agent can carry them out from natural-language requests instead of a CLI.",[32,16338,16340],{"id":16339},"serverless-function-management","🧩 Serverless Function Management",[48,16342,16343],{},"Moreover, we integrated Pulsar Functions support, enabling the agent to deploy serverless functions or connectors by submitting function code or connector configs via a tool. Imagine telling your AI agent, “Deploy a function that scans for sensitive data, e.g., SSN, and masks it”, and the agent uses an MCP tool to submit the Pulsar Function to the cluster. This drastically lowers the barrier to deploying stream processing logic, as the AI can act as your DevOps helper for streaming jobs. All changes remain subject to your cluster’s governance – the AI won’t bypass quotas or authorization – but it provides a natural-language interface to tasks previously handled via CLI or GUI.",[32,16345,16347],{"id":16346},"streaming-data-as-first-class-context","📊 Streaming Data as First-Class Context",[48,16349,16350],{},"The StreamNative MCP Server supports streaming outputs using MCP’s event streaming features (based on JSON-RPC and will soon be on Server-Sent Events). This means that when an AI agent subscribes to a topic via a tool, the server can feed data continuously to the client in a streaming fashion, rather than sending only one-off responses. The MCP protocol supports sending incremental results, so an agent could effectively “listen” to a topic. This real-time push of data is crucial for truly live agentic behavior – your agent could, for example, monitor a stream of user transactions and proactively flag anomalies during a conversation. Under MCP, the client-side (agent host) can choose to display or use streaming responses as they come. The key takeaway: real-time data isn’t just a one-shot query – it’s a continuous feed, and our MCP Server fully supports that mode.",[40,16352,16354],{"id":16353},"interacting-with-pulsar-kafka-via-streamnative-mcp-server","Interacting with Pulsar & Kafka via StreamNative MCP Server",[48,16356,16357,16358,16362],{},"Here are a few examples that showcase the StreamNative MCP Server’s capabilities; you can find additional demos in the ",[55,16359,16361],{"href":16117,"rel":16360},[264],"StreamNative MCP Server playlist"," on YouTube.",[32,16364,16366],{"id":16365},"produce-and-consume-kafka-messages-with-avro-schema-in-ursa","Produce and Consume Kafka Messages with AVRO Schema in URSA",[48,16368,16369,16370],{},"📺",[55,16371,16374],{"href":16372,"rel":16373},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=UhOzBLjYLP8&list=PL7-BmxsE3q4WO8mu8gzsbVkjoXb-PFpQX&index=2",[264]," Watch here",[321,16376,16377,16380,16383,16386,16389],{},[324,16378,16379],{},"Create Kafka topic",[324,16381,16382],{},"Produce Kafka messages with AVRO schema",[324,16384,16385],{},"Consume Kafka messages",[324,16387,16388],{},"Examine the message in Databricks",[324,16390,16391],{},"Delete resources",[32,16393,16395],{"id":16394},"managing-pulsar-tenants-namespaces-and-topics","Managing Pulsar Tenants, Namespaces, and Topics",[48,16397,16369,16398],{},[55,16399,16374],{"href":16400,"rel":16401},"https:\u002F\u002Fyoutu.be\u002FPSXdhdmunZg",[264],[321,16403,16404,16407,16410,16413,16416,16419],{},[324,16405,16406],{},"Create tenant",[324,16408,16409],{},"Create namespace",[324,16411,16412],{},"Create partitioned topic",[324,16414,16415],{},"Test topic",[324,16417,16418],{},"Set namespace TTL",[324,16420,16391],{},[32,16422,16424],{"id":16423},"create-deploy-and-test-python-pulsar-function","Create, Deploy, and Test Python Pulsar Function",[48,16426,16369,16427],{},[55,16428,16374],{"href":16429,"rel":16430},"https:\u002F\u002Fyoutu.be\u002F9JDHL-WaCXs",[264],[321,16432,16433,16436,16439,16442],{},[324,16434,16435],{},"Create Python Pulsar Function with vibe coding",[324,16437,16438],{},"Deploy with MCP Server",[324,16440,16441],{},"Test with MCP Server",[324,16443,16391],{},[40,16445,16447],{"id":16446},"laying-the-foundation-for-real-time-enterprise-ai-agents","Laying the Foundation for Real-Time Enterprise AI Agents",[48,16449,16450],{},"The release and open-source of the StreamNative MCP Server marks a significant milestone: it provides the foundation for what we envision as Real-Time Enterprise AI Agents—a complete environment for running AI agents natively with streaming data. With the MCP Server in place, AI agents can now connect to streaming systems to:",[321,16452,16453,16456,16459,16462],{},[324,16454,16455],{},"Retrieve up-to-the-second data for more accurate decision-making",[324,16457,16458],{},"Trigger transformations and pipelines via Pulsar Functions, ensuring the ability to enrich data on the fly",[324,16460,16461],{},"Tap into existing connectors to instantly access 200+ data sources without writing new integration code",[324,16463,16464],{},"Automate resource management and provisioning through natural language, reducing operational overhead and simplifying DevOps workflows",[48,16466,16467],{},"Future enhancements to the StreamNative MCP Server will unlock even more capabilities for fast, intelligent AI agents across diverse data landscapes.",[32,16469,16471],{"id":16470},"going-further-with-ursa","Going Further with Ursa",[48,16473,16474,16476],{},[55,16475,1332],{"href":6647},", StreamNative’s next-generation, lakehouse-native data streaming engine, brings together real-time streaming data and lakehouse tables. Through MCP, AI agents gain unified access to both historical datasets (in Apache Iceberg or Delta Tables) and ongoing event streams – all from a single interface. This means no more relying on stale snapshots – agents can respond to live data, correlate it with archived knowledge, and deliver timely, context-rich insights.",[32,16478,16480],{"id":16479},"leveraging-pulsar-functions","Leveraging Pulsar Functions",[48,16482,16483],{},"Many users already rely on Pulsar Functions for real-time data processing and transformation. These business logic functions can now be directly utilized – or even dynamically created and updated – by AI agents through MCP. As a result, agents can perform in-flight analytics or adapt data pipelines based on changing requirements, making your event-driven architecture more intelligent and responsive.",[32,16485,16487],{"id":16486},"harnessing-connectors","Harnessing Connectors",[48,16489,16490],{},"StreamNative’s robust connector ecosystem, which covers everything from enterprise systems to SaaS platforms and databases, ensures that AI agents can connect to virtually any data source without custom coding. By removing the need for specialized integrations, developers save time and can focus on enhancing their AI-driven workflows.",[40,16492,16494],{"id":16493},"get-involved-try-it-out-today","Get Involved – Try it Out Today",[48,16496,16497,16498,16502],{},"The StreamNative MCP Server is available now ",[55,16499,16501],{"href":16004,"rel":16500},[264],"on GitHub"," (under the StreamNative organization). We invite all streaming enthusiasts, data engineers, and curious tinkerers to download the code, read the docs, and play with it. We’ve provided the instructions that show how to connect an AI client – such as the Claude Desktop app – to your own MCP Server and start issuing tool commands to a local Pulsar or Kafka topic.",[48,16504,16505,16506,16511],{},"Because this is an early release, we’re actively seeking feedback and contributions from the community. ",[55,16507,16510],{"href":16508,"rel":16509},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fstreamnative-mcp-server\u002Fdiscussions",[264],"Join the conversation"," on GitHub to ask questions, share use cases, and get help from our engineers and fellow early adopters.",[48,16513,16514],{},"This launch is an invitation to explore the cutting edge of real-time AI integration. Whether you want to build:",[321,16516,16517,16520,16523],{},[324,16518,16519],{},"An AI ops assistant that manages your streaming platform",[324,16521,16522],{},"An intelligent monitoring agent that watches your event data",[324,16524,16525],{},"Or a new breed of data-driven chatbot that can act on the information it retrieves",[48,16527,16528],{},"…the tools are now in your hands.",[48,16530,16531],{},"We believe Agentic AI – AI agents empowered with real-time context – will unlock a new class of applications. With the StreamNative MCP Server, connecting streaming data to AI is no longer theoretical – it’s something you can implement today.",[48,16533,16534],{},"Feel free to explore the repo, launch the StreamNative MCP Server, and unleash your AI agents on live data. We can’t wait to see what you create, and we look forward to building the future of real-time AI together with the community.",{"title":18,"searchDepth":19,"depth":19,"links":16536},[16537,16538,16539,16540,16548,16553,16558],{"id":15997,"depth":19,"text":15998},{"id":16254,"depth":19,"text":16255},{"id":16278,"depth":19,"text":16279},{"id":16294,"depth":19,"text":16295,"children":16541},[16542,16543,16544,16545,16546,16547],{"id":16301,"depth":279,"text":16302},{"id":16318,"depth":279,"text":16319},{"id":16325,"depth":279,"text":16326},{"id":16332,"depth":279,"text":16333},{"id":16339,"depth":279,"text":16340},{"id":16346,"depth":279,"text":16347},{"id":16353,"depth":19,"text":16354,"children":16549},[16550,16551,16552],{"id":16365,"depth":279,"text":16366},{"id":16394,"depth":279,"text":16395},{"id":16423,"depth":279,"text":16424},{"id":16446,"depth":19,"text":16447,"children":16554},[16555,16556,16557],{"id":16470,"depth":279,"text":16471},{"id":16479,"depth":279,"text":16480},{"id":16486,"depth":279,"text":16487},{"id":16493,"depth":19,"text":16494},"The StreamNative MCP Server is an open-source project that connects streaming data platforms like Kafka and Pulsar to AI agents using the Model Context Protocol (MCP). It enables seamless, real-time AI integration by allowing agents to access, manage, and interact with live data streams through natural language, enhancing the capabilities and security of AI-driven workflows.","\u002Fimgs\u002Fblogs\u002F6822faa477df140288f03c3a_mcp-server.png",{},"12 min",{"title":16203,"description":16559},"blog\u002Fintroducing-the-streamnative-mcp-server-connecting-streaming-data-to-ai-agents",[3988,10054,3989,799,821,1332],"idMeeYwS2sy_hYieHJTL8uFfI3h8tfAO2iQq4aIq69I",{"id":16568,"title":16569,"authors":16570,"body":16571,"category":1332,"createdAt":290,"date":16849,"description":16850,"extension":8,"featured":294,"image":16851,"isDraft":294,"link":290,"meta":16852,"navigation":7,"order":296,"path":16853,"readingTime":16854,"relatedResources":290,"seo":16855,"stem":16856,"tags":16857,"__hash__":16858},"blogs\u002Fblog\u002Fdiskless-stateless-leaderless---a-comic-guide-to-modern-data-streaming.md","Diskless, Stateless, Leaderless – A Comic Guide to Modern Data Streaming",[806],{"type":15,"value":16572,"toc":16828},[16573,16576,16580,16583,16594,16597,16601,16606,16610,16613,16617,16628,16632,16643,16646,16660,16663,16667,16672,16675,16678,16681,16692,16695,16703,16705,16719,16722,16726,16731,16734,16737,16740,16748,16751,16759,16761,16775,16778,16781,16786,16790,16801,16804,16808,16823,16826],[48,16574,16575],{},"‍Goal of this comic‑blog: Explain three buzz‑worthy architectures in plain English so any dev, PM, or VP can walk away nodding, “Got it!”",[40,16577,16579],{"id":16578},"meet-the-three-amigos","Meet the Three Amigos 🧑‍🚀🧑‍🔧🧑‍🎨",[48,16581,16582],{},"Imagine a squad of message‑brokers who keep your data flowing 24\u002F7. Each broker can adopt one (or more) of these personality quirks:",[1666,16584,16585,16588,16591],{},[324,16586,16587],{},"Diskless – “No backpack full of disks for me. I stash my stuff in the cloud!”",[324,16589,16590],{},"Stateless – “Goldfish memory. I deliver messages then forget them.”",[324,16592,16593],{},"Leaderless – “Nobody here is the boss. We pass the ball around like pickup basketball.”",[48,16595,16596],{},"Why should you care? Because the mix you choose decides how cheap, fast, and fault‑tolerant your pipeline can be.",[40,16598,16600],{"id":16599},"_1-diskless-no-hard-drives-no-heavy-lifting-️","1. Diskless – No Hard Drives, No Heavy Lifting ☁️",[48,16602,16603],{},[384,16604],{"alt":18,"src":16605},"\u002Fimgs\u002Fblogs\u002F6811ca92126da159260dc291_AD_4nXcKUPV28OjymYhcXjO9z7nk25mQCZgur59wqb5FpCEsLZ0p5wQ9uqAdH-qiLHExnB--JGuEvG5KsbROKidCW1rb9g7GNbgBoJrnPJoE-Veh0VS_W1yZdoabhWoVEhH7IgZ1lQkUpQ.png",[32,16607,16609],{"id":16608},"what-it-means","What it means",[48,16611,16612],{},"Brokers write straight to cloud\u002Fobject storage. Their own disks disappear completely.",[32,16614,16616],{"id":16615},"why-its-cool","Why it’s cool",[321,16618,16619,16622,16625],{},[324,16620,16621],{},"Elastic scale – Spin up new brokers in seconds; no terabytes to copy.",[324,16623,16624],{},"Cloud pricing – Pay pennies per GB instead of gold‑plated SSDs.",[324,16626,16627],{},"Unlimited bandwidth – Saturate links with no caps or throttling, so throughput grows with your needs.",[32,16629,16631],{"id":16630},"gotchas","Gotchas",[321,16633,16634,16637,16640],{},[324,16635,16636],{},"Adds a few hundred extra milliseconds – Each write waits for the cloud to say “stored!”.",[324,16638,16639],{},"Backpressure on the producers – Messages must be kept on the client until they are persisted in the cloud.",[324,16641,16642],{},"Your cloud bucket is now the single source of truth – keep it durable and monitored.",[48,16644,16645],{},"Who's doing it?",[321,16647,16648,16651,16654,16657],{},[324,16649,16650],{},"Kafka – Mostly disk‑full today; Diskless Topics (KIP‑1150) are still experimental.",[324,16652,16653],{},"Redpanda – Mostly disk‑full; shadow‑indexing can tier cold data to S3, but brokers still rely on local disks.",[324,16655,16656],{},"Pulsar – Brokers are diskless, but BookKeeper storage nodes keep spinning disks.",[324,16658,16659],{},"Ursa – 100 % diskless; everything lands straight in S3 + Iceberg.",[48,16661,16662],{},"👉 Use diskless if you love cloud economics or need limitless retention. Stick with disks if every millisecond counts or you run on‑prem without object storage.",[40,16664,16666],{"id":16665},"_2-stateless-memory-of-a-goldfish","2. Stateless – Memory of a Goldfish 🐠",[48,16668,16669],{},[384,16670],{"alt":18,"src":16671},"\u002Fimgs\u002Fblogs\u002F6811ca91da647ba0f5cc2cff_AD_4nXedwfXjaoWu7-_B1Fw1dwa0lg42xyofVOpvv9aAP1If6KG_1-46o_oPshM9aKeJQmh7d427W1_quOzg3c4GTBtQMlLU5o95R2Zvi93nFdAu1VvsOBnIIxVypr2XQQxuAXMIGFqr.png",[32,16673,16609],{"id":16674},"what-it-means-1",[48,16676,16677],{},"Brokers keep no durable state. They push every message to an external store (and maybe cache a few in RAM). If they crash, a twin broker just resumes the job.",[32,16679,16616],{"id":16680},"why-its-cool-1",[321,16682,16683,16686,16689],{},[324,16684,16685],{},"Replaceable pods – Perfect fit for Kubernetes auto‑scaling.",[324,16687,16688],{},"Ops bliss – Rolling upgrades? Kill and redeploy without data shuffles.",[324,16690,16691],{},"Elastic scalability - instantly add or remove brokers to your cluster without having to copy data",[32,16693,16631],{"id":16694},"gotchas-1",[321,16696,16697,16700],{},[324,16698,16699],{},"More moving parts – You must run external storage (BookKeeper, S3, etc.) and a metadata service.",[324,16701,16702],{},"Extra hop – Reads may fetch from storage instead of a local disk, adding a tiny overhead.",[48,16704,16645],{},[321,16706,16707,16710,16713,16716],{},[324,16708,16709],{},"Kafka – Classic mode is stateful (data lives on broker disks).",[324,16711,16712],{},"Redpanda – Also stateful; each node stores logs locally just like Kafka.",[324,16714,16715],{},"Pulsar – Brokers are proudly stateless; BookKeeper holds the data.",[324,16717,16718],{},"Ursa – Stateless ++; any broker can serve any partition (thanks to Leaderless, next section).",[48,16720,16721],{},"👉 Use stateless when you crave effortless scaling or multi‑tenant isolation. Choose stateful if you prefer one simple box that “just works.”",[40,16723,16725],{"id":16724},"_3-leaderless-no-single-boss-here","3. Leaderless – No Single Boss Here 🤝",[48,16727,16728],{},[384,16729],{"alt":18,"src":16730},"\u002Fimgs\u002Fblogs\u002F6811cb24f1ba54327a3ec4cc_AD_4nXfyHjnkSyx0CYg5I_lJkSRt67vOkRg45K5U9OFoHOJO4-huDskfL1rsq42BwCa_kkkjY3j3JmriW7l2fSPftg4p3anqReCYj_g0t2EmkN_cymzH6HHTxSEwbrutp2nvtSLN38164g.png",[32,16732,16609],{"id":16733},"what-it-means-2",[48,16735,16736],{},"There’s no “captain” broker for a partition. Any broker can accept writes; ordering is managed by a shared “scoreboard” (a fast metadata\u002Findex service).",[32,16738,16616],{"id":16739},"why-its-cool-2",[321,16741,16742,16745],{},[324,16743,16744],{},"Failover magic – A broker dies? Clients just talk to another—no election delay.",[324,16746,16747],{},"Spread the load – Hot partitions aren’t locked to one über‑busy leader.",[32,16749,16631],{"id":16750},"gotchas-2",[321,16752,16753,16756],{},[324,16754,16755],{},"New brain to babysit – That metadata service (Oxia, etcd, Spanner) is now mission‑critical. Keep it HA and low‑latency.",[324,16757,16758],{},"Two‑hop writes – Broker ➜ metadata ➜ storage adds a smidge of latency.",[48,16760,16645],{},[321,16762,16763,16766,16769,16772],{},[324,16764,16765],{},"Kafka – Leader‑based (for now).",[324,16767,16768],{},"Redpanda – Same leader‑follower pattern as Kafka.",[324,16770,16771],{},"Pulsar – One broker owns a topic at any moment ⇒ still leader‑based.",[324,16773,16774],{},"Ursa – Fully leaderless; brokers & storage coordinate via Oxia.",[48,16776,16777],{},"👉 Use leaderless if you need five‑nines uptime, global clusters, or you’re tired of hot‑leader bottlenecks. Stick with leaders when ultra‑low latency or simple ops trump everything.",[40,16779,16780],{"id":9825},"Putting It All Together 🏆",[48,16782,16783],{},[384,16784],{"alt":5878,"src":16785},"\u002Fimgs\u002Fblogs\u002F681281dc64384c1b68568838_Screenshot-2025-04-30-at-1.02.18-PM.png",[40,16787,16789],{"id":16788},"takeaways","Takeaways",[321,16791,16792,16795,16798],{},[324,16793,16794],{},"Diskless slashes storage costs and boosts elasticity.",[324,16796,16797],{},"Stateless makes brokers cattle, not pets.",[324,16799,16800],{},"Leaderless removes single points of failure (but needs a rock‑solid metadata brain).",[48,16802,16803],{},"Mix & match based on your pain point—storage cost, scaling headaches, or availability.",[40,16805,16807],{"id":16806},"ready-to-play-with-the-three-amigos","Ready to play with the Three Amigos? 🎲",[1666,16809,16810,16813,16816],{},[324,16811,16812],{},"Spin up Pulsar to feel the stateless joy of instant topic reassignment.",[324,16814,16815],{},"Give Ursa a whirl to taste the full diskless + stateless + leaderless combo.",[324,16817,16818,16819,16822],{},"Join our free two‑day ",[55,16820,5383],{"href":15752,"rel":16821},[264]," (May 28‑29) for live demos, real‑world war stories, and Q&A with the folks building these systems.",[48,16824,16825],{},"See you on the stream! 🚀",[48,16827,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":16829},[16830,16831,16836,16841,16846,16847,16848],{"id":16578,"depth":19,"text":16579},{"id":16599,"depth":19,"text":16600,"children":16832},[16833,16834,16835],{"id":16608,"depth":279,"text":16609},{"id":16615,"depth":279,"text":16616},{"id":16630,"depth":279,"text":16631},{"id":16665,"depth":19,"text":16666,"children":16837},[16838,16839,16840],{"id":16674,"depth":279,"text":16609},{"id":16680,"depth":279,"text":16616},{"id":16694,"depth":279,"text":16631},{"id":16724,"depth":19,"text":16725,"children":16842},[16843,16844,16845],{"id":16733,"depth":279,"text":16609},{"id":16739,"depth":279,"text":16616},{"id":16750,"depth":279,"text":16631},{"id":9825,"depth":19,"text":16780},{"id":16788,"depth":19,"text":16789},{"id":16806,"depth":19,"text":16807},"2025-04-30","Learn how diskless, stateless & leaderless architectures transform modern data streaming architectures—cutting infrastructure costs, easing scale, boosting uptime.","\u002Fimgs\u002Fblogs\u002F6814230cb271e5a961040ae1_comic-cost-less.jpg",{},"\u002Fblog\u002Fdiskless-stateless-leaderless-a-comic-guide-to-modern-data-streaming","3 mins",{"title":16569,"description":16850},"blog\u002Fdiskless-stateless-leaderless---a-comic-guide-to-modern-data-streaming",[799,821,1332],"7CZALyw2xgaIABqIniUSjbBX2TEQh-nimjC-yC0nqM0",{"id":16860,"title":16861,"authors":16862,"body":16863,"category":5376,"createdAt":290,"date":16849,"description":16977,"extension":8,"featured":294,"image":16978,"isDraft":294,"link":290,"meta":16979,"navigation":7,"order":296,"path":16980,"readingTime":16981,"relatedResources":290,"seo":16982,"stem":16983,"tags":16984,"__hash__":16986},"blogs\u002Fblog\u002Fthe-data-streaming-summit-virtual-2025-schedule-is-now-live.md","The Data Streaming Summit Virtual 2025 Schedule is Now Live!",[6127],{"type":15,"value":16864,"toc":16966},[16865,16868,16871,16874,16877,16880,16884,16904,16908,16911,16915,16918,16922,16925,16929,16932,16936,16939,16943,16946,16950,16953,16961,16963],[48,16866,16867],{},"Theme: Unlocking Real-Time AI\n📅 May 28–29, 2025 | 💻 Virtual Event",[48,16869,16870],{},"We’re excited to announce that the full schedule for Data Streaming Summit Virtual 2025 is now available!",[48,16872,16873],{},"After receiving an overwhelming number of high-quality submissions, we've expanded this year's event to two full days, featuring:",[48,16875,16876],{},"✅ 6 breakout tracks across two days\n✅ 37+ breakout sessions\n✅ Visionary keynotes\n✅ Live speaker Q&A",[48,16878,16879],{},"This year's theme, \"Unlocking Real-Time AI,\" explores how real-time data streaming is becoming the foundation for AI-native systems, analytics platforms, and intelligent architectures.",[40,16881,16883],{"id":16882},"keynotes-you-cant-miss","🌟 Keynotes You Can't Miss",[321,16885,16886,16889,16892,16895,16898,16901],{},[324,16887,16888],{},"Data Streaming for the Agentic EraThe StreamNative team shapes the vision for streaming-powered intelligent and agentic systems.",[324,16890,16891],{},"A Deep Dive into Apache Flink 2.0Xintong Song (Alibaba), the release manager for Flink 2.0, discusses the future of Flink and unified stream processing.",[324,16893,16894],{},"Pulsar as the Center of the StackJeff Bolle and Daniel Shaver (Q6 Cyber) share how Pulsar tamed 85+ billion cyberthreat records for real-time fraud prevention.",[324,16896,16897],{},"Fluss: Reinventing Kafka for the Real-Time LakehouseJark Wu (Alibaba), introduces Fluss, reimagining Kafka for the streaming lakehouse world",[324,16899,16900],{},"Building the Next Generation of Real-Time Data Pipelines at NetflixGuil Pires and Sujay Jain (Netflix), show how Netflix combines Data Mesh and Streaming SQL for next-gen pipelines.",[324,16902,16903],{},"Flink Safe Deployment at UberYusheng Chen (Uber) shares how Uber achieves safe, large-scale Flink deployments.",[40,16905,16907],{"id":16906},"_6-curated-tracks-across-two-days","🧠 6 Curated Tracks Across Two Days",[48,16909,16910],{},"We’ve carefully organized the summit into six focused breakout tracks to cover the full spectrum of real-time data innovation:",[32,16912,16914],{"id":16913},"_1-2-tech-deep-dives-pulsar-kafka-ursa-flink","1. & 2. Tech Deep Dives: Pulsar, Kafka, Ursa, Flink",[48,16916,16917],{},"This year, we’re running two \"Tech Deep Dives\" tracks to accommodate the volume of highly technical sessions. You'll find deep architectural insights, operational best practices, and innovations across Apache Pulsar, Apache Kafka, StreamNative Ursa, Apache Flink, Apache Spark, and more.",[32,16919,16921],{"id":16920},"_3-use-cases-real-time-analytics","3. Use Cases: Real-Time Analytics",[48,16923,16924],{},"Real-world implementations of real-time data systems powering customer-facing analytics, threat intelligence, and other dynamic applications by leveraging data streaming.",[32,16926,16928],{"id":16927},"_4-use-cases-streaming-lakehouse","4. Use Cases: Streaming Lakehouse",[48,16930,16931],{},"Sessions dedicated to integrating streaming with modern lakehouse architectures, with a special focus on Apache Iceberg and streaming ingestion patterns.",[32,16933,16935],{"id":16934},"_5-ai-track","5. AI Track",[48,16937,16938],{},"Exploring how streaming architectures power Agentic AI systems - from event-driven agent orchestration to retrieval-augmented generation (RAG) pipelines.",[32,16940,16942],{"id":16941},"_6-stream-processing-track","6. Stream Processing Track",[48,16944,16945],{},"Focused on the future of stream processing frameworks like KStreams, KSQL, Flink, Spark, including unified streaming-batch models, state management innovations, and streaming SQL.",[40,16947,16949],{"id":16948},"register-now","🚀 Register Now!",[48,16951,16952],{},"Whether you’re scaling real-time analytics, building agentic AI systems, or designing modern streaming architectures - Data Streaming Summit Virtual 2025 is the event you can’t miss.",[48,16954,16955,16956],{},"👉",[55,16957,16960],{"href":16958,"rel":16959},"https:\u002F\u002Fdatastreaming-summit.org\u002Fevent\u002Fdata-streaming-virtual-2025\u002Fschedule",[264]," Explore the full schedule and register today!",[48,16962,3931],{},[48,16964,16965],{},"Let's shape the future of real-time data streaming together!",{"title":18,"searchDepth":19,"depth":19,"links":16967},[16968,16969,16976],{"id":16882,"depth":19,"text":16883},{"id":16906,"depth":19,"text":16907,"children":16970},[16971,16972,16973,16974,16975],{"id":16913,"depth":279,"text":16914},{"id":16920,"depth":279,"text":16921},{"id":16927,"depth":279,"text":16928},{"id":16934,"depth":279,"text":16935},{"id":16941,"depth":279,"text":16942},{"id":16948,"depth":19,"text":16949},"The Data Streaming Summit Virtual 2025 (May 28–29) has unveiled its expanded two-day schedule, featuring 6 tracks, 37+ sessions, and visionary keynotes from Alibaba, Netflix, Uber, and more, all centered on the theme “Unlocking Real-Time AI” to explore how data streaming powers AI-native systems, intelligent architectures, and next-gen analytics.","\u002Fimgs\u002Fblogs\u002F6809c3902af8c62478bbba98_DSSV25-social-media-XL-v2.0.png",{},"\u002Fblog\u002Fthe-data-streaming-summit-virtual-2025-schedule-is-now-live","2 min",{"title":16861,"description":16977},"blog\u002Fthe-data-streaming-summit-virtual-2025-schedule-is-now-live",[5376,16985,303],"Kubernetes","6q77gt2Q39jxLOCxr76vOaxm4CF4fsJDjtiCLDnKk_4",{"id":16988,"title":16989,"authors":16990,"body":16991,"category":3550,"createdAt":290,"date":17157,"description":17158,"extension":8,"featured":294,"image":17159,"isDraft":294,"link":290,"meta":17160,"navigation":7,"order":296,"path":4788,"readingTime":17161,"relatedResources":290,"seo":17162,"stem":17163,"tags":17164,"__hash__":17165},"blogs\u002Fblog\u002Fannouncing-ursa-engine-preview-on-gcp.md","StreamNative Ursa Expands to Google Cloud with Public Preview Release",[806,311],{"type":15,"value":16992,"toc":17148},[16993,16996,17000,17007,17025,17028,17032,17036,17039,17050,17052,17063,17065,17068,17127,17131,17134,17145],[48,16994,16995],{},"We’re thrilled to share that Ursa, our leaderless, lakehouse-native data streaming engine, is now available in Public Preview on Google Cloud Platform (GCP)—building on the strong momentum of our General Availability launch on AWS. This exciting expansion delivers the same cost-efficient, high-performance streaming capabilities to organizations looking to modernize their data infrastructure on GCP.",[40,16997,16999],{"id":16998},"recap-ursas-general-availability-on-aws","Recap: Ursa’s General Availability on AWS",[48,17001,17002,17003,17006],{},"Since ",[55,17004,17005],{"href":6864},"launching Ursa on AWS",", we’ve helped organizations tackle complex data streaming challenges while reducing both infrastructure costs and operational overhead. Here are a few highlights:",[321,17008,17009,17016,17019,17022],{},[324,17010,17011,17012,17015],{},"Significant Cost Savings: Customers have reported up to a 10x reduction in total costs compared to legacy leader-based architectures such as Kafka and Redpanda. ",[55,17013,17014],{"href":10357},"Our cost benchmark report"," details how Ursa sustains a 5GB\u002Fs Kafka workload at just 5% of the cost of traditional streaming engines.",[324,17017,17018],{},"Leaderless Architecture: By removing expensive inter-zone data transfers and eliminating single points of failure, Ursa’s design simplifies scaling, reduces networking expenses, and dramatically lowers operational headaches.",[324,17020,17021],{},"Lakehouse-Native Storage: Ursa is the first and only data streaming solution with a storage engine built on open lakehouse formats (Iceberg and Delta Lake). By embedding data schemas directly in the storage layer and taking advantage of columnar compression, Ursa delivers 10x or more storage reductions.",[324,17023,17024],{},"Customer Success & Growth: Numerous enterprise deployments demonstrate Ursa’s real-world gains in throughput, reliability, and cost efficiency.",[48,17026,17027],{},"These successes on AWS have laid the groundwork for our new venture on GCP, where we’re excited to extend the same benefits to GCP users.",[40,17029,17031],{"id":17030},"introducing-ursa-on-gcp-public-preview","Introducing Ursa on GCP Public Preview",[32,17033,17035],{"id":17034},"why-gcp","Why GCP?",[48,17037,17038],{},"Many of our customers—alongside new prospects—expressed interest in running Ursa within Google Cloud. Whether you’re already on GCP or pursuing a multi-cloud strategy, Ursa on GCP enables you to:",[321,17040,17041,17044,17047],{},[324,17042,17043],{},"Leverage Native Services: Integrate seamlessly with BigQuery, Pub\u002FSub, and Cloud Storage.",[324,17045,17046],{},"Scale Confidently: Achieve high throughput and cost-efficient streaming for latency-relaxed workloads—without the bottlenecks of leader-based systems.",[324,17048,17049],{},"Reduce TCO: Avoid the high costs and complexities of leader-based solutions, thanks to Ursa’s leaderless & stateless architecture.",[32,17051,14609],{"id":14608},[1666,17053,17054,17057,17060],{},[324,17055,17056],{},"Ursa on GCP BYOC (Bring Your Own Cloud)Deploy Ursa into your own GCP environment, maintaining full control over your infrastructure and data security. Enjoy the elasticity of GCP compute and networking services while adhering to internal compliance requirements.",[324,17058,17059],{},"Integration with Databricks Unity Catalog & Snowflake Open CatalogConnect Ursa streams directly to your lakehouse governance tools. Centralize data policies, lineage, and security for both streaming and batch workflows—without duplicating schemas or permissions across multiple systems.",[324,17061,17062],{},"Universal Linking on GCPSeamlessly migrate from Apache Kafka to Ursa with zero downtime and minimal risk. Universal Linking allows incremental migration of Kafka workloads so that you can transition critical streaming applications without service disruptions.",[40,17064,2890],{"id":749},[48,17066,17067],{},"Ready to try Ursa on GCP? Follow these steps to kick off your journey:",[1666,17069,17070,17078,17098],{},[324,17071,17072,17073],{},"Free Trial: To get started with StreamNative Cloud, ",[55,17074,17077],{"href":17075,"rel":17076},"https:\u002F\u002Fconsole.streamnative.cloud\u002F",[264],"sign up for a free trial account.",[324,17079,17080,17081,17086,17087,17092,17093],{},"Deploy Ursa in GCP: ",[55,17082,17085],{"href":17083,"rel":17084},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fstreamnative-cluster-overview",[264],"Learn about StreamNative Clusters",", and follow ",[55,17088,17091],{"href":17089,"rel":17090},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fmanage-byoc-clusters",[264],"steps to create a BYOC cluster based on Ursa engine in Google Cloud Platform",". Watch the following videos for additional information:",[55,17094,17097],{"href":17095,"rel":17096},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-BmxsE3q4W5QnrusLyYt9_HbX4R7vEN",[264],"Set up the BYOC environment",[324,17099,17100,17105,17106,1154,17110,17115,17116,17121,17122,17126],{},[55,17101,17104],{"href":17102,"rel":17103},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-BmxsE3q4VUjNa5z6e8SwpkTxa7MjZc",[264],"Provision a BYOC (Bring Your Own Cloud) Ursa cluster","\nExplore Key Integrations: Configure Ursa to work with ",[55,17107,1185],{"href":17108,"rel":17109},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-databricks-unitycatalog",[264],[55,17111,17114],{"href":17112,"rel":17113},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-snowflake-open-catalog",[264],"Snowflake Open Catalog",", and see how streamlined governance can be.Leverage Universal Linking: Seamlessly ",[55,17117,17120],{"href":17118,"rel":17119},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Funilink-overview",[264],"replicate or migrate your Kafka workloads"," to Ursa with minimal disruption. For comprehensive guidance on migrating Kafka workloads, download the ",[55,17123,17125],{"href":17124},"\u002Fwhitepapers\u002Fkafka-migration-guide","Kafka Migration Guide",", which offers detailed insights on transitioning from existing Kafka services to StreamNative.",[40,17128,17130],{"id":17129},"looking-ahead","Looking Ahead",[48,17132,17133],{},"The Public Preview of Ursa on GCP marks an important step in our commitment to delivering a multi-cloud, leaderless, and lakehouse-native data streaming platform. As we progress toward General Availability on GCP, here’s what you can expect:",[321,17135,17136,17139,17142],{},[324,17137,17138],{},"Deeper Integrations with Google Cloud: Expanded support for governance, monitoring, and machine learning workflows.",[324,17140,17141],{},"Enhanced Cost Optimization: New features aligning with GCP’s billing model to further reduce the total cost of ownership.",[324,17143,17144],{},"Ongoing Improvements: Enhancements to resiliency, performance, and user experience—ensuring a frictionless cloud-native and lakehouse-native solution.",[48,17146,17147],{},"We’re excited to see how you’ll harness Ursa’s capabilities on GCP to power real-time insights and supercharge your data-driven initiatives.",{"title":18,"searchDepth":19,"depth":19,"links":17149},[17150,17151,17155,17156],{"id":16998,"depth":19,"text":16999},{"id":17030,"depth":19,"text":17031,"children":17152},[17153,17154],{"id":17034,"depth":279,"text":17035},{"id":14608,"depth":279,"text":14609},{"id":749,"depth":19,"text":2890},{"id":17129,"depth":19,"text":17130},"2025-04-29","Ursa Engine—a Lakehouse-Native data streaming engine with 95% Kafka cost savings—is now available in Public Preview on GCP. Following the success of our General Availability launch on AWS, this exciting expansion brings the same cost-effective, high-performance streaming capabilities to organizations eager to modernize their data infrastructure on GCP.","\u002Fimgs\u002Fblogs\u002F68119987d4a226be50e397a8_Ursa-on-GCP_Public-Review.png",{},"6 min",{"title":16989,"description":17158},"blog\u002Fannouncing-ursa-engine-preview-on-gcp",[1332,800,5954],"N9O4Dt1hLs38ypNu9YZaK4T_hT91cIggYlT0BT2nTNU",{"id":17167,"title":17168,"authors":17169,"body":17170,"category":5376,"createdAt":290,"date":17257,"description":17258,"extension":8,"featured":294,"image":16978,"isDraft":294,"link":290,"meta":17259,"navigation":7,"order":296,"path":17260,"readingTime":16981,"relatedResources":290,"seo":17261,"stem":17262,"tags":17263,"__hash__":17264},"blogs\u002Fblog\u002Fdata-streaming-summit-virtual-2025-is-now-a-two-day-event---may-28-29.md","Data Streaming Summit Virtual 2025 Is Now a Two‑Day Event – May 28‑29",[6127],{"type":15,"value":17171,"toc":17251},[17172,17175,17178,17182,17193,17197,17200,17217,17221,17232,17236,17239,17246,17249],[48,17173,17174],{},"When we opened the Call for Papers for Data Streaming Summit Virtual 2025, we hoped to surface the community’s best ideas. What we received blew us away. In just a few short weeks, a wave of deeply technical, production‑tested, and forward‑looking proposals poured in—from the people building Kafka clusters that move petabytes a day, to the engineers powering Pulsar‑backed real‑time analytics, to data scientists marrying data streaming with AI.",[48,17176,17177],{},"The verdict was clear: one day simply isn’t enough. We’re extending the summit from a single‑day program on May 29 to a full two‑day experience on May 28‑29.",[40,17179,17181],{"id":17180},"why-the-expansion","Why the Expansion?",[321,17183,17184,17187,17190],{},[324,17185,17186],{},"High‑volume, high‑quality submissions. Reviewers agreed the majority of proposals deserved a stage. Doubling the runtime lets us showcase more of them.",[324,17188,17189],{},"Broader technology coverage. Topics now span Kafka, Pulsar, Ursa, Flink, Iceberg, and cutting‑edge AI pipelines—reflecting how today’s streaming stacks are converging with lakehouse and AI ecosystems.",[324,17191,17192],{},"Deeper, more focused tracks. With an extra day we can separate advanced deep dives from practical implementation talks so you never have to choose between them.",[40,17194,17196],{"id":17195},"what-to-expect","What to Expect?",[48,17198,17199],{},"Three breakout tracks, two packed days",[321,17201,17202,17205,17208,17211,17214],{},[324,17203,17204],{},"Tech Deep Dives – Engine internals, performance tuning, and architecture walk‑throughs for Pulsar, Kafka, Ursa, and Flink\nUse Cases –",[324,17206,17207],{},"May 28 – Customer‑Facing Analytics: Real‑time pipelines that deliver instant insights to end users.",[324,17209,17210],{},"May 29 – Streaming‑Augmented Lakehouse: Building lakehouses that merge streaming data with Iceberg and other open table formats.\nAI + Stream Processing –",[324,17212,17213],{},"May 28 – AI: Event‑driven, multi‑agent, real‑time pipelines that operationalize generative & agentic AI at scale.",[324,17215,17216],{},"May 29 – Stream Processing: Next‑gen engines and stateful frameworks powering low‑latency, lakehouse‑ready pipelines.",[40,17218,17220],{"id":17219},"key-milestones","Key Milestones",[321,17222,17223,17226,17229],{},[324,17224,17225],{},"Speaker confirmations: in progress",[324,17227,17228],{},"Full schedule: publishes in one week – keep an eye on your inbox and our social channels",[324,17230,17231],{},"Registration: open now – lock in your free virtual pass today",[40,17233,17235],{"id":17234},"secure-your-spot","Secure Your Spot",[48,17237,17238],{},"Join thousands of developers, architects, and data leaders online for two immersive days of real‑time data innovation. Whether you’re battling Kafka cloud costs, scaling a Pulsar deployment, or exploring AI‑ready lakehouse patterns, you’ll find peers who’ve been there—and lessons you can apply immediately.",[48,17240,14089,17241,17245],{},[55,17242,17244],{"href":15752,"rel":17243},[264],"Sign up now"," and be the first to receive the detailed agenda when it drops next week.",[48,17247,17248],{},"We can’t wait to see you (twice as long!) on May 28‑29. Stay tuned and keep streaming.",[48,17250,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":17252},[17253,17254,17255,17256],{"id":17180,"depth":19,"text":17181},{"id":17195,"depth":19,"text":17196},{"id":17219,"depth":19,"text":17220},{"id":17234,"depth":19,"text":17235},"2025-04-24","Get ready for an exciting transformation! The Data Streaming Summit Virtual 2025 is now a two-day event on May 28-29, thanks to an incredible response to our Call for Papers. Tailored for developers, architects, and data leaders, this summit is your gateway to actionable insights that can revolutionize your projects. Dive deep into optimizing Kafka and Pulsar deployments, scaling cutting-edge AI-driven pipelines, and seamlessly integrating streaming with lakehouse architectures. Connect with peers and uncover real-world solutions to pressing challenges, from slashing cloud costs to harnessing the power of generative AI. Don’t miss this opportunity to elevate your skills and network with industry leaders!",{},"\u002Fblog\u002Fdata-streaming-summit-virtual-2025-is-now-a-two-day-event-may-28-29",{"title":17168,"description":17258},"blog\u002Fdata-streaming-summit-virtual-2025-is-now-a-two-day-event---may-28-29",[5376,799,821],"NUdRmXjlDhes_MvZuB_FYaEsn3ZO6dfjBJE-3jJbJD8",{"id":17266,"title":16221,"authors":17267,"body":17268,"category":290,"createdAt":290,"date":17257,"description":17436,"extension":8,"featured":294,"image":17437,"isDraft":294,"link":290,"meta":17438,"navigation":7,"order":296,"path":15615,"readingTime":17439,"relatedResources":290,"seo":17440,"stem":17441,"tags":17442,"__hash__":17443},"blogs\u002Fblog\u002Fopen-standards-real-time-ai-mcp.md",[806],{"type":15,"value":17269,"toc":17429},[17270,17274,17281,17284,17288,17291,17294,17297,17300,17304,17317,17320,17323,17326,17329,17349,17352,17355,17358,17361,17364,17369,17372,17375,17379,17382,17390,17393,17398,17401,17404,17407,17410,17413,17416,17420,17423,17426],[40,17271,17273],{"id":17272},"recap-why-ai-agents-need-a-real-time-event-bus","Recap: Why AI Agents Need a Real-Time Event Bus",[48,17275,17276,17277,17280],{},"In our previous post “",[55,17278,17279],{"href":15610},"AI Agents Meet Real-Time Data – Bridging the Gap","”, we highlighted how today’s AI agents often operate in fragmented silos, each with limited awareness of what others know or what’s happening in the world around them. Even the smartest models are typically isolated from fresh data, relying only on their built-in training knowledge​. This isolation means agents struggle to coordinate or to incorporate new information on the fly. We introduced the idea of a shared real-time event bus as a remedy. By using a streaming event bus, multiple AI agents (and data sources) can publish and subscribe to live information in a common channel. This architecture lets agents share context (events, facts, signals) in real time, giving them a sort of collective memory and enabling dynamic coordination and situational awareness. The takeaway was that a data streaming backbone can serve as the “meeting place” for AI agents – a place where they continuously exchange knowledge and triggers, rather than remaining isolated.",[48,17282,17283],{},"However, sharing data among agents is only part of the challenge. Equally important is how agents connect to the outside world – to the tools, databases, and services where context resides. In the last post, we hinted that beyond a real-time event bus for agent-to-agent communication, we need a straightforward way for agents to tap into external systems in real time. In this follow-up, we’ll tackle that next piece of the puzzle. Before diving in, let’s look at a common struggle developers face when integrating AI agents with external tools using ad-hoc methods.",[40,17285,17287],{"id":17286},"a-developers-dilemma-one-off-integrations-everywhere","A Developer’s Dilemma: One-off Integrations Everywhere",[48,17289,17290],{},"Consider a developer named Alex, who is building an AI assistant for customer support. Alex wants this assistant to answer user questions by pulling data from various sources – customer profiles in a database, ticket histories from a helpdesk system, even real-time sales stats from a dashboard API. Excited to get started, Alex begins wiring these data sources into the AI agent one by one.",[48,17292,17293],{},"At first, the approach seems straightforward: write a script to call the customer database’s REST API, embed that in the agent’s code, then do something similar for the helpdesk API. But very soon, Alex hits a wall of integration complexity. Each tool requires a different approach – different authentication, different query language, different response formats. There’s no consistency. For every new capability, Alex ends up writing bespoke glue code (or custom prompts) to bridge the AI with that system​. One week it’s a Salesforce CRM, the next it’s a legacy SQL database – each time a completely new one-off connector.",[48,17295,17296],{},"Alex tries using some agent frameworks hoping to simplify the work. Frameworks like LangChain provide abstractions, but still rely on individual connectors for each data source. That means hunting down (or writing) a plugin for every service. After integrating a handful of systems, the code has turned into a fragile patchwork of adapters. Maintaining these custom integrations is difficult and time-consuming​. When an API changes or a new data source is added, it’s back to square one. The lack of a common interface for tools is causing a real headache.",[48,17298,17299],{},"This scenario is all too common. Proprietary integration solutions exist (for example, some LLM vendors offer plugin frameworks or closed APIs to connect data), but they often come with limitations. They might tie the solution to a single AI provider or support only a narrow range of services. Alex realizes that relying on proprietary, siloed integrations is not a sustainable strategy. It’s like building a new custom adapter for every single peripheral on your computer – imagine having to write a new driver for your mouse, keyboard, and printer separately! It’s clear there must be a better way – a more unified and open approach to connect AI agents with the rich ecosystem of tools and data out there.",[40,17301,17303],{"id":17302},"enter-mcp-an-open-standard-for-ai-tool-integration","Enter MCP: An Open Standard for AI-Tool Integration",[48,17305,17306,17307,17310,17311,17316],{},"The good news is the industry has recognized this integration problem, and an answer has emerged: ",[55,17308,3583],{"href":15620,"rel":17309},[264],". MCP is an open standard, ",[55,17312,17315],{"href":17313,"rel":17314},"https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fmodel-context-protocol",[264],"introduced by Anthropic in late 2024",", that provides a consistent way for AI agents to interface with external systems​. In essence, MCP defines a common language that lets any AI agent talk to any tool or data source that speaks that language. Instead of building a dozen one-off integrations, a developer like Alex can use MCP as a single, universal connector.",[48,17318,17319],{},"Think of MCP as the “universal port” or “USB-C for AI” – a standardized interface that replaces all those bespoke adapters​. Just as USB-C plugs allow many types of devices to connect through one port, MCP lets an AI agent plug into many different services through one protocol. Some have even called MCP the OpenAPI for AI agents, drawing an analogy to how OpenAPI standardized web service definitions​. The core idea is the same: rather than every AI-tool integration being custom, we define a common protocol so that tools and AI agents can interoperate easily.",[48,17321,17322],{},"So how does MCP work? At a high level, it uses a client–server architecture to mediate between an AI and external resources​. The AI agent (or the application backing it) includes an MCP client component, and for each external tool or data source you want to integrate, there is an MCP server component. The MCP server is essentially an adapter or wrapper around that tool – it exposes the tool’s functions and data in a standard way. The AI agent’s MCP client connects to any number of such servers, and because communication follows the MCP standard (built on JSON-RPC 2.0 messaging), the agent can invoke operations or fetch data without needing to know the low-level details of the tool’s API.",[48,17324,17325],{},"What does this look like in practice? When Alex uses MCP, he would run an MCP server for each system (one for the customer database, one for the helpdesk, etc.). Each server defines a set of “tools” and “resources” that it offers to the AI. For example, a Database MCP server might offer a tool called “queryCustomers” that takes a customer ID and returns details, or a Helpdesk MCP server might have a tool “findTickets” for retrieving support ticket histories. When the AI agent needs some info or action, it doesn’t call the database or helpdesk API directly – it asks the MCP client to invoke the appropriate tool on the respective MCP server. The server then translates that request into the actual query or API call to the underlying system, and returns the result in a normalized format the AI can understand.",[48,17327,17328],{},"This setup brings several benefits:",[321,17330,17331,17334,17337,17340,17343,17346],{},[324,17332,17333],{},"No more one-off glue code: As long as a tool has an MCP server, any AI agent can use it via the standard protocol. Developers don’t have to reinvent the integration for each new agent or project​. As Anthropic’s announcement put it, MCP replaces fragmented integrations with a single universal protocol, making it much easier to give AI access to the data it needs​.",[324,17335,17336],{},"Discoverability: MCP is designed so that an AI agent can discover what capabilities (tools\u002Fresources) are available on a server at runtime. The agent can list the tools and resources an MCP server provides, along with how to call them (expected parameters, etc.). This means the agent isn’t hardcoded for specific tools – new tools can be added to the server and the AI will know via discovery.",[324,17338,17339],{},"Rich, structured interactions: Tools exposed via MCP can do a lot. They might allow the AI to query data, retrieve documents, or execute actions in external systems. For instance, an AI agent could: Pull records from a customer database (e.g. “get all orders from last week”)​",[324,17341,17342],{},"Retrieve documents from a knowledge base or cloud storage (e.g. “open the design spec file from SharePoint”)",[324,17344,17345],{},"Call external APIs or services (e.g. invoke a weather API, send an email via an SMTP service)",[324,17347,17348],{},"Perform system actions like writing to a file or kicking off a script (if allowed by an MCP server bridging to an OS or DevOps tool)\nAll such interactions follow a consistent request\u002Fresponse pattern defined by MCP, using JSON structures. The AI receives results in a structured format (JSON objects, lists, etc.) that it can easily parse and incorporate into its reasoning.",[48,17350,17351],{},"Two-way, real-time communication: MCP isn’t just for the AI to pull data – it also allows the AI to push or take actions (with proper authorization). It establishes a secure, two-way channel. An AI agent can thus perform tasks like creating a new support ticket or updating a record in real time​. Because the protocol is designed to be efficient, these interactions can happen within an ongoing conversation or agent loop without noticeable lag.",[48,17353,17354],{},"Security and governance: As an open standard, MCP has built-in hooks for encryption and access control. Each MCP server can enforce authentication, permissions, and even user approval for certain actions​. This is crucial when giving AI agents access to sensitive tools – you can ensure the AI only does what it’s permitted to. Since MCP interactions are structured, it’s also easier to audit what the AI requested and what was returned, compared to parsing arbitrary natural language commands.",[48,17356,17357],{},"Model-agnostic and flexible: Perhaps one of the biggest advantages of MCP being open is that it’s model-agnostic. It’s not tied to Claude or GPT or any single AI system. Any AI client that implements the MCP protocol can talk to any MCP server​. This means if Alex builds his tools with MCP, he could use them with different AI platforms – today maybe with Anthropic’s Claude, tomorrow with an open-source LLM or another vendor’s agent that supports MCP. The tools and the agents are decoupled by the standard. It also encourages a community ecosystem: indeed, since MCP’s launch, an open-source community has sprung up building MCP servers for many common services (Google Drive, Slack, GitHub, databases, etc.)​. Alex might not even need to write his own servers for common tools – he could find pre-built ones and just plug them in.",[48,17359,17360],{},"In short, MCP provides the unified integration layer that Alex was missing. Instead of his AI agent having five different integration mechanisms for five tools, it has one mechanism (MCP) to talk to all of them. This dramatically reduces the complexity of his system. As one summary aptly put it: MCP turns an N×M integration problem into an N+M problem​. In other words, if you have N agents and M tools, traditionally you might worry about wiring every agent to every tool (N*M integrations); with MCP, you just ensure each agent speaks MCP and each tool has an MCP interface, and they can mix-and-match freely.",[48,17362,17363],{},"To visualize how MCP facilitates an interaction, consider a simple example of a single AI assistant answering a question using a database via MCP. The sequence might look like this:",[48,17365,17366],{},[384,17367],{"alt":18,"src":17368},"\u002Fimgs\u002Fblogs\u002F680a94bbbb67063281b5cd30_AD_4nXdCcv0LWyrqzhkmnGSc9VhIapbTqsqSTjsz_U1ojADDC3WgVV7a4R8xKQ-OWTFg3-p_E6B10lFHvXchcIx6ObtGpzZHem3OY8YO7pEbn6Xzja-LjZEov3XxQEk3Q27ZTB1uEId1.png",[48,17370,17371],{},"In the diagram above, notice how the AI assistant app didn’t query the database directly – it went through the MCP server. The server handled the details of executing the SQL query and simply returned the data in a standardized way. From the AI’s perspective, it just called a “querySales” tool and got data. This standardized, decoupled interaction is what makes MCP so powerful.",[48,17373,17374],{},"Now that we’ve seen what MCP is and how it works in isolation, let’s tie it back to the bigger picture introduced in the first blog post – combining MCP with a real-time event bus for a truly robust AI agent architecture.",[40,17376,17378],{"id":17377},"marrying-mcp-with-real-time-data-streams","Marrying MCP with Real-Time Data Streams",[48,17380,17381],{},"How does Model Context Protocol fit into the vision of a real-time event bus for AI agents? In many ways, MCP and a streaming data bus complement each other perfectly, each handling a different aspect of the agent ecosystem:",[321,17383,17384,17387],{},[324,17385,17386],{},"Real-time event bus = agents coordinating with each other (and with streaming data). This is the context highway. Agents publish events (observations, intermediate results, alerts) and subscribe to events from others or from external event producers. For example, one agent can publish “user just asked about order #12345” as an event, which another agent (or the same agent in a different mode) could listen for and use as a trigger to act. The bus ensures every agent has access to the latest facts and can react in a timely fashion. It’s excellent for decoupled communication, broad distribution of information, and logging a timeline of what’s happening.",[324,17388,17389],{},"MCP = agents accessing tools and services on demand. This is the action toolkit. When an agent needs to actually do something with an external system (read or write data, invoke a service), it uses MCP to make that happen in a standardized way. MCP is not about broadcasting to multiple listeners; it’s about a direct, secure exchange between an agent and a tool. It shines in enabling the agent to fetch specific context (like looking up a value) or perform a specific operation (like creating a calendar event) at the exact moment it’s needed.",[48,17391,17392],{},"In a unified architecture, an AI agent will leverage both. Let’s illustrate with a hypothetical scenario:",[48,17394,17395],{},[384,17396],{"alt":18,"src":17397},"\u002Fimgs\u002Fblogs\u002F680a94bb0544c7b1e8e70a19_AD_4nXfBKXTgXv4MC1k_VmE2MzwPutf2gECX1EPEVnYQyZIAt2P2NjfrenAFLWOjf76QjETYDrTz8eN-BqnZYPVkTFPxl32tCKArzXv3zxKHhb1BMa0OgHilHCCuJ5D97snvnaRQdtSg.png",[48,17399,17400],{},"Scenario: Automated Incident Response. Imagine a system with multiple agents: one monitors server logs, one analyzes issues, and one communicates to DevOps tools. They use a real-time event bus (say Apache Pulsar or Apache Kafka topics) to share information. When the monitoring agent detects an error in the logs, it publishes an “ErrorDetected” event onto the bus. The analysis agent subscribes to these events, and upon receiving it, needs more info to diagnose the issue. Here’s where MCP comes in: the analysis agent uses an MCP server for the logging system to retrieve the last 100 lines of logs around the error, or perhaps an MCP server for the metrics database to get the recent CPU usage. With those details (fetched via MCP in seconds), the agent figures out it’s a database connection issue. It then publishes an “IncidentAnalysis” event with findings. The third agent (DevOps agent) picks that up and decides to create a ticket in Jira and restart a service. It uses an MCP server for Jira to file a ticket and an MCP server for the cloud orchestration to restart the service. Finally, it emits a “ResolutionDeployed” event on the bus.",[48,17402,17403],{},"In that workflow, the event bus was the glue that held the multi-agent workflow together – it orchestrated the when and which agent does something. The MCP integrations provided the how each agent performed its part (gathering logs, creating a ticket, etc.). Real-time streaming made sure every agent had up-to-the-moment information, and MCP let agents turn decisions into actions on real-world systems, all in real time.",[48,17405,17406],{},"MCP doesn’t sit apart from the event bus—it can publish to and consume from it. Imagine an MCP server that wraps a temperature sensor: it answers a direct readTemp request, yet it also streams every new reading onto a sensors.temperature topic so every agent stays in the loop. Likewise, an agent can lift any message it gets from the bus and feed it straight into an MCP call—turning a raw event into an external action.",[48,17408,17409],{},"The two systems aren’t overlapping; they divide the workload. The event bus delivers high-fan-out, time-ordered updates so all agents share the same situational awareness, while MCP offers a secure, uniform interface for side-effecting operations. One keeps the brains in sync; the other gives them the muscles to act.",[48,17411,17412],{},"From a developer’s perspective, combining these open architectures yields a highly decoupled, observable, and scalable system—every interaction is traceable, so engineers can reason about what is happening (or has happened) across the agents. You can add new agents to the bus without breaking the others, and you can add new MCP-integrated tools without altering the agent logic – the agent will discover the new tools and can start using them as needed. It’s a plug-and-play ecosystem. StreamNative is bringing this vision to life by implementing an MCP server for our data streaming platform, so AI agents can subscribe to live data and invoke services in place. Agents always act on the freshest information, no custom pipelines required. This synergy between streaming and MCP defines the future we’re building at StreamNative.",[48,17414,17415],{},"Put simply, the AI agents are the brain, while the real-time event bus acts as the central nervous system that lets the brain communicate with the hands—MCP-integrated tools. Through this nervous system, agents can issue commands (“grab that pot on the stove”) and immediately receive feedback (“it’s very hot”). By listening and talking over the bus and acting through MCP, agents continuously bridge their reasoning with real-world context, making every decision timely and effective.",[40,17417,17419],{"id":17418},"looking-ahead-streamnative-mcp-open-integration-for-ai-agents","Looking Ahead: StreamNative + MCP = Open Integration for AI Agents",[48,17421,17422],{},"By aligning a real-time event bus architecture with open standards like MCP, we pave the way for the next generation of AI applications: ones that are context-rich, action-capable, and truly real-time. Developers will be able to build complex agent ecosystems without getting bogged down in integration plumbing – the infrastructure (streaming platform + MCP interfaces) will handle that, letting you focus on the higher-level logic and user experience.",[48,17424,17425],{},"At StreamNative, we’re excited about this vision. Our commitment to open source and open standards runs deep, and MCP fits right in with that ethos. In fact, we are actively working on an MCP server implementation to Model Context Protocol support for both Apache Kafka and Apache Pulsar—whether those clusters run on StreamNative Cloud or anywhere else. This will allow developers to easily connect their Pulsar or Kafka topics with MCP-enabled AI agents and tools, achieving seamless real-time coordination and tool access in one unified stack.",[48,17427,17428],{},"Stay tuned for more details on our upcoming MCP integration (we’ll be announcing it soon!). We believe it will greatly simplify building real-time AI solutions across environments. Imagine agents on Pulsar or Kafka topics that can, via MCP, query databases, call APIs, or update dashboards, all in a secure and standardized way – that’s what we’re building towards. We invite you to join us on this journey into open, real-time AI integration. The combination of a shared event bus and open tool protocol is poised to unlock a new level of capability for AI agents. We’re excited to see what you’ll build with it when everything comes together – truly autonomous, collaborative agents that are connected to both data and action in real time. Stay tuned!",{"title":18,"searchDepth":19,"depth":19,"links":17430},[17431,17432,17433,17434,17435],{"id":17272,"depth":19,"text":17273},{"id":17286,"depth":19,"text":17287},{"id":17302,"depth":19,"text":17303},{"id":17377,"depth":19,"text":17378},{"id":17418,"depth":19,"text":17419},"Discover how Model Context Protocol (MCP) enables real-time AI agent integration with tools and data using open standards and streaming architectures.","\u002Fimgs\u002Fblogs\u002F680b1ae78b1fbc104d8a4dad_agents-mcp.png",{},"6 minutes",{"title":16221,"description":17436},"blog\u002Fopen-standards-real-time-ai-mcp",[3988,3989],"q02-oQU9cUxKtLdzkpkM7Q_yw3vUCCnPIigTp18YdCM",{"id":17445,"title":16216,"authors":17446,"body":17447,"category":6415,"createdAt":290,"date":17602,"description":17603,"extension":8,"featured":294,"image":17604,"isDraft":294,"link":290,"meta":17605,"navigation":7,"order":296,"path":15610,"readingTime":17606,"relatedResources":290,"seo":17607,"stem":17608,"tags":17609,"__hash__":17610},"blogs\u002Fblog\u002Fai-agents-real-time-data-bridge.md",[806],{"type":15,"value":17448,"toc":17594},[17449,17452,17455,17459,17462,17473,17476,17479,17483,17486,17489,17506,17515,17520,17523,17527,17530,17533,17536,17540,17543,17548,17551,17554,17558,17561,17569,17572,17576,17579,17589,17592],[48,17450,17451],{},"Meet Riya, a software engineer at a growing tech startup. A few months ago, Riya built an AI support chatbot to handle customer queries. This AI agent quickly became a hero of the support team – it answered FAQs, helped troubleshoot issues, and learned from each interaction. Encouraged by its success, her team created more AI agents for other tasks: one to triage bug reports, another to analyze user feedback, and even an agent to monitor system logs for anomalies. Each agent was impressive on its own, perceiving, reasoning, and acting within its niche. Everything seemed great… at first.",[48,17453,17454],{},"But as these agents rolled out, Riya noticed something troubling. Each agent was operating in a silo. The support bot had no clue what the log monitoring agent discovered, and the feedback analyzer worked in a vacuum separate from the bug triage agent. The team had built powerful AI assistants, yet they weren’t talking to each other. Important information was getting stuck within one agent and never reaching the others. In short, the company’s AI landscape was becoming a patchwork of isolated intelligences. Riya realized they were facing a growing problem: their AI agents lacked a shared context and any way to coordinate.",[40,17456,17458],{"id":17457},"islands-of-automation-when-one-agent-isnt-enough","Islands of Automation: When One Agent Isn’t Enough",[48,17460,17461],{},"Riya’s experience is increasingly common. It started with one helpful agent, but soon there was a fleet of specialized agents each handling a slice of the workload. The pattern played out across her organization:",[321,17463,17464,17467,17470],{},[324,17465,17466],{},"The Customer Support Agent handled tickets but didn’t share data with anything else.",[324,17468,17469],{},"The Sales Assistant Agent tracked leads in the CRM, unaware of insights from the analytics tools.",[324,17471,17472],{},"The DevOps Agent watching system metrics operated completely independently.",[48,17474,17475],{},"At first, this agent per task approach seemed logical – each AI was tuned for a specific job. However, the more agents they deployed, the more apparent their fragmentation became. Riya jokingly referred to them as “islands of automation.” Each agent had its own database of knowledge and none had a boat to reach the others. The result was duplicated efforts and missed opportunities. For example, the sales team’s agent diligently summarized a customer’s past interactions, but it never knew that the data analysis agent had identified a new trend about that customer’s behavior. Valuable insights slipped through the cracks because the agents had no way to share information​.",[48,17477,17478],{},"Agent fragmentation soon began to hurt productivity. Riya’s team found themselves manually stitching together outputs: copying answers from one agent’s report into another’s input. It felt ironic – they introduced AI to automate tasks, yet now spent more time playing the intermediary between AIs. This fragmented setup led to obvious inefficiencies: two agents would sometimes research the same customer data separately, and the development team had to update each agent with context one at a time. Clearly, having multiple smart agents wasn’t very smart when they were deaf to each other.",[40,17480,17482],{"id":17481},"when-agents-dont-talk-the-cost-of-isolation","When Agents Don’t Talk: The Cost of Isolation",[48,17484,17485],{},"As weeks went by, the lack of coordination between AI agents started causing real problems. One morning, Riya discovered that the customer support chatbot had recommended an outdated workaround to a user – something the internal system-monitoring agent already knew was no longer needed after a recent fix. Since the support agent and the monitoring agent never exchanged notes, the chatbot was operating with stale context. This contextual isolation meant each agent was making decisions with incomplete information. In another case, two different agents ended up contacting the same customer from two angles – one about an upsell opportunity and another about a support issue – creating a confusing user experience. If only the sales AI and support AI had been aware of each other’s activities, they could have coordinated a single, informed response instead of two conflicting ones.",[48,17487,17488],{},"Riya’s team identified several symptoms of their AI silos:",[321,17490,17491,17494,17497,17500,17503],{},[324,17492,17493],{},"Redundant work: Agents often fetched or computed similar data separately, duplicating effort.",[324,17495,17496],{},"Missing insights: An insight generated by one agent stayed with that agent – others never knew about it, leading to decisions made without the full picture.",[324,17498,17499],{},"Inconsistent actions: Without a common understanding, agents sometimes gave contradictory answers or took misaligned actions for the same customer.",[324,17501,17502],{},"Hard to trace or audit at scale: Each agent maintained its own opaque history, making it painful to reproduce decisions, satisfy compliance reviews, or debug issues across the fleet.",[324,17504,17505],{},"Rigid growth path: On‑boarding a new agent—or even upgrading an existing one—required point‑to‑point integrations, slowing innovation and limiting the team’s ability to experiment.",[48,17507,17508,17509,17514],{},"It became clear that even the most sophisticated AI models ",[55,17510,17513],{"href":17511,"rel":17512},"https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fmodel-context-protocol#:~:text=As%20AI%20assistants%20gain%20mainstream,connected%20systems%20difficult%20to%20scale",[264],"are constrained by their isolation from data and from each other​",". Every new AI capability they added required custom wiring to connect it with the rest, and those ad-hoc integrations were becoming unmanageable. Riya likened it to the bad old days of applications not talking to each other – the dreaded data silos – except now the silos were intelligent agents. Instead of a connected AI ecosystem, they had a fragmented one, and it was difficult to scale or maintain. The promise of AI was to streamline work, but without fixing these silos, it was introducing new friction.",[48,17516,17517],{},[384,17518],{"alt":18,"src":17519},"\u002Fimgs\u002Fblogs\u002F6807d2b89278837f19fb5514_AD_4nXcOeA19Uye-bGT7EBkEgzt7bk4E3KigG1CLB4XlfIipsNIwxY0mOwdWXEqXLb7Ou80EloNu7ZHQlESkHwZgV-AXlzoU7mkFzRetGzPOV8vNQxfmPIKGNHT0OGE-zuCzExAqLOlPDg.png",[48,17521,17522],{},"Riya knew something had to change. The team briefly considered consolidating into a single uber-agent, but that wasn’t practical – each agent existed for a reason, with different expertise and tools. They also tried quick fixes, like having one agent call another agent’s API. This helped a bit (for example, the support agent would ping the analytics agent for recent stats), but it was a brittle solution. Hard-coding one-to-one interactions between agents felt like playing whack-a-mole; every time they added a new connection, two more needs for integration popped up. With three or four agents, writing custom API calls between each pair was just about manageable. But as the number of agents grew, the point-to-point wiring became a tangle (as the illustration above shows). They needed a more elegant, scalable way for all these agents to stay in sync.",[40,17524,17526],{"id":17525},"a-page-from-microservices-one-event-bus-to-rule-them-all","A Page from Microservices: One Event Bus to Rule Them All",[48,17528,17529],{},"One afternoon, during a brainstorming session, Riya had an epiphany. Years ago, she had helped transition part of the company’s software architecture from a monolithic design to microservices. In doing so, they faced a similar challenge of isolated services needing to communicate. The solution back then was to introduce a shared event bus – a real-time data stream where services publish events and subscribe to those they care about. That design pattern broke down silos in the software, allowing loosely coupled components to react to each other without hard-coded integrations. Riya realized the same principle could apply to their AI agents. Why not give the agents a “communication bus” to talk to each other?",[48,17531,17532],{},"The idea was simple: an event-driven pipeline where each AI agent could post updates about what it learns or does, and listen for relevant updates from others. Instead of direct agent-to-agent calls, they would all speak a common language of streaming events. For example, the support agent could emit an event like “UserIssueResolved” or “FeatureRequestReceived” onto the bus. The product feedback agent, subscribed to such events, would see the feature request and update its analysis. Meanwhile, the sales agent subscribed to “FeatureRequestReceived” might flag that user as a potential beta tester for a new feature. In return, the sales agent might publish an event “CustomerUpsellOpportunity,” which the support agent could pick up to know that this user is being targeted for an upsell – thus the next support interaction can be tailored. All of this coordination could happen asynchronously, in real-time, through a shared data streaming platform rather than explicit APIs.",[48,17534,17535],{},"Riya’s team was excited about this concept. Essentially, they were planning to give their AI agents a shared brain — not by merging them into one AI, but by creating a shared memory space and message pipeline. The agents would remain independent entities (each with its own role and model), but they would no longer be context-isolated. The event bus would act as the universal translator and meeting place for their AI ecosystem. It’s like building bridges between islands so information can flow instantly rather than  waiting for a ship to carry a message across the water.",[40,17537,17539],{"id":17538},"building-a-shared-brain-real-time-data-as-common-context","Building a Shared Brain: Real-Time Data as Common Context",[48,17541,17542],{},"Once the team decided on a shared event bus approach, implementation began. They set up a real-time data streaming system (Apache Pulsar, Apache Kafka, or StreamNative Ursa) to serve as the central highway for events. Each agent was instrumented to do two things: publish events about any noteworthy action or finding it had, and listen for events from others that it might care about. This design immediately paid off. Now, when the data analytics agent detects a trend or anomaly, it publishes an event (TrendDetected) to the bus. The sales and support agents subscribe to such events and automatically enrich their own responses with that info – no more blind spots. Likewise, the support bot emits a MajorIssueOpened event when a high-priority ticket is created; the other agents all get the memo in real time. Suddenly, the isolated islands became an archipelago with bridges between them.",[48,17544,17545],{},[384,17546],{"alt":18,"src":17547},"\u002Fimgs\u002Fblogs\u002F6807d2b85cae40d5506cce14_AD_4nXfj63VnZGlciH2O4r1jA-O8l9l39M3qgWQDaft4HMzh0mQG2U_QgSFeGkKT2wT2lN__WA71EXXHQuwQpz88ytukAqIzulmlNUgvTM84up2xVHtUD-gEoay8x9vA02IBFbcTUiifmA.png",[48,17549,17550],{},"Crucially, this shared data layer serves as a single source of truth and context for all agents. It’s as if all the agents now speak a common language. The nasty tangle of point-to-point integrations was replaced by each agent integrating just once – to the event stream. Riya no longer had to write special-case code for the support agent to query the analytics agent or vice versa; the publish\u002Fsubscribe model handled it gracefully. This not only made the system simpler, but also more extensible. When they decided to add a new AI agent (for example, a Knowledge Base Agent to automatically draft documentation from support tickets), they simply plugged it into the event bus. Immediately, it could start consuming events like UserQuestionAsked or BugResolved and contribute by posting its own events (ArticleDraftCreated). No complex integration work was required to connect this new agent with every other service – one connection unlocked all the others.",[48,17552,17553],{},"The transformation was remarkable. The AI agents, once fragmented, were now coordinating like a well-trained team. They exchanged facts and findings in real time, leading to outcomes that none of the individual agents could achieve alone. The support agent’s answers became more context-aware (since it “knew” about latest product updates and user activity), the sales agent picked much better timing to reach out to customers, and the engineering team gained a collective view of what all the AIs were learning. Riya’s narrative had shifted from isolated intelligence to integrated intelligence. By introducing a shared real-time event bus, the team had effectively bridged the gap between their AI agents and created a unified, collaborative system.",[40,17555,17557],{"id":17556},"from-isolation-to-intelligence-the-path-forward","From Isolation to Intelligence: The Path Forward",[48,17559,17560],{},"Riya’s journey illustrates a key insight in the era of AI: to unlock the full potential of AI agents, we must connect them through shared data and context. It’s not enough to deploy dozens of isolated agents, no matter how advanced each one is. Without a shared “language” or data fabric, we end up with AI silos and lost opportunities. The solution is to treat real-time data as a common substrate that all agents can draw from and contribute to. In essence, the organization needs a nervous system for AI – a way for each agent (the “neurons”) to fire signals that the others can sense and respond to.",[48,17562,17563,17564,17568],{},"Forward-looking teams are already embracing this approach. Some are implementing open standards for agent communication so that different AI systems can interoperate on a common bus. (For instance, ",[55,17565,17567],{"href":17313,"rel":17566},[264],"Anthropic’s introduction of a Model Context Protocol proposes a unified way for AI tools to exchange context, highlighting industry recognition of this need","​.) The exact technology can vary – whether it’s built on message queues, data streaming platforms, or specialized agent coordination frameworks – but the core idea is consistent: real-time data sharing to break down AI silos. Just as APIs once allowed disparate software services to work together, this real-time event bus gives AI agents a medium to collaborate.",[48,17570,17571],{},"Instead of each new agent requiring a custom integration to understand the world around it, teams can “plug and play” their agents into a central stream of events and facts. This dramatically lowers the effort to add or upgrade AI capabilities. It also adds robustness – if one agent goes down or is replaced, others continue to communicate via the bus with minimal disruption. The overall system becomes more adaptive and intelligent as it can recombine the skills of multiple agents on the fly. In short, a shared real-time event bus turns a collection of smart but isolated agents into a coordinated, collectively smarter whole.",[40,17573,17575],{"id":17574},"bridging-the-gap-your-next-steps","Bridging the Gap – Your Next Steps 🚀",[48,17577,17578],{},"Riya’s story is becoming the new reality for many developers integrating AI into their apps. We, as developers and tech leaders, have a chance to shape this emerging paradigm of connected AI agents. The vision is clear: bring real-time data streams and AI agents together to create an agile, intelligent ecosystem. It’s time to move beyond deploying AI agents in isolation and towards building systems where they can truly work in concert.",[48,17580,17581,17582,15755,17585,17588],{},"This post is only the starting point. Over the next few weeks, we’ll publish additional posts in our new Agentic AI blog series, exploring how real‑time data, open standards, and modern runtime design come together to enable your AI agents to “talk” to your data and to each other in real time.  If you’d like a first look at what’s next, ",[55,17583,15754],{"href":15752,"rel":17584},[264],[55,17586,6796],{"href":15758,"rel":17587},[264]," on May 28 - 29, where we'll showcase these ideas in action..",[48,17590,17591],{},"Don’t let your AI agents live on islands. By connecting them with a real-time data backbone, you empower them to collaborate, adapt, and achieve far more together than they ever could alone. The era of Agentic AI is just beginning—let’s build a fleet of data‑driven agents together.",[48,17593,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":17595},[17596,17597,17598,17599,17600,17601],{"id":17457,"depth":19,"text":17458},{"id":17481,"depth":19,"text":17482},{"id":17525,"depth":19,"text":17526},{"id":17538,"depth":19,"text":17539},{"id":17556,"depth":19,"text":17557},{"id":17574,"depth":19,"text":17575},"2025-04-22","See how event‑driven streams connect siloed AI agents, unifying chatbots with a real‑time data backbone to boost insight and efficiency.","\u002Fimgs\u002Fblogs\u002F6807d53137a7eb2a36b33f9f_agents-meet-streams.jpg",{},"8 mins read",{"title":16216,"description":17603},"blog\u002Fai-agents-real-time-data-bridge",[3988],"9D1AgWquq0OZnETYfPhdqARmd0wG1EuxeloUYMefeDk",{"id":17612,"title":17613,"authors":17614,"body":17615,"category":290,"createdAt":290,"date":17929,"description":17930,"extension":8,"featured":294,"image":17931,"isDraft":294,"link":290,"meta":17932,"navigation":7,"order":296,"path":17933,"readingTime":17934,"relatedResources":290,"seo":17935,"stem":17936,"tags":17937,"__hash__":17938},"blogs\u002Fblog\u002Fstreamnative-perspective-connecting-real-time-streaming-with-data-catalogs-for-ai.md","StreamNative Perspective: Connecting Real-Time Streaming with Data Catalogs for AI",[311],{"type":15,"value":17616,"toc":17919},[17617,17620,17623,17626,17630,17633,17642,17646,17649,17662,17666,17677,17680,17689,17693,17696,17699,17702,17731,17735,17738,17764,17767,17771,17785,17790,17796,17801,17811,17827,17830,17833,17835,17838,17849,17852,17855,17912],[48,17618,17619],{},"In the era of real-time AI, bridging data streaming with data lakehouses has become essential to ensure AI models and applications are continuously fed with high-quality, trustworthy data. Enterprises rely on AI-driven insights, but traditional architectures often fail to deliver governed, real-time data efficiently.",[48,17621,17622],{},"AI models are only as good as the data they learn from, making data acquisition essential for accurate predictions and insights. High-quality, diverse, and real-time data enables AI to adapt, detect patterns, and make informed decisions. Without a continuous flow of reliable data, AI models risk becoming outdated, biased, or ineffective in real-world applications.",[48,17624,17625],{},"While acquiring high-quality data is crucial for AI success, setting up a scalable, real-time data platform presents challenges in integration, governance, and ensuring seamless data flow across streaming and analytical systems.",[40,17627,17629],{"id":17628},"the-rise-of-ai-increases-data-acquisition-costs","The Rise of AI Increases Data Acquisition Costs",[48,17631,17632],{},"As AI-driven applications scale, the demand for real-time data acquisition grows, leading to higher infrastructure costs. When using data streaming technology for data acquisition, traditional leader-based architectures such as kafka exacerbate this by introducing bottlenecks, hotspots, and complex failover scenarios. Additionally, producers and writers face expensive cross-AZ traffic and replication costs, making data acquisition increasingly inefficient and costly.",[48,17634,17635,17636,17638,17639],{},"By adopting a leaderless architecture with a lakehouse-native storage approach, ",[55,17637,1332],{"href":6647}," eliminates inter-zone network costs—one of the biggest expenses in leader-based deployments like Kafka and Redpanda—while also reducing storage costs through cloud-native object storage and efficient columnar formats. This approach further enables real-time and batch analytics without the need for costly ETL transformations, streamlining data processing and reducing infrastructure expenses. ",[55,17640,17641],{"href":14554},"Learn more on this blog.",[40,17643,17645],{"id":17644},"ai-requires-unified-data-governance","AI Requires Unified Data Governance",[48,17647,17648],{},"Once data is ingested, in AI-driven analytics, maintaining unified data governance is critical, but connector-based pipelines often introduce gaps by bypassing centralized catalogs. This fragmentation leads to inconsistent access controls, lack of visibility, and compliance risks, making it difficult to ensure data integrity and security across the enterprise.",[48,17650,17651,17652,4003,17656,17661],{},"This blog post explores StreamNative’s vision for seamless integration with leading Data Catalogs, including ",[55,17653,2864],{"href":17654,"rel":17655},"https:\u002F\u002Fwww.unitycatalog.io\u002F",[264],[55,17657,17660],{"href":17658,"rel":17659},"https:\u002F\u002Ficeberg.apache.org\u002Fconcepts\u002Fcatalog\u002F",[264],"Iceberg Catalogs",", enabling enterprises to unlock AI-driven use cases by providing a unified, real-time data foundation for machine learning and analytics.",[40,17663,17665],{"id":17664},"how-data-catalogs-simplify-data-governance-discovery-and-security","How Data Catalogs Simplify Data Governance, Discovery, and Security",[48,17667,17668,17669,4003,17673,17676],{},"The Lakehouse architecture combines the best features of data lakes and data warehouses, offering a unified platform for both analytical and operational workloads. At the core of this architecture are Data catalogs, like ",[55,17670,17672],{"href":17658,"rel":17671},[264],"Apache Iceberg catalogs",[55,17674,2864],{"href":17654,"rel":17675},[264],", which provide centralized metadata management to streamline data discovery, governance, and security. These catalogs enable fine-grained access controls, automate data lineage tracking, and enforce consistent security policies across the organization. By offering built-in governance features and simplifying regulatory compliance, Data catalogs help enterprises maintain secure, well-managed data environments, making it easier to discover, trust, and leverage data for decision-making.",[48,17678,17679],{},"As enterprises manage multiple teams consuming data, Data Catalogs play a critical role in data governance. Whether leveraging Unity Catalog or Iceberg-based catalogs, these solutions empower governance teams to secure data assets, monitor access, and ensure regulatory compliance, offering a centralized and auditable framework for effective data management.",[48,17681,17682,17683,17688],{},"StreamNative’s vision is to simplify the complexities of building a scalable, real-time data platform by bridging data streaming and lakehouse storage with a unified approach. We aim to address challenges in integration, governance, and performance, enabling enterprises to harness real-time data for AI and analytics seamlessly. While we lead in this space, ",[55,17684,17687],{"href":17685,"rel":17686},"https:\u002F\u002Fwww.confluent.io\u002Fblog\u002Fconfluent-and-databricks\u002F",[264],"other industry peers are also working toward similar goals,"," collectively driving innovation to make real-time data infrastructure more accessible and efficient.",[40,17690,17692],{"id":17691},"empowering-ai-with-catalogs-simplifying-data-discovery-and-accessibility","Empowering AI with Catalogs: Simplifying Data Discovery and Accessibility",[48,17694,17695],{},"A data catalog is a centralized collection of data assets, accompanied by details about those assets. It provides resources to assist users in locating reliable data, comprehending its purpose, and utilizing it correctly. Serving as a metadata repository, it delivers the context required for effective data utilization.",[48,17697,17698],{},"Catalogs play a critical role in the success and usability of open lakehouse architectures. They act as the metadata backbone, enabling seamless data discovery, governance, and operations across open table formats like Apache Iceberg, and Delta Lake.",[48,17700,17701],{},"A catalog, in the context of data management and analytics, is a centralized metadata repository that stores and organizes information about data assets. It helps users discover, understand, and manage data efficiently.\nHere's the type of information typically stored in a catalog:",[1666,17703,17704,17707,17710,17713,17716,17719,17722,17725,17728],{},[324,17705,17706],{},"Metadata about datasets",[324,17708,17709],{},"Data location",[324,17711,17712],{},"Data quality and profiling",[324,17714,17715],{},"Versioning and lineage",[324,17717,17718],{},"Access and governance",[324,17720,17721],{},"Enrichment and annotations",[324,17723,17724],{},"Operational details",[324,17726,17727],{},"Integration with other systems",[324,17729,17730],{},"Custom attributes",[32,17732,17734],{"id":17733},"benefits-of-data-catalogs","Benefits of Data Catalogs",[48,17736,17737],{},"Data Catalogs provide a range of benefits, some of which are outlined below.",[321,17739,17740,17743,17746,17749,17752,17755,17758,17761],{},[324,17741,17742],{},"Centralized Metadata Management Store and centralize metadata, ensuring consistent data access and interpretation across tools.\nTransactional Consistency",[324,17744,17745],{},"Enable ACID transactions, ensuring safe concurrent reads and writes for accurate data.\nData Discovery and Lineage",[324,17747,17748],{},"Make datasets searchable and track data lineage for governance and compliance.\nInteroperability Across Tools",[324,17750,17751],{},"Unify metadata, enabling tools like Spark, Trino, and Flink to work seamlessly together.\nSchema and Version Control",[324,17753,17754],{},"Support schema evolution and time travel, enabling reproducible analytics workflows.\nData Governance and Security",[324,17756,17757],{},"Enforce fine-grained access control, protecting sensitive data while ensuring accessibility.\nOrchestration of Real-Time and Batch Data",[324,17759,17760],{},"Manage hybrid workflows, enabling real-time ingestion and batch queries.\nFacilitating AI\u002FML Workflows",[324,17762,17763],{},"Streamline metadata management for training, monitoring, and detecting data drift.",[48,17765,17766],{},"There are numerous Data Catalogs available, but this post will specifically focus on StreamNative’s integration with Unity Catalog ,Snowflake Open Catalog and AWS S3 Tables.",[40,17768,17770],{"id":17769},"streamnative-cloud-bridging-the-gap-with-data-catalog-integrations","StreamNative Cloud: Bridging the Gap with Data Catalog Integrations",[48,17772,17773,17774,17777,17778,4003,17781,190],{},"StreamNative Cloud aims to deliver a fully managed, out-of-the-box service that enables seamless data ingestion into various Data Catalogs within seconds. Users can effortlessly activate catalog integration and select their preferred catalog from supported vendors. ",[55,17775,17776],{"href":4811},"StreamNative includes support for Delta Lake through Unity Catalog"," and Apache Iceberg via multiple catalog implementations, such as ",[55,17779,17780],{"href":4825},"Snowflake’s Open Catalog",[55,17782,17784],{"href":2872,"rel":17783},[264],"Amazon S3 Tables",[48,17786,17787],{},[384,17788],{"alt":18,"src":17789},"\u002Fimgs\u002Fblogs\u002F67e47a1495bc648017c9a476_AD_4nXebf04uDpuMDcaoCKwxHZgaJXz9UIk01Y7VSeXsGe3TZOjiAf-l4WDO32TyFviP0l35a0ddhepkrgKMO7vNor_K9-jMIwlOrnfdw42I130uemyaLFejH8_io_rtqXjOL_XFel5KGA.png",[48,17791,17792,17793,17795],{},"During the creation of a StreamNative ",[55,17794,1332],{"href":6647}," Cluster, users have the option to enable Data Catalog Integration, select a preferred catalog provider, configure the necessary settings, and proceed with cluster deployment.",[48,17797,17798],{},[384,17799],{"alt":18,"src":17800},"\u002Fimgs\u002Fblogs\u002F67e47a15d8b083ce4af1ad28_AD_4nXfkp2KPa8-AO6wHUEUaGcorIFRlkU9asSc5fssATDIYxUGMx_GcAM04K0nmKZ48BsoqdDqg40Jda24o6-skYUFeuj_I1cyvl08O0MlkFOyAtp07UjLTewYE4_8kVr0fWdiXeMxjFw.png",[48,17802,17803,17807,17808,190],{},[55,17804,1185],{"href":17805,"rel":17806},"https:\u002F\u002Fwww.databricks.com\u002Fproduct\u002Funity-catalog",[264],": A unified governance solution from Databricks, offering fine-grained access control, data lineage, and metadata management for Delta Lake and other open data formats. ",[55,17809,17810],{"href":4811},"Learn more about StreamNative’s integration with Databricks Unity Catalog",[321,17812,17813,17820],{},[324,17814,17815,17819],{},[55,17816,17114],{"href":17817,"rel":17818},"https:\u002F\u002Fother-docs.snowflake.com\u002Fen\u002Fopencatalog\u002Foverview",[264],": Part of Snowflake's advanced capabilities, it supports hybrid data management across structured and semi-structured data, empowering unified analytics and governance. A blog on this topic will be published soon.",[324,17821,17822,17826],{},[55,17823,17784],{"href":17824,"rel":17825},"https:\u002F\u002Faws.amazon.com\u002Fs3\u002Ffeatures\u002Ftables\u002F",[264],": An AWS S3 Table stores tabular data in S3 for efficient querying with Athena or Redshift Spectrum, using formats like Parquet and managed via AWS Glue.A blog on this topic will be published soon.",[48,17828,17829],{},"With this comprehensive support, StreamNative Cloud ensures organizations can leverage the best-in-class capabilities of these catalogs to simplify governance, enhance interoperability, and accelerate data-driven innovation.",[48,17831,17832],{},"Once a catalog is enabled, StreamNative begins writing cluster data to the designated storage location, with the data seamlessly published in the catalog. Users can effortlessly discover and query data directly from the catalog.",[40,17834,2125],{"id":2122},[48,17836,17837],{},"As AI becomes mainstream, enterprises need a cost-effective, scalable way to ingest and govern real-time data. Traditional ETL pipelines are slow and expensive, while legacy streaming architectures introduce inefficiencies that drive up costs. StreamNative’s Ursa Engine, built on a leaderless architecture, eliminates these challenges—cutting real-time data ingestion costs by 90% and seamlessly integrating with leading Data Catalogs to unify governance across both streaming and batch data.",[321,17839,17840,17843,17846],{},[324,17841,17842],{},"Seamless Data Streaming & Metadata Management: StreamNative Cloud integrates open Data Catalogs, enabling real-time data streaming with robust metadata management.",[324,17844,17845],{},"Native Integration Advantage: Eliminates the need for connector-based pipelines that do not publish data to catalogs, ensuring a more efficient approach.",[324,17847,17848],{},"Broad Catalog Support: Supports Unity Catalog, Snowflake Open Catalog, and S3 Tables for simplified data governance and interoperability.",[48,17850,17851],{},"By bridging real-time streaming with lakehouse storage and governance, StreamNative enables enterprises to maximize the value of their AI and analytics investments—eliminating data silos, reducing infrastructure costs, and ensuring AI models operate on fresh, high-quality, and trusted data.",[48,17853,17854],{},"Here are a few resources for you to explore:",[321,17856,17857,17866,17873,17880,17888,17895,17903],{},[324,17858,17859,17860,17865],{},"Watch our workshop: ",[55,17861,17864],{"href":17862,"rel":17863},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=N0-P6TllSEc",[264],"Augment Your Lakehouse with Streaming Capabilities for Real-Time AI"," to get an end-to-end overview of StreamNative’s integration with Databricks Unity Catalog.",[324,17867,17868,17869],{},"Documentation for Unity Catalog Integration : ",[55,17870,17872],{"href":17108,"rel":17871},[264],"Follow these steps to integrate StreamNative Cloud with Databricks Unity Catalog.",[324,17874,17875,17876],{},"Documentation for Snowflake Open Integration : ",[55,17877,17879],{"href":17112,"rel":17878},[264],"Follow these steps to integrate StreamNative Cloud with Snowflake Open Catalog",[324,17881,17882,17883],{},"Documentation for Amazon S3 Tables : ",[55,17884,17887],{"href":17885,"rel":17886},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-s3-tables",[264],"Follow these steps to integrate StreamNative Cloud with Amazon S3 Tables",[324,17889,17890,17891,17894],{},"Check out StreamNative's recent benchmark about Ursa Engine: See how ",[55,17892,17893],{"href":10357},"Ursa sustains a 5GB\u002Fs Kafka workload at just 5% of the cost"," of traditional streaming engines like Kafka and Redpanda.",[324,17896,17897,17898,190],{},"Read the detailed architectural blog post of Ursa Engine: Learn how ",[55,17899,17902],{"href":17900,"rel":17901},"http:\u002F\u002Fstreamnative.io\u002Fblog\u002Fleaderless-architecture-and-lakehouse-native-storage-for-reducing-kafka-cost",[264],"leaderless architecture and lakehouse storage reduce 95% of Kafka cost",[324,17904,17905,17906,17911],{},"Watch our recent webinar with Databricks: Watch the ",[55,17907,17910],{"href":17908,"rel":17909},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MUs5z45Ndgo",[264],"Databricks and StreamNative webinar",", where we discussed about the native integration with Unity Catalog.",[48,17913,17914,17915,17918],{},"Try it yourself: ",[55,17916,7137],{"href":17075,"rel":17917},[264]," to explore StreamNative's Ursa Engine and experience the power of real-time data in action.",{"title":18,"searchDepth":19,"depth":19,"links":17920},[17921,17922,17923,17924,17927,17928],{"id":17628,"depth":19,"text":17629},{"id":17644,"depth":19,"text":17645},{"id":17664,"depth":19,"text":17665},{"id":17691,"depth":19,"text":17692,"children":17925},[17926],{"id":17733,"depth":279,"text":17734},{"id":17769,"depth":19,"text":17770},{"id":2122,"depth":19,"text":2125},"2025-03-26","Discover how StreamNative’s Ursa Engine bridges real-time data streaming with leading Data Catalogs like Unity Catalog and Snowflake Open Catalog. Learn how a leaderless architecture reduces costs, simplifies governance, and accelerates AI-driven insights.","\u002Fimgs\u002Fblogs\u002F67e479e1d376bce952ae7560_image-57.png",{},"\u002Fblog\u002Fstreamnative-perspective-connecting-real-time-streaming-with-data-catalogs-for-ai","15 min",{"title":17613,"description":17930},"blog\u002Fstreamnative-perspective-connecting-real-time-streaming-with-data-catalogs-for-ai",[800,2599,10054,302,1332],"QKGkjZwQ3N_Fml6ZcWgi9wmfQ0o5yWqhS_QUACjTIDg",{"id":17940,"title":17941,"authors":17942,"body":17943,"category":290,"createdAt":290,"date":18644,"description":18645,"extension":8,"featured":294,"image":18646,"isDraft":294,"link":290,"meta":18647,"navigation":7,"order":296,"path":18648,"readingTime":18649,"relatedResources":290,"seo":18650,"stem":18651,"tags":18652,"__hash__":18654},"blogs\u002Fblog\u002Fdefinitive-guide-for-streaming-data-into-snowflake---part-2-lakehouse-native-data-streaming-with-apache-iceberg-and-snowflake-open-catalog.md","Definitive Guide for Streaming Data into Snowflake – Part 2: Lakehouse-Native Data Streaming with Apache Iceberg and Snowflake Open Catalog",[810],{"type":15,"value":17944,"toc":18618},[17945,17953,17969,17973,17976,17979,17988,17991,17995,17998,18018,18020,18023,18037,18042,18046,18055,18057,18060,18071,18075,18078,18081,18089,18092,18101,18104,18108,18111,18114,18117,18119,18126,18132,18135,18142,18145,18147,18150,18155,18157,18160,18163,18189,18192,18195,18202,18205,18209,18212,18215,18221,18223,18232,18235,18239,18242,18267,18270,18273,18276,18280,18283,18290,18294,18302,18305,18314,18318,18321,18332,18335,18341,18344,18347,18352,18355,18358,18363,18367,18370,18373,18393,18397,18400,18408,18412,18415,18418,18432,18435,18439,18444,18448,18451,18455,18469,18473,18481,18485,18488,18493,18497,18500,18504,18508,18511,18519,18523,18531,18535,18549,18552,18556,18563,18567,18570,18573,18576,18581,18584,18586,18589,18603,18607,18610,18613,18615],[48,17946,17947,17948,17952],{},"Welcome back to our three-part blog series, The Definitive Guide to Streaming Data into Snowflake. In",[55,17949,17951],{"href":17950},"\u002Fblog\u002Fdefinitive-guide-for-streaming-data-into-snowflake-part-1---with-connectors"," the first part of this series",", we explored how to stream data into Snowflake with connector-based approaches. While connectors work well for many scenarios, they can become expensive and complex to manage at large scale.",[48,17954,17955,17956,17959,17960,4003,17964,17968],{},"In this second blog post, we’ll introduce a modern alternative - a zero-copy data streaming approach, which uses the ",[55,17957,17958],{"href":6647},"StreamNative Ursa engine"," to stream data directly into Snowflake via ",[55,17961,1153],{"href":17962,"rel":17963},"https:\u002F\u002Ficeberg.apache.org\u002F",[264],[55,17965,17114],{"href":17966,"rel":17967},"http:\u002F\u002Fe.com\u002Fen\u002Fproduct\u002Ffeatures\u002Fopen-catalog\u002F",[264],". This approach eliminates the need for connectors, simplifies data streaming architecture, and enables real-time AI and analytics at scale. By the end of this post, you will have a clear understanding of how to build a modern, real-time data streaming solution on Snowflake.",[40,17970,17972],{"id":17971},"introduction-to-apache-iceberg-snowflake-open-catalog-and-streamnative-ursa","Introduction to Apache Iceberg, Snowflake Open Catalog, and StreamNative Ursa",[48,17974,17975],{},"The zero-copy data streaming approach involves 3 major components: Apache Iceberg, Snowflake Open Catalog, and StreamNative Ursa.",[48,17977,17978],{},"Apache Iceberg is a high-performance table format designed for large analytic datasets. It provides consistency and ACID guarantees to data lakes, making it possible to handle petabyte-scale datasets efficiently. By treating data in distributed storage (e.g., S3 or other cloud object stores) as a table with columnar layouts, Iceberg simplifies schema evolution and accelerates queries.",[48,17980,17981,17982,17987],{},"Snowflake Open Catalog is a fully managed service for ",[55,17983,17986],{"href":17984,"rel":17985},"https:\u002F\u002Fpolaris.apache.org\u002F",[264],"Apache Polaris",", which implements Iceberg’s REST catalog API and provides centralized, secure read and write access to Iceberg tables across different REST-compatible query engines. It allows Snowflake to read directly from external Apache Iceberg tables, providing a unified approach to managing and accessing large analytic datasets without copying data or using additional connectors. This simplifies data ingestion workflows, allowing external Iceberg tables to be treated as native Snowflake tables.",[48,17989,17990],{},"StreamNative Ursa is a Kafka-compatible data streaming engine built for the Lakehouse architecture, storing data in object storage as Apache Iceberg format. With Ursa, there is no need to deploy additional connectors; you can produce and consume data using the Kafka protocol or reconfigure existing Kafka applications to a StreamNative Ursa cluster. Data produced into a Kafka topic is continuously stored in an Iceberg table in real-time. Kafka topic schemas are automatically mapped to Lakehouse Table schemas, and data is written to Lakehouse Tables using open standards like Apache Iceberg. The engine will also commit metadata into Snowflake Open Catalog so that the catalog then enables querying of those tables without duplicating data.",[40,17992,17994],{"id":17993},"why-choose-this-approach","Why Choose This Approach?",[48,17996,17997],{},"By leveraging Ursa and Snowflake Open Catalog together, this approach creates a reliable and scalable zero-copy data streaming architecture. It provides the following benefits:",[321,17999,18000,18003,18006,18009,18012,18015],{},[324,18001,18002],{},"Lakehouse-Native Architecture: Ursa stores streaming data directly as Iceberg tables, which Snowflake can discover via Open Catalog and query without duplicating data.",[324,18004,18005],{},"Optimized for Data Streaming: Iceberg’s structured data management, combined with Ursa’s native data streaming capabilities, ensures the Iceberg data lakehouse remains up to date with minimal operational overhead.",[324,18007,18008],{},"Scalability: Using Apache Iceberg and object storage enables handling growing data volumes more efficiently than connector-based ingestion.",[324,18010,18011],{},"Cost Efficiency: Data is directly written to Iceberg tables for optimized reads in Snowflake, eliminating redundant storage and excessive data transfer.",[324,18013,18014],{},"Consistency and ACID Guarantees: Iceberg ensures atomic commits, snapshot isolation, and schema evolution, eliminating many data consistency headaches.",[324,18016,18017],{},"Open Ecosystem: Avoid vendor lock-in by utilizing open table formats and object storage.",[40,18019,2697],{"id":2696},[48,18021,18022],{},"Below is an overview of how the components interact:",[1666,18024,18025,18028,18031,18034],{},[324,18026,18027],{},"Data Streams: Data is published to Kafka topics in a StreamNative Ursa cluster.",[324,18029,18030],{},"StreamNative Ursa: Ursa continuously transforms the streaming data and writes it to Iceberg tables in object storage.",[324,18032,18033],{},"Snowflake Open Catalog: Iceberg tables are registered in Snowflake Open Catalog, allowing Snowflake to access them directly.",[324,18035,18036],{},"Query in Snowflake:Data practitioners can write SQL queries against these Iceberg tables as if they were native to Snowflake.",[48,18038,18039],{},[384,18040],{"alt":18,"src":18041},"\u002Fimgs\u002Fblogs\u002F67db771c30f7ea02204f000d_AD_4nXcYLnesoTcwW0eChgWv032NFyF2u8iVpHbKWdbT9v-n6oKYxmhfmKlorG9HCdzyRX_2GXpaYsmS2NyNPtfxd2aZH9KLBo3pr0fGFQIFef5aQPuBHtGpDqN55qBRzLvyVBCe_XNPSQ.png",[40,18043,18045],{"id":18044},"step-by-step-guide","Step-by-Step Guide",[48,18047,18048,18049,18054],{},"Follow the step-by-step guide below to set up a modern approach for streaming data into Snowflake using StreamNative Ursa and Snowflake Open Catalog. You can watch ",[55,18050,18053],{"href":18051,"rel":18052},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-BmxsE3q4WpxiG20X6AuxUxp4TelDjz",[264],"this playlist"," for more details.",[32,18056,10104],{"id":10103},[48,18058,18059],{},"Before you get started, ensure you have the following three resources:",[321,18061,18062,18065,18068],{},[324,18063,18064],{},"AWS Account: An AWS account to create an S3 storage bucket for storing Iceberg tables.",[324,18066,18067],{},"Snowflake Account: A Snowflake account to create a Snowflake Open Catalog and run Snowflake queries.",[324,18069,18070],{},"StreamNative Cloud Account: A StreamNative Cloud account to install and run Ursa clusters.",[32,18072,18074],{"id":18073},"step-0-prepare-a-cloud-storage-bucket","Step 0: Prepare a Cloud Storage Bucket",[48,18076,18077],{},"Before setting up this modern approach, you need to create a cloud storage bucket, which will store Iceberg tables. This S3 bucket must be accessible by both StreamNative and Snowflake.",[48,18079,18080],{},"Important:",[321,18082,18083,18086],{},[324,18084,18085],{},"The Snowflake Open Catalog, S3 bucket, and StreamNative Ursa cluster must be in the same AWS region to avoid excessive cross-region traffic.",[324,18087,18088],{},"Snowflake Open Catalog does not support cross-region buckets.",[48,18090,18091],{},"Assuming you use the following bucket and path:",[48,18093,18094,18095],{},"s3:\u002F\u002F",[18096,18097,10259,18098],"your-bucket-name",{},[18099,18100],"your-bucket-path",{},[48,18102,18103],{},"First, you must grant StreamNative access to this storage bucket, allowing StreamNative’s Ursa cluster to access this bucket.",[32,18105,18107],{"id":18106},"grant-streamnative-access-to-the-storage-bucket","Grant StreamNative access to the storage bucket",[48,18109,18110],{},"StreamNative provides a Terraform module to allow users to grant the storage bucket access to its control plane for setting up Ursa clusters. Use the following Terraform script to grant access:",[48,18112,18113],{},"module \"sn_managed_cloud\" {",[48,18115,18116],{},"source = \"github.com\u002Fstreamnative\u002Fterraform-managed-cloud\u002F\u002Fmodules\u002Faws\u002Fvolume-access?ref=v3.18.0\"",[48,18118,3931],{},[48,18120,18121,18122],{},"external_id = \"",[18123,18124,18125],"your-organization-name",{},"\"",[48,18127,18128,18129],{},"role = \"",[18130,18131,18125],"your-role-name",{},[48,18133,18134],{},"buckets = [",[48,18136,18125,18137],{},[18096,18138,10259,18139],{},[18099,18140,18141],{},"\",",[48,18143,18144],{},"]",[48,18146,3931],{},[48,18148,18149],{},"account_ids = [",[48,18151,18125,18152],{},[18153,18154,18125],"your-aws-account-id",{},[48,18156,18144],{},[48,18158,18159],{},"}",[48,18161,18162],{},"Replace the placeholders with your actual values:",[321,18164,18165,18171,18177,18183],{},[324,18166,18167,18170],{},[4926,18168,18169],{},"\u003Cyour-organization-name>",": Your StreamNative Cloud organization ID.",[324,18172,18173,18176],{},[4926,18174,18175],{},"\u003Cyour-bucket-name>\u002F\u003Cyour-bucket-path>",": Your AWS S3 storage bucket.",[324,18178,18179,18182],{},[4926,18180,18181],{},"\u003Cyour-aws-account-id>",": Your AWS account ID hosting the storage bucket.",[324,18184,18185,18188],{},[4926,18186,18187],{},"\u003Cyour-role-name>",": The IAM role name that will be created for storage bucket access.",[48,18190,18191],{},"Once you execute the Terraform script, it will grant StreamNative’s control plane access to the storage bucket. This allows the StreamNative Ursa cluster to write data to the storage bucket.",[48,18193,18194],{},"As Ursa continuously writes produced data to the storage bucket, it automatically compacts the data into Iceberg tables. These tables will be rewritten in the following path:",[48,18196,18094,18197],{},[18096,18198,10259,18199],{},[18099,18200,18201],{},"\u002Fcompaction",[48,18203,18204],{},"In the next step, you will need to configure Snowflake Open Catalog to grant access to this path.",[32,18206,18208],{"id":18207},"step-1-configure-snowflake-open-catalog","Step 1: Configure Snowflake Open Catalog",[48,18210,18211],{},"Before setting up a StreamNative Ursa cluster, you must grant Snowflake Open Catalog access to the storage bucket with the following IAM policy:",[48,18213,18214],{},"{",[8325,18216,18219],{"className":18217,"code":18218,"language":8330},[8328],"\"Version\": \"2012-10-17\",\n\n\"Statement\": [\n\n    {\n\n        \"Effect\": \"Allow\",\n\n        \"Action\": [\n\n            \"s3:PutObject\",\n\n            \"s3:GetObject\",\n\n            \"s3:GetObjectVersion\",\n\n            \"s3:DeleteObject\",\n\n            \"s3:DeleteObjectVersion\"\n\n        ],\n\n        \"Resource\": \"arn:aws:s3:::\u003Cyour-bucket-name>\u002F\u003Cyour-bucket-path>\u002F*\"\n\n    },\n\n    {\n\n        \"Effect\": \"Allow\",\n\n        \"Action\": [\n\n            \"s3:ListBucket\",\n\n            \"s3:GetBucketLocation\"\n\n        ],\n\n        \"Resource\": \"arn:aws:s3:::\u003Cyour-bucket-name>\u002F\u003Cyour-bucket-path>\",\n\n        \"Condition\": {\n\n            \"StringLike\": {\n\n                \"s3:prefix\": [\n\n                    \"*\"\n\n                ]\n\n            }\n\n        }\n\n    }\n\n]\n",[4926,18220,18218],{"__ignoreMap":18},[48,18222,18159],{},[48,18224,18225,18226,18231],{},"Follow ",[55,18227,18230],{"href":18228,"rel":18229},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-snowflake-open-catalog#step-4-configure-aws-account-for-snowflake-open-catalog-access",[264],"this documentation"," to set up the IAM policy and role for accessing the storage bucket.",[48,18233,18234],{},"Note: This IAM policy and role will be used for creating Snowflake Open Catalogs.",[3933,18236,18238],{"id":18237},"create-a-snowflake-open-catalog","Create a Snowflake Open Catalog",[48,18240,18241],{},"Use the following settings when creating a Snowflake Open Catalog:",[321,18243,18244,18247,18250,18253,18261,18264],{},[324,18245,18246],{},"Name: The name of the Open Catalog.",[324,18248,18249],{},"External: Keep this disabled.",[324,18251,18252],{},"Storage Provider: Select \"S3\".",[324,18254,18255,18256],{},"Default Base Location: Use the storage bucket path created in Step 0, which should look like: s3:\u002F\u002F",[18096,18257,10259,18258],{},[18099,18259,18260],{},"\u002Fcompaction. Here, the compaction folder stores all the compacted lakehouse tables.",[324,18262,18263],{},"S3 Role ARN: The ARN of the IAM role created above.",[324,18265,18266],{},"External ID: The External ID used in the IAM policy setup.",[48,18268,18269],{},"After creating the Snowflake Open Catalog, retrieve its IAM user ARN from the Open Catalog details page. Next, update your IAM policy to grant the Open Catalog access to your S3 bucket by adding this IAM user ARN to the Principal:AWS field.",[48,18271,18272],{},"At this point:",[48,18274,18275],{},"✔️ Your storage bucket is accessible by StreamNative and Snowflake Open Catalog.\n✔️ A Snowflake Open Catalog is ready to use.",[3933,18277,18279],{"id":18278},"create-a-service-connection-in-snowflake-open-catalog","Create a Service Connection in Snowflake Open Catalog",[48,18281,18282],{},"To allow StreamNative access to Open Catalog, create a Service Connection in Snowflake Open Catalog.",[48,18284,18225,18285,18289],{},[55,18286,18230],{"href":18287,"rel":18288},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-snowflake-open-catalog#step-5-create-snowflake-open-catalog",[264]," to complete this step and record the Client ID and Client Secret for configuring the StreamNative Ursa cluster in the next step.",[32,18291,18293],{"id":18292},"step-2-setup-streamnative-ursa-cluster","Step 2: Setup StreamNative Ursa Cluster",[48,18295,18296,18297,18301],{},"Once StreamNative has permission to access the storage bucket and you have the Client ID and Secret for Snowflake Open Catalog, you can proceed to create a StreamNative Ursa cluster. Refer to ",[55,18298,18230],{"href":18299,"rel":18300},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-snowflake-open-catalog#create-stream-native-byoc-ursa-cluster",[264]," for detailed step-by-step instructions.",[48,18303,18304],{},"Once the Ursa cluster is up and running, it exposes the Kafka API via its Kafka endpoints. You can configure your application to produce Kafka messages to a topic.",[48,18306,18307,18308,18313],{},"StreamNative provides ",[55,18309,18312],{"href":18310,"rel":18311},"https:\u002F\u002Fdocs.streamnative.io\u002Fkafka-clients\u002Fkafka-clients-overview",[264],"tutorials"," on using Kafka clients to interact with StreamNative Cloud.",[3933,18315,18317],{"id":18316},"data-storage-structure","Data Storage Structure",[48,18319,18320],{},"When messages are produced to Kafka topics:",[321,18322,18323,18326,18329],{},[324,18324,18325],{},"They are immediately written to the storage bucket configured for the cluster.",[324,18327,18328],{},"The compaction folder stores all compacted Iceberg tables.",[324,18330,18331],{},"The storage folder stores write-ahead logs for raw data.",[48,18333,18334],{},"Example file structure. (See the screenshot below)",[48,18336,18094,18337],{},[18096,18338,10259,18339],{},[18099,18340,10259],{},[48,18342,18343],{},"├── compaction\u002F  # Stores compacted Iceberg tables",[48,18345,18346],{},"├── storage\u002F     # Stores write-ahead logs",[48,18348,18349],{},[384,18350],{"alt":18,"src":18351},"\u002Fimgs\u002Fblogs\u002F67db771c451d766ea27ad3d4_AD_4nXcoeDOtJlM-PLQU3KOfZt1eKkYtcaodP-2rwSaL27oi7o787y6QAxWJmh9TQGG6q5iRA3kwLjY_OU6gUjdo4-sExjLoOwjr9bN5LXf2F8mAdzANYf5NgroXOZKsSbWR6JJtRcIwAA.png",[48,18353,18354],{},"All Iceberg tables are automatically registered in Snowflake Open Catalog.",[48,18356,18357],{},"You can navigate to the Snowflake Open Catalog console to view tables and schemas.",[48,18359,18360],{},[384,18361],{"alt":18,"src":18362},"\u002Fimgs\u002Fblogs\u002F67db771baab338b81f2a030d_AD_4nXdY6xQ-6xG7AkJIKlKWvpyaFGnjIlAK5iW92fHK3fIuf5V_jzDvaSC3kw52tNhmWypWYWcxfI-vV9zqCRqYXNl-VCMYxChr77Q8eeVX2U2EOZ7Y5L-4TRs7LPGhf7wPuf96tMOTCw.png",[32,18364,18366],{"id":18365},"step-3-query-iceberg-tables-in-snowflake-ai-data-cloud","Step 3: Query Iceberg Tables in Snowflake AI Data Cloud",[48,18368,18369],{},"Once the tables are available in Snowflake Open Catalog, you can use Snowflake AI Data Cloud to query them.",[48,18371,18372],{},"For more details:",[321,18374,18375,18384],{},[324,18376,18377,18378,18383],{},"Refer to ",[55,18379,18382],{"href":18380,"rel":18381},"https:\u002F\u002Fdocs.snowflake.com\u002Fen\u002Fuser-guide\u002Ftables-iceberg-open-catalog-query",[264],"Snowflake documentation"," (add a link if available).",[324,18385,18386,18387,18392],{},"Watch ",[55,18388,18391],{"href":18389,"rel":18390},"https:\u002F\u002Fyoutu.be\u002F658ZV78lyew",[264],"this video",", which provides a detailed query walkthrough.",[40,18394,18396],{"id":18395},"choose-the-right-lakehouse-table-mode","Choose the Right Lakehouse Table Mode",[48,18398,18399],{},"When setting up the Ursa Engine, it supports two different storage modes for writing streaming data into Iceberg tables. Currently, this setting is configured at a per-cluster level and will soon be supported at a per-topic level. These two storage modes are:",[321,18401,18402,18405],{},[324,18403,18404],{},"SBT Mode (Stream Backed by Table): Also known as Ursa Managed Table, where Ursa Engine manages, compacts, and preserves metadata such as offsets.",[324,18406,18407],{},"SDT Mode (Stream Delivered to Table): Also known as Ursa External Table, where Ursa Engine does not manage the table’s lifecycle but instead appends or upserts records to it. The Iceberg Catalog provider manages the table’s lifecycle.",[32,18409,18411],{"id":18410},"sbt-mode-stream-backed-by-table","SBT Mode: Stream Backed by Table",[48,18413,18414],{},"‍Ursa's default lakehouse-native storage mode follows the \"stream backed by table\" concept. This approach, as described earlier, compacts all streaming data into columnar Parquet files, organizing them into Iceberg tables.",[48,18416,18417],{},"With this mode:",[321,18419,18420,18423,18426,18429],{},[324,18421,18422],{},"Only one copy of the streaming data is stored.",[324,18424,18425],{},"All streaming-related metadata (such as offsets and ordering) is preserved.",[324,18427,18428],{},"You can replay the entire stream by reading the Parquet files from the backed table.",[324,18430,18431],{},"You achieve \"stream-table duality\" while maintaining a single copy of data governed by a catalog service.",[48,18433,18434],{},"This mode is also known as “Ursa Managed Table” because Ursa manages the entire data lifecycle based on retention policies and automatically registers the table in a data catalog for easy discovery.",[3933,18436,18438],{"id":18437},"best-use-cases-for-sbt-mode","Best Use Cases for SBT Mode",[321,18440,18441],{},[324,18442,18443],{},"Storing raw data in a Medallion Architecture as bronze tables, which retain all historical data for replay and auditing purposes.",[32,18445,18447],{"id":18446},"sdt-mode-stream-delivered-to-table","SDT Mode: Stream Delivered to Table",[48,18449,18450],{},"‍In contrast, SDT tables do not preserve all streaming-related metadata (such as offsets). Instead, the data is delivered to an Iceberg table through append or upsert operations. However, this table is managed externally, outside of Ursa Engine.",[3933,18452,18454],{"id":18453},"key-differences-in-sdt-mode","Key Differences in SDT Mode",[321,18456,18457,18460,18463,18466],{},[324,18458,18459],{},"SDT tables do not back up the stream, meaning streaming reads via the Kafka protocol are not feasible.",[324,18461,18462],{},"Since the stream and table lifecycles are decoupled, this mode is better suited for storing compacted data using upsert operations.",[324,18464,18465],{},"Ursa can either append or upsert changes into the external table, offering flexibility in partitioning strategies.",[324,18467,18468],{},"SDT tables are often referred to as \"External Tables\" because Ursa does not manage the table’s lifecycle. Instead, it is typically managed by a data catalog service provider, which may also optimize tables through maintenance services.",[3933,18470,18472],{"id":18471},"best-use-cases-for-sdt-mode","Best Use Cases for SDT Mode",[321,18474,18475,18478],{},[324,18476,18477],{},"Storing compacted, curated, and transformed data—such as silver and gold tables in a Medallion Architecture.",[324,18479,18480],{},"Aggregated data optimized for production analytics.",[32,18482,18484],{"id":18483},"sbt-mode-vs-sdt-mode-decide-which-mode-fits-your-use-case","SBT Mode vs. SDT Mode: Decide Which Mode Fits Your Use Case",[48,18486,18487],{},"Following is a table summarizing the differences between SBT mode and SDT mode. You can use it as a guide to determine which mode fits your use cases better.",[48,18489,18490],{},[384,18491],{"alt":18,"src":18492},"\u002Fimgs\u002Fblogs\u002F67db7821669e50189550e075_AD_4nXfAjZ4VYk-T5wWXk5hQRJQcCE-Cuq4zc502zSkx_BKlRMBQF2gvpEWS1Ds-cQCVYkh3fELh1Y3EC7m7mCYs06BryuZ5WpMMFvgnF1vpndyZInx1t_YLrllQj0RNkALuLgSSnFRm0w.png",[225,18494,18496],{"id":18495},"choosing-the-right-mode","Choosing the Right Mode",[48,18498,18499],{},"✔ If you need a single data copy that can be replayed using Kafka protocol at any time → Choose SBT Mode.\n✔ If you prefer a decoupled approach where the streaming engine just delivers changes (e.g., upserts) to a lakehouse table → Opt for SDT Mode.",[32,18501,18503],{"id":18502},"other-best-practices","Other Best Practices",[3933,18505,18507],{"id":18506},"optimize-sdt-table-for-snowflake-queries","Optimize SDT table for Snowflake Queries",[48,18509,18510],{},"If you are using SDT tables, ensure you align Iceberg partitioning with your primary Snowflake query patterns. This will help:",[321,18512,18513,18516],{},[324,18514,18515],{},"Reduce query latency",[324,18517,18518],{},"Minimize data-scanning costs",[3933,18520,18522],{"id":18521},"retention-and-lifecycle-policies","Retention and Lifecycle Policies",[321,18524,18525,18528],{},[324,18526,18527],{},"In SBT Mode, configure Ursa’s data retention settings to automatically remove or compact older data based on compliance and cost constraints.",[324,18529,18530],{},"In SDT Mode, schedule periodic compactions or optimizations via your data lakehouse service (e.g., file merges, vacuuming, or upserts).",[3933,18532,18534],{"id":18533},"monitor-stream-lag-and-table-snapshots","Monitor Stream Lag and Table Snapshots",[321,18536,18537,18540,18543,18546],{},[324,18538,18539],{},"Keep track of how frequently new data commits land in Iceberg.",[324,18541,18542],{},"Balance commit frequency with ingestion throughput to:",[324,18544,18545],{},"Avoid tiny files that slow down queries.",[324,18547,18548],{},"Prevent stale data caused by infrequent commits.",[48,18550,18551],{},"By understanding the nuances of SBT vs. SDT tables, you can architect your data streaming architecture to meet specific business and analytical needs.",[40,18553,18555],{"id":18554},"use-ursa-with-your-existing-kafka-clusters","Use Ursa with Your Existing Kafka Clusters",[48,18557,18558,18559,18562],{},"If you already have an existing Kafka cluster, you can transition to this modern architecture without major operational changes by using ",[55,18560,18561],{"href":6305},"Universal Linking (UniLink)",". UniLink allows you to link any Kafka cluster—whether it is MSK, Confluent, RedPanda, or a self-managed Apache Kafka deployment—to StreamNative Ursa.",[32,18564,18566],{"id":18565},"what-is-universal-linking","What Is Universal Linking?",[48,18568,18569],{},"Universal Linking is a cost-effective solution for Kafka data replication and migration. With UniLink, you can seamlessly mirror data from any Kafka-compatible source cluster into Ursa while preserving offsets, consumer groups, schemas, ACLs, and configurations.",[48,18571,18572],{},"Unlike traditional topic mirroring mechanisms, UniLink does not replicate data over a network between brokers. Instead, it utilizes object storage as both the networking and replication layer, eliminating expensive inter-AZ network transfers and reducing infrastructure overhead. This approach significantly lowers costs while maintaining high data fidelity across multiple environments.",[48,18574,18575],{},"Once UniLink is configured, data from your source Kafka cluster is seamlessly written to the storage bucket and compacted into Iceberg tables, making it immediately available for querying in Snowflake.",[48,18577,18578],{},[384,18579],{"alt":18,"src":18580},"\u002Fimgs\u002Fblogs\u002F67db771b31f45f01307bdb44_AD_4nXc2NUvvNIGuLQmMdYff8IUP8yHKPK_NnYShSIxqN59YsXzUo6GS-1pS78HXg9B3tJufTjtOLJNf1I0ZVIhk659SOJyW7wZWHI6Icv8WeDpoM9TQAABtAyqrffpDoD7jq65rGbk1.png",[48,18582,18583],{},"By leveraging UniLink, you can bridge your existing Kafka clusters with Iceberg tables efficiently, enabling a cost-effective solution to migrate and stream data into Snowflake without modifying your existing Kafka setup.",[40,18585,319],{"id":316},[48,18587,18588],{},"By combining StreamNative’s Ursa, Apache Iceberg, and Snowflake Open Catalog, you can build a scalable, zero-copy data streaming solution for Snowflake. This approach offers several benefits:",[321,18590,18591,18594,18597,18600],{},[324,18592,18593],{},"Avoid duplicating data storage and transfer",[324,18595,18596],{},"Simplified architecture",[324,18598,18599],{},"Direct access to fresher data in Snowflake",[324,18601,18602],{},"Centralized data governance",[32,18604,18606],{"id":18605},"key-advantages","Key Advantages",[48,18608,18609],{},"✔ Apache Iceberg + Snowflake Open Catalog eliminates the need for a dedicated connector cluster, simplifying the overall architecture.\n✔ StreamNative Ursa automatically writes streaming data to Iceberg tables, ensuring your data is always fresh.\n✔ Snowflake queries Iceberg tables in near real-time, delivering the best of both worlds—flexible data lake storage and powerful data warehouse analytics.",[48,18611,18612],{},"Additionally, Universal Linking allows you to connect any existing Kafka clusters with Ursa, enabling you to enjoy the same architectural benefits without re-programming your applications.",[40,18614,13565],{"id":1727},[48,18616,18617],{},"In our third and final blog post, we’ll compare the connector-based approach from the first blog post with the Zero-Copy (Iceberg\u002FOpen Catalog) method covered in this post. We’ll explore the trade-offs, performance considerations, cost implications, and operational complexity of each approach to help you determine which best fits your organization’s needs. Stay tuned for Part 3—coming soon! 🚀",{"title":18,"searchDepth":19,"depth":19,"links":18619},[18620,18621,18622,18623,18631,18637,18640,18643],{"id":17971,"depth":19,"text":17972},{"id":17993,"depth":19,"text":17994},{"id":2696,"depth":19,"text":2697},{"id":18044,"depth":19,"text":18045,"children":18624},[18625,18626,18627,18628,18629,18630],{"id":10103,"depth":279,"text":10104},{"id":18073,"depth":279,"text":18074},{"id":18106,"depth":279,"text":18107},{"id":18207,"depth":279,"text":18208},{"id":18292,"depth":279,"text":18293},{"id":18365,"depth":279,"text":18366},{"id":18395,"depth":19,"text":18396,"children":18632},[18633,18634,18635,18636],{"id":18410,"depth":279,"text":18411},{"id":18446,"depth":279,"text":18447},{"id":18483,"depth":279,"text":18484},{"id":18502,"depth":279,"text":18503},{"id":18554,"depth":19,"text":18555,"children":18638},[18639],{"id":18565,"depth":279,"text":18566},{"id":316,"depth":19,"text":319,"children":18641},[18642],{"id":18605,"depth":279,"text":18606},{"id":1727,"depth":19,"text":13565},"2025-03-21","Discover how to stream data into Snowflake without connectors using Apache Iceberg and Snowflake Open Catalog. Learn how StreamNative Ursa enables real-time, zero-copy data streaming for AI and analytics at scale.","\u002Fimgs\u002Fblogs\u002F67db7825761ad1c763709bb2_image-52.png",{},"\u002Fblog\u002Fdefinitive-guide-for-streaming-data-into-snowflake-part-2-lakehouse-native-data-streaming-with-apache-iceberg-and-snowflake-open-catalog","20 min",{"title":17941,"description":18645},"blog\u002Fdefinitive-guide-for-streaming-data-into-snowflake---part-2-lakehouse-native-data-streaming-with-apache-iceberg-and-snowflake-open-catalog",[800,18653,1332],"Snowflake","2Guz7PAwBq_N4gsl1OmrcApbqML0ZtPk7BAc-9RzR8s",{"id":18656,"title":18657,"authors":18658,"body":18659,"category":290,"createdAt":290,"date":18875,"description":18876,"extension":8,"featured":294,"image":18877,"isDraft":294,"link":290,"meta":18878,"navigation":7,"order":296,"path":18879,"readingTime":17934,"relatedResources":290,"seo":18880,"stem":18881,"tags":18882,"__hash__":18883},"blogs\u002Fblog\u002Funilink-your-universal-tableflow-for-kafka--at-your-fingertips.md","UniLink: Your Universal “Tableflow” for Kafka—At Your Fingertips",[806],{"type":15,"value":18660,"toc":18861},[18661,18664,18667,18670,18672,18675,18678,18689,18693,18696,18700,18703,18707,18710,18714,18717,18721,18724,18728,18731,18734,18748,18752,18755,18760,18768,18771,18781,18784,18798,18802,18805,18809,18812,18815,18818,18822,18828,18831,18858],[48,18662,18663],{},"Confluent’s recent announcement of Tableflow’s general availability has sparked renewed enthusiasm around bridging Apache Kafka® with popular data lakehouses in real time. And for good reason: this release underscores how critical it is for organizations to have direct, seamless pipelines between streaming data and analytics platforms.",[48,18665,18666],{},"However, there’s a catch: Tableflow only works with Confluent Cloud. If you’re outside the Confluent ecosystem—or simply don’t want to be locked into it—where can you turn? Enter StreamNative’s Universal Linking (UniLink). Think of it as a universal “Tableflow,” enabling you to connect any Kafka cluster to any data lakehouse in real time.",[48,18668,18669],{},"Below, we’ll walk through what UniLink is, how it works, and how you can easily set it up to link data from Kafka topics to your favorite lakehouse—no Confluent lock-in required.",[40,18671,18566],{"id":18565},[48,18673,18674],{},"UniLink is a platform-agnostic solution designed to move data between different Kafka clusters and modern data lakehouses (powered by Iceberg or Delta Lake) in real time. It unifies your data flow without forcing you into a specific cloud provider or proprietary environment.",[48,18676,18677],{},"Key Capabilities:",[321,18679,18680,18683,18686],{},[324,18681,18682],{},"Full-Fidelity ReplicationUniLink captures every element—topics, offsets, consumer groups, schemas, and configurations—to create an exact copy of your Kafka topics. By preserving data integrity down to the byte, we eliminate replication drift, ensuring each environment behaves exactly the same.",[324,18684,18685],{},"Cost-Effective ReplicationUniLink replicates data from your Kafka clusters into a lakehouse efficiently by leveraging a powerful stream format, built as part of the Ursa Engine for object storage. This approach cuts streaming costs with smart zone-aware reads and direct object storage integration. By streaming data directly to cost-efficient cloud storage, you bypass broker bottlenecks, reduce cross-AZ transfer fees, and lower infrastructure overhead.",[324,18687,18688],{},"Universal InteroperabilityConnect any Kafka cluster (Confluent, MSK, Apache Kafka, Redpanda, etc.) to any lakehouse powered by Iceberg or Delta Lake. Whether you’re on-prem, in a multi-cloud environment, or both, UniLink simplifies your data architecture without tying you to a single vendor.",[40,18690,18692],{"id":18691},"where-unilink-excels","Where UniLink Excels",[48,18694,18695],{},"Below is a quick look at how UniLink compares with Confluent’s Tableflow on some of its most prominent features.",[32,18697,18699],{"id":18698},"effortless-real-time-data-movement","Effortless Real-Time Data Movement",[48,18701,18702],{},"UniLink allows you to effortlessly stream data from your Kafka topics to modern data lakehouses powered by Delta Lake or Apache Iceberg without being confined to a single cloud provider. Whether you’re on-prem, in a different major cloud, or in a hybrid environment, UniLink works seamlessly—truly universal.",[32,18704,18706],{"id":18705},"eliminating-data-silos","Eliminating Data Silos",[48,18708,18709],{},"UniLink is designed to unify data pipelines, ensuring teams can access real-time insights without complex workflows. But unlike Tableflow, you can unify data across any Kafka cluster—Confluent, MSK, Redpanda, or self-managed Kafka—into any data lakehouses powered by Delta Lake or Apache Iceberg. Eliminating vendor lock-in and future-proofing your data streaming platform.",[32,18711,18713],{"id":18712},"achieving-real-time-insights-at-scale","Achieving Real-Time Insights at Scale",[48,18715,18716],{},"UniLink provides high-throughput, low-latency data replication at a fraction of the cost of Tableflow. Under the hood, it leverages StreamNative’s Ursa engine to handle massive data volumes with robust performance guarantees at 10x lower cost. Scale up or down as your business grows without worrying about infrastructure costs.",[32,18718,18720],{"id":18719},"simplifying-pipelines-for-faster-outcomes","Simplifying Pipelines for Faster Outcomes",[48,18722,18723],{},"UniLink eliminates complexities in Kafka-to-analytics pipelines, making them easier to build and maintain without being locked into Confluent’s proprietary environment. You can keep your existing Kafka deployments, DevOps tools, and data platforms—no re-architecture required.",[40,18725,18727],{"id":18726},"why-lock-yourself-into-a-single-vendor","Why Lock Yourself Into a Single Vendor?",[48,18729,18730],{},"If you want all the benefits of Tabeflow without being locked into a single vendor, then UniLink is for you. With UniLink, you have the freedom to use any Kafka vendor you want. In an era where the Kafka landscape is evolving rapidly, keeping your options open makes sense.",[48,18732,18733],{},"With UniLink, you can:",[321,18735,18736,18739,18742,18745],{},[324,18737,18738],{},"Connect any Kafka distribution.",[324,18740,18741],{},"Send data to on-prem or cloud-based analytics platforms or lakehouses.",[324,18743,18744],{},"Avoid the heavy lifting of migrating everything to Confluent Cloud.",[324,18746,18747],{},"Simplify operations by managing fewer specialized tools.",[40,18749,18751],{"id":18750},"whats-under-the-hood-ursa-stream-storage","What’s Under the Hood: Ursa Stream Storage",[48,18753,18754],{},"UniLink’s “secret sauce” is Ursa Stream Storage—a headless, multi-modal storage layer built on object storage and open table formats (Apache Iceberg or Delta Lake). Internally, it stores data in Parquet files and can present those files as either continuous streams or as well-organized, compacted tables.",[48,18756,18757],{},[384,18758],{"alt":18,"src":18759},"\u002Fimgs\u002Fblogs\u002F67db741c138e5b1eb23954c8_AD_4nXeajL1bzN0MZeIrzO0SFV2MdzcRcRyEoTqEDdCd34vgzBzLjt11Oaifq9IJ1MepeXq63DmO8dJX4ek-xtyPOlENptL95oLsQou636J4dS7NSPXR9U15UNOzvrrKKE7dhSaU_LZd.png",[48,18761,18762,18763,18767],{},"Curious to learn more? Check out ",[55,18764,18766],{"href":18765},"\u002Fblog\u002Fthe-evolution-of-log-storage-in-modern-data-streaming-platforms","The Evolution of Log Storage in Modern Data Streaming Platforms"," to learn more about Ursa, and how its efficient use of infrastructure makes it the lowest cost Kafka solution on the market today.",[48,18769,18770],{},"Unified Governance with Unity Catalog, Snowflake Open Catalog & AWS S3 Tables",[48,18772,18773,18774,1186,18776,1186,18778,190],{},"UniLink isn’t just about moving data between Kafka and lakehouses. It also integrates natively with popular data catalogs that support Iceberg and\u002For Delta Lake, uniting real-time streaming and analytical data under a single governance model. Specifically, UniLink works with ",[55,18775,1185],{"href":4811},[55,18777,17114],{"href":4825},[55,18779,2876],{"href":2872,"rel":18780},[264],[48,18782,18783],{},"By leveraging UniLink to replicate your Kafka topics as lakehouse tables in these catalogs, you achieve:",[321,18785,18786,18789,18792,18795],{},[324,18787,18788],{},"Centralized Policies & Access ControlDefine and apply consistent security, lineage, and compliance rules once, instead of duplicating them across multiple systems.",[324,18790,18791],{},"Schema & Metadata DiscoveryA single “source of truth” for data definitions in both real-time streaming and batch environments, boosting data reliability and usability.",[324,18793,18794],{},"Reduced Data SilosBreak down barriers between streaming and analytics teams; everyone has a unified view of the data, enabling faster insights and easier collaboration.",[324,18796,18797],{},"Open Standard FormatsSince Ursa Engine writes data in Iceberg or Delta Lake by default, any compatible downstream engine—Databricks, Snowflake, AWS Athena, and more—can instantly query your latest streaming data.",[40,18799,18801],{"id":18800},"why-now-is-the-perfect-time-to-go-universal","Why Now Is the Perfect Time to Go Universal",[48,18803,18804],{},"Confluent’s announcement has spotlighted the importance of bridging Kafka and analytics seamlessly. If you’ve been evaluating solutions for real-time data pipelines, there’s no better moment to consider UniLink. Keep your options open by choosing a truly universal solution that fits your existing environment and future plans.",[40,18806,18808],{"id":18807},"in-a-nutshell","In a Nutshell",[48,18810,18811],{},"Tableflow: A solid step for Confluent Cloud users who want direct pipelines from Kafka to data warehouses and lakehouses.",[48,18813,18814],{},"UniLink: Everything Tableflow aims to do—plus support for any Kafka cluster, with no forced move to Confluent Cloud.",[48,18816,18817],{},"If you need real-time data replication, analytics, and streaming at scale, but want to avoid the cost and complexity of a single-vendor ecosystem, UniLink is your ready-to-roll universal alternative.",[40,18819,18821],{"id":18820},"take-the-next-step","Take the Next Step",[48,18823,18824,18825,18827],{},"Ready to ride this data-streaming wave on your terms? Check out ",[55,18826,1249],{"href":6305}," and discover how it can unlock the full potential of your existing Kafka infrastructure—without forcing you to Confluent.",[48,18829,18830],{},"Learn More About Universal Linking:",[321,18832,18833,18840,18847,18852],{},[324,18834,18835,18839],{},[55,18836,18838],{"href":18837},"\u002Fwebinars\u002Fdata-streaming-launch---march-2025","Data Streaming Launch"," - March 2025",[324,18841,18842],{},[55,18843,18846],{"href":18844,"rel":18845},"https:\u002F\u002Fyoutu.be\u002FMCtis-AQhIg?si=ik9q9z51L9eI8KZa&t=3947",[264],"Data Streaming Summit 2024 Keynote",[324,18848,18849],{},[55,18850,18851],{"href":4863},"Effortless Kafka Migration & Real-Time Data Replication with StreamNative UniLink",[324,18853,18854],{},[55,18855,18857],{"href":18856},"\u002Fblog\u002Fintroducing-universal-linking-revolutionizing-data-replication-and-interoperability-across-data-streaming-systems","Introducing Universal Linking: Revolutionizing Data Replication and Interoperability Across Data Streaming Systems",[48,18859,18860],{},"Make the most of this exciting moment in data streaming, and harness the freedom, flexibility, and universal interoperability your business deserves!",{"title":18,"searchDepth":19,"depth":19,"links":18862},[18863,18864,18870,18871,18872,18873,18874],{"id":18565,"depth":19,"text":18566},{"id":18691,"depth":19,"text":18692,"children":18865},[18866,18867,18868,18869],{"id":18698,"depth":279,"text":18699},{"id":18705,"depth":279,"text":18706},{"id":18712,"depth":279,"text":18713},{"id":18719,"depth":279,"text":18720},{"id":18726,"depth":19,"text":18727},{"id":18750,"depth":19,"text":18751},{"id":18800,"depth":19,"text":18801},{"id":18807,"depth":19,"text":18808},{"id":18820,"depth":19,"text":18821},"2025-03-20","StreamNative’s UniLink is your universal alternative to Tableflow—connect any Kafka cluster to any data lakehouse in real time. No Confluent Cloud required. Unlock seamless, cost-effective real-time data movement today!","\u002Fimgs\u002Fblogs\u002F67d836fab4d736f3226719e2_image-49.png",{},"\u002Fblog\u002Funilink-your-universal-tableflow-for-kafka-at-your-fingertips",{"title":18657,"description":18876},"blog\u002Funilink-your-universal-tableflow-for-kafka--at-your-fingertips",[800,11899,4152,1332],"rKvAzqjb7Z25u3d6RfSbbQbywcCQHEnDq3Y5SHRmqDM",{"id":18885,"title":18886,"authors":18887,"body":18888,"category":290,"createdAt":290,"date":19139,"description":19140,"extension":8,"featured":294,"image":19141,"isDraft":294,"link":290,"meta":19142,"navigation":7,"order":296,"path":6864,"readingTime":17934,"relatedResources":290,"seo":19143,"stem":19144,"tags":19145,"__hash__":19146},"blogs\u002Fblog\u002Fannouncing-ursa-engine-ga-on-aws-leaderless-lakehouse-native-data-streaming-that-slashes-kafka-costs-by-95.md","Announcing Ursa Engine GA on AWS: Leaderless, Lakehouse-Native Data Streaming That Slashes Kafka Costs by 95%",[806],{"type":15,"value":18889,"toc":19131},[18890,18901,18916,18920,18923,18937,18940,18944,18952,18986,18991,18994,18998,19006,19009,19012,19015,19023,19026,19029,19047,19050,19060,19063,19067,19074,19085,19089,19092,19100,19109,19116,19125,19128],[48,18891,18892,18893,18895,18896,18900],{},"We’re excited to announce a major milestone in the evolution of cloud-native data streaming: ",[55,18894,4725],{"href":6647}," is now Generally Available on StreamNative BYOC for AWS! Built to fulfill the promise of the ",[55,18897,18899],{"href":18898},"\u002Fblog\u002Fintroducing-streaming-augmented-lakehouse-sal-for-the-data-foundation-of-real-time-gen-ai","Streaming Augmented Lakehouse",", Ursa Engine is the first and only Kafka-compatible data streaming engine purpose-built for AI-Ready data lakehouses in cloud-native environments. It streamlines data streaming into your lakehouse, augmenting it with real-time streaming capabilities and slashing infrastructure costs by up to 95% compared to traditional Kafka deployments.",[48,18902,18903,18904,18907,18908,1186,18911,18915],{},"In tandem with our GA release, we’re proud to share that Ursa Engine now natively integrates data across various locations such as tables stored in ",[55,18905,17784],{"href":17824,"rel":18906},[264]," and tables registered in ",[55,18909,1185],{"href":17805,"rel":18910},[264],[55,18912,17114],{"href":18913,"rel":18914},"https:\u002F\u002Fwww.snowflake.com\u002Fen\u002Fproduct\u002Ffeatures\u002Fopen-catalog\u002F",[264],", and — providing organizations with end-to-end data governance for both streaming and batch workloads.",[40,18917,18919],{"id":18918},"the-streaming-augmented-lakehouse-why-it-matters","The Streaming Augmented Lakehouse: Why It Matters",[48,18921,18922],{},"Traditional data ecosystems often require multiple, separate infrastructures: one for real-time data streaming (e.g., Kafka or Pulsar) and another for batch processing via data lakehouses (e.g., Delta Lake, Iceberg). This split environment not only complicates governance, schema management, and data discovery—it also introduces expensive infrastructure costs resulting from repeated data transfers and storage, complex ETL processes, and error-prone, duplicated schema mapping. Specifically, organizations face:",[321,18924,18925,18928,18931,18934],{},[324,18926,18927],{},"Costly Data Transfers: Frequent cross-system data movement drives up infrastructure expenses.",[324,18929,18930],{},"Fragmented Governance: Duplicating access policies, security settings, and lineage tracking across multiple platforms leads to inconsistencies.",[324,18932,18933],{},"Operational Complexity: Running two or more separate systems for data streaming and lakehouses is labor-intensive.",[324,18935,18936],{},"Data Silos: Maintaining consistent data sets across streaming, warehouse, and lakehouse environments is resource-heavy and prone to errors.",[48,18938,18939],{},"Ursa Engine solves these challenges by augmenting the lakehouse with Kafka-compatible data streaming capabilities, leveraging open storage formats like Delta Lake and Iceberg, and unifying governance through catalog integrations. The result: real-time AI and analytics without the overhead of siloed data pipelines or expensive multi-system architectures.",[40,18941,18943],{"id":18942},"general-availability-on-streamnative-byoc-for-aws","General Availability on StreamNative BYOC for AWS",[48,18945,18946,18947,18951],{},"Ursa Engine is now officially GA on ",[55,18948,18950],{"href":18949},"\u002Fdeployment\u002Fbyoc","StreamNative BYOC (Bring Your Own Cloud)"," for AWS, giving organizations the freedom to deploy Ursa in their own cloud environment—while offering a fully integrated approach to streaming data into lakehouses. Key benefits include:",[1666,18953,18954,18961,18964,18972,18975,18978],{},[324,18955,18956,18957,18960],{},"10x Infrastructure Cost Reduction (Up to 95% Savings)Ursa’s leaderless architecture eliminates inter-AZ data transfer overhead and leverages lakehouse-native storage, driving down costs significantly. Read ",[55,18958,18959],{"href":10357},"our cost benchmark report"," to see how Ursa sustains a 5GB\u002Fs Kafka workload at just 5% of the cost of traditional streaming engines like Kafka and Redpanda.",[324,18962,18963],{},"Kafka Protocol CompatibilityRetain your existing Kafka clients and applications without rewriting code.",[324,18965,18966,18967,18971],{},"Latency-Relaxed WorkloadsStrike the ideal balance between ",[55,18968,18970],{"href":18969},"\u002Fblog\u002Fcap-theorem-for-data-streaming","throughput, performance, and cost-effectiveness",", especially for AI & analytics scenarios that don’t require single-digital millisecond latencies.",[324,18973,18974],{},"Instant Lakehouse AvailabilityMake data instantly accessible in open-standard formats (e.g., Iceberg, Delta) by leveraging native lakehouse integration, removing extra ETL processes and data movement.",[324,18976,18977],{},"Unified GovernanceMaintain consistent security, lineage, and access policies, along with seamless discovery through native integration with Unity Catalog and Iceberg REST Catalog —unifying data access across both real-time and batch domains.",[324,18979,18980,18981,18985],{},"Usage-Based PricingLeverage ",[55,18982,11224],{"href":18983,"rel":18984},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-overview#elastic-throughput-unit-etu",[264]," to pay only for throughput, significantly lowering total cost of ownership compared to traditional streaming platforms.",[916,18987,18988],{},[48,18989,18990],{},"“As a longtime StreamNative customer, I couldn’t be more excited about the new Ursa Engine GA. Our evaluation shows it to be 10x more cost-efficient than other Kafka solutions. Everything is seamlessly written to object storage, automatically compacted into Iceberg tables, and made immediately available for our data teams using Snowflake.” —Christos A, Enterprise Architect at a Fortune 500 company",[48,18992,18993],{},"By adopting Ursa Engine on StreamNative BYOC, customers can consolidate their data infrastructure—reducing both costs and complexity—while unifying streaming and batch processing into one cohesive ecosystem.",[40,18995,18997],{"id":18996},"reduce-infrastructure-costs-by-10x-with-leaderless-architecture-and-lakehouse-native-storage","Reduce Infrastructure Costs by 10x with Leaderless Architecture and Lakehouse-Native Storage",[48,18999,19000,19001,19005],{},"A key differentiator of Ursa Engine is its leaderless architecture, which leverages the lakehouse as shared storage and Oxia as a scalable index\u002Fmetadata manager. This approach eliminates expensive inter-AZ traffic and significantly reduces inter-AZ data replication overhead. In a ",[55,19002,19004],{"href":19003},"\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour#key-benchmark-findings","recent benchmark",", Ursa consistently handled 5GB\u002Fs of Kafka workload for just $54 per hour—95% cheaper than vanilla Kafka and RedPanda.",[48,19007,19008],{},"In addition, Ursa Engine is the first and ONLY data streaming solution that natively implements its storage engine using open lakehouse formats, supporting both Iceberg and DeltaLake. By embedding data schemas directly into the storage layer, Ursa takes advantage of columnar compression, enabling potential 10x or more storage reduction.",[48,19010,19011],{},"Unlike other “Iceberg integrations” (e.g., RedPanda Iceberg topics), where two copies of data are maintained—one in proprietary storage and another in the lakehouse—Ursa stores data just once, cutting complexity and eliminating inconsistencies.",[48,19013,19014],{},"By embracing open lakehouse formats and avoiding leader-based inter-zone data replication, Ursa delivers up to a 10x reduction in infrastructure costs compared to traditional streaming solutions.",[48,19016,19017,19018,19022],{},"Interested in how we achieved these savings? Check out our blog post on ”",[55,19019,19021],{"href":17900,"rel":19020},[264],"Why Leaderless Architecture and Lakehouse-Native Storage for Reducing Kafka Cost","”.",[40,19024,18770],{"id":19025},"unified-governance-with-unity-catalog-snowflake-open-catalog-aws-s3-tables",[48,19027,19028],{},"Another major differentiator is that Ursa Engine natively integrates with popular data catalogs that support Iceberg and\u002For Delta Lake, bringing real-time streaming and batch data together under a single governance model through native catalog support. Specifically, Ursa Engine connects with:",[1666,19030,19031,19036,19041],{},[324,19032,19033,19035],{},[55,19034,1185],{"href":4811}," – Delivering uniform access controls and lineage across streaming and batch data, eliminating the need to maintain multiple parallel security configurations.",[324,19037,19038,19040],{},[55,19039,17114],{"href":4825}," – Allowing organizations to discover and govern real-time data—stored in open table formats like Iceberg—alongside Snowflake’s analytical workloads.",[324,19042,19043,19046],{},[55,19044,2876],{"href":2872,"rel":19045},[264]," – Ursa Engine can stream data directly into Amazon S3 Tables, leveraging Iceberg’s REST catalog to ensure centralized metadata, efficient storage optimization, and seamless querying via AWS analytics services.",[48,19048,19049],{},"By registering your Kafka topics as managed or external tables in these catalogs, you achieve:",[321,19051,19052,19054,19056,19058],{},[324,19053,18788],{},[324,19055,18791],{},[324,19057,18794],{},[324,19059,18797],{},[48,19061,19062],{},"With this native data catalog integration, Ursa achieves storing a single copy of your data in your own bucket -fully discoverable and shareable—then seamlessly provide access across Databricks, Snowflake, AWS Athena, and more. No more juggling siloed data copies or ballooning transport costs. It turns “separate worlds” of streaming and batch data into a single ecosystem, minimizing complexity while maximizing governance, security, and discoverability.",[40,19064,19066],{"id":19065},"etu-pricing-model-pay-for-throughput-not-storage","ETU Pricing Model: Pay for Throughput, Not Storage",[48,19068,19069,19070,19073],{},"Lastly, while traditional streaming platforms often bundle storage and throughput costs, Ursa Engine introduces ",[55,19071,11224],{"href":18983,"rel":19072},[264],"—a usage-based pricing model that charges only for throughput, with no storage fees.",[321,19075,19076,19079],{},[324,19077,19078],{},"Transparent & Predictable: Scale your workload as needed without hidden storage charges.",[324,19080,19081,19082,190],{},"50% Lower Cost than Confluent WarpStream: Lower your total cost of ownership (TCO) while maintaining robust performance and reliability. Check out the pricing difference in ",[55,19083,6677],{"href":19084},"\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour#comparing-total-cost-of-ownership",[40,19086,19088],{"id":19087},"getting-started-with-ursa-engine","Getting Started with Ursa Engine",[48,19090,19091],{},"Ready to take your data architecture into the era of real-time AI? Here’s how you can get started:",[48,19093,19094,19095,19099],{},"🚀 [",[55,19096,19098],{"href":3907,"rel":19097},[264],"Sign Up for Ursa Engine on StreamNative BYOC","]\nDeploy in your preferred cloud environment, configure latency-relaxed Kafka workloads, and streamline data ingestion into your lakehouse.",[48,19101,19102,19103,19108],{},"📖 [",[55,19104,19107],{"href":19105,"rel":19106},"https:\u002F\u002Fdocs.streamnative.io",[264],"Explore Our Documentation","]\nLearn how to configure Ursa Engine with Databricks Unity Catalog, Snowflake Open Catalog, and\u002For AWS S3 Tables to maintain a single governance model from ingestion to analytics.",[48,19110,19111,19112,19115],{},"📞 [",[55,19113,19114],{"href":6392},"Contact Us for a Demo","]\nSee how Ursa Engine optimizes Kafka workloads and simplifies lakehouse integration—reducing complexity and operational overhead.",[48,19117,19118,19119,19122,17865],{},"🎥 ",[2628,19120,19121],{},"Watch our on-demand workshop",[55,19123,17864],{"href":17862,"rel":19124},[264],[48,19126,19127],{},"Thank you for joining us on this journey to redefine real-time data streaming standards. With the General Availability of Ursa Engine on BYOC for AWS, complete the integrations with Unity Catalog, Snowflake Open Catalog and AWS S3 Tables, you can unify governance, cut costs, and streamline your data ingestion—all in one place.",[48,19129,19130],{},"We look forward to seeing the innovative applications and solutions you’ll build with Ursa Engine!",{"title":18,"searchDepth":19,"depth":19,"links":19132},[19133,19134,19135,19136,19137,19138],{"id":18918,"depth":19,"text":18919},{"id":18942,"depth":19,"text":18943},{"id":18996,"depth":19,"text":18997},{"id":19025,"depth":19,"text":18770},{"id":19065,"depth":19,"text":19066},{"id":19087,"depth":19,"text":19088},"2025-03-17","StreamNative’s Ursa Engine is now generally available on AWS, offering leaderless, lakehouse-native data streaming that cuts Kafka costs by 95%. Achieve seamless data streaming into lakehouses with unified governance across popular data catalogs like Databricks Unity, Snowflake Open Catalog, and AWS S3 Tables.","\u002Fimgs\u002Fblogs\u002F67d856ece8dc80cbd5e87920_image-51.png",{},{"title":18886,"description":19140},"blog\u002Fannouncing-ursa-engine-ga-on-aws-leaderless-lakehouse-native-data-streaming-that-slashes-kafka-costs-by-95",[1332,799,10322,800,303,5954],"2vGuTc-eZSnWWK0JvM7jh52Xv8IljfYpiV3FNi8HCT8",{"id":19148,"title":19149,"authors":19150,"body":19151,"category":290,"createdAt":290,"date":19139,"description":19703,"extension":8,"featured":294,"image":18877,"isDraft":294,"link":290,"meta":19704,"navigation":7,"order":296,"path":4863,"readingTime":18649,"relatedResources":290,"seo":19705,"stem":19706,"tags":19707,"__hash__":19708},"blogs\u002Fblog\u002Feffortless-kafka-migration-real-time-data-replication-with-streamnative-universal-linking.md","Effortless Kafka Migration & Real-Time Data Replication With StreamNative Universal Linking",[311],{"type":15,"value":19152,"toc":19681},[19153,19157,19160,19167,19170,19173,19187,19191,19194,19208,19212,19215,19217,19228,19233,19236,19240,19243,19255,19257,19367,19371,19374,19377,19380,19385,19388,19393,19396,19401,19404,19407,19410,19413,19418,19421,19426,19429,19434,19437,19442,19445,19448,19451,19456,19459,19464,19467,19472,19475,19480,19483,19488,19491,19496,19499,19503,19506,19509,19514,19529,19537,19541,19544,19547,19550,19555,19563,19566,19569,19572,19576,19579,19583,19591,19595,19606,19610,19618,19622,19630,19634,19642,19646,19654,19658,19669,19671,19674],[40,19154,19156],{"id":19155},"background","Background",[48,19158,19159],{},"As enterprises scale their data infrastructure, seamless data replication and migration become critical for operational efficiency. Organizations managing real-time data pipelines often face challenges in replicating data across clusters, migrating from legacy Kafka deployments, and ensuring business continuity during these transitions.",[48,19161,19162,19163,19166],{},"At StreamNative, we understand the importance of a frictionless and cost-efficient approach to data replication and migration. That's why we announced the availability of ",[55,19164,19165],{"href":18856},"Universal Linking in Private Preview"," at the Data Streaming Summit in October,2024. We are now introducing StreamNative Universal Linking Public Preview, which offers a powerful experience designed to make data movement between environments effortless.",[40,19168,5417],{"id":19169},"use-cases",[48,19171,19172],{},"StreamNative Universal Linking is designed to support a wide range of use cases; however, this post will primarily highlight the following use cases.",[1666,19174,19175,19178,19181,19184],{},[324,19176,19177],{},"Seamless Kafka Migration to StreamNative Enable a zero-downtime, low-risk migration strategy for organizations moving from Apache Kafka to StreamNative (Apache Pulsar).",[324,19179,19180],{},"Facilitate incremental migration of Kafka workloads with Universal Linking, ensuring minimal disruption during application transitions.",[324,19182,19183],{},"Real-Time Data Lakehouse Bridge Kafka’s real-time streaming data with modern lakehouse architectures.",[324,19185,19186],{},"Continuously replicates Kafka topics into Delta Lake or Iceberg tables to fuel AI\u002FML pipelines, hybrid analytics (HTAP), and real-time decision-making.",[40,19188,19190],{"id":19189},"challenges","Challenges",[48,19192,19193],{},"Organizations undergoing data replication and migration typically face the following challenges.",[321,19195,19196,19199,19202,19205],{},[324,19197,19198],{},"Offset Management – Maintaining consistent offsets across clusters is complex, especially for active consumers, to prevent duplicate processing or message loss.",[324,19200,19201],{},"Schema Replication – Managing schema compatibility, evolution, registry sync, cross-cluster lookups,",[324,19203,19204],{},"Operation Complexity – Maintaining, upgrading, and managing Kafka clusters may seek a managed alternative to offload operational burdens.",[324,19206,19207],{},"High Networking Costs – High networking costs from traditional wire-protocol-based data replication methods which also suffer from protocol-specific limitations.",[40,19209,19211],{"id":19210},"introducing-universal-linking-public-preview","Introducing Universal Linking Public Preview",[48,19213,19214],{},"We are excited to introduce Universal Linking in Public Preview , a tool built to simplify and accelerate Kafka migration and build a Real-Time Data Lakehouse.",[32,19216,18677],{"id":2742},[321,19218,19219,19222,19225],{},[324,19220,19221],{},"Full-Fidelity ReplicationPreserve topics, offsets, schemas, and ACLs for an exact Kafka replica with full offset preservation.",[324,19223,19224],{},"Cost-Effective by Design Cut storage costs by 20x with direct-to-lakehouse streaming, eliminating bottlenecks and cross-AZ fees.",[324,19226,19227],{},"Seamless Kafka IntegrationWorks with Confluent, AWS MSK, Redpanda, and self-hosted Kafka for seamless replication, migration, and scaling—no code changes needed.",[48,19229,19230],{},[384,19231],{"alt":18,"src":19232},"\u002Fimgs\u002Fblogs\u002F67d8376faffe2f105236cd7a_AD_4nXexVJBOtC__7J0JOsvur59Xt6YX2wyldxbxhEeKAOB-93mLpP_jQaC67rieWarAuMeVmpuFIHrqZMd03K9R1tK7_yDoYY4OVTBzf1ZBorbgqW_hQqQWls2VouK8O6T4_HlmpxHU.png",[48,19234,19235],{},"With Universal Linking, enterprises can seamlessly build a Real-Time Data Lakehouse, replicating data from a source Kafka cluster to object storage, integrating it with a data catalog, and enabling effortless querying. When ready to transition from the source Kafka cluster, organizations can seamlessly migrate their applications to the destination cluster on StreamNative Cloud.",[40,19237,19239],{"id":19238},"walkthrough","Walkthrough",[48,19241,19242],{},"This walkthrough provides details on utilizing Universal Linking (UniLink) to build a Real-Time Data Lakehouse and facilitate Kafka migration. The process involves configuring source and destination clusters, streaming data to an S3 table bucket, and querying data from Amazon Athena. After establishing a pipeline to stream data to a Real-Time Data Lakehouse, we will explore the steps required to transition producers and consumers to a new StreamNative cluster, enabling a seamless migration from the source Kafka cluster.",[48,19244,19245,19246,1186,19248,5422,19250,19254],{},"While StreamNative’s platform supports multiple catalogs—including ",[55,19247,1185],{"href":4811},[55,19249,17114],{"href":4825},[55,19251,17784],{"href":19252,"rel":19253},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-s3-tables#introduction",[264],"—this post specifically focuses on S3 Tables Integration.",[48,19256,10104],{},[1666,19258,19259,19308,19345],{},[324,19260,19261,19262,19267,19268,19270,19271,19276,19278,19279,3931,19284,19286,19287,19292,19294,19295,19300,19302,19303],{},"An ",[55,19263,19266],{"href":19264,"rel":19265},"https:\u002F\u002Fportal.aws.amazon.com\u002Fbilling\u002Fsignup\u002Fiam?nc2=h_ct&redirect_url=https:\u002F\u002Faws.amazon.com\u002Fregistration-confirmation&src=header_signup#\u002Fsupport",[264],"AWS account"," with access to the following AWS services:",[15918,19269],{},"1.1 ",[55,19272,19275],{"href":19273,"rel":19274},"https:\u002F\u002Faws.amazon.com\u002Fs3\u002F",[264],"Amazon S3",[15918,19277],{},"1.2 ",[55,19280,19283],{"href":19281,"rel":19282},"https:\u002F\u002Faws.amazon.com\u002Fiam\u002F",[264],"AWS Identity and Access Management (IAM)",[15918,19285],{},"1.3 ",[55,19288,19291],{"href":19289,"rel":19290},"https:\u002F\u002Faws.amazon.com\u002Fathena\u002F",[264],"Amazon Athena",[15918,19293],{},"1.4 ",[55,19296,19299],{"href":19297,"rel":19298},"https:\u002F\u002Faws.amazon.com\u002Fglue\u002F",[264],"AWS Glue",[15918,19301],{},"1.5 ",[55,19304,19307],{"href":19305,"rel":19306},"https:\u002F\u002Faws.amazon.com\u002Flake-formation\u002F",[264],"AWS Lake Formation",[324,19309,19310,19311,19313,19314,19319,19320,19322,19323,19328,19329,19331,19332,19337,19338,19340,19341],{},"Create a BYOC infrastructure pool by taking the following steps.",[15918,19312],{},"2.1 ",[55,19315,19318],{"href":19316,"rel":19317},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbyoc-aws-access",[264],"Grant StreamNative vendor access"," to manage clusters in your AWS account.",[15918,19321],{},"2.2 ",[55,19324,19327],{"href":19325,"rel":19326},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcreate-cloud-connection",[264],"Create a cloud connection"," for StreamNative to provision your environment.",[15918,19330],{},"2.3 ",[55,19333,19336],{"href":19334,"rel":19335},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcreate-cloud-environment",[264],"Create Cloud Environment"," to provision Pool Members in your designated cloud region.",[15918,19339],{},"2.4 ",[55,19342,19344],{"href":17095,"rel":19343},[264],"Watch this video for a detailed explanation.",[324,19346,19347,19348,19350,19351,19353,19354,3931,19359,19361,19362],{},"Create StreamNative BYOC ",[55,19349,1332],{"href":6647}," Cluster where data will be replicated",[15918,19352],{},"3.1 ",[55,19355,19358],{"href":19356,"rel":19357},"http:\u002F\u002Fstreamnative.io",[264],"Follow these steps to create a StreamNative Ursa cluster.",[15918,19360],{},"3.2 ",[55,19363,19366],{"href":19364,"rel":19365},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-s3-tables#step-1-create-a-stream-native-byoc-ursa-cluster-in-stream-native-cloud-console",[264],"Enable S3 Tables Integration while deploying Ursa cluster ",[32,19368,19370],{"id":19369},"step-1-create-unilink-to-replicate-data-from-source-to-streamnative-cluster","Step 1: Create UniLink to replicate data from source to StreamNative Cluster",[48,19372,19373],{},"We will be creating two UniLinks. One for Data Migration and one for Schema migration.",[48,19375,19376],{},"Data Migration",[48,19378,19379],{},"On the StreamNative Cloud homepage, navigate to the left-hand menu and select UniLink. Then, click Create and choose Data Migration, as illustrated in the image below.",[48,19381,19382],{},[384,19383],{"alt":18,"src":19384},"\u002Fimgs\u002Fblogs\u002F67d8376fd886b757638f7222_AD_4nXerEraFJjPRfYUjFfpPAlhcoA2SNi34-x7QwXcGb6WCuWy-b8RfFnvU-CTv0WrP0bnZkNFr8vA9hswbXfcKDk4TiSYTDqa1V5mxTwLkVUXWujPNNLceW-nR6A_zSmBv8nAFKP6CrQ.png",[48,19386,19387],{},"Enter a name for the Data Migration UniLink, then click on Source and Destination Details.",[48,19389,19390],{},[384,19391],{"alt":18,"src":19392},"\u002Fimgs\u002Fblogs\u002F67d8376fe14856435249159e_AD_4nXewVgs9QRFr-LPQdI1GnxPJGWvpKBPuWUOcSqqlo_jUJK3Vwep-lcCkzTN7mtZ_ymNQ6hKcr_g0IDNT8gGcPuh9Xlfgf3oLH0QXaasxYSFZodONF-ca_Qub3IskK25fGhdJYAyO.png",[48,19394,19395],{},"Enter the Source Cluster Broker URL, Destination Instance, and Ursa Cluster in StreamNative Cloud. Then, provide the Key\u002FSecret for source Kafka cluster for authentication, and click on ‘Connect & Deploy.’",[48,19397,19398],{},[384,19399],{"alt":18,"src":19400},"\u002Fimgs\u002Fblogs\u002F67d8376f1b46b4d2ffe99fe8_AD_4nXcs27W8Kd3ZmcGrQ8GAM2GyXI8XAg69X_MjWS2CikuX5Rr2a-dnCG2-bvIt_LkoQQaqQvyVFqP9axp1hCeUmDLdvs1WnoQRT8EXmFIBBMn1duc6iPRX4sF483q5MNRG2h0snsr3.png",[48,19402,19403],{},"After clicking Connect and Deploy, the UniLink runtime verifies the connection and proceeds upon successful authentication. If the connection fails, the user is prompted to return to the previous step to correct the source cluster details.",[48,19405,19406],{},"In the next step, configure the topics to be migrated from the source to the destination cluster. You can either select All Topics Replication to replicate all topics or specify topics to migrate using a regex expression. In this guide, we will select All Topics Replication, as shown in the image below.",[48,19408,19409],{},"Additionally, you can specify the tenant and namespace for the destination StreamNative Ursa cluster. In this post, we will keep the default settings, allowing the Data Migration UniLink to migrate topics and data to the default tenant (public) and default namespace (default).",[48,19411,19412],{},"After entering all details, click Deploy.",[48,19414,19415],{},[384,19416],{"alt":18,"src":19417},"\u002Fimgs\u002Fblogs\u002F67d8376f3445d3c37beab959_AD_4nXeaR3a5NOOQwctrjPT7demhH2jOl38AjIMUSVNg1aYEcFxrLiL-3GM8-poX01txxqo0VW7TcTa9YLK2XP6OGOXe4K_MTPJMmfjFYzk3xeVkN4Flmirtzz8AS8zBGUBs1WKnzcXdsQ.png",[48,19419,19420],{},"Preview the topics from your source cluster and click Continue.",[48,19422,19423],{},[384,19424],{"alt":18,"src":19425},"\u002Fimgs\u002Fblogs\u002F67d8376fed5f8e24ef426f03_AD_4nXdZ9ygXiSLEW5tlbqcMHistdm1xHi5yjcbx9FUjBzYUtff9l1Pa2R04OPBeinswXR4nR1rFHshs_JAJib8zTOvW6HZvyksHx1p_D0Y0emWpMius_k2KzjwBfB-kmD1giFuZoOX9.png",[48,19427,19428],{},"In the next optional step, configure Consumer Groups by either selecting All Consumer Groups or specifying specific consumer groups using a regex expression. In this post, we will choose All Consumer Groups and click Deploy.",[48,19430,19431],{},[384,19432],{"alt":18,"src":19433},"\u002Fimgs\u002Fblogs\u002F67d8376f4765bf4df8c02fb4_AD_4nXfUqmELpgfOrAvtt30Y9DLADR-BjwjfwNa426B6nAxHwihqqzctrdim95mqkk7164YR8WYuud3gvmG304at_mEBf_Q6fMwoeh5wz5FI5mJNh9K6nnBk_uaRw4Bt6kSiNOHz8qtasQ.png",[48,19435,19436],{},"Once the Data Migration UniLink is deployed you can find it in the list view as highlighted in the picture below.",[48,19438,19439],{},[384,19440],{"alt":18,"src":19441},"\u002Fimgs\u002Fblogs\u002F67d8376fb4d736f3226793d1_AD_4nXdM2Y6tDtSic2fiueeajmgfw93Q27fRosVM5K9UgBOkDIBptHgHXvBYxegtW7XTQ2ceVDwY92zOFScx0X4ToqRc9wfCX-yqYaMt2eY8ecgmL8T1uLXWi-YKlv67bmUtu2QhSwfwtA.png",[48,19443,19444],{},"In the next section, we will review the topics and data replicated by this Data Migration link. However, before proceeding, let's create a link to replicate schemas.",[48,19446,19447],{},"Schema Migration",[48,19449,19450],{},"On the StreamNative Cloud homepage, navigate to the left-hand menu and select UniLink. Then, click Create and choose Schema Migration, as illustrated in the image below.",[48,19452,19453],{},[384,19454],{"alt":18,"src":19455},"\u002Fimgs\u002Fblogs\u002F67d8376fd886b757638f723e_AD_4nXdzjPoSxjQFBsMinJMnZju5tvMbWVp2tJnbrWPQiAqzMiNvWqvSp9ugBEAbkHWzT4KmYYgRUqD66XNV-zwOzBgmfKcK-cXsWRVsnjoMTKxCm4BDGCVOky3nAhQSRHfTOf6Duekp-w.png",[48,19457,19458],{},"Enter a name for the Schema Migration UniLink, then click on Source and Destination Details.",[48,19460,19461],{},[384,19462],{"alt":18,"src":19463},"\u002Fimgs\u002Fblogs\u002F67d8376f16464a7b56b0e880_AD_4nXf7T6Mcx2W2NMyq0IyNG2xmC00jB--6sIKNWXdRl41eu9vByTplS9C0KnxDRJr4I7ZToBn4fManmCJmsn0dekO1XSbr8bwiV-CUHhZmca8rw3SZJXWsbwV3x-O1CQaCcrC7Cphw-Q.png",[48,19465,19466],{},"Enter the Source Cluster Broker URL, Destination Instance, and Ursa Cluster in StreamNative Cloud. Then, provide the Secret for source Schema registry cluster for authentication, and click on Subjects Configuration’",[48,19468,19469],{},[384,19470],{"alt":18,"src":19471},"\u002Fimgs\u002Fblogs\u002F67d8376faffe2f105236cd70_AD_4nXeICPW25wpFBj-8dYRZg1pK1LEhz8B5z2a9_EZoPp8QXthU3AXhbuO9wPuyuFWzVhwfQAj0DiAZiKJY7-8COJWwlwoE0X3J1WbSuJxzNocElH74cwcJgSUEzEmO5ltIOcESMpeV.png",[48,19473,19474],{},"In the next step, configure the subjects to be migrated from the source to the destination cluster. You can either select All Subjects to replicate all schemas and configurations or specify subjects to migrate using a regex expression. In this post, we will select All Subjects Replication, as shown in the image below, and click Deploy.",[48,19476,19477],{},[384,19478],{"alt":18,"src":19479},"\u002Fimgs\u002Fblogs\u002F67d8376f8e9947af078af06c_AD_4nXdd67oXXCQAMrDMsrYhcmaBHxks1-HI9XhQZ85IRE5IjHGgEuqQp2ycghKqVu1jjw24RoOIBfSqCubFrA2ZGVMIGExgC0GS3hdwpk0nqAl99DJ3tBhgFcpeW2HJYe-UeZ1jzXbW.png",[48,19481,19482],{},"In the next step preview the schemas from source schema registry, and click on Continue.",[48,19484,19485],{},[384,19486],{"alt":18,"src":19487},"\u002Fimgs\u002Fblogs\u002F67d8376f5f462be20658e2ae_AD_4nXd4IJDzCwIALR11uY9rs52HPtAkmo-C9-ners2BidwrnorLAzChCCcnHbj0Oa6LikUKMZpSrwhPgyNcyobdDb53ATZhuSK0F9Uht-YvfPlbASAw6_s8_EKfkXezcqj2InF8-DHxUA.png",[48,19489,19490],{},"Once the Schema Migration UniLink is deployed you can find it in the list view as highlighted in the picture below.",[48,19492,19493],{},[384,19494],{"alt":18,"src":19495},"\u002Fimgs\u002Fblogs\u002F67d8376fbf4c5a6f018503fc_AD_4nXcgSeVsYRg-fIlRk-qNQzE8gdnwL5uxrZhS4XZAQrYInto_xBb713USbrNJMHlx6wlVGLYcaGGmU2WxgRJouXLx1z0UX04WkA4IUDPUeAvJXyCZ-3LISZuhjKKMzjgGjJfepi4Cew.png",[48,19497,19498],{},"At this stage, both Data Replication and Schema Replication UniLinks are actively running, replicating topic data, schemas, and other configurations. In the next sections, we will review the migrated topics and schemas in StreamNative Cloud and the Amazon S3 Table bucket.",[32,19500,19502],{"id":19501},"step-2-review-replicated-data-and-schema-in-streamnative-ursa-cluster","Step 2: Review replicated data and schema in StreamNative Ursa Cluster",[48,19504,19505],{},"As described in the post above, the topics and data are expected to be replicated into the default tenant (public) and default namespace (default) within StreamNative Cloud.",[48,19507,19508],{},"To verify this, navigate to the StreamNative Cloud homepage, select the Ursa cluster, and choose the default tenant and namespace, and find the migrated topics as shown in the image below.",[48,19510,19511],{},[384,19512],{"alt":18,"src":19513},"\u002Fimgs\u002Fblogs\u002F67d888ae390ff6947b7f5681_AD_4nXfxGi6bGy9TyHG_6U_RQ0bK8uXoQm3qo7S49_kieUFN92G6av1e4LFgmuen5T_abaLAeRretBYAaLN_uqx4cNoDdm8KpHzpFon2UJSkRGSlO2WAGGVCWB7QUHFSjm6J8rhaQ-oL.png",[48,19515,19516,19517,19522,19523,19528],{},"To verify the replicated schemas, invoke the Kafka Schema Registry REST APIs to view the migrated subjects. ",[55,19518,19521],{"href":19519,"rel":19520},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=5z0G7P_BbWg&list=PL7-BmxsE3q4Vcs4_i1zTZ4nWL6Y39N91x&index=7",[264],"Watch this video to learn how to set up and access"," Kafka Schema Registry With StreamNative Cloud .",[55,19524,19527],{"href":19525,"rel":19526},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ehWeN8NS7o4&list=PL7-BmxsE3q4Vcs4_i1zTZ4nWL6Y39N91x&index=8",[264],"Watch this video ","to learn more about Querying Kafka Schema Registry.",[48,19530,19531,19532,190],{},"To verify the replicated consumer groups, watch this video to ",[55,19533,19536],{"href":19534,"rel":19535},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=jJpIqN59SOo",[264],"use Kafka CLI to Query Consumer Group Backlog",[32,19538,19540],{"id":19539},"step-3-review-and-query-data-published-in-s3-tables-bucket","Step 3: Review and query data published in S3 Tables Bucket",[48,19542,19543],{},"Instead of relying on direct networking over streaming protocols like Pulsar or Kafka, Universal Linking leverages object storage (such as S3) for both networking and storage. This architecture enables cost-effective, robust, flexible, and scalable replication across heterogeneous environments.",[48,19545,19546],{},"In this guide, as outlined in the Prerequisites section, we configured the destination cluster with S3 Tables Integration, enabling UniLink to replicate data directly to Amazon S3 Tables, which is an object store with built-in Apache Iceberg support and streamline storing tabular data at scale.",[48,19548,19549],{},"The three topics (orders.fulfilled, orders.received, and payments.processed) are created as Apache Iceberg tables in the S3 Table bucket, which is configured in the destination Ursa cluster as shown in the picture below.",[48,19551,19552],{},[384,19553],{"alt":18,"src":19554},"\u002Fimgs\u002Fblogs\u002F67d888aeee2b444d167ee225_AD_4nXfBYDt-H7HOvKVL1mt2iv96GOaIYqLRBbT2b2F1VTTGbThxbRVnns-XII3LZXv32eaaW0vm_rYcPi_UXUidZ3wwNw7lx1C8ffWTzH2fKmM-ivrhR7_I0qu0jeHpOCC-gVWLYeCpzw.png",[48,19556,19557,19558],{},"To query the Iceberg tables in Amazon Athena, ",[55,19559,19562],{"href":19560,"rel":19561},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fintegrate-with-s3-tables#configure-lake-formation-to-access-s3-table-bucket",[264],"follow steps listed on this doc.",[48,19564,19565],{},"Steps 1 to 3 listed above outline the process of building a Real-Time Data Lakehouse, where UniLink facilitates live data and schema replication in an open lakehouse format, such as Apache Iceberg. This allows Data Science, Data Analytics, and other teams to seamlessly query the replicated data.",[48,19567,19568],{},"The UniLink jobs deployed for a Real-Time Data Lakehouse are long-running and always active, continuously replicating data and schemas. This setup is particularly beneficial for organizations that want to continue using their existing Kafka cluster while leveraging StreamNative Ursa’s capabilities to build a Real-Time Data Lakehouse.",[48,19570,19571],{},"For organizations ready to fully transition from their source cluster, the next step provides best practices for switching producer and consumer applications to connect directly to the StreamNative Ursa cluster.",[32,19573,19575],{"id":19574},"step-4-switch-over-producers-consumers-to-streamnative-ursa-cluster","Step 4: Switch over producers & consumers to StreamNative Ursa Cluster",[48,19577,19578],{},"This is an option step for organizations who no longer want to keep their source Kafka clusters and switch over to StreamNative Ursa clusters. Steps 1 to 3 already replicated topics data, schema and other related configurations. In this section we will look at some of the important steps an organization needs to review before switching over completely to StreamNative Ursa Cluster.",[32,19580,19582],{"id":19581},"_1-validate-topic-and-partition-configurations","1. Validate Topic and Partition Configurations",[321,19584,19585,19588],{},[324,19586,19587],{},"Verify that all topics, partitions, and replication factors match the original cluster.",[324,19589,19590],{},"Ensure topic-level configurations (e.g., retention policies, cleanup policies, compression settings) are correctly applied.",[32,19592,19594],{"id":19593},"_2-validate-data-integrity","2. Validate Data Integrity",[321,19596,19597,19600,19603],{},[324,19598,19599],{},"Compare data consistency between old and new clusters using: Running consumers on both clusters and comparing outputs.",[324,19601,19602],{},"Checking for data loss or corruption with checksum verification.",[324,19604,19605],{},"Do not produce anything in the destination topic until the consumer group has mitigated from source to destination.",[32,19607,19609],{"id":19608},"_3-performance-and-latency-benchmarking","3. Performance and Latency Benchmarking",[321,19611,19612,19615],{},[324,19613,19614],{},"Measure producer and consumer performance on the new cluster.",[324,19616,19617],{},"Test end-to-end latency and throughput to ensure the new cluster meets SLAs.",[32,19619,19621],{"id":19620},"_4-validate-security-access-controls","4. Validate Security & Access Controls",[321,19623,19624,19627],{},[324,19625,19626],{},"Ensure ACLs, authentication mechanisms (SASL, TLS etc), and RBAC roles match between clusters.",[324,19628,19629],{},"Verify that all users and service accounts have the correct permissions.",[32,19631,19633],{"id":19632},"_5-perform-a-controlled-traffic-cutover","5. Perform a Controlled Traffic Cutover",[321,19635,19636,19639],{},[324,19637,19638],{},"Gradually redirect producer and consumer traffic in a phased manner: Blue-Green Deployment: Run some applications on the old and some on the new cluster.",[324,19640,19641],{},"Canary Testing: Route a small percentage of traffic first before full cutover.",[32,19643,19645],{"id":19644},"_6-update-client-configurations","6. Update Client Configurations",[321,19647,19648,19651],{},[324,19649,19650],{},"Modify producer and consumer configs: Update bootstrap.servers to point to the new cluster.",[324,19652,19653],{},"Adjust acks, linger.ms, and batch.size based on new cluster performance.",[32,19655,19657],{"id":19656},"_7-final-validation-decommission-old-cluster","7. Final Validation & Decommission Old Cluster",[321,19659,19660,19663,19666],{},[324,19661,19662],{},"Run end-to-end tests before decommissioning the old cluster.",[324,19664,19665],{},"Ensure all applications are stable and confirm there’s no data loss or consistency issues.",[324,19667,19668],{},"Safely shut down the old cluster once all traffic has been migrated.",[40,19670,2125],{"id":2122},[48,19672,19673],{},"StreamNative Universal Linking streamlines Kafka migration and real-time data replication, enabling seamless transitions to a modern Data Lakehouse. With full-fidelity replication, reduced operational complexity, and cost-efficient direct-to-lakehouse streaming, it ensures minimal disruption and maximum flexibility.",[48,19675,19676,19677,190],{},"Now in Public Preview, Universal Linking helps enterprises bridge streaming and analytics, unlocking real-time insights with ease. Start leveraging it today to future-proof your data infrastructure. Get started by signing up for a ",[55,19678,19680],{"href":17075,"rel":19679},[264],"trial of StreamNative Cloud",{"title":18,"searchDepth":19,"depth":19,"links":19682},[19683,19684,19685,19686,19689,19702],{"id":19155,"depth":19,"text":19156},{"id":19169,"depth":19,"text":5417},{"id":19189,"depth":19,"text":19190},{"id":19210,"depth":19,"text":19211,"children":19687},[19688],{"id":2742,"depth":279,"text":18677},{"id":19238,"depth":19,"text":19239,"children":19690},[19691,19692,19693,19694,19695,19696,19697,19698,19699,19700,19701],{"id":19369,"depth":279,"text":19370},{"id":19501,"depth":279,"text":19502},{"id":19539,"depth":279,"text":19540},{"id":19574,"depth":279,"text":19575},{"id":19581,"depth":279,"text":19582},{"id":19593,"depth":279,"text":19594},{"id":19608,"depth":279,"text":19609},{"id":19620,"depth":279,"text":19621},{"id":19632,"depth":279,"text":19633},{"id":19644,"depth":279,"text":19645},{"id":19656,"depth":279,"text":19657},{"id":2122,"depth":19,"text":2125},"StreamNative Universal Linking simplifies Kafka migration and real-time data replication, enabling seamless transitions to a modern Data Lakehouse. Reduce costs, minimize complexity, and ensure business continuity with full-fidelity replication. Now available in Public Preview!",{},{"title":19149,"description":19703},"blog\u002Feffortless-kafka-migration-real-time-data-replication-with-streamnative-universal-linking",[1332,11899,4152],"2O3eQaiYQ17VWYQ-SZnvENwLc9TzfntW9CNODC6ootk",{"id":19710,"title":19711,"authors":19712,"body":19714,"category":290,"createdAt":290,"date":20004,"description":20005,"extension":8,"featured":294,"image":20006,"isDraft":294,"link":290,"meta":20007,"navigation":7,"order":296,"path":4825,"readingTime":18649,"relatedResources":290,"seo":20008,"stem":20009,"tags":20010,"__hash__":20011},"blogs\u002Fblog\u002Fstreamnative-enables-seamless-streaming-into-apache-iceberg-tm-snowflake-open-catalog.md","StreamNative Enables Seamless Streaming into Apache Iceberg™ with Snowflake Open Catalog",[311,19713],"Ashwin Kamath",{"type":15,"value":19715,"toc":19986},[19716,19723,19726,19739,19743,19746,19750,19753,19757,19760,19764,19768,19775,19786,19790,19793,19796,19799,19802,19807,19811,19814,19819,19822,19827,19830,19847,19850,19853,19857,19860,19871,19877,19882,19886,19889,19893,19896,19901,19904,19908,19911,19914,19919,19922,19931,19935,19943,19948,19950,19953],[48,19717,19718,19719,19722],{},"As Generative AI continues to transform industries, the demand for real-time data is growing exponentially. However, ingesting and managing real-time data at scale remains a costly challenge. StreamNative’s vision is to simplify and optimize real-time data ingestion, providing a ",[55,19720,19721],{"href":14554},"cost-effective solution"," that enables organizations to harness real-time data without excessive costs, making it accessible to everyone.",[48,19724,19725],{},"StreamNative enables seamless data ingestion into open lakehouse formats like Apache Iceberg and Delta Lake, supporting various catalogs.",[48,19727,19728,19729,19733,19734,19738],{},"StreamNative is excited to help customers ingest topic data to Apache Iceberg™ cost-effectively by partnering with Snowflake to develop a native integration with ",[55,19730,19732],{"href":17817,"rel":19731},[264],"Snowflake's Open Catalog",", a fully managed service for ",[55,19735,19737],{"href":17984,"rel":19736},[264],"Apache Polaris™ (incubating)"," which is an open-source catalog enabling secure, centralized access to Iceberg tables across REST-compatible query engines.",[40,19740,19742],{"id":19741},"challenges-of-streaming-data-to-a-lakehouse-architecture","Challenges of streaming data to a lakehouse architecture",[48,19744,19745],{},"While StreamNative tackles various data ingestion challenges, this blog highlights two key areas.",[32,19747,19749],{"id":19748},"elevated-costs-associated-with-connector-based-pipelines","Elevated costs associated with connector-based pipelines",[48,19751,19752],{},"Connectors that stream data to a Lakehouse offer a fast and declarative approach to building data pipelines without requiring custom code. However, their reliance on compute resources can significantly drive up costs, depending on the processing capacity required for data workloads. Also,connector-based pipelines often introduce maintenance complexity due to operational overhead and dependency management.",[32,19754,19756],{"id":19755},"lack-of-unified-governance-with-an-interoperable-catalog","Lack of unified governance with an interoperable catalog",[48,19758,19759],{},"Another key challenge with connector-based pipelines is their inability to publish data to centralized catalogs, resulting in fragmentation.This fragmentation results in inconsistent access controls, reduced visibility, and compliance risks, complicating data integrity and security across the enterprise.",[40,19761,19763],{"id":19762},"cost-efficient-data-streaming-with-ursa-engine","Cost-Efficient Data Streaming with Ursa Engine",[32,19765,19767],{"id":19766},"ursas-leaderless-architecture-offers-cost-effective-scalable-data-streaming","Ursa’s Leaderless Architecture Offers Cost-Effective, Scalable Data Streaming",[48,19769,19770,19771,19774],{},"By ",[55,19772,19773],{"href":14554},"shifting from leader-based architectures to a leaderless design"," with a lakehouse-native storage approach, Ursa delivers key advantages:",[321,19776,19777,19780,19783],{},[324,19778,19779],{},"Elimination of inter-zone network costs, including client and data replication traffic, which are among the largest expenses in leader-based deployments like Kafka and Redpanda.",[324,19781,19782],{},"Lower storage costs through the use of cloud-native object storage and optimized columnar formats.",[324,19784,19785],{},"Seamless real-time and batch analytics without the need for costly ETL transformations.",[32,19787,19789],{"id":19788},"ensuring-governance-with-a-unified-interoperable-catalog","Ensuring governance with a unified, interoperable catalog",[48,19791,19792],{},"StreamNative Ursa’s lakehouse-native storage follows a \"stream backed by table\" approach, compacting streaming data into Parquet files within Iceberg or Delta Lake. This ensures a single, catalog-governed data copy while preserving streaming metadata for replay. Ursa Managed Table automates data lifecycle management and table registration for seamless discovery.",[48,19794,19795],{},"By compacting topic data into a single copy in Apache Parquet™ format, Ursa Engine ensures consistency, efficiency, and accessibility across teams. For example, Data Engineering & Infrastructure teams benefit from simplified management, lower storage costs, and stronger governance; Data Science & AI teams gain real-time, ETL-free access for AI\u002FML; and Analytics & BI teams accelerate insights with up-to-date, queryable data.",[48,19797,19798],{},"StreamNative's integration with Snowflake Open Catalog leverages Ursa’s lakehouse-native storage, enabling seamless data streaming directly into Snowflake’s Open Catalog for easy discovery and consumption.",[48,19800,19801],{},"Here’s a quote from Chris Child, VP of Product Management at Snowflake who underscores this vision.",[916,19803,19804],{},[48,19805,19806],{},"\"We are thrilled to partner with StreamNative to bring seamless, cost-effective streaming data ingestion into Apache Iceberg through Snowflake Open Catalog. This integration will help customers with interoperability needs make real-time data AI-ready while ensuring governance across their data ecosystem. Together, we’re enabling organizations to apply open standards and unlock new levels of efficiency and value from their streaming data with Snowflake's data and AI platform\". - Chris Child, VP of Product Management, Snowflake",[40,19808,19810],{"id":19809},"seamless-streaming-from-streamnative-to-iceberg-via-snowflake-open-catalog","Seamless Streaming from StreamNative to Iceberg via Snowflake Open Catalog",[48,19812,19813],{},"StreamNative Cloud serves as a powerful streaming layer for Iceberg tables and Snowflake Open Catalog, enabling real-time data to be universally governed and easily integrated with the Snowflake AI Data Cloud. StreamNative Cloud allows enterprises to ingest\u002Fstream, process, and manage high-velocity data streams across diverse sources while maintaining schema consistency and lineage through Snowflake Open Catalog.",[48,19815,19816],{},[384,19817],{"alt":18,"src":19818},"\u002Fimgs\u002Fblogs\u002F67bd07ba8c6b1235e301ab0d_AD_4nXcxVjhRCuHCTl4ecQTrwC9J64iNEUw8jl10S1eR581mKflriAxrSD4zG4s7rGQorxVLA3SrLkKEUlchs4xY_L7mYtJSftT7o9WiFXV3Hg5uGjg6B2efie4JtHsaOlpr2UpXVgiVQw.png",[48,19820,19821],{},"This streamlined integration not only simplifies data management but also accelerates data accessibility for downstream analytics and AI workloads, empowering organizations to unlock actionable insights from fresh, AI-ready data at scale.",[48,19823,19824],{},[384,19825],{"alt":18,"src":19826},"\u002Fimgs\u002Fblogs\u002F67bd07ba688aa4290445d1c6_AD_4nXdnqHjZg8nVEB2B1QpIIgxMv7DoVskwq2VOcRDMOOrhDQ7aFRAkBjhfRSeZDG3M_-yCcyw6IBl9JzdjYu4pGFxjyfNeApWWegkS3gx8-uRlt5hmkDVJEVhuh92EdOzCWWX0pD5j.png",[48,19828,19829],{},"The integration of StreamNative Cloud with Snowflake Open Catalog leverages Iceberg libraries to ingest data as Iceberg tables and register them within Snowflake’s Open Catalog.",[1666,19831,19832,19835,19838,19841,19844],{},[324,19833,19834],{},"Create and register an iceberg table – StreamNative Cloud utilizes Apache Iceberg libraries to authenticate with Snowflake’s Open Catalog service and execute REST APIs for table creation and registration.",[324,19836,19837],{},"Write topics data to Iceberg table – Topic data is written to Parquet files, with a corresponding Iceberg table created for each topic.",[324,19839,19840],{},"Generate snapshot – StreamNative Cloud runtime creates a new snapshot. This process occurs with each update to the Iceberg table, capturing all associated data and manifest files. Snapshots enable time-travel queries and support rollback operations.",[324,19842,19843],{},"Commit snapshot – Snapshot created in the previous step is committed in this step. Committing a snapshot is the process of atomically applying changes to an Iceberg table through the REST Catalog API. This ensures consistency and correctness in a distributed environment.",[324,19845,19846],{},"Query and Analyze Iceberg Data in Snowflake AI Data Cloud – Users can access and analyze the ingested data with the Snowflake AI Data Cloud and a variety of tools.",[48,19848,19849],{},"This native integration enables users to effortlessly configure a cluster for streaming data directly into Iceberg with just a few clicks, allowing them to quickly gain insights from their data.",[48,19851,19852],{},"StreamNative's integration with Snowflake Open Catalog provides unified governance, enabling visibility and access controls across streaming and non-streaming data as it moves from ingestion to processing, storage, and consumption. This integration also enables interoperability with a vendor-neutral, open source foundation in Apache Iceberg and Apache Polaris, giving organizations flexibility to read and write with a variety of engines.",[40,19854,19856],{"id":19855},"integration-setup","Integration Setup",[48,19858,19859],{},"To establish an integration between StreamNative and Snowflake Open Catalog, three key steps must be followed:",[1666,19861,19862,19865,19868],{},[324,19863,19864],{},"Configuring Snowflake Open Catalog – Begin by setting up Snowflake Open Catalog to enable seamless integration with StreamNative Cloud for data streaming.",[324,19866,19867],{},"Deploying and Enabling Integration – Create a cluster and activate the Snowflake Open Catalog integration within StreamNative Cloud.",[324,19869,19870],{},"Connecting to Snowflake AI Data Cloud – Configure Snowflake AI Data Cloud to access and query data published in Snowflake Open Catalog, ensuring seamless interoperability.",[48,19872,19873],{},[55,19874,19876],{"href":17112,"rel":19875},[264],"Learn more about the configuration process and step-by-step implementation.",[48,19878,19879],{},[384,19880],{"alt":18,"src":19881},"\u002Fimgs\u002Fblogs\u002F67bd07b9765f441a21416e16_AD_4nXfK251JTVYuMFadXCetKl5mOFDWYUaQKynGoNe-lGb5PesrdlTdjtDGj5PGmcgivfi_uLFov_BKQHeo2bzF2SPd4Ol7MtQMdjrn4tPV_CSuob3IZVCedi9jX10yVHW8oTFviRJtyQ.png",[40,19883,19885],{"id":19884},"enabling-open-catalog-integration","Enabling Open Catalog Integration",[48,19887,19888],{},"When creating a cluster in StreamNative Cloud, users can enable a Catalog Integration, configure Snowflake Open Catalog, and deploy the cluster seamlessly.",[32,19890,19892],{"id":19891},"setup-streamnative-cluster","Setup StreamNative Cluster",[48,19894,19895],{},"To create a new instance, enter the instance name, configure the Cloud Connection, select the URSA Engine, and then specify the Cluster Location.",[48,19897,19898],{},[384,19899],{"alt":18,"src":19900},"\u002Fimgs\u002Fblogs\u002F679ff1bec86e2a1d5d0448e1_AD_4nXeEO8JuSAamLeddrBzBX9Axotz-QGJrhr1KulTEyTwGZKW3iKA9AEYK52OD_1iNl7kArZkl3ANmcCKGmNxO2DroAPgZCVMJJYS0QufsbI3hzkx2SQ31r0Bs1bgxCNruexkqo-sxJw.png",[48,19902,19903],{},"To create a cluster, provide the cluster name, select the Cloud Environment and Availability Zone, and proceed to configure the Lakehouse Storage settings.",[32,19905,19907],{"id":19906},"enable-configure-open-catalog-integration","Enable & Configure Open Catalog Integration",[48,19909,19910],{},"There are two options for selecting a storage location: you can either specify your own storage bucket or utilize a pre-created bucket provided by the BYOC environment. In this example, we will use the pre-created bucket.",[48,19912,19913],{},"To configure Snowflake Open Catalog, select Snowflake Open Catalog as the catalog provider and complete the remaining catalog configuration details. Click Deploy to finish catalog configuration.",[48,19915,19916],{},[384,19917],{"alt":18,"src":19918},"\u002Fimgs\u002Fblogs\u002F67bd07ba0dcd3c952183d030_AD_4nXdGK2swsdK4vejaALFkqcugChGfqqczpWy1foIvo73JaDO9PSdicCm97V_jGzMkFB1i2K0KP7_tuZ1zBR56b0chqhRYu_mnyUD6-PzJaV7a7iQAYbEFzHhmqf86PdsgL8TmB36MHQ.png",[48,19920,19921],{},"Click Deploy to complete cluster creation.",[48,19923,19924,19925,19930],{},"Once the cluster is created, populate the StreamNative cluster to stream data directly into storage by ",[55,19926,19929],{"href":19927,"rel":19928},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fquickstart-console#create-a-producer-consumer",[264],"creating and running a producer",", where it is stored as Iceberg tables and published to Snowflake Open Catalog for discovery and analysis.",[32,19932,19934],{"id":19933},"query-data-from-snowflake-open-catalog","Query Data from Snowflake Open Catalog",[48,19936,19937,19938,19942],{},"To query data from Snowflake Open Catalog in Snowflake AI Data Cloud, users must ",[55,19939,19941],{"href":18380,"rel":19940},[264],"complete the necessary steps"," to establish an integration. Once integrated, the data becomes accessible to query from Snowflake using SQL, Python, Notebooks, LLM functions, Cortex Analyst and more",[48,19944,19945],{},[384,19946],{"alt":18,"src":19947},"\u002Fimgs\u002Fblogs\u002F67bd07b95f351894ffe5af6f_AD_4nXfuWCfEPI2P_r4TKqL3qYgiyvHEjhMJ-ZAAAK6RFbm9TaUz9_VLFeRuzpLrkqjdGFjrPGPHIrJasYtWUFI9sfEZYm5T4w1D0YLGVEILuH_sV3DFlqUu4AjakeYMRBtlyfBth4HOtg.png",[40,19949,2125],{"id":2122},[48,19951,19952],{},"The Public Preview release of Catalog integration within StreamNative Cloud represents a transformative step in connecting real-time data streaming pipelines to Lakehouse Storage, particularly through Snowflake Open Catalog. Built on vendor-neutral open standards like Apache Kafka, Apache Iceberg and Apache Polaris, this integration ensures greater interoperability than alternatives while providing cross-engine data governance, seamless schema evolution, and efficient metadata management. Organizations can ingest data directly into Iceberg tables, enable effortless discovery in Snowflake Open Catalog, and streamline analytics and machine learning workflows, making it easier to extract actionable insights and predictions from streaming data. With unified access controls, organizations gain full control of data movement, transformations, and consumption across many engines in their stack. Explore how this open, standards-based integration can revolutionize your data-driven strategy. Here are a few resources for you to explore:",[321,19954,19955,19963,19970],{},[324,19956,19957,19958],{},"Learn more about StreamNative’s Integration With Snowflake Open Catalog ",[55,19959,19962],{"href":19960,"rel":19961},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=R0uPsIIVmO8&list=PL7-BmxsE3q4WpxiG20X6AuxUxp4TelDjz&index=2",[264],"Part1 : Preparing Snowflake Account",[324,19964,19965],{},[55,19966,19969],{"href":19967,"rel":19968},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=UQoyYSSSaDc&list=PL7-BmxsE3q4WpxiG20X6AuxUxp4TelDjz&index=3&t=4s",[264],"Part2 : Deploy StreamNative BYOC Ursa Cluster",[324,19971,19972,19977,19978,19982,19983,17918],{},[55,19973,19976],{"href":19974,"rel":19975},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=658ZV78lyew&list=PL7-BmxsE3q4WpxiG20X6AuxUxp4TelDjz&index=3",[264],"Part3 : Query Data in Snowflake","\nDocumentation for Snowflake Open Catalog Integration : ",[55,19979,19981],{"href":17112,"rel":19980},[264],"Follow these steps to integrate StreamNative Cloud with Snowflake Open Catalog."," Try it yourself: ",[55,19984,7137],{"href":3907,"rel":19985},[264],{"title":18,"searchDepth":19,"depth":19,"links":19987},[19988,19992,19996,19997,19998,20003],{"id":19741,"depth":19,"text":19742,"children":19989},[19990,19991],{"id":19748,"depth":279,"text":19749},{"id":19755,"depth":279,"text":19756},{"id":19762,"depth":19,"text":19763,"children":19993},[19994,19995],{"id":19766,"depth":279,"text":19767},{"id":19788,"depth":279,"text":19789},{"id":19809,"depth":19,"text":19810},{"id":19855,"depth":19,"text":19856},{"id":19884,"depth":19,"text":19885,"children":19999},[20000,20001,20002],{"id":19891,"depth":279,"text":19892},{"id":19906,"depth":279,"text":19907},{"id":19933,"depth":279,"text":19934},{"id":2122,"depth":19,"text":2125},"2025-02-26","Discover how StreamNative enables cost-effective, real-time data streaming into Apache Iceberg™ with Snowflake Open Catalog. Learn how to simplify data ingestion, unify governance, and accelerate AI-driven analytics using an open lakehouse architecture.","\u002Fimgs\u002Fblogs\u002F67be0a917db61d503f7dc5b4_image-42.png",{},{"title":19711,"description":20005},"blog\u002Fstreamnative-enables-seamless-streaming-into-apache-iceberg-tm-snowflake-open-catalog",[800,18653,1332],"K2DpKe8EJvR5HwaDUue0yVrmQ3Ay8ljzulMESTuNZFQ",{"id":20013,"title":20014,"authors":20015,"body":20016,"category":5376,"createdAt":290,"date":20139,"description":20140,"extension":8,"featured":294,"image":20141,"isDraft":294,"link":290,"meta":20142,"navigation":7,"order":296,"path":20143,"readingTime":20144,"relatedResources":290,"seo":20145,"stem":20146,"tags":20147,"__hash__":20148},"blogs\u002Fblog\u002Fdata-streaming-summit-virtual-2025.md","Announcing Data Streaming Summit Virtual 2025",[6127],{"type":15,"value":20017,"toc":20132},[20018,20021,20024,20027,20031,20034,20048,20051,20055,20058,20061,20064,20068,20077,20080,20100,20104,20112,20116,20124,20127,20130],[48,20019,20020],{},"Data streaming is rapidly evolving, transforming how businesses harness real-time information. The fusion of data streaming and AI is unlocking new possibilities, making this an exciting era for developers, architects, and technical leaders in the data ecosystem.",[48,20022,20023],{},"We’re excited to announce the first Data Streaming Summit of 2025—Data Streaming Summit Virtual 2025—taking place on May 29, 2025.",[48,20025,20026],{},"Additionally, we’re planning a second, in-person event later in the year in October —stay tuned for more details!",[40,20028,20030],{"id":20029},"data-streaming-summit-virtual-2025-may-29-2025","Data Streaming Summit Virtual 2025 - May 29, 2025",[48,20032,20033],{},"Join us online to explore the latest in data streaming technology, trends, and best practices. This virtual edition removes geographical barriers, bringing together a global community of data streaming enthusiasts. Whether you’re new to the field or a seasoned practitioner, you’ll find value in:",[321,20035,20036,20039,20042,20045],{},[324,20037,20038],{},"Engaging Talks & Technical Deep Dives",[324,20040,20041],{},"Keynotes & Expert Panels",[324,20043,20044],{},"Real-World Industry Use Cases",[324,20046,20047],{},"Interactive Workshops & Networking Opportunities",[48,20049,20050],{},"Technologies covered will include messaging and data streaming platforms such as Pulsar, Kafka, Ursa, and other Kafka-compatible solutions, as well as stream processing technologies like Flink, Spark, RisingWave, and more. Learn from the brightest minds in data streaming—right from the comfort of your home or office.",[40,20052,20054],{"id":20053},"theme-unlocking-real-time-ai","Theme: Unlocking Real-Time AI",[48,20056,20057],{},"The need to unlock real-time AI emerges as a defining theme in 2025, spotlighting the powerful convergence of real-time data and artificial intelligence. As organizations deepen their investments in AI, a clear truth emerges—AI is only as robust as the data feeding it. Yet, siloed and fragmented data often stands in the way of unlocking AI’s full potential.",[48,20059,20060],{},"The key to operationalizing AI effectively lies in real-time, governed data products—trusted, reusable assets that power AI and analytics regardless of their source. Achieving this level of data readiness requires seamless integration between data streaming platforms and modern lakehouses, ensuring data flows continuously between operational and analytical systems while maintaining accuracy, context, and usability.",[48,20062,20063],{},"This year’s summits will focus on how streaming technologies intersect with lakehouses to support real-time AI applications ranging from anomaly detection to AI-driven analytics and intelligent automation.",[40,20065,20067],{"id":20066},"call-for-speakers-is-now-open","Call for Speakers is Now Open!",[48,20069,20070,20071,20076],{},"We invite speakers to submit proposals for our first virtual event. Whether you’re a data streaming expert, an AI enthusiast, or an industry practitioner, we encourage you to share your knowledge. If you’re interested in presenting your story, ",[55,20072,20075],{"href":20073,"rel":20074},"https:\u002F\u002Fsessionize.com\u002Fdata-streaming-summit-virtual-2025\u002F",[264],"submit your talk proposal here","!",[48,20078,20079],{},"Topics of interest include:",[1666,20081,20082,20085,20088,20091,20094,20097],{},[324,20083,20084],{},"Data Streaming + AI – How are organizations building intelligent, real-time systems? Share experiences powering AI-driven applications, real-time machine learning pipelines, or intelligent automation. Explore best practices for ensuring trustworthy data and integrating streaming data into AI models.",[324,20086,20087],{},"Learning Data Streaming (Pulsar, Kafka, Ursa, and More) – If you’re passionate about educating others, submit an entry-level talk introducing Pulsar, Kafka, or Ursa. Cover key concepts, architectures, and best practices to help newcomers kick-start their data streaming journey.",[324,20089,20090],{},"Deep Dive into Data Streaming – For technical experts, we welcome advanced discussions on Pulsar, Kafka, Ursa Engine, and cutting-edge architectures (e.g., leaderless streaming, lakehouse-native storage, real-time data processing optimizations). How do you design scalable, high-performance streaming infrastructures, and what challenges have you solved at scale?",[324,20092,20093],{},"Real-Time Ingestion into the Data Lakehouse – Streaming is the backbone of modern lakehouse ingestion, enabling AI and analytics teams to work with up-to-date, trustworthy data. We invite talks on best practices, challenges, and architectures for integrating real-time streaming with lakehouses like Delta Lake and Iceberg.",[324,20095,20096],{},"Unifying Governance and Access with Data Catalogs – As real-time and AI-driven data ecosystems grow, so does the importance of governance and unified access. We welcome insights on bridging real-time streaming with data catalogs and metadata management to ensure data consistency, security, and AI readiness.",[324,20098,20099],{},"Stream Processing and Ecosystem – Have you built tools or integrations that enhance Flink, Spark, RisingWave, or other stream processing frameworks? Share your experiences optimizing event-driven architectures, real-time analytics, or complex streaming pipelines, and discuss how you balance latency, scalability, and cost-efficiency.",[40,20101,20103],{"id":20102},"registration-coming-soon","Registration Coming Soon",[48,20105,20106,20107,20111],{},"We will open registration for Data Streaming Summit Virtual 2025 in the coming weeks. Stay tuned for more details, including early-bird pricing, schedule highlights, and speaker announcements. In the meantime, be sure to",[55,20108,20110],{"href":5372,"rel":20109},[264]," subscribe to the Data Streaming newsletter"," to receive the latest news and updates directly in your inbox. We look forward to having you join us!",[40,20113,20115],{"id":20114},"sponsorship-opportunities","Sponsorship Opportunities",[48,20117,20118,20119,20123],{},"We invite vendors and organizations to sponsor these premier data streaming events, including the virtual summit in May and the in-person event later this year. This sponsorship offers a unique opportunity to showcase your brand, connect with industry leaders, and engage with a highly targeted audience of data streaming professionals. Whether you seek to increase brand visibility, generate leads, or demonstrate thought leadership, we have sponsorship packages to fit your needs. For more information on sponsorship packages, please contact us at ",[55,20120,20122],{"href":20121},"mailto:organizers@datastreaming-summit.org","organizers@datastreaming-summit.org",". Join us in shaping the future of data streaming and AI by becoming a sponsor today!",[48,20125,20126],{},"Submit your talk proposal and be part of a dynamic community shaping the future of data streaming and AI. We look forward to seeing you at both events—virtually in May and in person in October, 2025!",[48,20128,20129],{},"Let’s unlock real-time AI—together, across the globe.",[48,20131,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":20133},[20134,20135,20136,20137,20138],{"id":20029,"depth":19,"text":20030},{"id":20053,"depth":19,"text":20054},{"id":20066,"depth":19,"text":20067},{"id":20102,"depth":19,"text":20103},{"id":20114,"depth":19,"text":20115},"2025-02-20","Join us May 29 for Data Streaming Summit Virtual 2025. Explore Pulsar, Kafka, and StreamNative Ursa, and learn how to leverage data streaming with data lakehouses to unlock real-time AI. Submit your talk proposal!","\u002Fimgs\u002Fblogs\u002F67d95c010e153c859cf3fa12_iShot_2025-03-18_19.40.11.png",{},"\u002Fblog\u002Fdata-streaming-summit-virtual-2025","2 min read",{"title":20014,"description":20140},"blog\u002Fdata-streaming-summit-virtual-2025",[5376,303],"y2C5DUHAn26GcXKpwKeMrY-bkZ78USTPASbk2o_YyY8",{"id":20150,"title":20151,"authors":20152,"body":20153,"category":290,"createdAt":290,"date":20336,"description":20337,"extension":8,"featured":294,"image":20338,"isDraft":294,"link":290,"meta":20339,"navigation":7,"order":296,"path":20340,"readingTime":11180,"relatedResources":290,"seo":20341,"stem":20342,"tags":20343,"__hash__":20344},"blogs\u002Fblog\u002Fyou-dont-need-to-shift-everything-left-lakehouse-first-thinking-is-all-you-need.md","You Don’t Need to Shift Everything Left; Lakehouse-First Thinking is all you need",[28,806],{"type":15,"value":20154,"toc":20327},[20155,20166,20170,20173,20176,20179,20182,20185,20189,20196,20199,20203,20212,20215,20226,20229,20231,20236,20239,20243,20246,20249,20252,20255,20258,20262,20269,20272,20288,20291,20309,20312,20317,20321,20324],[48,20156,20157,20158,20161,20162,20165],{},"We were excited to announce ",[55,20159,20160],{"href":4811},"the release of catalog integration with Databricks Unity Catalog",", along with a",[55,20163,20164],{"href":10357}," cost benchmark report ","comparing Ursa with other leader-based data streaming engines in the past few weeks. These efforts underscore our commitment to our vision of the Stream-Augmented Lakehouse.",[40,20167,20169],{"id":20168},"what-is-shift-left","What is Shift-Left?",[48,20171,20172],{},"Following its acquisition of Flink provider Immerok, Confluent introduced the concept of “Shift-Left” for data integration. Inspired by the software testing approach with the same name, this approach called for both data processing and governance to be performed closer to the source of the data. According to them, “Data lakes have turned into data swamps.”, and the shift-left approach is a more efficient and cost-effective solution to this problem.",[48,20174,20175],{},"Shift Left in data integration is a concept derived from software engineering principles, specifically Shift Left Testing. In this context, testing is performed earlier in the development lifecycle to improve software quality, accelerate time-to-market, and identify issues earlier.",[48,20177,20178],{},"Similarly, Shift Left in data integration involves performing data processing and governance closer to the source of data generation. By cleaning and processing data earlier in the data lifecycle, organizations can ensure that downstream consumers, such as cloud data warehouses, data lakes, and lakehouses, receive a single source of well-defined and well-formatted data.",[48,20180,20181],{},"Among other things, the “shift-left” embraces the concept of medallion tiers for data based on data quality, as originally envisioned in the Lakehouse architecture. However, it shifts the bronze and silver datasets along with the necessary data cleansing, processing and governance to the data streaming platform rather than the Lakehouse, which hosts only the gold datasets.",[48,20183,20184],{},"While bringing schema enforcement, validation, and governance closer to data sources promises numerous benefits, such as improved data quality and reduced costs, it's important to weigh these benefits against the costs of adopting a streaming-first approach when it comes to your data before fully re-architecting your analytical platform to incorporate a streaming platform.",[40,20186,20188],{"id":20187},"lakehouse-first-streaming-architecture","Lakehouse-First Streaming Architecture",[48,20190,20191,20192,20195],{},"While we agree that Shift-Left is an important data strategy for moving toward real-time streaming architectures, we have a different viewpoint on it. In our opinion, the foundation for real-time Gen AI applications must combine data streaming and lakehouses. This synergy is what we refer to as ",[55,20193,20194],{"href":18898},"Streaming-Augmented Lakehouse"," (SAL).",[48,20197,20198],{},"Unlike Shift-Left, which aims to move entirely from batch and lakehouse architectures to real-time streaming, SAL acknowledges that lakehouses serve as the foundation where all data lands but emphasizes augmenting lakehouses with real-time data streams for greater flexibility, low-latency insights, and adaptability.",[32,20200,20202],{"id":20201},"the-traditional-lakehouse-architecture","The Traditional Lakehouse Architecture",[48,20204,20205,20206,20211],{},"In 2021, the Databrick’s founders, in conjunction with colleagues from Stanford and UC Berkeley ",[55,20207,20210],{"href":20208,"rel":20209},"https:\u002F\u002Fwww.cidrdb.org\u002Fcidr2021\u002Fpapers\u002Fcidr2021_paper17.pdf",[264],"introduced the Lakehouse architectural pattern"," to address some of the shortcomings of the Data Lake model. It allows organizations to use low-cost storage for all types of data, while still providing data management features including transactions, and low-latency query performance.",[48,20213,20214],{},"At its core, the Lakehouse architecture relies on a metadata layer like Delta Lake, which integrates transactional capabilities, versioning, and additional data structures into files in an open format such as Apache Parquet. This enables seamless querying of the system through a range of APIs and engines, as evidenced by the sheer number of vendors who offer Lakehouse products and services. In our view, the Lakehouse paradigm elegantly distills the modern data stack into three core pillars, each of which plays a crucial role in enabling efficient and accessible data management:",[321,20216,20217,20220,20223],{},[324,20218,20219],{},"Format: This pillar establishes a standardized specification and protocol for data access. Essentially, it acts as an API for your data, built upon open standards. This standardization ensures that data can be consistently interpreted and accessed by various tools and systems within the Lakehouse ecosystem.",[324,20221,20222],{},"Catalog: This pillar provides a unified governance mechanism for managing table metadata and data access. It serves as an API for metadata and data governance, ensuring that data is properly organized, described, and secured. The catalog allows users to discover and understand the data available within the Lakehouse, as well as control who can access and modify it.",[324,20224,20225],{},"Engines: This pillar encompasses the various engines that interact directly with data and metadata. These engines leverage the open lakehouse formats and data catalogs to query and access data efficiently. They may include query engines, machine learning libraries, and other data processing tools that operate directly on the data stored within the Lakehouse.",[48,20227,20228],{},"By adhering to these three pillars, the Lakehouse architecture provides a robust and scalable foundation for data-driven applications and analytics. The combination of standardized data formats, unified metadata management, and flexible data access engines allows organizations to effectively store, manage, and analyze large volumes of data from diverse sources.",[48,20230,3931],{},[48,20232,20233],{},[384,20234],{"alt":18,"src":20235},"\u002Fimgs\u002Fblogs\u002F67ab8a449a396af1f6ca38dd_AD_4nXcVOrDlZDNrWWbfcQa4o6_RH1nZOuZEfmfHZ_QIaNCOEo_C1GR6IrFdFixRScuY6rOvdQLUng-QwwUYyZkE2IXSq5hFijM4UZM4TEQNu_mzLMsoy6wLO8__yC3wa5N-PrjyT-AkAw.png",[48,20237,20238],{},"This Format-Catalog-Engine (FCE) framework’s open format ensures that all components within the Data Lakehouse are pluggable, eliminating vendor lock-in. We feel that standardizing on open data formats with metadata and governance via data catalogs is the proper approach for architecting modern data platforms, including streaming platforms.",[32,20240,20242],{"id":20241},"why-data-streaming-needs-a-lakehouse-first-architecture","Why Data Streaming Needs a Lakehouse-First Architecture",[48,20244,20245],{},"The data streaming ecosystem has historically been siloed, with applications and services interacting with data streaming engines via wire protocols like Apache Kafka and Apache Pulsar. This creates a separation between streaming and lakehouse environments. Bridging this gap has necessitated custom connectors and integrations, leading to challenges like costly data transfers, wasted resources, complex data transformations, and increased risk of errors.",[48,20247,20248],{},"However, with the rising adoption of open lakehouse formats for data storage and data catalogs for metadata governance, it is essential that data streaming aligns with this trend.",[48,20250,20251],{},"In an age of data explosion and AI models\u002Fapplications demanding ever-more data, the ingestion engine will become increasingly critical. This is why the industry needs a solution of collecting data that is both cost-effective and aligned with open data formats supported by the Lakehouse architecture.",[48,20253,20254],{},"Traditionally, data streaming architectures have been built around leader-based paradigms. However, as data scales, it is evident that streaming needs to align with the same architectural principles that power modern Lakehouses. This means we need to rethink data streaming architectures to fit into the Format-Catalog-Engine (FCE) framework.",[48,20256,20257],{},"From a practical perspective, streaming data platforms should adopt a lakehouse-first approach by storing and accessing data in open formats such as Iceberg or Delta Lake. Data streams should be managed or integrated with catalogs like Databricks Unity Catalog or Iceberg REST Catalog to ensure proper organization and governance. This approach eliminates reliance on proprietary storage layers, enabling greater flexibility and interoperability.",[40,20259,20261],{"id":20260},"ursa-a-lakehouse-first-data-streaming-engine","Ursa: a Lakehouse-First Data Streaming Engine",[48,20263,20264,20265,20268],{},"We built Ursa using the Format-Catalog-Engine (FCE) framework as its architectural principle. With Lakehouse-First Thinking, Ursa supports data-intensive streaming and ingestion workloads in a highly cost-efficient way while providing ",[55,20266,20267],{"href":4811},"native data catalog integration"," for unified metadata and governance.",[48,20270,20271],{},"We have shared additional architectural insights in our recent blog post—be sure to check it out for more details.",[321,20273,20274,20279,20283],{},[324,20275,20276],{},[55,20277,20278],{"href":14554},"Cut Kafka Costs by 95%: The Power of Leaderless Architecture and Lakehouse Storage",[324,20280,20281],{},[55,20282,18766],{"href":18765},[324,20284,20285],{},[55,20286,20287],{"href":10389},"Ursa: Reimagine Apache Kafka for the Cost-Conscious Data Streaming",[48,20289,20290],{},"The outcome of adopting Lakehouse-First Architecture in Ursa is clear:",[321,20292,20293,20296,20299,20306],{},[324,20294,20295],{},"Unification: Stream and batch data processing converge seamlessly.",[324,20297,20298],{},"Interoperability: Any engine can interact with streaming data using the same open formats and catalogs.",[324,20300,20301,20302,20305],{},"Cost-Efficiency: By leveraging object storage and lakehouse-native storage, we cut streaming infrastructure costs by 95% compared to traditional streaming engines. (Check out ",[55,20303,20304],{"href":10357},"our benchmark blog post","!)",[324,20307,20308],{},"Future-Proofing: As more data systems adopt the FCE principle, Ursa ensures long-term compatibility with open data ecosystems.",[48,20310,20311],{},"Using a Lakehouse-First data streaming engine like Ursa Engine, you can augment your existing lakehouse to create a Streaming-Augmented Lakehouse—bringing real-time capabilities to your data platform to create a uniform data foundation for your organization.",[48,20313,20314],{},[384,20315],{"alt":18,"src":20316},"\u002Fimgs\u002Fblogs\u002F67ab8a442d813fbe1ddc6341_AD_4nXf2cBaWZCg9WG0ZJb35GHIT9qKlryE29GnOUiR14xfnlnzUJrEd5v-brpYZGsGdNXPwje-ul0tyjLri2PtYTrGg8V89nNN4M_n-dWUJzCxanvC9BCuytl1f9tyGpn0jNt4VrVeqbw.png",[40,20318,20320],{"id":20319},"the-future-of-data-systems-is-lakehouse-first","The Future of Data Systems is Lakehouse-First",[48,20322,20323],{},"Looking ahead, I expect more and more data systems (“engines”) to be built using the FCE framework with Lakehouse-First Thinking. This shift is not just about data lakes or warehouses—it’s about rearchitecting the entire data stack to be truly open, interoperable, and cost-efficient in the age of AI.",[48,20325,20326],{},"Data streaming is evolving, and Ursa is at the forefront of this transformation. Join us as we redefine real-time data infrastructure with Lakehouse-First Thinking.",{"title":18,"searchDepth":19,"depth":19,"links":20328},[20329,20330,20334,20335],{"id":20168,"depth":19,"text":20169},{"id":20187,"depth":19,"text":20188,"children":20331},[20332,20333],{"id":20201,"depth":279,"text":20202},{"id":20241,"depth":279,"text":20242},{"id":20260,"depth":19,"text":20261},{"id":20319,"depth":19,"text":20320},"2025-02-11","This blog post explores the “Shift-Left” paradigm in data streaming and introduces Lakehouse-First Thinking—an approach that embraces the Streaming-Augmented Lakehouse. We delve into the evolution of data architectures, the growing adoption of Lakehouse Architectures, and the significance of the Format-Catalog-Engine (FCE) framework in modern data platforms. With Ursa Engine, we demonstrate how streaming can seamlessly integrate with lakehouse architectures, unlocking cost efficiency, interoperability, and real-time analytics at scale.","\u002Fimgs\u002Fblogs\u002F67ab955476e915d375e54c34_image-78.png",{},"\u002Fblog\u002Fyou-dont-need-to-shift-everything-left-lakehouse-first-thinking-is-all-you-need",{"title":20151,"description":20337},"blog\u002Fyou-dont-need-to-shift-everything-left-lakehouse-first-thinking-is-all-you-need",[800,799,1331,1332],"R4q0RuKQNYWyx3DgrdPPEJRX5_E5Z-jMo7garN1oAfQ",{"id":20346,"title":20347,"authors":20348,"body":20349,"category":3550,"createdAt":290,"date":21134,"description":21135,"extension":8,"featured":294,"image":21136,"isDraft":294,"link":290,"meta":21137,"navigation":7,"order":296,"path":21138,"readingTime":18649,"relatedResources":290,"seo":21139,"stem":21140,"tags":21141,"__hash__":21142},"blogs\u002Fblog\u002Fdefinitive-guide-for-streaming-data-into-snowflake-part-1---with-connectors.md","Definitive Guide for Streaming Data into Snowflake: Part 1 - with Connectors",[810],{"type":15,"value":20350,"toc":21103},[20351,20354,20357,20360,20363,20367,20370,20374,20381,20385,20392,20396,20403,20408,20412,20415,20418,20422,20430,20434,20445,20449,20460,20464,20472,20476,20487,20491,20505,20508,20512,20517,20520,20523,20527,20530,20534,20537,20540,20543,20551,20555,20558,20562,20578,20582,20590,20594,20605,20609,20620,20624,20627,20647,20651,20671,20676,20679,20682,20704,20708,20722,20726,20733,20738,20741,20744,20747,20750,20773,20777,20785,20789,20797,20804,20808,20811,20816,20819,20822,20825,20828,20831,20846,20849,20860,20873,20877,20879,20884,20892,20897,20905,20910,20918,20924,20932,20938,20946,20952,20957,20961,20964,20969,20974,20979,20984,20991,20995,21010,21018,21022,21027,21030,21036,21038,21041,21045,21048,21059,21062,21073,21077,21080,21091,21094,21097,21100],[48,20352,20353],{},"Modern data architectures often rely on distributed messaging and data streaming systems like Apache Kafka and Apache Pulsar to handle real-time data streams. These systems excel at ingesting, processing, and delivering high volumes of streaming data with low latency (milliseconds to sub-second). However, to derive value from this data, it often needs to be integrated into data warehouses like Snowflake or data lakehouses for storage, analysis, and further processing.",[48,20355,20356],{},"Snowflake is a cloud-based data platform designed for data warehousing, data lakes, and data engineering. It provides a fully managed, scalable, and secure environment for storing and analyzing structured and semi-structured data. Snowflake's architecture separates compute and storage, allowing users to scale resources independently and pay only for what they use. Its support for real-time data ingestion, combined with powerful analytics capabilities, makes it an ideal destination for streaming data from systems like Kafka and Pulsar.",[48,20358,20359],{},"Organizations increasingly demand real-time data ingestion into Snowflake for analytics, AI, and other data-intensive applications. However, efficiently streaming data into Snowflake remains a key challenge due to the numerous ingestion methods available. To address this, we’ve created a definitive guide, Streaming Data into Snowflake, to explore various approaches for ingesting and streaming data from data streaming engines into Snowflake.",[48,20361,20362],{},"This is the first post in a three-part blog series. In this first blog post, we will explore the connector-based approach, using different connectors to send data from Kafka, Pulsar, or their respective fully managed services to Snowflake, enabling seamless and efficient data integration. We’ll dive into the capabilities of these connectors, their setup processes, and how they ensure reliable and scalable data transfer. By the end, you’ll have a clear understanding of how to leverage these tools to unlock the full potential of your real-time data in Snowflake.",[40,20364,20366],{"id":20365},"connectors-framework","Connectors Framework",[48,20368,20369],{},"Before diving into the specifics of streaming data from Apache Pulsar or Apache Kafka to Snowflake, it’s important to understand the foundational frameworks that make these integrations possible: Kafka Connect and Pulsar IO. These frameworks are designed to simplify data movement between distributed streaming platforms and external data systems such as Snowflake, databases, and cloud storage.",[32,20371,20373],{"id":20372},"kafka-connect","Kafka Connect",[48,20375,20376,20380],{},[55,20377,20373],{"href":20378,"rel":20379},"https:\u002F\u002Fkafka.apache.org\u002Fdocumentation\u002F#connectapi",[264]," is a scalable, fault-tolerant framework designed to simplify data integration between Kafka-compatible systems and external data sources. It provides a standardized method for ingesting and exporting data, reducing the need for custom development. With a rich ecosystem of pre-built connectors, Kafka Connect enables seamless integration with databases, cloud storage, and data warehouses like Snowflake. Optimized for high-throughput, real-time data streams, it is ideal for organizations that require efficient and reliable data movement.",[32,20382,20384],{"id":20383},"pulsar-io","Pulsar IO",[48,20386,20387,20391],{},[55,20388,20384],{"href":20389,"rel":20390},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F4.0.x\u002Fio-overview\u002F",[264]," is Apache Pulsar’s equivalent of Kafka Connect, natively built for the Pulsar ecosystem to enable data engineers to build and manage data pipelines between Pulsar and external systems. Designed for scalability and flexibility, Pulsar IO simplifies data ingestion and export through a collection of pre-built connectors for databases, cloud services, and data warehouses like Snowflake. By leveraging Pulsar’s distributed architecture, it efficiently handles high-throughput, real-time data streams. Pulsar IO also supports extensibility, allowing users to develop custom connectors for specialized use cases. Additionally, its deep integration with Pulsar’s messaging capabilities ensures reliable and efficient data movement.",[32,20393,20395],{"id":20394},"streamnative-cloud-for-kafka-connect-and-pulsar-io","StreamNative Cloud for Kafka Connect and Pulsar IO",[48,20397,20398,20399,20402],{},"StreamNative Cloud provides cloud-native support for both Kafka Connect and Pulsar IO through ",[55,20400,20401],{"href":11363},"Universal Connect",", enabling seamless integration with external systems like Snowflake, databases, and cloud storage. This unified approach eliminates the complexity of setup and maintenance, allowing organizations to leverage both connector frameworks effortlessly.",[48,20404,20405],{},[384,20406],{"alt":18,"src":20407},"\u002Fimgs\u002Fblogs\u002F67aa30b6f96ec6fbef779939_AD_4nXfk7sNbFaJS_tnVUP1ZlC-3rGFOaiUfyLlHcjA_jeW7MRhN2T1Nk-NgmMWuNsEa1rfdPw0q4S96titQkllYkJoOyBy-a5uTsEXeOhyTjw1e8BERqNopwUxbm3JmdyT2dYMMjyFgmg.png",[40,20409,20411],{"id":20410},"snowflake-apis","Snowflake APIs",[48,20413,20414],{},"Now that we’ve explored the popular connector frameworks—Kafka Connect and Pulsar IO—let’s examine the APIs Snowflake provides for data ingestion.",[48,20416,20417],{},"Snowflake offers two powerful mechanisms for streaming data into its platform: Snowpipe and Snowpipe Streaming. Each serves a different role in ingesting data into Snowflake, catering to varying performance, cost, and complexity requirements.",[32,20419,20421],{"id":20420},"snowpipe-micro-batch-file-based-ingestion","Snowpipe: Micro-Batch File-Based Ingestion",[48,20423,20424,20429],{},[55,20425,20428],{"href":20426,"rel":20427},"https:\u002F\u002Fdocs.snowflake.com\u002Fen\u002Fuser-guide\u002Fdata-load-snowpipe-intro",[264],"Snowpipe"," is a continuous, micro-batch ingestion service that loads staged files into Snowflake automatically and asynchronously. It is best suited for near real-time ingestion where data arrives frequently but does not require immediate processing.",[3933,20431,20433],{"id":20432},"how-snowpipe-works","How Snowpipe Works:",[321,20435,20436,20439,20442],{},[324,20437,20438],{},"Data is staged in an external cloud storage location (Amazon S3, Google Cloud Storage, or Azure Blob Storage).",[324,20440,20441],{},"Snowpipe detects new files in the storage location using event notifications or polling.",[324,20443,20444],{},"Data is ingested into Snowflake via an automated COPY command, making it available for querying.",[3933,20446,20448],{"id":20447},"key-characteristics-of-snowpipe","Key Characteristics of Snowpipe:",[321,20450,20451,20454,20457],{},[324,20452,20453],{},"Latency: Typically minutes, as it depends on file arrival and ingestion scheduling.",[324,20455,20456],{},"Storage Cost: Requires an external storage layer (S3\u002FGCS\u002FAzure), which may add cost and complexity.",[324,20458,20459],{},"Best Use Case: Suitable for high-throughput workloads where ultra-low latency is not a strict requirement.",[32,20461,20463],{"id":20462},"snowpipe-streaming-record-level-streaming-ingestion","Snowpipe Streaming: Record-level Streaming Ingestion",[48,20465,20466,20471],{},[55,20467,20470],{"href":20468,"rel":20469},"https:\u002F\u002Fdocs.snowflake.com\u002Fen\u002Fuser-guide\u002Fdata-load-snowpipe-streaming-overview",[264],"Snowpipe Streaming"," is Snowflake’s native real-time ingestion API, allowing for continuous, record-by-record ingestion without requiring intermediate file staging. It is designed for low-latency, high-frequency data ingestion, making it ideal for real-time analytics.",[3933,20473,20475],{"id":20474},"how-snowpipe-streaming-works","How Snowpipe Streaming Works:",[321,20477,20478,20481,20484],{},[324,20479,20480],{},"Applications or streaming frameworks send individual records directly to Snowflake via the Snowpipe Streaming API.",[324,20482,20483],{},"Data is ingested in near real-time, bypassing file staging.",[324,20485,20486],{},"Records become queryable in sub-second latency, enabling immediate analytics.",[3933,20488,20490],{"id":20489},"key-characteristics-of-snowpipe-streaming","Key Characteristics of Snowpipe Streaming:",[321,20492,20493,20496,20499,20502],{},[324,20494,20495],{},"Latency: Sub-second, ideal for real-time applications.",[324,20497,20498],{},"Storage Cost: No need for external file staging, reducing overall cost.",[324,20500,20501],{},"Complexity: Simpler integration since data flows directly into Snowflake.",[324,20503,20504],{},"Best Use Case: Ideal for low-latency, event-driven use cases, such as real-time dashboards, anomaly detection, or fraud detection.",[48,20506,20507],{},"Here’s a detailed comparison between the two APIs to help you understand their differences and best use cases:",[32,20509,20511],{"id":20510},"snowpipe-vs-snowpipe-streaming","Snowpipe vs. Snowpipe Streaming",[48,20513,20514],{},[384,20515],{"alt":18,"src":20516},"\u002Fimgs\u002Fblogs\u002F67aa30f49bef44be22f4254e_AD_4nXfzRQ_QtrocU2h2jVbM5rGxanmTflH4Qm40GM-eqFnOnaVpKKJS8CuoXMa05QCxvBmtg9VkQBgXI1rYmlNQ8V6-OZe7OtmrojrMc-oqZcQ2rboAm7sGTykikxkDhJB-t4uaj5Hiig.png",[48,20518,20519],{},"Together, Snowpipe and Snowpipe Streaming provide flexible and efficient ingestion options for integrating streaming data from Apache Kafka, Apache Pulsar, or other data sources into Snowflake. This enables organizations to unlock the full potential of real-time data for analytics, AI\u002FML pipelines, and operational intelligence.",[48,20521,20522],{},"Now that we have explored Kafka Connect, Pulsar IO, and Snowflake’s data ingestion APIs, we are ready to dive into the available connectors that enable seamless, reliable, and efficient data ingestion into Snowflake.",[40,20524,20526],{"id":20525},"streaming-data-into-snowflake-using-kafka-connect","Streaming Data into Snowflake using Kafka Connect",[48,20528,20529],{},"For organizations that already use Apache Kafka or Kafka-compatible data streaming platforms, the Kafka Connect Snowflake Sink Connector is the most efficient way to stream data into Snowflake. Developed and maintained by Snowflake, this connector provides a seamless integration between Kafka topics and Snowflake tables, leveraging both Snowpipe and Snowpipe Streaming for ingestion.",[32,20531,20533],{"id":20532},"how-the-kafka-connect-snowflake-sink-connector-works","How the Kafka Connect Snowflake Sink Connector Works",[48,20535,20536],{},"The Kafka Connect Snowflake Sink Connector enables reliable, scalable, and automated data transfer from Kafka into Snowflake. It uses Kafka offsets to track message delivery and handle failure recovery, ensuring exactly-once or at-least-once semantics depending on the configuration.",[48,20538,20539],{},"Messages are ingested into Snowflake in JSON, Avro, or Protobuf formats. If using Protobuf, a Protobuf converter must be configured for proper schema handling.",[48,20541,20542],{},"For authentication and security, users can choose between:",[321,20544,20545,20548],{},[324,20546,20547],{},"Key Pair Authentication (recommended for production deployments)",[324,20549,20550],{},"External OAuth Authentication (required when using Snowpipe Streaming APIs)",[32,20552,20554],{"id":20553},"key-configuration-settings","Key Configuration Settings",[48,20556,20557],{},"Setting up the Kafka Connect Snowflake Sink Connector requires several essential parameters:",[3933,20559,20561],{"id":20560},"basic-connection-settings","Basic Connection Settings",[321,20563,20564,20572,20575],{},[324,20565,20566,20567,20571],{},"snowflake.url.name – The Snowflake account URL (e.g., ",[55,20568,20569],{"href":20569,"rel":20570},"https:\u002F\u002Fxyz.snowflakecomputing.com",[264],").",[324,20573,20574],{},"snowflake.user.name – The Snowflake username used for authentication.",[324,20576,20577],{},"snowflake.private.key \u002F snowflake.oauth.token – The authentication method, which could be key pair authentication or OAuth token-based authentication.",[3933,20579,20581],{"id":20580},"data-format-and-conversion","Data Format and Conversion",[321,20583,20584,20587],{},[324,20585,20586],{},"value.converter – Specifies the format of Kafka messages (JSON, Avro, or Protobuf).",[324,20588,20589],{},"value.converter.schema.registry.url – If using Avro or Protobuf, this setting points to the Schema Registry for schema validation.",[3933,20591,20593],{"id":20592},"buffering-and-performance-tuning","Buffering and Performance Tuning",[321,20595,20596,20599,20602],{},[324,20597,20598],{},"buffer.count.records – Controls the number of records buffered before ingestion (default: 10000).",[324,20600,20601],{},"buffer.flush.time – Defines the maximum time (in seconds) before buffered records are sent to Snowflake.",[324,20603,20604],{},"buffer.size.bytes – Specifies the maximum buffer size in bytes before flushing.",[3933,20606,20608],{"id":20607},"failure-handling-and-retention","Failure Handling and Retention",[321,20610,20611,20614,20617],{},[324,20612,20613],{},"behavior.on.error – Determines how errors are handled (FAIL, LOG, IGNORE).",[324,20615,20616],{},"snowflake.ingestion.method – Defines the ingestion mode (Snowpipe or Snowpipe Streaming).",[324,20618,20619],{},"snowflake.schema.evolution – Enables or disables automatic schema evolution.",[32,20621,20623],{"id":20622},"best-practices-for-deploying-the-kafka-connect-snowflake-sink-connector","Best Practices for Deploying the Kafka Connect Snowflake Sink Connector",[48,20625,20626],{},"To ensure a robust, efficient, and scalable deployment, consider the following best practices:",[1666,20628,20629,20632,20635,20638,20641,20644],{},[324,20630,20631],{},"Use Snowpipe Streaming for Low-Latency Needs If your workload demands real-time ingestion, configure the connector to use Snowpipe Streaming instead of traditional Snowpipe.",[324,20633,20634],{},"Optimize Buffer Settings for Performance Tuning buffer.count.records and buffer.flush.time helps balance latency and throughput.",[324,20636,20637],{},"Secure Authentication Use Key Pair Authentication in production for stronger security.",[324,20639,20640],{},"For OAuth authentication, ensure proper permissions are assigned in Snowflake.",[324,20642,20643],{},"Enable Schema Evolution (if using Avro\u002FProtobuf) Activate schema evolution to handle data structure changes automatically.",[324,20645,20646],{},"Monitor and Scale the Connector Set up monitoring for connector health and tune parallelism based on ingestion needs.",[32,20648,20650],{"id":20649},"deploying-the-connector-in-streamnative-cloud","Deploying the Connector in StreamNative Cloud",[48,20652,20653,20654,20659,20660,2869,20665,20670],{},"StreamNative Cloud supports the Kafka Connect Snowflake Sink Connector as a fully managed connector. Once the Kafka protocol support and the Kafka Connect feature are turned on, users can ",[55,20655,20658],{"href":20656,"rel":20657},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fkafka-connect-create",[264],"deploy"," it via the UI, API, CLI, ",[55,20661,20664],{"href":20662,"rel":20663},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fterraform-provider-pulsar",[264],"Terraform",[55,20666,20669],{"href":20667,"rel":20668},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator",[264],"Kubernetes CRDs",", making it easy to integrate into existing workflows. Below is a diagram showing the UI configuration of a Kafka Connect Snowflake Sink Connector.",[48,20672,20673],{},[384,20674],{"alt":18,"src":20675},"\u002Fimgs\u002Fblogs\u002F67aa30b66702d33dff40a2d5_AD_4nXc_DAISn_AtiSmxyqficabRHL15jG4Ek7gdeMMGHvA6OJvOgW6WYaPLOFphrsIobymxFD2XXjRqSwRoo_SmXveCOofP7k4c3etmflTfFMO3C6eRYjmc9mdsnJ2W13K74EpunI4O.png",[48,20677,20678],{},"Once configured, the Kafka Connect Snowflake Sink Connector can be deployed to StreamNative Cloud to start streaming data into Snowflake over the Kafka protocol.",[48,20680,20681],{},"For detailed instructions on deploying and managing Kafka Connect connectors in StreamNative Cloud, refer to:",[321,20683,20684,20691,20698],{},[324,20685,20686],{},[55,20687,20690],{"href":20688,"rel":20689},"https:\u002F\u002Fdocs.snowflake.com\u002Fen\u002Fuser-guide\u002Fkafka-connector",[264],"Snowflake’s Official Kafka Connector Documentation",[324,20692,20693],{},[55,20694,20697],{"href":20695,"rel":20696},"https:\u002F\u002Fdocs.streamnative.io\u002F",[264],"StreamNative Hub - Kafka Connect Snowflake Sink Connector Guide",[324,20699,20700],{},[55,20701,20703],{"href":20656,"rel":20702},[264],"How to Submit a Kafka Connect Connector to StreamNative Cloud",[40,20705,20707],{"id":20706},"streaming-data-into-snowflake-using-pulsar-io","Streaming Data into Snowflake using Pulsar I\u002FO",[48,20709,20710,20711,20716,20717,20721],{},"For Apache Pulsar users, you can use Pulsar IO ",[55,20712,20715],{"href":20713,"rel":20714},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-snowflake-streaming-v1.1",[264],"Snowflake Streaming Sink Connector"," or Pulsar IO ",[55,20718,11417],{"href":20719,"rel":20720},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-snowflake-sink-v4.0",[264]," to ingest data from Pulsar to Snowflake. StreamNative Cloud natively supports both connectors, making it easy to deploy and manage them.",[32,20723,20725],{"id":20724},"how-snowflake-streaming-sink-connector-works","How Snowflake Streaming Sink Connector works",[48,20727,20728,20729,20732],{},"The Snowflake streaming sink connector was ",[55,20730,6367],{"href":20731},"\u002Fblog\u002Fintroducing-snowpipe-streaming-support-in-streamnatives-snowflake-streaming-sink-connector"," recently. It pulls data from Pulsar topics and persists data to Snowflake based on the Snowpipe Streaming API with sub-second latency. The connector supports exactly-once semantics to ensure data is processed without duplication. If there are any errors during the messages processing, the connector will simply get restarted and reprocess the messages from the last committed messages.",[48,20734,20735],{},[384,20736],{"alt":18,"src":20737},"\u002Fimgs\u002Fblogs\u002F67aa30b66a929aa988c5fed9_AD_4nXeJI7XcfFmyy9XzwfuMPMvcYJ-Ovo9YJv8iaI_CzR2q9ghwBwr5EKkfOFWyW9RNj9u5OR_9E5JsXyGD8qWnDnpXgSYSaZz0RsnapS29Wb1lo9aZ7uBGaSfaZzD9GEXe9UjH3f_v.png",[48,20739,20740],{},"Messages are ingested into Snowflake in JSON, Avro, or Primitive formats. The connector also supports sinking data into Iceberg tables with the proper configuration. For authentication and security, this connector supports Key Pair Authentication (recommended for production deployments).",[32,20742,20554],{"id":20743},"key-configuration-settings-1",[48,20745,20746],{},"Setting up the Pulsar IO Snowflake Streaming Sink Connector requires several essential parameters:",[3933,20748,20561],{"id":20749},"basic-connection-settings-1",[321,20751,20752,20758,20761,20764,20767,20770],{},[324,20753,20754,20755,20571],{},"url – The Snowflake account URL (e.g., ",[55,20756,20569],{"href":20569,"rel":20757},[264],[324,20759,20760],{},"user – The Snowflake username used for authentication.",[324,20762,20763],{},"privateKey – The private key of the user. This is sensitive information used for authentication.",[324,20765,20766],{},"database –  The database in Snowflake where the connector will sink data.",[324,20768,20769],{},"schema – The schema in Snowflake where the connector will sink data.",[324,20771,20772],{},"role – Access control role to use when inserting rows into the table.",[3933,20774,20776],{"id":20775},"table-format-and-schema","Table Format and Schema",[321,20778,20779,20782],{},[324,20780,20781],{},"icebergEnabled – \tEnable the Iceberg table format. Defaults to false.",[324,20783,20784],{},"enableSchematization – Enable schema detection and evolution. Defaults to true.",[3933,20786,20788],{"id":20787},"performance-tuning","Performance Tuning",[321,20790,20791,20794],{},[324,20792,20793],{},"maxClientLag – Specifies how often Snowflake Ingest SDK flushes data to Snowflake, in seconds.",[324,20795,20796],{},"checkCommittedMessageIntervalMs – Specifies how often the connector checks for committed messages, in milliseconds.",[48,20798,20799,20800,190],{},"For more details on the connector's design, setup steps, and configuration options, visit the StreamNative Hub documentation: ",[55,20801,20803],{"href":20713,"rel":20802},[264],"Pulsar IO Snowflake Snowpipe Streaming Connector Guide",[32,20805,20807],{"id":20806},"how-snowflake-sink-connector-works","How Snowflake Sink Connector works",[48,20809,20810],{},"The snowflake sink connector receives messages from input topics and converts them into JSON format. These data are buffered in memory until the threshold is reached and then are written to temporal files in the internal stage. And Snowpipes will be created to ingest staged files on a partition basis. Once the ingestion is succeeded, temporal files will be deleted; otherwise it will move files into table stage and produce error messages. This connector supports effectively-once delivery semantics.",[48,20812,20813],{},[384,20814],{"alt":18,"src":20815},"\u002Fimgs\u002Fblogs\u002F67aa30b6c274cf546991f507_AD_4nXdYlWR1jI2PnlEKuvtAklbLVD_FX4Kv494TaSXvZj_CuTifw0VCatsBNKcek1lNqKnv-eQaUlPYKIG5iAc0nojNnbDll-RMyo2jYL-bgqd2AATVPOsWC0WalHVlocTdy2Y6Cbsg.png",[48,20817,20818],{},"Messages are ingested into Snowflake in JSON, Avro, or Primitive formats. This connector currently supports Key Pair Authentication. Users need to generate a key pair,  then set the private key to the snowflake sink connector configuration and assign the public key to a user account in the snowflake.",[48,20820,20821],{},"This connector was developed earlier due to the lack of Snowpipe Streaming API; Due to its batch loading method, the ingestion latency would be higher. It’s always recommended to use the new Snowpipe Streaming Sink Connector whenever possible.",[32,20823,20554],{"id":20824},"key-configuration-settings-2",[48,20826,20827],{},"Setting up the Pulsar IO Snowflake Sink Connector requires several essential parameters:",[3933,20829,20561],{"id":20830},"basic-connection-settings-2",[321,20832,20833,20836,20838,20840,20842,20844],{},[324,20834,20835],{},"host – The host URL of the snowflake service.",[324,20837,20760],{},[324,20839,20763],{},[324,20841,20766],{},[324,20843,20769],{},[324,20845,20772],{},[3933,20847,20593],{"id":20848},"buffering-and-performance-tuning-1",[321,20850,20851,20854,20857],{},[324,20852,20853],{},"bufferCountRecords –The number of records that are buffered in the memory before they are ingested to Snowflake. By default, it is set to 10_000.",[324,20855,20856],{},"bufferSizeBytes – The cumulative size (in units of bytes) of the records that are buffered in the memory before they are ingested in Snowflake as data files. By default, it is set to 5_000_000 (5 MB).",[324,20858,20859],{},"bufferFlushTimeInSeconds – The number of seconds between buffer flushes, where the flush is from the Pulsar’s memory cache to the internal stage. By default, it is set to 60 seconds.",[48,20861,20799,20862,20866,20867,20872],{},[55,20863,20865],{"href":20719,"rel":20864},[264],"Pulsar IO Snowflake Sink Connector",". StreamNative Academy also provides a ",[55,20868,20871],{"href":20869,"rel":20870},"https:\u002F\u002Fyoutu.be\u002FoYY7HZTfMmE?si=gUCAh8DQvw-htBK2",[264],"tutorial video"," on how to submit it with the pulsarctl cli tool.",[32,20874,20876],{"id":20875},"best-practices-for-deploying-the-pulsar-snowflake-sink-connectors","Best Practices for Deploying the Pulsar Snowflake Sink Connectors",[48,20878,20626],{},[1666,20880,20881],{},[324,20882,20883],{},"Choose the Right Connector Based on Latency Needs",[321,20885,20886,20889],{},[324,20887,20888],{},"Use the Snowpipe Streaming Sink Connector for real-time ingestion with sub-second latency.",[324,20890,20891],{},"Use the Snowflake Sink Connector if batch processing with higher latency is acceptable.",[1666,20893,20894],{"start":19},[324,20895,20896],{},"Optimize Buffering for Performance",[321,20898,20899,20902],{},[324,20900,20901],{},"Low-latency use cases: Reduce bufferCountRecords and bufferFlushTimeInSeconds to speed up data ingestion.",[324,20903,20904],{},"High-throughput use cases: Increase these settings to optimize efficiency and reduce processing overhead.",[1666,20906,20907],{"start":279},[324,20908,20909],{},"Secure Authentication and Access Control",[321,20911,20912,20915],{},[324,20913,20914],{},"Use Key Pair Authentication instead of username-password authentication, especially in production environments.",[324,20916,20917],{},"Restrict permissions to only the necessary databases, schemas, and roles.",[1666,20919,20921],{"start":20920},4,[324,20922,20923],{},"Monitor and Scale the Connector",[321,20925,20926,20929],{},[324,20927,20928],{},"Enable Pulsar and Snowflake monitoring to track connector health and ingestion performance.",[324,20930,20931],{},"Scale horizontally by increasing the parallelism factor of the connector for high-ingestion workloads.",[1666,20933,20935],{"start":20934},5,[324,20936,20937],{},"Handle Errors and Recovery Gracefully",[321,20939,20940,20943],{},[324,20941,20942],{},"Configure error-handling policies to retry failed messages instead of dropping them.",[324,20944,20945],{},"Use dead-letter topics (DLTs) for better troubleshooting and recovery.",[1666,20947,20949],{"start":20948},6,[324,20950,20951],{},"Deploy Using Infrastructure-as-Code (IaC)",[321,20953,20954],{},[324,20955,20956],{},"Use Terraform, Kubernetes CRDs, or StreamNative API to deploy and manage connectors consistently across environments.",[32,20958,20960],{"id":20959},"deploying-the-pulsar-io-connectors-in-streamnative-cloud","Deploying the Pulsar IO Connectors in StreamNative Cloud",[48,20962,20963],{},"StreamNative Cloud supports both Pulsar IO Snowflake connectors in a fully managed way. Users can deploy them via the UI, API, CLI, Terraform, or Kubernetes CRDs, making it easy to integrate into existing workflows. Below are diagrams showing the UI configuration of Snowflake Sink Connectors.",[321,20965,20966],{},[324,20967,20968],{},"Submit Snowpipe Streaming Sink Connector via StreamNative Cloud UI",[48,20970,20971],{},[384,20972],{"alt":18,"src":20973},"\u002Fimgs\u002Fblogs\u002F67aa30b76ad57fb76e5a421e_AD_4nXeVwC2euVGHvsbPvlI3UTkXiyto1QNf6yobUTtloyjlvqhh23K-U1u-WDZSoDRcOQS5vGpYxlYr3x15zl_7EDDY45RoJA8WZHLeNNUencZK4xcxayYMG9EXYNBNoNKRzd9CTWj3.png",[321,20975,20976],{},[324,20977,20978],{},"Submit SnowpipeSink Connector via StreamNative Cloud UI",[48,20980,20981],{},[384,20982],{"alt":18,"src":20983},"\u002Fimgs\u002Fblogs\u002F67aa30b640efd201d4ebf79f_AD_4nXfEyenKF9xO4rSvabAyDYEbEEpPHAW2Dqxssr7SiGBFHMb79ktYGqrBa9ONuCD4wet4ABXZTbSmwZ4bWjYA-uUjmZe8RMJs3nsiU-qasfmD3tTeXzmzuwxOKLKTL2hAlHq0THM6tg.png",[48,20985,20986,20987,20990],{},"Once the connector is configured, users can click the ",[4926,20988,20989],{},"Submit"," button and start sending data into Snowflake from Pulsar.",[40,20992,20994],{"id":20993},"load-data-into-snowflake-with-s3-storage-connector","Load Data into Snowflake with S3 Storage Connector",[48,20996,20997,20998,21003,21004,21009],{},"In addition to the above end-to-end connector approach to send data into Snowflake directly, we also observed some of our customers utilizing the ",[55,20999,21002],{"href":21000,"rel":21001},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-aws-s3-sink-v4.0",[264],"cloud storage s3 sink connector"," for loading data into Snowflake. You can find the ",[55,21005,21008],{"href":21006,"rel":21007},"https:\u002F\u002Fyoutu.be\u002F4DffaS0N_nw?si=IWWYvQm_f2EQdWqu&t=407",[264],"talk recording"," from the 2024 Data Streaming Summit.",[48,21011,21012,21013],{},"Users will first use the S3 Sink connector to sink data from Pulsar topics into S3 buckets, and then ",[55,21014,21017],{"href":21015,"rel":21016},"https:\u002F\u002Fdocs.snowflake.com\u002Fen\u002Fuser-guide\u002Fdata-load-s3",[264],"utilize the COPY INTO ",[55,21019,21021],{"href":21015,"rel":21020},[264]," command",[48,21023,21024],{},[384,21025],{"alt":18,"src":21026},"\u002Fimgs\u002Fblogs\u002F67aa30b63b82b3cdcafbbc24_AD_4nXdZckgjdpjpxe95QLoMJKXQoSgZ-Ef8BA_CUrg6SdDxHy3vvRBogyZE_wwe_rGcC0Noh0QFSBqi2wlLC44kx_Fpc8i_bD7ct5fulE_2PIdF0pUxfKrc_59_w9sb_7KwpiCvCKtKQw.png",[48,21028,21029],{},"It allows users to batch a big chunk of data into a single file and then load it in a later separate step. This approach is desirable for large organizations where different teams rely on the same set of data but have different processing needs. Each team can load data of its interest into its own tables for different processing. The latency can be at a minute level given the buffering used for generating large-size files.",[48,21031,21032,21033],{},"More detailed information regarding the S3 Sink connector can be found ",[55,21034,267],{"href":21000,"rel":21035},[264],[40,21037,2125],{"id":2122},[48,21039,21040],{},"The connector-based approach is a powerful way to stream real-time data into Snowflake, leveraging Kafka Connect and Pulsar IO for seamless integration. Whether using the Kafka or Pulsar protocol, and sending data via Snowpipe or the Snowpipe Streaming API, each approach offers different trade-offs in terms of latency, scalability, and operational complexity.",[32,21042,21044],{"id":21043},"when-to-use-the-connector-approach","When to Use the Connector Approach",[48,21046,21047],{},"The connector approach is particularly well-suited when:",[321,21049,21050,21053,21056],{},[324,21051,21052],{},"You are already using Kafka or Pulsar for data streaming.",[324,21054,21055],{},"You need a managed solution that abstracts ingestion complexity.",[324,21057,21058],{},"Your use case involves structured data ingestion into Snowflake.",[48,21060,21061],{},"StreamNative Cloud supports both Kafka Connect and Pulsar IO connectors via Universal Connect, simplifying real-time data movement between Pulsar, Kafka, and Snowflake. The choice of connector should be guided by business requirements, data velocity, and system constraints, balancing ease of use, cost, latency, throughput, and manageability.",[321,21063,21064,21067,21070],{},[324,21065,21066],{},"If your workload demands low-latency ingestion, use the Snowpipe Streaming Sink Connector for sub-second ingestion.",[324,21068,21069],{},"If batch processing and cost efficiency are the priority, the Snowflake Sink Connector is a viable option.",[324,21071,21072],{},"If you have a large number of topics and tables, the connector approach may introduce management overhead, requiring careful scaling, monitoring, and cost optimization.",[32,21074,21076],{"id":21075},"beyond-connectors-exploring-alternative-approaches","Beyond Connectors: Exploring Alternative Approaches",[48,21078,21079],{},"While connectors provide an easy-to-use and managed solution, they may introduce challenges at scale, particularly for organizations managing a high number of topics and tables. These challenges include:",[321,21081,21082,21085,21088],{},[324,21083,21084],{},"Scaling ingestion infrastructure as data volume grows.",[324,21086,21087],{},"Managing multiple connectors across different streaming pipelines.",[324,21089,21090],{},"Handling potential storage costs associated with Snowpipe-based ingestion.",[48,21092,21093],{},"For organizations looking for greater flexibility and deeper integration with Snowflake, an alternative approach involves using Iceberg tables and Open Catalog integration for direct data streaming.",[48,21095,21096],{},"In the next post, we’ll explore how Iceberg and Open Catalog can be leveraged to stream data into Snowflake without connectors, offering a more scalable and efficient ingestion strategy.",[48,21098,21099],{},"By understanding these different ingestion methods, businesses can make informed decisions to optimize their data streaming architecture based on their unique needs. Stay tuned for Part 2!",[21101,21102],"table",{},{"title":18,"searchDepth":19,"depth":19,"links":21104},[21105,21110,21115,21121,21129,21130],{"id":20365,"depth":19,"text":20366,"children":21106},[21107,21108,21109],{"id":20372,"depth":279,"text":20373},{"id":20383,"depth":279,"text":20384},{"id":20394,"depth":279,"text":20395},{"id":20410,"depth":19,"text":20411,"children":21111},[21112,21113,21114],{"id":20420,"depth":279,"text":20421},{"id":20462,"depth":279,"text":20463},{"id":20510,"depth":279,"text":20511},{"id":20525,"depth":19,"text":20526,"children":21116},[21117,21118,21119,21120],{"id":20532,"depth":279,"text":20533},{"id":20553,"depth":279,"text":20554},{"id":20622,"depth":279,"text":20623},{"id":20649,"depth":279,"text":20650},{"id":20706,"depth":19,"text":20707,"children":21122},[21123,21124,21125,21126,21127,21128],{"id":20724,"depth":279,"text":20725},{"id":20743,"depth":279,"text":20554},{"id":20806,"depth":279,"text":20807},{"id":20824,"depth":279,"text":20554},{"id":20875,"depth":279,"text":20876},{"id":20959,"depth":279,"text":20960},{"id":20993,"depth":19,"text":20994},{"id":2122,"depth":19,"text":2125,"children":21131},[21132,21133],{"id":21043,"depth":279,"text":21044},{"id":21075,"depth":279,"text":21076},"2025-02-10","Learn how to seamlessly stream real-time data into Snowflake using Apache Kafka, Apache Pulsar, and their connector frameworks. Explore Snowpipe, Snowpipe Streaming, and best practices for efficient data ingestion and analytics.","\u002Fimgs\u002Fblogs\u002F67aa30993ed5a7db18b46e02_image-39.png",{},"\u002Fblog\u002Fdefinitive-guide-for-streaming-data-into-snowflake-part-1-with-connectors",{"title":20347,"description":21135},"blog\u002Fdefinitive-guide-for-streaming-data-into-snowflake-part-1---with-connectors",[800,18653],"aghleYC_9Wg-r-wfeNmwyuIXd8Rq7jf6ly7G6GfGMfE",{"id":21144,"title":21145,"authors":21146,"body":21147,"category":3550,"createdAt":290,"date":21379,"description":21380,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":21381,"navigation":7,"order":296,"path":21382,"readingTime":17934,"relatedResources":290,"seo":21383,"stem":21384,"tags":21385,"__hash__":21386},"blogs\u002Fblog\u002Fannouncing-ursa-engine-ga-lakehouse-first-kafka-streaming-with-unity-iceberg-catalog-integration.md","Announcing Ursa Engine GA: Lakehouse-Native Kafka Streaming with Unity & Iceberg REST Catalog Integration",[806],{"type":15,"value":21148,"toc":21371},[21149,21156,21165,21167,21170,21180,21183,21186,21191,21215,21217,21221,21227,21230,21232,21235,21240,21244,21247,21250,21261,21264,21267,21278,21281,21284,21298,21310,21313,21319,21327,21329,21332,21337,21343,21347,21354,21366,21369],[48,21150,18892,21151,18895,21153,21155],{},[55,21152,4725],{"href":6647},[55,21154,18899],{"href":18898},", Ursa Engine is the first and only Kafka-compatible data streaming engine purpose-built for cloud-native environments and lakehouses. It streamlines data ingestion into your lakehouse and augments it with real-time streaming capabilities.",[48,21157,21158,21159,21164],{},"In tandem with our GA release, we’re proud to share that ",[55,21160,21163],{"href":21161,"rel":21162},"http:\u002F\u002Fstreamnative.io\u002Fblog\u002Fseamless-streaming-to-lakehouse-unveiling-streamnative-clouds-integration-with-databricks-unity-catalog",[264],"Ursa Engine now integrates seamlessly with Unity Catalog and Iceberg REST Catalog",", enabling instant streaming data discovery and truly uniform governance—from data streaming to data analytics.",[40,21166,18919],{"id":18918},[48,21168,21169],{},"Traditional data ecosystems often require separate infrastructures for real-time data streaming (e.g., Kafka or Pulsar) and batch processing via data lakehouses (e.g., Delta Lake, Iceberg). This split environment not only complicates governance, schema management, and data discovery—it also introduces expensive infrastructure costs resulting from repeated data transfers and storage, complex ETL processes, and error-prone, duplicated schema mapping. Specifically, organizations face:",[321,21171,21172,21174,21176,21178],{},[324,21173,18927],{},[324,21175,18930],{},[324,21177,18933],{},[324,21179,18936],{},[48,21181,21182],{},"Ursa Engine solves these challenges by augmenting the lakehouse with Kafka-compatible data streaming capabilities, leveraging open storage formats like Delta Lake and Iceberg, and unifying governance through catalog integrations.",[40,21184,21185],{"id":18942},"‍General Availability on StreamNative BYOC for AWS",[48,21187,18946,21188,21190],{},[55,21189,18950],{"href":18949}," for AWS, giving organizations the freedom to deploy Ursa in their preferred cloud environment—while also offering a fully integrated approach to streaming data into lakehouses. Key benefits include:",[1666,21192,21193,21198,21200,21204,21206,21209],{},[324,21194,21195,21196,18960],{},"10x Infrastructure Cost ReductionAchieve dramatic savings with a leaderless architecture that eliminates inter-AZ traffic and open-format lakehouse storage that significantly lowers costs. Read ",[55,21197,18959],{"href":10357},[324,21199,18963],{},[324,21201,18966,21202,190],{},[55,21203,18970],{"href":18969},[324,21205,18974],{},[324,21207,21208],{},"Unified GovernanceEnsure consistent data policies, security, and seamless discovery through Data Catalog —unifying data access across both real-time and batch domains.",[324,21210,18980,21211,21214],{},[55,21212,11224],{"href":18983,"rel":21213},[264]," to pay only for throughput, significantly reducing total cost of ownership compared to traditional streaming platforms.",[48,21216,18993],{},[40,21218,21220],{"id":21219},"reduce-infrastructure-costs-by-10x-with-leaderless-architecture-and-open-lakehouse-storage","‍Reduce Infrastructure Costs by 10x with Leaderless Architecture and Open Lakehouse Storage",[48,21222,21223,21224,21226],{},"‍A key differentiator of Ursa Engine is its leaderless architecture, which leverages the lakehouse as shared storage and Oxia as a scalable index\u002Fmetadata manager. This approach eliminates expensive inter-AZ traffic and significantly reduces inter-AZ data replication overhead. In a ",[55,21225,19004],{"href":19003},", Ursa consistently handled 5GB\u002Fs of Kafka workload for just $54 per hour—94% cheaper than vanilla Kafka and RedPanda.",[48,21228,21229],{},"In addition, Ursa Engine is the first and ONLY data streaming solution that natively implements its storage engine using open lakehouse formats, supporting both Iceberg and DeltaLake. By embedding data schemas directly into the storage layer, Ursa takes advantage of columnar compression, enabling potential more than 10x storage reduction.",[48,21231,19011],{},[48,21233,21234],{},"By embracing open lakehouse formats and avoiding leader-based interzone data replication, Ursa delivers up to a 10x reduction in infrastructure costs compared to traditional streaming solutions.",[48,21236,19017,21237,19022],{},[55,21238,19021],{"href":17900,"rel":21239},[264],[40,21241,21243],{"id":21242},"unified-governance-with-unity-catalog-iceberg-rest-catalog","‍Unified Governance with Unity Catalog & Iceberg REST Catalog",[48,21245,21246],{},"‍Ursa Engine seamlessly integrates with Iceberg and Delta Lake, supporting two table modes for real-time and batch analytics:",[48,21248,21249],{},"‍1. Stream Backed by Table (Ursa Managed Table)Ursa persistently stores streaming data in an append-only lakehouse table, ensuring a single data copy while preserving offsets and ordering.",[321,21251,21252,21255,21258],{},[324,21253,21254],{},"Enables full stream replay and real-time queries.",[324,21256,21257],{},"Ursa manages data lifecycle and retention.",[324,21259,21260],{},"Tables auto-register in Unity or Iceberg Catalog for governance.",[48,21262,21263],{},"✅ Best for: Bronze tables—historical data retention, auditing, and replayability.",[48,21265,21266],{},"‍2. Stream Delivered to Table (Ursa External Table)Ursa publishes data to external Iceberg tables via append or upsert, without managing their lifecycle.",[321,21268,21269,21272,21275],{},[324,21270,21271],{},"Two data copies: row-based for streaming, columnar for analytics.",[324,21273,21274],{},"Ideal for compacted storage with flexible partitioning.",[324,21276,21277],{},"Lifecycle managed by external data catalog services.",[48,21279,21280],{},"✅ Best for: Silver & Gold tables—curated, transformed, and optimized for analytics.",[48,21282,21283],{},"Ursa’s Unity\u002FIceberg Catalog integration ensures:",[321,21285,21286,21289,21292,21295],{},[324,21287,21288],{},"Centralized Policies: Unified access control & lineage tracking.",[324,21290,21291],{},"Schema Discovery: Single metadata layer across streaming & batch.",[324,21293,21294],{},"Data Discoverability: Query real-time & batch data without duplication.",[324,21296,21297],{},"Efficiency: Simplified architecture, reduced complexity, and better scalability.",[48,21299,21300,21301,21305,21306,190],{},"You can dive deeper into our ",[55,21302,21304],{"href":17900,"rel":21303},[264],"Lakehouse-native storage blog post"," to learn how we leverage Iceberg or Delta Lake as storage formats. Additionally, check out our announcement blog post on ",[55,21307,21309],{"href":21161,"rel":21308},[264],"how Ursa Engine integrates with Unity Catalog for ingesting data into Databricks",[40,21311,21312],{"id":19065},"‍ETU Pricing Model: Pay for Throughput, Not Storage",[48,21314,21315,21316,19073],{},"‍Lastly, while traditional streaming platforms often bundle storage and throughput costs, Ursa Engine introduces ",[55,21317,11224],{"href":18983,"rel":21318},[264],[321,21320,21321,21323],{},[324,21322,19078],{},[324,21324,19081,21325,190],{},[55,21326,6677],{"href":19084},[40,21328,19088],{"id":19087},[48,21330,21331],{},"‍Ready to take your data architecture into the era of real-time AI? Here’s how you can get started:",[48,21333,19094,21334,19099],{},[55,21335,19098],{"href":3907,"rel":21336},[264],[48,21338,19102,21339,21342],{},[55,21340,19107],{"href":19105,"rel":21341},[264],"]\nLearn how to configure Ursa Engine with Unity or Iceberg Catalog to maintain a single governance model from ingestion to analytics.",[48,21344,19111,21345,19115],{},[55,21346,19114],{"href":6392},[48,21348,19118,21349,21351,17865],{},[2628,21350,19121],{},[55,21352,17864],{"href":17862,"rel":21353},[264],[48,21355,21356,21357,21360,21365],{},"📅 ",[2628,21358,21359],{},"Sign up for our upcoming webinar",[55,21361,21364],{"href":21362,"rel":21363},"https:\u002F\u002Fhs.streamnative.io\u002Fdatabricks-unitycatalog",[264],"Join StreamNative & Databricks"," as we dive deeper into Ursa Engine and Unity Catalog integration.",[48,21367,21368],{},"Thank you for joining us on this journey to redefine real-time data streaming standards. With the General Availability of Ursa Engine on BYOC for AWS, complete with Unity Catalog and Iceberg REST Catalog integration, you can unify governance, cut costs, and streamline your data ingestion—all in one place.",[48,21370,19130],{},{"title":18,"searchDepth":19,"depth":19,"links":21372},[21373,21374,21375,21376,21377,21378],{"id":18918,"depth":19,"text":18919},{"id":18942,"depth":19,"text":21185},{"id":21219,"depth":19,"text":21220},{"id":21242,"depth":19,"text":21243},{"id":19065,"depth":19,"text":21312},{"id":19087,"depth":19,"text":19088},"2025-02-03","Ursa Engine—the first Kafka-compatible data streaming engine purpose-built to augment your lakehouse—is now Generally Available (GA) on StreamNative BYOC, featuring deep integration with Unity Catalog and Iceberg Catalog for seamless governance across real-time and batch data.",{},"\u002Fblog\u002Fannouncing-ursa-engine-ga-lakehouse-first-kafka-streaming-with-unity-iceberg-catalog-integration",{"title":21145,"description":21380},"blog\u002Fannouncing-ursa-engine-ga-lakehouse-first-kafka-streaming-with-unity-iceberg-catalog-integration",[1332,799,1330,10322,800,303],"UYjF_uRrTmqghIa4MW_hVM8S9q_pGgNGpnfcH2ULkAA",{"id":21388,"title":20278,"authors":21389,"body":21390,"category":3550,"createdAt":290,"date":21379,"description":21785,"extension":8,"featured":294,"image":21786,"isDraft":294,"link":290,"meta":21787,"navigation":7,"order":296,"path":14554,"readingTime":21788,"relatedResources":290,"seo":21789,"stem":21790,"tags":21791,"__hash__":21792},"blogs\u002Fblog\u002Fleaderless-architecture-and-lakehouse-native-storage-for-reducing-kafka-cost.md",[806],{"type":15,"value":21391,"toc":21765},[21392,21405,21408,21416,21419,21423,21435,21438,21449,21453,21461,21466,21469,21473,21487,21495,21499,21502,21507,21509,21519,21522,21542,21547,21550,21553,21556,21559,21562,21565,21569,21573,21576,21579,21583,21586,21591,21594,21597,21602,21605,21608,21611,21614,21617,21621,21632,21635,21641,21645,21648,21651,21659,21664,21668,21671,21674,21677,21680,21684,21687,21698,21701,21704,21707,21710,21715,21719,21722,21727,21730,21733,21736,21739,21742,21745,21755,21758],[48,21393,21394,21395,21400,21401,21404],{},"Last week, I had the opportunity to present \"",[55,21396,21399],{"href":21397,"rel":21398},"https:\u002F\u002Fwww.slideshare.net\u002Fslideshow\u002Fapache-iceberg-meetup-01-30-25-ursa-augmenting-iceberg-with-kafka-compatible-data-streaming-capabilities\u002F275271267",[264],"Ursa: Augment Iceberg with Kafka Data Streaming Capabilities","\" at the Apache Iceberg Bay Area Meetup. After that event, our team also released a blog post benchmarking the cost comparison between Ursa and other data streaming engines. We demonstrated that ",[55,21402,21403],{"href":19084},"Ursa can run a 5 GB\u002Fs Kafka workload at just 5% of the infra cost of traditional leader-based data streaming solutions",". This has sparked increasing interest in how we drastically cut infrastructure costs.",[48,21406,21407],{},"While we have more technical papers coming soon, I’d like to use this blog post to share insights about two key innovations in Ursa Engine that enable these cost reductions:",[1666,21409,21410,21413],{},[324,21411,21412],{},"Leaderless architecture: Eliminating inter-zone client traffic costs via a leaderless design.",[324,21414,21415],{},"Lakehouse-native storage: No inter-zone data replication via direct writes to cloud object storage and leveraging open table formats.",[48,21417,21418],{},"This blog post will break down how these architectural choices lead to massive cost reductions while maintaining the performance and durability required for data streaming workloads.",[40,21420,21422],{"id":21421},"challenges-of-leader-based-architectures","‍Challenges of Leader-Based Architectures",[48,21424,21425,21426,21428,21429,21434],{},"‍As discussed in our previous blog post, “",[55,21427,18766],{"href":18765},"”, and the keynote presentation “",[55,21430,21433],{"href":21431,"rel":21432},"https:\u002F\u002Fyoutu.be\u002Fiknjqr0gWEY?list=PLqRma1oIkcWgN9agdJ0DQhX2gPf8K2ynk&t=3100",[264],"Data Streaming: Past, Present, and Future","” in Data Streaming Summit 2024, the evolution of storage engines for data streaming has moved from leader-based to leaderless architectures. But what does this really mean?",[48,21436,21437],{},"Most traditional data streaming engines, including Apache Kafka, Apache Pulsar, and Redpanda, deploy a leader-based data replication model. In this model:",[321,21439,21440,21443,21446],{},[324,21441,21442],{},"Each topic partition has a designated leader broker responsible for handling incoming data and serving consumers.",[324,21444,21445],{},"Data is replicated from the leader to multiple followers across availability zones (AZs) to ensure durability and fault tolerance.",[324,21447,21448],{},"Different replication algorithms are used: Kafka employs ISR replication, Redpanda relies on Raft, and Pulsar utilizes a Paxos-variant via BookKeeper.",[32,21450,21452],{"id":21451},"the-hidden-costs-of-leader-based-replication","The Hidden Costs of Leader-Based Replication",[48,21454,21455,21456,21460],{},"‍Leader-based replication is essential for achieving ultra-low latency (typically single-digit to sub-100 milliseconds). Our previous ",[55,21457,21459],{"href":21458},"\u002Fwhitepapers\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark","Pulsar benchmark report"," demonstrated that Pulsar consistently achieves latencies below 5 milliseconds. This architecture is useful for workloads demanding extreme low-latency guarantees.",[48,21462,21463],{},[384,21464],{"alt":18,"src":21465},"\u002Fimgs\u002Fblogs\u002F67a05f06dade6090369a08f9_AD_4nXffNm4QNhIq23hFUdsjYK5qZxGI7gRE5Igdh_Ho8VEgUvrsNeZP0SVm1yUPKnD4hn1X8VBeQel9_BRm7LEQoaVsOoRg_3HBIt-cwGb7KJEyR8sn2cPR5ap5e3Q7Cx7gbUfgjPoi.png",[48,21467,21468],{},"However, the majority of data streaming workloads don’t require ultra-low latency, nor should they have to pay a premium for an over-killed architecture. For example, ingesting data into data lakehouses does not necessitate single-digit millisecond latency. Instead, the focus is on efficiently handling large volumes of data, which calls for a highly cost-effective data streaming engine to feed data into Iceberg or Lakehouse. This need has become even more urgent with the rise of DeepSeek disrupting AI infrastructure, pushing companies to confront soaring costs—not only for training and deploying AI models but also for data acquisition and ingestion.",[32,21470,21472],{"id":21471},"cost-challenges-in-leader-based-data-streaming","‍Cost Challenges in Leader-based Data Streaming:",[1666,21474,21475,21478,21481,21484],{},[324,21476,21477],{},"Inter-zone client traffic: Since each partition has a leader broker, producers often have to cross AZ boundaries to send data, increasing network costs.",[324,21479,21480],{},"Inter-zone data replication costs: Every write operation triggers costly cross-AZ replication.",[324,21482,21483],{},"Broker hotspots: Leaders often become bottlenecks, leading to uneven workload distribution.",[324,21485,21486],{},"Failover complexity: When a leader fails, a new leader election occurs, triggering additional replication, cross-AZ traffic, and operational delays.",[48,21488,21489,21490,21494],{},"While solutions like Apache Pulsar decouple storage from compute to ",[55,21491,21493],{"href":21492},"\u002Fblog\u002Fno-data-rebalance-needed-kafka-and-pulsar","minimize rebalancing overhead",", the fundamental cost problem remains unsolved in leader-based systems.",[40,21496,21498],{"id":21497},"leaderless-architecture-breaking-free-from-leader-based-constraints","‍Leaderless Architecture: Breaking Free from Leader-based Constraints",[48,21500,21501],{},"‍Ursa Engine eliminates leader-based replication entirely, adopting a leaderless architecture—an approach now also being explored by innovations like Confluent’s Freight Clusters and WarpStream.",[48,21503,21504],{},[384,21505],{"alt":18,"src":21506},"\u002Fimgs\u002Fblogs\u002F67a05f066944dc5b3612271f_AD_4nXcVmbCodi3jptKoJPg_VSSEcT31rEovqFhNnAFLct97n4xhl1JwYDq59p0W7ckS7VjZcURjz4D8AryGUbzLUiWsp-lKrqn1JfGtDQYwItE7ZlBzjocnJ-DkKQHjC7_Hw3LxgpTYPQ.png",[32,21508,2697],{"id":2696},[48,21510,21511,21512,21515,21516,190],{},"‍In our ",[55,21513,21514],{"href":18765},"\"Evolution of Log Storage\""," blog post, we introduced the concept of index\u002Fdata split (originally pioneered in Pulsar). This approach stores log segment indexes in a centralized metadata store, keeping log segments remote and decoupled from brokers. This avoids ",[55,21517,21518],{"href":21492},"data rebalancing overhead",[48,21520,21521],{},"Ursa’s leaderless architecture takes this concept further:",[321,21523,21524,21530,21536,21539],{},[324,21525,21526,21527,190],{},"Offset tracking and sequencing coordination are moved to a centralized metadata\u002Findex service - ",[55,21528,5599],{"href":21529},"\u002Fblog\u002Fintroducing-oxia-scalable-metadata-and-coordination",[324,21531,21532,21533,21535],{},"Brokers no longer handle sequencing, indexing, or offset tracking—instead, this is managed by ",[55,21534,5599],{"href":21529},", a scalable metadata\u002Findex service developed by StreamNative.",[324,21537,21538],{},"This fully decouples metadata operations from data operations, enabling: Shared storage for durability (e.g., AWS S3).",[324,21540,21541],{},"Shared metadata\u002Findex service for sequencing, indexing, and offset tracking.",[48,21543,21544],{},[384,21545],{"alt":18,"src":21546},"\u002Fimgs\u002Fblogs\u002F67a05f06eff52272ff8d72ca_AD_4nXf2eOx6DK-7DhwQlnb41SiZdybZ-wKPc-6V-PoAKrz-4Tsz6aSnLsrWDRM-l7Erqot_pUVJT2g1BO-ty0k9fep7Z6aASNyMN8sSC4i2MwNMYpNfQCvL4dZ_SsTZezFJtXV2tHVzfA.png",[48,21548,21549],{},"The diagram above illustrates how a leaderless system operates.",[48,21551,21552],{},"‍Equal brokers: Every broker is equal—any node can accept writes and independently store them in a shared storage service, such as AWS S3. This eliminates the need for clients to traverse availability zones to locate a broker leader, thereby removing inter-zone client traffic.",[48,21554,21555],{},"‍Direct writes to object storage: Data is persisted directly in object storage for durability, eliminating the need for brokers to replicate data across availability zones. In Ursa’s case, data is written directly to cloud object storage (more on this in the next section).",[48,21557,21558],{},"‍Reduced network costs: By removing both inter-zone client traffic and inter-zone data replication, this approach significantly reduces cloud network costs—one of the biggest hidden expenses in traditional leader-based deployments like Kafka and Redpanda.",[48,21560,21561],{},"After brokers independently write data to object storage, they commit metadata updates to Oxia. While this generates some cross-AZ metadata traffic, it is significantly smaller than the actual data movement, making it negligible.",[48,21563,21564],{},"This leaderless approach enables linear scalability while dramatically reducing network costs, making high-throughput workloads far more efficient than traditional leader-based systems.",[40,21566,21568],{"id":21567},"lakehouse-native-storage-eliminating-inter-zone-replication","‍Lakehouse-Native Storage: Eliminating Inter-Zone Replication",[32,21570,21572],{"id":21571},"the-hidden-cost-of-leader-based-data-replication","The Hidden Cost of Leader-based Data Replication",[48,21574,21575],{},"‍Traditional leader-based streaming engines store data on local disks and replicate it across AZs for durability. This means that each byte is replicated at least three times, traversing inter-AZ boundaries twice, driving up storage and network costs.",[48,21577,21578],{},"Additionally, these engines store data in row-based formats (e.g., AVRO, JSON, Protobuf), which are inefficient for analytical workloads, requiring costly ETL processes before use in data lakehouses.",[32,21580,21582],{"id":21581},"ursas-approach-direct-writes-to-lakehouse","‍Ursa’s Approach: Direct Writes to Lakehouse",[48,21584,21585],{},"‍Ursa Engine eliminates these inefficiencies by adopting a lakehouse-native storage model:",[48,21587,21588],{},[384,21589],{"alt":18,"src":21590},"\u002Fimgs\u002Fblogs\u002F67a05f06cd1ec3b61be59e53_AD_4nXereH47sTHW4RdZ_sesuuW7grj2hOOz7lsVmFIraRecl9koIBOYLsOTn_8pDTD-BYX2igYSvZP-xXrH6tJjKi7B4T5Eb_i_Q3OFF4ahheBQ4p-XH_aNKXq3sOD0jpWqGiutLzyd.png",[48,21592,21593],{},"The entire storage engine is designed around the concept of augmenting the lakehouse tables with a write-ahead log (WAL) for data streaming, which achieves the “stream-table” duality by offering both streaming and table accesses over the same data sets.",[48,21595,21596],{},"The Write-Ahead Log (WAL) component stores the original records produced via Kafka or Pulsar protocols. Data from different topics is multiplexed into a broker-level write buffer and flushed out as a WAL file object. The records are stored in row format, and writes go directly to cloud storage (S3, GCS, or Azure Blob Store), eliminating the need for brokers to replicate data across availability zones.",[48,21598,21599],{},[384,21600],{"alt":18,"src":21601},"\u002Fimgs\u002Fblogs\u002F67a05f06f82b2bd064a847d4_AD_4nXea4pg63EgcqTT_91Eoqea4RdOWgneOeTWhz6Wbj8vlSj3Ujk34HYFWsfnZjXChm1qAtGoKnWM1DYq3epJMnvbX11Ql5pUifiwHO7CjNIFYwqlez1vz3C4Rl8bkLO1hKXGWiKMWhg.png",[48,21603,21604],{},"With this multiplexed write-ahead log approach, we minimize write latency and reduce the number of write requests to object storage. It also maximizes the size of compacted data, enabling more efficient reads from object storage when compacting data into columnar formats.",[48,21606,21607],{},"Once data is persisted in the write-ahead log, the produce request is completed and returned to the client. The data remains buffered in the broker for tailing reads, ensuring no delay in dispatching messages.",[48,21609,21610],{},"If data retention is short, WAL data can expire quickly. However, for workloads requiring longer retention, Ursa runs a background compaction service that processes WAL files. Instead of compacting WAL files into another row-based format, we leverage schema information to infer record structures and convert data into columnar formats, saving them as Parquet files. These Parquet files are then organized into open lakehouse formats like Apache Iceberg and Delta Lake, allowing long-term data to be directly stored in columnar format and accessed via open lakehouse standards. This enables seamless integration with the lakehouse ecosystem.",[48,21612,21613],{},"Despite data being compacted into Parquet files, you can still use the Kafka protocol to replay or catch up on historical data. Leveraging schema information and columnar storage reduces costs through efficient compression while optimizing queries for downstream analytics and Kafka-based replays.",[48,21615,21616],{},"This approach represents Ursa’s key innovation in bridging data streaming and the lakehouse ecosystem with a \"Stream-Table\" dual format—storing a single copy of data while enabling access via either a table format or a stream format. We will publish a detailed technical blog post outlining the specifications of this stream format soon.",[32,21618,21620],{"id":21619},"the-benefits-of-lakehouse-native-storage","‍The Benefits of Lakehouse-Native Storage",[1666,21622,21623,21626,21629],{},[324,21624,21625],{},"Zero inter-zone replication costs: Since data is written directly to a write-ahead log powered by cloud object storage, there is no need to replicate data across availability zones.",[324,21627,21628],{},"Massive storage cost reduction: Storing long-term data in Parquet rather than raw log segments reduces storage footprint and improves compression efficiency.",[324,21630,21631],{},"Seamless analytics integration: Data can be queried directly using Spark, Trino, Databricks, or Snowflake without the need for expensive ETL pipelines.",[48,21633,21634],{},"By eliminating local broker storage and persisting data directly to cloud storage using open lakehouse formats, Ursa reduces both network and storage costs while enabling real-time and batch analytics on the same dataset.",[48,21636,21637,21638,190],{},"If you're interested in learning more about the concept of \"Stream-Table Duality,\" check out our previous blog post:",[55,21639,21640],{"href":10453}," Stream-Table Duality and the Vision of Enabling Data Sharing",[40,21642,21644],{"id":21643},"a-note-on-iceberg-integration","‍A Note on Iceberg Integration",[48,21646,21647],{},"‍Integrating data streaming engines with Iceberg—and open lakehouse formats in general—is an increasingly popular trend. Solutions like Confluent’s Tableflow, Redpanda’s Iceberg topics, and other data streaming vendors have introduced similar concepts for incorporating Iceberg into their systems. However, not all Iceberg integrations are created equal. Ursa takes a more comprehensive approach compared to the solutions mentioned above.",[48,21649,21650],{},"In Ursa, Iceberg—and lakehouse storage in general—is implemented in two modes:",[1666,21652,21653,21656],{},[324,21654,21655],{},"Stream Backed by Table (aka Ursa Managed Table)",[324,21657,21658],{},"Stream Delivered to Table (aka Ursa External Table)",[48,21660,21661],{},[384,21662],{"alt":18,"src":21663},"\u002Fimgs\u002Fblogs\u002F67a05f0601a2d1392a2d66f6_AD_4nXettMnraO7ApHrmjn8Bqu-b1vd2t-z5w19QhHrR4Uacd9qJQl_uHhMGV1NggrXCmJzL2uGZYqYKMMQPGT_r3wil-uelZ2tzLBaVB8ujJ7EzaFd6Sgs429-IR6LTzBEh3BELg8YbVA.png",[32,21665,21667],{"id":21666},"stream-backed-by-table-ursa-managed-table","Stream Backed by Table – Ursa Managed Table",[48,21669,21670],{},"‍Ursa’s default lakehouse-native storage follows the \"stream backed by table\" concept. This approach, as described earlier, compacts all streaming data into columnar Parquet files, organizing them into Iceberg or Delta Lake table formats. As a result, only one copy of the streaming data is stored, and all streaming-related metadata—such as offsets and ordering—is preserved.",[48,21672,21673],{},"This means you can replay the entire stream by reading the Parquet files from the backed table. This is how we achieve “stream-table duality” while maintaining a single copy of data governed by a catalog service.",[48,21675,21676],{},"We call this “Ursa Managed Table” because Ursa manages the entire lifecycle of the data based on retention requirements and registers the table in a data catalog for easy discovery.",[48,21678,21679],{},"✅ Best for: Storing bronze tables in the lakehouse, which retain all historical data for replay and auditing purposes.",[32,21681,21683],{"id":21682},"stream-delivered-to-table-ursa-external-table","‍Stream Delivered to Table – Ursa External Table",[48,21685,21686],{},"‍By contrast, Stream Delivered to Table is the Iceberg integration that most data streaming engines have implemented. The idea is to move connector-based data streaming and lakehouse integration into the streaming engine as a native feature.In this model, data streaming engines only deliver data to an external lakehouse table—they do not manage its lifecycle. This typically means:",[1666,21688,21689,21692,21695],{},[324,21690,21691],{},"Two copies of data are stored: Log segment data (row-based format) for streaming read\u002Fwrite.",[324,21693,21694],{},"Lakehouse-formatted data for analytics.",[324,21696,21697],{},"Streaming reads via the Kafka protocol are not possible from the lakehouse table.",[48,21699,21700],{},"Since the lifecycle of Stream and Table is decoupled, this mode is better suited for storing compacted data using upsert operations. Streaming engines can either append or upsert changes into the external table, providing more flexibility in organizing data via different partitioning strategies.",[48,21702,21703],{},"We call this “External Table” because Ursa does not manage the table’s lifecycle—instead, it is typically managed by a data catalog service provider, which may also offer table maintenance services to optimize tables.",[48,21705,21706],{},"✅ Best for: Storing compacted, curated, and transformed data—such as silver and gold tables in the lakehouse.",[48,21708,21709],{},"Ursa takes a more holistic approach in defining “Stream-Table Duality”, seamlessly integrating data streaming with lakehouses to deliver a well-integrated, end-to-end data solution.We hope this note clarifies the different Iceberg\u002Flakehouse integration approaches and helps distinguish the unique advantages of Ursa's design.",[48,21711,21712],{},[384,21713],{"alt":18,"src":21714},"\u002Fimgs\u002Fblogs\u002F67a05f0628ce3d7d9310564a_AD_4nXePHBf7iJqU8SxPtQ4QKlsPSyD4oLF9sS028ym7iEN3_oCY7hdKPHeVDaOZxce_7dh8Iv9FIvlgCoi9ldMfaElLvesIzV8ta7k7b3OrYX6_WIjk0ZJsyQM8_g5M-Zh-8tXsSYsOkQ.png",[40,21716,21718],{"id":21717},"cost-breakdown-kafka-redpanda-vs-ursa","Cost Breakdown: Kafka \u002F Redpanda vs. Ursa",[48,21720,21721],{},"‍Now that we’ve explored Ursa’s innovations, let’s compare the costs of running data streaming workloads in a cloud environment.",[48,21723,21724],{},[384,21725],{"alt":18,"src":21726},"\u002Fimgs\u002Fblogs\u002F67a05f06758f36f22f1c6f87_AD_4nXeRnQytQb5FfLIF5GWh13XcOUyaEdHNMp4AGW4kzoqM1s_SfXcmKM8OZ2owN4saEyxpLIinKZS5e1HtJWBAWGaRW6GkstW5JYYBT6cKyT_rTKXR2U0hCm0xxACd6CDt2RT_P7aR7A.png",[48,21728,21729],{},"With Ursa, the combination of a leaderless architecture and lakehouse-native storage results in an order-of-magnitude cost reduction (up to 10x). This makes high-throughput data streaming and data ingestion into a lakehouse economically viable at scale.",[40,21731,21732],{"id":2122},"‍Conclusion",[48,21734,21735],{},"‍We are at an exciting moment in the convergence of streaming, lakehouse, and AI. Innovations in data and AI infrastructure are reshaping the landscape, and Ursa represents a fundamental shift in how real-time data streaming is architected for the AI and lakehouse era.",[48,21737,21738],{},"By moving away from leader-based architectures and embracing a leaderless architecture with lakehouse-native storage approach, Ursa has:",[48,21740,21741],{},"✅ Eliminated inter-zone network costs (both client and data replication traffic), one of the largest expenses in leader-based deployments like Kafka and Redpanda.\n✅ Reduced storage costs by leveraging cloud-native object storage and efficient columnar formats.\n✅ Enabled real-time + batch analytics without the need for expensive ETL transformations.",[48,21743,21744],{},"‍The result?",[48,21746,3931,21747,21750,21751,190],{},[55,21748,21749],{"href":10357},"A 5 GB\u002Fs Kafka-compatible workload running for just $50 per hour","—a fraction of the cost of traditional leader-based architectures.Ursa isn’t just an incremental improvement—it’s a revolutionary rethinking of data streaming for lakehouses in the AI era. If you're looking to cut costs while scaling your data streaming workloads or ingesting data into lakehouses, ",[55,21752,21754],{"href":17075,"rel":21753},[264],"it’s time to give Ursa a spin",[48,21756,21757],{},"‍Want to learn more?",[48,21759,21760,21761,21764],{},"‍📩 ",[55,21762,21763],{"href":6392},"Get in touch"," with us to see Ursa in action!\n🔔 Stay tuned—more Ursa updates are coming next week!",{"title":18,"searchDepth":19,"depth":19,"links":21766},[21767,21771,21774,21779,21783,21784],{"id":21421,"depth":19,"text":21422,"children":21768},[21769,21770],{"id":21451,"depth":279,"text":21452},{"id":21471,"depth":279,"text":21472},{"id":21497,"depth":19,"text":21498,"children":21772},[21773],{"id":2696,"depth":279,"text":2697},{"id":21567,"depth":19,"text":21568,"children":21775},[21776,21777,21778],{"id":21571,"depth":279,"text":21572},{"id":21581,"depth":279,"text":21582},{"id":21619,"depth":279,"text":21620},{"id":21643,"depth":19,"text":21644,"children":21780},[21781,21782],{"id":21666,"depth":279,"text":21667},{"id":21682,"depth":279,"text":21683},{"id":21717,"depth":19,"text":21718},{"id":2122,"depth":19,"text":21732},"Discover how Ursa’s leaderless architecture and lakehouse-native storage eliminate inter-zone network costs and slash Kafka infrastructure expenses by 95%. Learn how it works.","\u002Fimgs\u002Fblogs\u002F68da799184c4fa1fa1228d34_image-31.png",{},"30 min",{"title":20278,"description":21785},"blog\u002Fleaderless-architecture-and-lakehouse-native-storage-for-reducing-kafka-cost",[800,799,10054,1332],"T-i3XvTiy1a2UpeR7A5RxZeT8PpK4ACJt0OOt2TycB0",{"id":21794,"title":21795,"authors":21796,"body":21797,"category":3550,"createdAt":290,"date":21379,"description":22022,"extension":8,"featured":294,"image":22023,"isDraft":294,"link":290,"meta":22024,"navigation":7,"order":296,"path":4811,"readingTime":17934,"relatedResources":290,"seo":22025,"stem":22026,"tags":22027,"__hash__":22028},"blogs\u002Fblog\u002Fseamless-streaming-to-lakehouse-unveiling-streamnative-clouds-integration-with-databricks-unity-catalog.md","Seamless Streaming to Lakehouse: Unveiling StreamNative Cloud's Integration with Databricks Unity Catalog",[311,2206],{"type":15,"value":21798,"toc":22011},[21799,21802,21805,21808,21811,21816,21820,21823,21834,21837,21842,21846,21849,21854,21857,21874,21877,21881,21887,21891,21898,21900,21902,21906,21908,21913,21916,21921,21923,21928,21934,21938,21941,21944,21949,21952,21957,21961,21964,21969,21971,21974,21976,22006],[48,21800,21801],{},"We recently announced the Public Preview release of Catalog integration within StreamNative Cloud, marking a significant milestone in enabling seamless data management and analytics in the streaming ecosystem. This integration allows organizations to effortlessly connect their streaming data pipelines to popular Lakehouse catalogs, such as Unity Catalog, Iceberg REST Catalog, and others, unlocking advanced capabilities for real-time data management and analytics. This blog will primarily highlight the integration of Databricks Unity Catalog with StreamNative Cloud. Among the various catalog solutions available, Unity Catalog is the first and only unified catalog for data and AI that is production grade and has garnered significant market interest and adoption. As a result, developing a native integration with Unity Catalog was a strategic priority for StreamNative.",[48,21803,21804],{},"In today’s data-driven world, Lakehouse catalogs play a pivotal role in bridging the gap between streaming and warehousing, offering a unified approach to managing structured and unstructured data at scale. They empower enterprises to streamline data governance, enable schema evolution, and optimize query performance, making them indispensable in modern data architectures.",[48,21806,21807],{},"Although the catalog integration supports multiple options, Lakehouse Storage with Databricks Unity Catalog offers a modern approach to data management, blending the strengths of data lakes and data warehouses. By enabling unified governance and seamless data access, Databricks Unity Catalog provides a centralized solution for managing metadata, permissions, and lineage across all data assets within the Lakehouse architecture. This ensures consistency, scalability, and efficient query performance while empowering organizations to handle diverse workloads, from real-time data streaming to batch analytics. Unity Catalog is a critical component in simplifying data governance and unlocking the full potential of the Lakehouse paradigm for modern data-driven enterprises.",[48,21809,21810],{},"StreamNative Cloud provides seamless, out-of-the-box integration with Databricks Unity Catalog, enabling users to stream data directly into the Databricks Data Intelligence Platform within seconds.",[48,21812,21813],{},[384,21814],{"alt":18,"src":21815},"\u002Fimgs\u002Fblogs\u002F679ff1bedb1725ac5428a485_AD_4nXcB9lRVMAIKh5XuD7u9LR5UlODjNnlEJA6Jz1Bj-qsnFRcYvnnGpFVK9bsQcmLGfvfkCJ3ZZf1LBCqgzSZaR73jlZC-J_kbVghdQGEXPwQ56xSbPmfu7x18bGoNdQTR8SlmSBZv.png",[40,21817,21819],{"id":21818},"integration-built-on-open-standards","Integration Built On Open Standards",[48,21821,21822],{},"StreamNative's integration with Databricks Unity Catalog is built on open standards,",[321,21824,21825,21828,21831],{},[324,21826,21827],{},"Kafka protocol – Enables scalable, real-time data streaming.",[324,21829,21830],{},"Delta Lake – Provides reliable, ACID-compliant storage and transaction management. Delta Lake is optimized for high-throughput ingestion, appending new data efficiently without frequent updates or deletes, making it ideal for streaming and batch ingestion from Apache Kafka and Pulsar.",[324,21832,21833],{},"Unity Catalog – Ensures unified data governance and access control for data and AI assets across the Lakehouse.",[48,21835,21836],{},"The importance of a simplified experience to ingest data directly into a Lakehouse, ready to be consumed by AI applications, cannot be overstated—it streamlines access, enhances governance, and accelerates real-time analytics and AI outcomes. Here’s a quote from Reynold Xin, Co-Founder and Chief Architect at Databricks who underscores this vision.‍",[916,21838,21839],{},[48,21840,21841],{},"We're extremely excited about StreamNative's integration with Unity Catalog. Analytics on real-time data is one of the core use cases of Databricks, and the rapid growth in AI has increased the need for real-time data exponentially. With this release, our customers gain an efficient and simple way to get access to the data that their mission-critical analytics and AI applications require and have it immediately benefit from the unified governance that Unity Catalog provides.\" - ‍Reynold Xin, Co-Founder and Chief Architect at Databricks",[40,21843,21845],{"id":21844},"streamnative-cloud-the-ideal-ingestion-layer-for-databricks-unity-catalog","‍StreamNative Cloud: The Ideal Ingestion Layer for Databricks Unity Catalog",[48,21847,21848],{},"StreamNative Cloud serves as a powerful ingestion layer for Databricks Unity Catalog, enabling seamless, real-time data streaming directly into Databricks Data Intelligence Platform. StreamNative Cloud allows enterprises to ingest, process, and manage high-velocity data streams across diverse sources while maintaining schema consistency and lineage through Unity Catalog. This streamlined integration not only simplifies data management but also accelerates data accessibility for downstream analytics and AI workloads, empowering organizations to unlock actionable insights from fresh, AI-ready data at scale.",[48,21850,21851],{},[384,21852],{"alt":18,"src":21853},"\u002Fimgs\u002Fblogs\u002F679ff1be6fc515e0da1e02eb_AD_4nXdXx1hSt6ycJ591TY98UM7ZyxrilVsPKNQMpmYpK-Zou5OpdYNAtJYUCRZElWzVu8rKVuX-FfKmnltU5NZC5rhDOtsjxu3Mbrr_Cu5fXzI_twC5SlwQMdHpY6xsWEDlOQEl7fjfuw.png",[48,21855,21856],{},"The integration of StreamNative Cloud with Databricks Unity Catalog leverages Unity Catalog APIs , Delta Lake SDK, and Databricks SDK to enable seamless connectivity between real-time data streaming pipelines and the Lakehouse ecosystem.",[1666,21858,21859,21862,21865,21868,21871],{},[324,21860,21861],{},"Ingest and Store Topic Data in Parquet Format – Incoming data from various sources is written to a cost-efficient storage service such as AWS S3, Google Cloud Storage, or Azure Blob Storage in Parquet format.",[324,21863,21864],{},"Create Delta Tables – StreamNative Cloud utilizes the Delta SDK to create a Delta Table, which includes a dedicated _delta_logs folder where all transactions are logged in JSON format.",[324,21866,21867],{},"Register Unity Table – The Unity Catalog API is invoked to create a Unity Table, defining the schema and specifying the storage location of the corresponding Delta Table.",[324,21869,21870],{},"Commit Parquet Files to Delta Table – The Parquet files generated in Step 1 are added to the Delta Table as transactions, which are recorded in JSON log files within the Delta Table structure.",[324,21872,21873],{},"Query and Analyze Ingested Data in Databricks – Users can access and analyze the ingested data within the Databricks workspace by querying the registered catalog.",[48,21875,21876],{},"This native integration enables users to effortlessly configure a cluster for streaming data directly into Databricks with just a few clicks, allowing them to quickly gain insights from their data.StreamNative's integration with Unity Catalog provides end-to-end lineage, enabling full visibility into data as it moves from ingestion to processing, storage, and consumption. This ensures better governance, compliance, and debugging, allowing organizations to track data flows seamlessly across their streaming and analytics pipelines.",[40,21878,21880],{"id":21879},"streaming-data-to-lakehouse-storage-a-walkthrough-of-direct-ingestion-and-cost-effective-offloading","Streaming Data to Lakehouse Storage: A Walkthrough of Direct Ingestion and Cost-Effective Offloading",[48,21882,21883,21884,17865],{},"StreamNative Cloud provides a seamless, out-of-the-box solution for streaming data directly into Lakehouse Storage while enabling its publication in Databricks Unity Catalog for efficient discovery and processing. Watch the workshop ",[55,21885,17864],{"href":17862,"rel":21886},[264],[32,21888,21890],{"id":21889},"setup-databricks-environment","Setup Databricks environment",[48,21892,21893,21894,4031],{},"Before initiating the integration of Databricks with StreamNative Cloud, ",[55,21895,21897],{"href":17108,"rel":21896},[264],"please ensure the following prerequisites are fulfilled",[32,21899,19892],{"id":19891},[48,21901,19895],{},[48,21903,21904],{},[384,21905],{"alt":18,"src":19900},[48,21907,19903],{},[48,21909,21910],{},[384,21911],{"alt":18,"src":21912},"\u002Fimgs\u002Fblogs\u002F679ff1bedade60903645a993_AD_4nXdd0o0vWB5Qv6i5JXwZEGfIrE0ijFyteuJRZU2fGbN5TTtqY4yo2dL_N_NfbbnWRNb4I8D53XE0y71uoRUmgzoCtetXXkqHGj1t-LYCpg9GB96Q4KZc7QfPLWEwDCAER3bOxth7FA.png",[48,21914,21915],{},"There are two options for selecting a storage location: you can either specify your own storage bucket or utilize a pre-created bucket provided by the BYOC environment. In this example, we will use the pre-created bucket.To configure Databricks Unity Catalog, select Unity Catalog as the catalog provider and complete the remaining catalog configuration details. Click Deploy to finish catalog configuration.",[48,21917,21918],{},[384,21919],{"alt":18,"src":21920},"\u002Fimgs\u002Fblogs\u002F679ff1bef966329346b732a7_AD_4nXfS8fn6c3yUcPwsRh_JMXy04pprfaoJn5cJFqg9uypBjrQ8-MfTY9yNL7POBp2eLzF9uwejfvd3F8rDAOlD5-MDFAKuwI2HfIkVXTQuV8q9d2N390N6DWuhagqAEMQ2u1PKmk-I.png",[48,21922,19921],{},[48,21924,21925],{},[384,21926],{"alt":18,"src":21927},"\u002Fimgs\u002Fblogs\u002F679ff1be2f31ca906489b9e0_AD_4nXc29Wnrs0UwjS4q8zZllgbR7bV5QVGfkmLjwFtk4ODeGYuBpwM3djbm0t8H0OTh-lhTov_gv4iQvNohfU2XRZuEwx8Hq0qYf4eMEjWZb52Bs8vVw3EXkXZZ0NTRqoBp_7rVrApcCw.png",[48,21929,19924,21930,21933],{},[55,21931,19929],{"href":19927,"rel":21932},[264],", where it is stored as Delta tables and published to Databricks Unity Catalog for discovery and analysis.",[32,21935,21937],{"id":21936},"authentication-types-supported-with-databricks-unity-catalog","Authentication Types Supported With Databricks Unity Catalog",[48,21939,21940],{},"StreamNative supports two types of authentication with Databricks Unity Catalog.",[48,21942,21943],{},"‍Personal Access Token (PAT) : A Databricks Personal Access Token is a secure authentication method that allows users to access Databricks REST APIs and command-line interfaces without sharing their login credentials.",[48,21945,21946],{},[384,21947],{"alt":18,"src":21948},"\u002Fimgs\u002Fblogs\u002F679ff1be023621e9b73c5d06_AD_4nXc2SjNj5uWMJjTHKp0VGHN2sbGZZyW1JLKBTvsjQNYypawLxl8F5Dy2HQk37lXenXHck4QcR05um1Jf-OGZtH5ZuMd_GcPxA7cqVGCAhz4nkE4bmScL4t3nZ1mt69R6XzVbLBZ0pA.png",[48,21950,21951],{},"‍OAuth 2 (M2M) : The OAuth2 Machine-to-Machine authentication flow enables secure, automated communication between servers or applications by using client credentials to obtain access tokens for API authorization without user involvement.",[48,21953,21954],{},[384,21955],{"alt":18,"src":21956},"\u002Fimgs\u002Fblogs\u002F679ff1be248ec7370032c045_AD_4nXfW78mx22qBMedH6MPNaoVNQ3R8PlYT6knKDSNk6yU_cACH50E0xAij9d8y_9HVS4mUl_9F9wWJBH5XSvia0yfoNBbrgKZraz5MXlPPBQ9TrSq-Dcc3zUYVZ_9g2jUHRJghfYFX.png",[32,21958,21960],{"id":21959},"review-ingested-data-in-databricks-unity-catalog","Review Ingested Data In Databricks Unity Catalog",[48,21962,21963],{},"Once ingested, the data becomes discoverable within the catalog and can be efficiently queried for analysis. The Delta tables generated by StreamNative Cloud are seamlessly integrated and visible within the designated schema in the catalog.",[48,21965,21966],{},[384,21967],{"alt":18,"src":21968},"\u002Fimgs\u002Fblogs\u002F679ff1beb209a3b771b40ed5_AD_4nXeJJQHwsHX6EkihsFLjHlU9PAsv81jDvz2eUbYYdyjPB0b8-IU5MOUhpviynDL3Fg4hobHcr2o8A4qSlAuE2erpXNdbNjtDnLPRHvkmpy3DJHnQoJNEUUXuCvreLvr5aObHSGd_og.png",[40,21970,2125],{"id":2122},[48,21972,21973],{},"The Public Preview release of Catalog integration within StreamNative Cloud represents a transformative step in connecting real-time data streaming pipelines to Lakehouse Storage, particularly through Databricks Unity Catalog. Built on open standards like Apache Kafka, Delta Lake, and Unity Catalog, this integration ensures interoperability while providing robust data governance, seamless schema evolution, and efficient metadata management. Organizations can ingest data directly into Delta tables, enable effortless discovery in Unity Catalog, and streamline analytics workflows, making it easier to extract actionable insights from streaming data. With end-to-end data lineage, organizations gain full visibility into data movement, transformations, and consumption, ensuring better governance, compliance, and debugging. Explore how this open, standards-based integration can revolutionize your data-driven strategy.",[48,21975,17854],{},[321,21977,21978,21983,21988,21993,21998],{},[324,21979,21980,21981,17894],{},"Checkout StreamNative's recent benchmark about Ursa Engine: See how ",[55,21982,17893],{"href":10357},[324,21984,17897,21985,190],{},[55,21986,17902],{"href":17900,"rel":21987},[264],[324,21989,17859,21990,17865],{},[55,21991,17864],{"href":17862,"rel":21992},[264],[324,21994,17868,21995],{},[55,21996,17872],{"href":17108,"rel":21997},[264],[324,21999,22000,22001,22005],{},"Join our joint webinar: Don’t miss the ",[55,22002,22004],{"href":21362,"rel":22003},[264],"Databricks and StreamNative webinar on February 20th",", where we’ll explore cutting-edge integrations.",[48,22007,17914,22008,17918],{},[55,22009,7137],{"href":3907,"rel":22010},[264],{"title":18,"searchDepth":19,"depth":19,"links":22012},[22013,22014,22015,22021],{"id":21818,"depth":19,"text":21819},{"id":21844,"depth":19,"text":21845},{"id":21879,"depth":19,"text":21880,"children":22016},[22017,22018,22019,22020],{"id":21889,"depth":279,"text":21890},{"id":19891,"depth":279,"text":19892},{"id":21936,"depth":279,"text":21937},{"id":21959,"depth":279,"text":21960},{"id":2122,"depth":19,"text":2125},"Discover how StreamNative Cloud’s native integration with Databricks Unity Catalog enables seamless, real-time data ingestion into the Lakehouse. Built on open standards like Kafka and Delta Lake, this integration simplifies data governance, enhances analytics, and accelerates AI workloads. Learn more!","\u002Fimgs\u002Fblogs\u002F67a0e2b2e93a5d7a75c08828_image-37.png",{},{"title":21795,"description":22022},"blog\u002Fseamless-streaming-to-lakehouse-unveiling-streamnative-clouds-integration-with-databricks-unity-catalog",[800,1332,2599],"DLTGazOteSCvaROFGARfeE_YFpKmRmg2psq597TJSi8",{"id":22030,"title":22031,"authors":22032,"body":22033,"category":3550,"createdAt":290,"date":22777,"description":22778,"extension":8,"featured":7,"image":22779,"isDraft":294,"link":290,"meta":22780,"navigation":7,"order":296,"path":10357,"readingTime":21788,"relatedResources":290,"seo":22781,"stem":22782,"tags":22783,"__hash__":22784},"blogs\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour.md","How We Run a 5 GB\u002Fs Kafka Workload for Just $50 per Hour",[6785,810,809,808],{"type":15,"value":22034,"toc":22747},[22035,22038,22041,22044,22047,22050,22054,22057,22065,22070,22073,22081,22086,22090,22095,22098,22101,22109,22113,22116,22121,22125,22128,22131,22134,22137,22145,22149,22152,22163,22166,22170,22173,22176,22187,22190,22194,22198,22206,22209,22213,22221,22250,22254,22257,22262,22266,22269,22273,22276,22279,22284,22293,22298,22301,22304,22315,22319,22322,22333,22337,22340,22343,22348,22351,22380,22384,22386,22392,22395,22400,22405,22408,22412,22426,22430,22441,22445,22460,22469,22480,22483,22486,22490,22493,22496,22507,22510,22513,22516,22521,22526,22530,22533,22550,22554,22568,22573,22577,22588,22591,22607,22611,22622,22627,22632,22640,22642,22645,22649,22655,22659,22662,22670,22674,22680,22686,22695,22704,22713,22722,22731,22739],[48,22036,22037],{},"The rise of DeepSeek has shaken the AI infrastructure market, forcing companies to confront the escalating costs of training and deploying AI models. But the real pressure point isn’t just compute—it’s data acquisition and ingestion costs.",[48,22039,22040],{},"As businesses rethink their AI cost-containment strategies, real-time data streaming is emerging as a critical enabler. The growing adoption of Kafka as a standard protocol has expanded cost-efficient options, allowing companies to optimize streaming analytics while keeping expenses in check.",[48,22042,22043],{},"Ursa, the data streaming engine powering StreamNative’s managed Kafka service, is built for this new reality. With its leaderless architecture and native lakehouse storage integration, Ursa eliminates costly inter-zone network traffic for data replication and client-to-broker communication while ensuring high availability at minimal operational cost.",[48,22045,22046],{},"In this blog post, we benchmarked the infrastructure cost and total cost of ownership (TCO) for running a 5GB\u002Fs Kafka workload across different Kafka vendors, including Redpanda, Confluent WarpStream, and AWS MSK. Our benchmark results show that Ursa can sustain 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda—making it the ideal solution for high-performance, cost-efficient ingestion and data streaming for data lakehouses and AI workloads.",[48,22048,22049],{},"Note: We also evaluated vanilla Kafka in our benchmark; however, for simplicity, we have focused our cost comparison on vendor solutions rather than self-managed deployments. That said, it is important to highlight that both Redpanda and vanilla Kafka use a leader-based data replication approach. In a data-intensive, network-bound workload like 5GB\u002Fs streaming, with the same machine type and replication factor, Redpanda and vanilla Kafka produced nearly identical cost profiles.",[40,22051,22053],{"id":22052},"key-benchmark-findings","Key Benchmark Findings",[48,22055,22056],{},"Ursa delivered 5 GB\u002Fs of sustained throughput at an infrastructure cost of just $54 per hour. For comparison:",[321,22058,22059,22062],{},[324,22060,22061],{},"MSK: $303 per hour → 5.6x more expensive compared to Ursa",[324,22063,22064],{},"Redpanda: $988 per hour → 18x more expensive compared to Ursa",[48,22066,22067],{},[384,22068],{"alt":18,"src":22069},"\u002Fimgs\u002Fblogs\u002F679c71b67d9046f26edc7977_AD_4nXfvTqyBNUBu2lObdkKAx-5UNkpNP8UYULLZyOcixE6z99VMZUUEsUqWjzexI7vjyNGRNSAUoM9smYvdTP55ctAhIbrs5lmQgcSVMWdaoigbWouCl95DVSQsxooY-qqfGcYqS4g4zA.png",[48,22071,22072],{},"Beyond infrastructure costs, when factoring in both storage pricing, vendor pricing and operational expenses, Ursa’s total cost of ownership (TCO) for a 5GB\u002Fs workload with a 7-day retention period is:",[321,22074,22075,22078],{},[324,22076,22077],{},"50% cheaper than Confluent WarpStream",[324,22079,22080],{},"85% cheaper than MSK and Redpanda",[48,22082,22083],{},[384,22084],{"alt":18,"src":22085},"\u002Fimgs\u002Fblogs\u002F679c602d77e9c706de5343b8_AD_4nXeDv8rrv_C1CTCCiqYo1zpvlGYbdBk1r0VEqovAPu22iFMQZgh54Hfw9PBMLzM7jDFxKwAFDxbdG0np4XVk_tGsWhEKMloLRcmmea7lvueCx-0cFsyaE3Mya4Mxc1Dox95A6JEc.png",[40,22087,22089],{"id":22088},"ursa-highly-cost-efficient-data-streaming-at-scale","Ursa: Highly Cost-Efficient Data Streaming at Scale",[48,22091,22092,22094],{},[55,22093,1332],{"href":10389}," is a next-generation data streaming engine designed to deliver high performance at a fraction of the cost of traditional disk-based solutions. It is fully compatible with Apache Kafka and Apache Pulsar APIs, while leveraging a leaderless, lakehouse-native architecture to maximize scalability, efficiency, and cost savings.",[48,22096,22097],{},"Ursa’s key innovation is separating storage from compute and decoupling metadata\u002Findex operations from data operations by utilizing cloud object storage (e.g., AWS S3) instead of costly inter-zone disk-based replication. It also employs open lakehouse formats (Iceberg and Delta Lake), enabling columnar compression to significantly reduce storage costs while maintaining durability and availability.",[48,22099,22100],{},"In contrast, traditional streaming systems—like Kafka and Redpanda—depend on leader-based architectures, which drive up inter-zone traffic costs due to replication and client communication. Ursa mitigates these costs by:",[321,22102,22103,22106],{},[324,22104,22105],{},"Eliminating inter-zone traffic costs via a leaderless architecture.",[324,22107,22108],{},"Replacing costly inter-zone replication with direct writes to cloud storage using open lakehouse formats.",[40,22110,22112],{"id":22111},"how-ursa-eliminates-inter-zone-traffic","How Ursa Eliminates Inter-Zone Traffic",[48,22114,22115],{},"Ursa minimizes inter-zone traffic by leveraging a leaderless architecture, which eliminates inter-zone communication between clients and brokers, and lakehouse-native storage, which removes the need for inter-zone data replication. This approach ensures high availability and scalability while avoiding unnecessary cross-zone data movement.",[48,22117,22118],{},[384,22119],{"alt":18,"src":22120},"\u002Fimgs\u002Fblogs\u002F679c602e21b3571bb7117dca_AD_4nXd7Oahc77NjRLNvA9clLt0tsyU6MrIqVibFYv5pW5giTIcCHPr3EA_yTGzfVEUIVO3VXK56qWK8zmBCp5lY0E_4nmlWIPFrHjtHylA5NhwELjn-UB0fLG2h_kbrxrc7Cs_edvveNA.png",[32,22122,22124],{"id":22123},"leaderless-architecture","Leaderless architecture",[48,22126,22127],{},"Traditional streaming engines such as Kafka, Pulsar, or RedPanda rely on a leader-based model, where each partition is assigned to a single leader broker that handles all writes and reads.",[48,22129,22130],{},"Pros of Leader-Based Architectures:\n✔ Maintains message ordering via local sequence IDs\n✔ Delivers low latency and high performance through message caching",[48,22132,22133],{},"Cons of Leader-Based Architectures:\n✖ Throughput bottlenecked by a single broker per partition\n✖ Inter-zone traffic required for high availability in multi-AZ deployments",[48,22135,22136],{},"While Kafka and Pulsar offer partial solutions (e.g., reading from followers, shadow topics) to reduce read-related inter-zone traffic, producers still send data to a single leader.",[48,22138,22139,22140,22144],{},"Ursa removes the concept of topic ownership, allowing any broker in the cluster to handle reads or writes for any partition. The primary challenge—ensuring message ordering—is solved with ",[55,22141,5599],{"href":22142,"rel":22143},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia",[264],", a scalable metadata and index service created by StreamNative in 2022.",[32,22146,22148],{"id":22147},"oxia-the-metadata-layer-enabling-leaderless-architecture","Oxia: The Metadata Layer Enabling Leaderless Architecture",[48,22150,22151],{},"Ensuring message ordering in a leaderless architecture is complex, but Ursa solves this with Oxia:",[321,22153,22154,22157,22160],{},[324,22155,22156],{},"Handles millions of metadata\u002Findex operations per second",[324,22158,22159],{},"Generates sequential IDs to maintain strict message ordering",[324,22161,22162],{},"Optimized for Kubernetes with horizontal scalability",[48,22164,22165],{},"Producers and consumers can connect to any broker within their local AZ, eliminating inter-zone traffic costs while maintaining performance through localized caching.",[32,22167,22169],{"id":22168},"zero-interzone-data-replication","Zero interzone data replication",[48,22171,22172],{},"In most distributed systems, data replication from a leader (primary) to followers (replicas) is crucial for fault tolerance and availability. However, replication across zones can inflate infrastructure expenses substantially.",[48,22174,22175],{},"Ursa avoids these costs by writing data directly to cloud storage (e.g., AWS S3, Google GCS):",[321,22177,22178,22181,22184],{},[324,22179,22180],{},"Built-In Resilience: Cloud storage inherently offers high availability and fault tolerance without inter-zone traffic fees.",[324,22182,22183],{},"Tradeoff: Slightly higher latency (sub-second, with p99 at 500 milliseconds) compared to local disk\u002FEBS (single-digit to sub-100 milliseconds), in exchange for significantly lower costs (up to 10x lower).",[324,22185,22186],{},"Flexible Modes: Ursa is an addition to the classic BookKeeper-based engine, providing users with the flexibility to optimize for either cost or low latency based on their workload requirements.",[48,22188,22189],{},"By foregoing conventional replication, Ursa slashes inter-zone traffic costs and associated complexities—making it a compelling option for organizations seeking to balance high-performance data streaming with strict budget constraints.",[40,22191,22193],{"id":22192},"how-we-ran-a-5-gbs-test-with-ursa","How We Ran a 5 GB\u002Fs Test with Ursa",[32,22195,22197],{"id":22196},"ursa-cluster-deployment","Ursa Cluster Deployment",[321,22199,22200,22203],{},[324,22201,22202],{},"9 brokers across 3 availability zones, each on m6i.8xlarge (Fixed 12.5 Gbps bandwidth, 32 vCPU cores, 128 GB memory).",[324,22204,22205],{},"Oxia cluster (metadata store) with 3 nodes of m6i.8xlarge, distributed across three availability zones (AZs).",[48,22207,22208],{},"During peak throughput (5 GB\u002Fs), each broker’s network usage was about 10 Gbps.",[32,22210,22212],{"id":22211},"openmessaging-benchmark-workers-configuration","OpenMessaging Benchmark Workers & Configuration",[48,22214,22215,22216,22220],{},"The OpenMessaging Benchmark(OMB) Framework is a suite of tools that make it easy to benchmark distributed messaging systems in the cloud. Please check ",[55,22217,22218],{"href":22218,"rel":22219},"https:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks\u002F",[264]," for details.",[321,22222,22223,22238,22247],{},[324,22224,22225,22226,22231,22232,22237],{},"12 OMB workers: 6 for ",[55,22227,22230],{"href":22228,"rel":22229},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002Fd1094122270775e4f1580947f80c5055",[264],"producers",", 6 for ",[55,22233,22236],{"href":22234,"rel":22235},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F06bada89381fb77a7862e1b4c1d8963d",[264],"consumers"," across 3 availability zones, on m6i.8xlarge instances. Each worker is configured with 12 CPU cores and 48 GB memory.",[324,22239,22240,22241,22246],{},"Sample YAML ",[55,22242,22245],{"href":22243,"rel":22244},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F204c1f26c4d44a218ae235bf2de99904",[264],"scripts"," provided for Kafka-compatible configuration and rate limits.",[324,22248,22249],{},"Achieved consistent 5 GB\u002Fs publish\u002Fsubscribe throughput.",[40,22251,22253],{"id":22252},"ursa-benchmark-tests-results","Ursa Benchmark Tests & Results",[48,22255,22256],{},"The following diagram demonstrates that Ursa can consistently handle 5 GB\u002Fs of traffic, fully saturating the network across all broker nodes.",[48,22258,22259],{},[384,22260],{"alt":18,"src":22261},"\u002Fimgs\u002Fblogs\u002F679c602d7b261bac1113f7d6_AD_4nXdDPsRc3koXICiFF0bqSmGWbJt_RlUy4FE3ruuWOfbCfpcqZ1dejjqGbkaCJv2hQFL1nirRouBVRW2l5uMWBvY9naMqGB_wHcLI14dBM0f85TXhmdm3UxEv1yGX9Y4hf5FttSkZew.png",[40,22263,22265],{"id":22264},"comparing-infrastructure-cost","Comparing Infrastructure Cost",[48,22267,22268],{},"This benchmark first evaluates infrastructure costs of running a 5 GB\u002Fs streaming workload (1:1 producer-to-consumer ratio) across different data streaming engines, including Ursa, Redpanda, and AWS MSK, with a focus on multi-AZ deployments to ensure a fair comparison.",[32,22270,22272],{"id":22271},"test-setup-key-assumptions","Test Setup & Key Assumptions",[48,22274,22275],{},"All tests use multi-AZ configurations, with clusters and clients distributed across three AWS availability zones (AZs). Cluster size scales proportionally to the number of AZs, and rack-awareness is enabled for all engines to evenly distribute topic partitions and leaders.",[48,22277,22278],{},"To ensure a fair comparison, we selected the same machine type capable of fully utilizing both network and storage bandwidth for Ursa and Redpanda in this 5GB\u002Fs test:",[321,22280,22281],{},[324,22282,22283],{},"9 × m6i.8xlarge instances",[48,22285,22286,22287,22292],{},"However, MSK's storage bandwidth limits vary depending on the selected instance type, with the highest allowed limit capped at 1000 MiB\u002Fs per broker, according to",[55,22288,22291],{"href":22289,"rel":22290},"https:\u002F\u002Fdocs.aws.amazon.com\u002Fmsk\u002Flatest\u002Fdeveloperguide\u002Fmsk-provision-throughput-management.html#throughput-bottlenecks",[264]," AWS documentation",". Given this constraint, achieving 5 GB\u002Fs throughput with a replication factor of 3 required the following setup:",[321,22294,22295],{},[324,22296,22297],{},"15 × kafka.m7g.8xlarge (32 vCPUs, 128 GB memory, 15 Gbps network, 4000 GiB EBS).",[48,22299,22300],{},"This configuration was necessary to work around MSK's storage bandwidth limitations, ensuring a comparable cost basis to other evaluated streaming engines.",[48,22302,22303],{},"Additional key assumptions include:",[321,22305,22306,22309,22312],{},[324,22307,22308],{},"Inter-AZ producer traffic: For leader-based engines, two-thirds of producer-to-broker traffic crosses AZs due to leader distribution.",[324,22310,22311],{},"Consumer optimizations: Follower fetch is enabled across all tests, eliminating inter-AZ consumer traffic.",[324,22313,22314],{},"Storage cost exclusions: This benchmark only evaluates streaming costs, assuming no long-term data retention.",[32,22316,22318],{"id":22317},"inter-broker-replication-costs","Inter-Broker Replication Costs",[48,22320,22321],{},"Inter-broker (cross-AZ) replication is a major cost driver for data streaming engines:",[321,22323,22324,22327,22330],{},[324,22325,22326],{},"RedPanda: Inter-broker replication is not free, leading to substantial costs when data must be copied across multiple availability zones.",[324,22328,22329],{},"AWS MSK: Inter-broker replication is free, but MSK instance pricing is significantly higher (e.g., $3.264 per hour for kafka.m7g.8xlarge vs $1.306 per hour for an on-demand m7g.8xlarge). The storage price of MSK is $0.10 per GB-month which is significantly higher than st1, which costs $0.045 per GB-month. Even though replication is free, client-to-broker traffic still incurs inter-AZ charges.",[324,22331,22332],{},"Ursa: No inter-broker replication costs due to its leaderless architecture, eliminating inter-zone replication costs entirely.",[32,22334,22336],{"id":22335},"zone-affinity-reducing-inter-az-costs","Zone Affinity: Reducing Inter-AZ Costs",[48,22338,22339],{},"We evaluated zone affinity mechanisms to further reduce inter-AZ data transfer costs.",[48,22341,22342],{},"Consumers:",[321,22344,22345],{},[324,22346,22347],{},"Follower fetch is enabled across all tests, ensuring consumers fetch data from replicas in their local AZ—eliminating inter-zone consumer traffic except for metadata lookups",[48,22349,22350],{},"Producers:",[321,22352,22353,22362,22371],{},[324,22354,22355,22356,22361],{},"Kafka protocol lacks an easy way to enforce producer AZ affinity (though ",[55,22357,22360],{"href":22358,"rel":22359},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FKIP-1123:+Rack-aware+partitioning+for+Kafka+Producer",[264],"KIP-1123"," aims to address this). And it only works with the default partitioner (i.e., when no record partition or record key is specified).",[324,22363,22364,22365,22370],{},"Redpanda recently introduced ",[55,22366,22369],{"href":22367,"rel":22368},"https:\u002F\u002Fdocs.redpanda.com\u002Fredpanda-cloud\u002Fdevelop\u002Fproduce-data\u002Fleader-pinning\u002F",[264],"leader pinning",", but this only benefits setups where producers are confined to a single AZ—not applicable to our multi-AZ benchmark.",[324,22372,22373,22374,22379],{},"Ursa is the only system in this test with ",[55,22375,22378],{"href":22376,"rel":22377},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fconfig-kafka-client#eliminate-cross-az-networking-traffic",[264],"built-in zone affinity for both producers and consumers",". It achieves this by embedding producer AZ information in client.id, allowing metadata lookups to route clients to local-AZ brokers, eliminating inter-AZ producer traffic.",[32,22381,22383],{"id":22382},"cost-comparison-results","Cost Comparison Results",[48,22385,22056],{},[321,22387,22388,22390],{},[324,22389,22061],{},[324,22391,22064],{},[48,22393,22394],{},"Ursa’s leaderless architecture, zone affinity, and native cloud storage integration deliver unparalleled cost efficiency, making it the most cost-effective choice for high-throughput data streaming workloads.",[48,22396,22397],{},[384,22398],{"alt":18,"src":22399},"\u002Fimgs\u002Fblogs\u002F679c72208198ca36a352f228_AD_4nXeeZuM8T-xBlD4Vf3j67K618n08qh8wIDLLtiLJG0ssA1Wj1V26u7wIDTX9sqLrtw8mB2c299dwzarGen62CG0Vh7nWstn5qbPGFcBaKJYEepTsLr5fHWv1U8uqbg8Y0UOK6fJ7.png",[48,22401,22402],{},[384,22403],{"alt":18,"src":22404},"\u002Fimgs\u002Fblogs\u002F679c625978031f40229de484_AD_4nXdLkLLJ30KKr-_A_rN1j8akVwBYacAWIPzWHoOReJF421890kfByZoQQxkLczihVSmiw5Q9J51-V9I2SEKITbwsYnANDDTlAVL5nQ_jfaHNTe9VEWhSoa7DZooCnilDYL6l6msmJg.png",[48,22406,22407],{},"The detailed infrastructure cost calculations for each data streaming engine are listed below:",[32,22409,22411],{"id":22410},"streamnative-ursa","StreamNative - Ursa",[321,22413,22414,22417,22420,22423],{},[324,22415,22416],{},"Server EC2 costs: 9 * $1.536\u002Fhr = $14",[324,22418,22419],{},"Client EC2 costs: 9 * $1.536\u002Fhr =$14",[324,22421,22422],{},"S3 write requests costs: 1350 r\u002Fs * $0.005\u002F1000r * 3600s = $24",[324,22424,22425],{},"S3 read requests costs: 1350 r\u002Fs * $0.0004\u002F1000r * 3600s = $2",[32,22427,22429],{"id":22428},"aws-msk","AWS MSK",[321,22431,22432,22435,22438],{},[324,22433,22434],{},"Server EC2 costs: 15 * $3.264\u002Fhr = $49",[324,22436,22437],{},"Client side EC2 costs: 9 * $1.536\u002Fhr =$14",[324,22439,22440],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FG(in+out) * 3600 = $240",[32,22442,22444],{"id":22443},"redpanda","RedPanda",[321,22446,22447,22449,22451,22454,22457],{},[324,22448,22416],{},[324,22450,22419],{},[324,22452,22453],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FGB(in+out) * 3600 = $240",[324,22455,22456],{},"Interzone traffic - replication: 10GB\u002Fs * $0.02\u002FGB(in+out) * 3600 = $720",[324,22458,22459],{},"Interzone traffic - broker to consumer: $0 (fetch from local zone)",[48,22461,22462,22463,22468],{},"Please note that we were unable to test ",[55,22464,22467],{"href":22465,"rel":22466},"https:\u002F\u002Fwww.redpanda.com\u002Fblog\u002Fcloud-topics-streaming-data-object-storage",[264],"Redpanda with Cloud Topics",", as it remains an announced but unreleased feature and is not yet available for evaluation. Based on the limited information available, while Cloud Topics may help optimize inter-zone data replication costs, producers still need to traverse inter-availability zones to connect to the topic partition owners and incur inter-zone traffic costs of up to $240 per hour.",[321,22470,22471,22477],{},[324,22472,22473,22476],{},[55,22474,22360],{"href":22358,"rel":22475},[264]," (when implemented) will help mitigate producer-to-broker inter-zone traffic, but it is not yet available. And it only works with the default partitioner (no record partition or key is specified).",[324,22478,22479],{},"Redpanda’s leader pinning helps only when all producers for the pinned topic are confined to a single AZ. In multi-AZ environments (like our benchmark), inter-zone producer traffic remains unavoidable.",[48,22481,22482],{},"Additionally, Redpanda’s Cloud Topics architecture is not documented publicly. Their blog mentions \"leader placement rules to optimize produce latency and ingress cost,\" but it is unclear whether this represents a shift away from a leader-based architecture or if it uses techniques similar to Ursa’s zone-aware approach.",[48,22484,22485],{},"We may revisit this comparison as more details become available.",[40,22487,22489],{"id":22488},"comparing-total-cost-of-ownership","Comparing Total Cost of Ownership",[48,22491,22492],{},"As highlighted earlier, with a BYOC Ursa setup, you can achieve 5 GB\u002Fs throughput at just 5% of the infrastructure cost of a traditional leader-based data streaming engine, such as Kafka or RedPanda, while managing the infrastructure yourself. This significant cost reduction is enabled by Ursa’s leaderless architecture and lakehouse-native storage design, which eliminate overhead costs such as inter-zone traffic and leader-based data replication. By leveraging a lakehouse-native, leaderless architecture, Ursa reduces resource requirements, enabling you to handle high data throughput efficiently and at a fraction of the cost of RedPanda.",[48,22494,22495],{},"Now, let’s examine the total cost comparison, evaluating Ursa alongside other vendors, including those that have adopted a leaderless architecture (e.g., Confluent WarpStream). This comparison is based on a 5GB\u002Fs workload with a 7-day retention period, factoring in both storage cost and vendor costs Here are the key findings:",[321,22497,22498,22501,22504],{},[324,22499,22500],{},"Ursa ($164,353\u002Fmonth) is: 50% cheaper than Confluent WarpStream ($337,068\u002Fmonth)",[324,22502,22503],{},"85% cheaper than AWS MSK ($1,115,251\u002Fmonth)",[324,22505,22506],{},"86% cheaper than Redpanda ($1,202,853\u002Fmonth)",[48,22508,22509],{},"In addition to Ursa’s architectural advantages—eliminating most inter-AZ traffic and leveraging lakehouse storage for cost-effective data retention—it also adopts a more fair and cost-efficient pricing model: Elastic Throughput-based pricing. This approach aligns costs with actual usage, avoiding unnecessary overhead.",[48,22511,22512],{},"Unlike WarpStream, which charges for both storage and throughput, Ursa ensures that customers only pay for the throughput they actively use. Ursa’s pricing is based on compressed data sent by clients, meaning the more data compressed on the client side, the lower the cost. In contrast, WarpStream prices are based on uncompressed data, unfairly inflating expenses and failing to incentivize customers to optimize their client applications.",[48,22514,22515],{},"This distinction is crucial, as compressed data reduces both storage and network costs, making Ursa’s pricing model not only more cost-effective but also more transparent and predictable.",[48,22517,22518],{},[384,22519],{"alt":18,"src":22520},"\u002Fimgs\u002Fblogs\u002F679c602d194800c9206d9d58_AD_4nXcFlf755xgyz7htxhMhBV5fGrsxy642mQNodt61DTok_z1dwkw5A6lkO5hatXVneCaB0anbZPAyvLI3MlIMuQEYLEACHHvQMOr5UfaB37dfzkdqewDEvcT-20VGd_zzvJsuA00zGA.png",[48,22522,22523],{},[384,22524],{"alt":18,"src":22525},"\u002Fimgs\u002Fblogs\u002F679c62594e9c2e629fae73aa_AD_4nXeU6cOgItnjLsEZCOf13TEvMY_SHWWIxYP2OYUj-B1GUPyWO78OG08K_v03hwYSVcg06f9dqDiGmdwy76vynjmiDGL5bluZ5_XF4nSU_r59oOZdfViXndXt6s11vVOY7qwfZN8v.png",[32,22527,22529],{"id":22528},"cost-breakdown","Cost Breakdown",[3933,22531,22532],{"id":22410},"StreamNative – Ursa",[321,22534,22535,22538,22541,22544,22547],{},[324,22536,22537],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9,953.28",[324,22539,22540],{},"S3 Write Requests: 1,350 r\u002Fs × $0.005\u002F1,000 r × 3,600 s × 24 hr × 30 days = $17,496",[324,22542,22543],{},"S3 Read Requests: 1,350 r\u002Fs × $0.0004\u002F1,000 r × 3,600 s × 24 hr × 30 days = $1,400",[324,22545,22546],{},"S3 Storage Costs: 5 GB\u002Fs × $0.021\u002FGB × 3,600 s × 24 hr × 7 days = $63,504",[324,22548,22549],{},"Vendor Cost: 200 ETU × $0.50\u002Fhr × 24 hr × 30 days = $72,000",[3933,22551,22553],{"id":22552},"warpstream","WarpStream",[321,22555,22556,22559],{},[324,22557,22558],{},"Based on WarpStream’s pricing calculator (as of January 29, 2025), we assume a 4:1 client data compression ratio, meaning 20 GB\u002Fs of uncompressed data translates to 5 GB\u002Fs of compressed data.",[324,22560,22561,22562,22567],{},"It's important to note that WarpStream’s pricing structure has fluctuated frequently throughout January. We observed the cost reported by their calculator changing from $409,644 per month to $337,068 per month. This variability has been previously highlighted in the blog post “",[55,22563,22566],{"href":22564,"rel":22565},"https:\u002F\u002Fbigdata.2minutestreaming.com\u002Fp\u002Fthe-brutal-truth-about-apache-kafka-cost-calculators",[264],"The Brutal Truth About Kafka Cost Calculators","”. To ensure transparency, we have documented the pricing as of January 29, 2025.",[48,22569,22570],{},[384,22571],{"alt":18,"src":22572},"\u002Fimgs\u002Fblogs\u002F679c602e42713e0028e9af5e_AD_4nXcu5_VWTLu9jRYs6zX1MBAOtLQEo5gyfNSWPcbpnQHXTa8qNCFAXezRR2E8daygzYTTwd4dhJjaLaLM8C6y_3OGbu2NS7pdvEv3a8-ptNKOg7AeKnYqPQCAYvQ5EuxzuI3JYIvY.png",[3933,22574,22576],{"id":22575},"msk","MSK",[321,22578,22579,22582,22585],{},[324,22580,22581],{},"EC2 (Server): 15 * $3.264\u002Fhr × 24 hr × 30 days = $35,251",[324,22583,22584],{},"Interzone Traffic (Client-Server): 5 GB\u002Fs × ⅔ × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $172,800",[324,22586,22587],{},"Storage: 5 GB\u002Fs × $0.1\u002FGB-month × 3,600 s × 24 hr × 7 days * 3 replicas = $907,200",[3933,22589,22444],{"id":22590},"redpanda-1",[321,22592,22593,22596,22598,22601,22604],{},[324,22594,22595],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9953",[324,22597,22584],{},[324,22599,22600],{},"Interzone Traffic (Replication): 5 GB\u002Fs × 2 × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $518,400",[324,22602,22603],{},"Storage: 5 GB\u002Fs × $0.045\u002FGB-month(st1) × 3,600 s × 24 hr × 7 days * 3 replicas = $408,240",[324,22605,22606],{},"Vendor Cost: $93,333 per month (based on limited information. See additional notes below).",[3933,22608,22610],{"id":22609},"additional-notes","Additional Notes",[321,22612,22613],{},[324,22614,22615,22616,22621],{},"Redpanda does not publicly disclose its BYOC pricing, making it difficult to accurately assess its total costs. We refer to information from the whitepaper “",[55,22617,22620],{"href":22618,"rel":22619},"https:\u002F\u002Fwww.redpanda.com\u002Fresources\u002Fredpanda-vs-confluent-performance-tco-benchmark-report#form",[264],"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group.","” for estimation purposes. Based on the Tier-8 pricing model in the whitepaper,  the estimated cost to support a 5GB\u002Fs workload would be $1.12 million per year ($93,333 per month). However, since this calculation is based on an estimation, we will revisit and refine the cost assessment once Redpanda publishes its BYOC pricing.",[48,22623,22624],{},[384,22625],{"alt":18,"src":22626},"\u002Fimgs\u002Fblogs\u002F679c602dc8a9859eed89a0ef_AD_4nXdbcO8vsNNPy4GtkNLlmNKf22fjxRvzLzH7CtOna1L08sTbvnZx3HhufeFqc1w4K2gEF7lxO2IR5supotxebAiGnA07Qa8Yr3Rd1pVK2LYKK4WurlJGwgdwwucZIFoF-N_2oBjY.png",[48,22628,22629],{},[384,22630],{"alt":18,"src":22631},"\u002Fimgs\u002Fblogs\u002F679c602d6bc1c2287e012540_AD_4nXfcHZnLfjbjIr3ZAgoQXT9dwP3aQCOQPmGZZJUtpNZSwE6qY6M3yehIaBxCwxEIeu5PVdUPY0zhyjnow26YfgjdYgSG4GnV9ibxu0YWTIpwng6z_F6FUGJMpERMKtpsFESzXSN_Sw.png",[321,22633,22634,22637],{},[324,22635,22636],{},"When estimating the storage costs for Kafka and Redpanda, we assume the use of HDD storage at $0.045\u002FGB, based on the premise that both systems can fully utilize disk bandwidth without incurring the higher costs associated with GP2 or GP3 volumes. However, in practice, many users opt for GP2 or GP3, significantly increasing the total storage cost for Kafka and Redpanda.",[324,22638,22639],{},"Unlike disk-based solutions, S3 storage does not require capacity preallocation—Ursa only incurs costs for the actual data stored. This contrasts with Kafka and Redpanda, where preallocating storage can drive up expenses. As a result, the real-world storage costs for Kafka and Redpanda are often 50% higher than the estimates above.",[40,22641,2125],{"id":2122},[48,22643,22644],{},"Ursa represents a transformative shift in streaming data infrastructure, offering cost efficiency, scalability, and flexibility without compromising durability or reliability. By leveraging a leaderless architecture and eliminating inter-zone data replication, Ursa reduces total cost of ownership by over 90% compared to traditional leader-based streaming engines like Kafka and Redpanda. Its direct integration with cloud storage and scalable metadata & index management via Oxia ensure high availability and simplified infrastructure management.",[32,22646,22648],{"id":22647},"balancing-latency-and-cost","Balancing Latency and Cost",[48,22650,22651,22654],{},[55,22652,22653],{"href":18969},"Ursa trades off slightly higher latency for ultra low cost",", making it an ideal choice for the majority of streaming workloads, especially those that prioritize throughput and cost savings over ultra-low latency. Meanwhile, StreamNative’s BookKeeper-based engine remains the preferred solution for real-time, latency-sensitive applications. By combining these two approaches, StreamNative empowers customers with the flexibility to choose the right engine for their specific needs—whether it's maximizing cost savings or achieving ultra low-latency real-time performance.",[32,22656,22658],{"id":22657},"the-future-of-streaming-infrastructure","The Future of Streaming Infrastructure",[48,22660,22661],{},"In an era where data fuels AI, analytics, and real-time decision-making, managing infrastructure costs is critical to sustaining innovation. Ursa is not just a cost-cutting alternative—it is a forward-thinking, lakehouse-native platform that redefines how modern data streaming infrastructure should be built and operated.",[48,22663,22664,22665,22669],{},"Whether your priority is reducing costs, improving flexibility, or ingesting massive data into lakehouses, Ursa delivers a future-proof solution for the evolving demands of real-time data streaming. ",[55,22666,22668],{"href":17075,"rel":22667},[264],"Get started"," with StreamNative Ursa today!",[8300,22671,22673],{"id":22672},"references","References",[48,22675,22676,758,22678],{},[2628,22677,5599],{},[55,22679,21529],{"href":21529},[48,22681,22682,758,22684],{},[2628,22683,1332],{},[55,22685,10389],{"href":10389},[48,22687,22688,758,22691],{},[2628,22689,22690],{},"StreamNative pricing",[55,22692,22693],{"href":22693,"rel":22694},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-overview",[264],[48,22696,22697,758,22700],{},[2628,22698,22699],{},"WarpStream pricing",[55,22701,22702],{"href":22702,"rel":22703},"https:\u002F\u002Fwww.warpstream.com\u002Fpricing#pricingfaqs",[264],[48,22705,22706,758,22709],{},[2628,22707,22708],{},"AWS S3 pricing",[55,22710,22711],{"href":22711,"rel":22712},"https:\u002F\u002Faws.amazon.com\u002Fs3\u002Fpricing\u002F",[264],[48,22714,22715,758,22718],{},[2628,22716,22717],{},"AWS EBS pricing",[55,22719,22720],{"href":22720,"rel":22721},"https:\u002F\u002Faws.amazon.com\u002Febs\u002Fpricing\u002F",[264],[48,22723,22724,758,22727],{},[2628,22725,22726],{},"AWS MSK pricing",[55,22728,22729],{"href":22729,"rel":22730},"https:\u002F\u002Faws.amazon.com\u002Fmsk\u002Fpricing\u002F",[264],[48,22732,22733,758,22736],{},[2628,22734,22735],{},"The Brutal Truth about Kafka Cost Calculators",[55,22737,22564],{"href":22564,"rel":22738},[264],[48,22740,22741,758,22744],{},[2628,22742,22743],{},"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group",[55,22745,22618],{"href":22618,"rel":22746},[264],{"title":18,"searchDepth":19,"depth":19,"links":22748},[22749,22750,22751,22756,22760,22761,22770,22773],{"id":22052,"depth":19,"text":22053},{"id":22088,"depth":19,"text":22089},{"id":22111,"depth":19,"text":22112,"children":22752},[22753,22754,22755],{"id":22123,"depth":279,"text":22124},{"id":22147,"depth":279,"text":22148},{"id":22168,"depth":279,"text":22169},{"id":22192,"depth":19,"text":22193,"children":22757},[22758,22759],{"id":22196,"depth":279,"text":22197},{"id":22211,"depth":279,"text":22212},{"id":22252,"depth":19,"text":22253},{"id":22264,"depth":19,"text":22265,"children":22762},[22763,22764,22765,22766,22767,22768,22769],{"id":22271,"depth":279,"text":22272},{"id":22317,"depth":279,"text":22318},{"id":22335,"depth":279,"text":22336},{"id":22382,"depth":279,"text":22383},{"id":22410,"depth":279,"text":22411},{"id":22428,"depth":279,"text":22429},{"id":22443,"depth":279,"text":22444},{"id":22488,"depth":19,"text":22489,"children":22771},[22772],{"id":22528,"depth":279,"text":22529},{"id":2122,"depth":19,"text":2125,"children":22774},[22775,22776],{"id":22647,"depth":279,"text":22648},{"id":22657,"depth":279,"text":22658},"2025-01-31","Discover how Ursa achieves 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda and AWS MSK. See our benchmark results comparing infrastructure costs, total cost of ownership (TCO), and performance across leading Kafka vendors.","\u002Fimgs\u002Fblogs\u002F679c6593d25099b1cdcec4ca_image-31.png",{},{"title":22031,"description":22778},"blog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour",[5954,799,303],"A0o_2xdJiLI6rf6xj4RKsxJNo_A6QN2fYzCp6gaLrFw",{"id":22786,"title":22787,"authors":22788,"body":22789,"category":3550,"createdAt":290,"date":22777,"description":22987,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":22988,"navigation":7,"order":296,"path":5209,"readingTime":22989,"relatedResources":290,"seo":22990,"stem":22991,"tags":22992,"__hash__":22993},"blogs\u002Fblog\u002Fjanuary-data-streaming-launch-organization-profile-ursa-engine-on-azure-enhancements-for-streamnative-cloud-and-more.md","January Data Streaming Launch: Organization Profile, Ursa Engine on Azure, Enhancements for StreamNative Cloud, and More",[806],{"type":15,"value":22790,"toc":22966},[22791,22794,22798,22801,22805,22816,22819,22827,22836,22840,22844,22847,22850,22854,22865,22869,22872,22880,22884,22887,22893,22896,22907,22911,22914,22918,22929,22932,22936,22939,22942,22953,22955,22963],[48,22792,22793],{},"At StreamNative, we are committed to redefining data streaming with cutting-edge solutions that provide flexibility, scalability, and ease of use. Our mission is to democratize data streaming by enabling organizations to build real-time applications with confidence and efficiency. In this month's data streaming launch, we are introducing significant updates to StreamNative Cloud, Ursa Engine, and Snowflake integration, along with enhancements to security and containerization.",[40,22795,22797],{"id":22796},"new-organization-profile-page","New Organization Profile Page",[48,22799,22800],{},"We’ve introduced a new Organization Profile Page in the StreamNative Console, making it easier for users to manage their organization’s details and receive important updates.",[32,22802,22804],{"id":22803},"what-you-can-update","What You Can Update:",[321,22806,22807,22810,22813],{},[324,22808,22809],{},"Organization Name – Keep your organization’s name up to date.",[324,22811,22812],{},"Billing Contact Email – Ensure you receive billing notifications without disruption.",[324,22814,22815],{},"Technical Contact Email – Stay informed about maintenance and technical updates.",[32,22817,22818],{"id":14959},"Why It Matters:",[321,22820,22821,22824],{},[324,22822,22823],{},"Improved communication – Ensures your organization receives critical notifications regarding billing, maintenance, and system updates.",[324,22825,22826],{},"Better account management – Provides a centralized place to manage essential organization details.",[48,22828,22829,22830,22835],{},"To update your ",[55,22831,22834],{"href":22832,"rel":22833},"https:\u002F\u002Fconsole.streamnative.cloud\u002Forganization-profile",[264],"Organization Profile",", visit the StreamNative Console today.",[40,22837,22839],{"id":22838},"enhanced-cluster-naming-flexibility","Enhanced Cluster Naming Flexibility",[32,22841,22843],{"id":22842},"improving-usability-with-display-name-support","Improving Usability with Display Name Support",[48,22845,22846],{},"Currently, the StreamNative Cloud console uses the user-provided cluster name as the object name for the cluster Custom Resource Definition (CRD), which imposes a strict 10-character limit. This restriction has posed challenges for users in creating meaningful and descriptive cluster names.",[48,22848,22849],{},"To improve usability, we are introducing a change that allows the user-provided cluster name to serve as the display name rather than the object name for the cluster CRD. This update removes the 10-character limit, giving users greater flexibility in naming their clusters.",[32,22851,22853],{"id":22852},"key-benefits","Key Benefits:",[321,22855,22856,22859,22862],{},[324,22857,22858],{},"Enhanced readability: Users can now create more descriptive cluster names, improving organization and management.",[324,22860,22861],{},"Better user experience: By decoupling the display name from the object name, users are no longer constrained by the character limit.",[324,22863,22864],{},"Seamless transition: Existing clusters will retain their functionality while new clusters benefit from this enhancement.",[40,22866,22868],{"id":22867},"ursa-engine-available-on-azure-for-private-preview","Ursa Engine Available on Azure for Private Preview",[48,22870,22871],{},"Ursa Engine, our next-generation data streaming engine designed to augment lakehouses with real-time capabilities, is now available for private preview on Microsoft Azure. This marks a significant milestone in expanding Ursa’s footprint across cloud providers, enabling Azure users to leverage its powerful features for high-throughput, low-latency data processing.",[48,22873,22874,22875,22879],{},"Organizations interested in participating in the ",[55,22876,22878],{"href":15569,"rel":22877},[264],"private preview"," can contact us to gain early access and provide feedback on optimizing Ursa Engine for Azure environments.",[40,22881,22883],{"id":22882},"snowpipe-streaming-support-in-snowflake-sink-connector","Snowpipe Streaming Support in Snowflake Sink Connector",[48,22885,22886],{},"We are excited to announce Snowpipe Streaming support in the StreamNative Snowflake Streaming Sink Connector. This enhancement enables more efficient and real-time data ingestion into Snowflake, reducing latency and improving cost efficiency.",[48,22888,22889,22890,190],{},"For more details on this integration, check out our",[55,22891,22892],{"href":20731}," blog post",[32,22894,22895],{"id":18605},"Key Advantages:",[321,22897,22898,22901,22904],{},[324,22899,22900],{},"Lower latency: Data streams into Snowflake in near real-time without the need for staging files.",[324,22902,22903],{},"Reduced costs: Eliminates the need for intermediate storage and reduces data pipeline complexity.",[324,22905,22906],{},"Simplified architecture: Direct streaming improves operational efficiency and reliability.",[40,22908,22910],{"id":22909},"kafka-schema-registry-rbac-support-available-for-private-preview","Kafka Schema Registry RBAC Support Available for Private Preview",[48,22912,22913],{},"Role-Based Access Control (RBAC) for the Kafka Schema Registry is now available for private preview. This feature enhances security by providing fine-grained access control over schema definitions, ensuring that only authorized users can manage and modify schema resources.",[32,22915,22917],{"id":22916},"benefits-of-rbac-support","Benefits of RBAC Support:",[321,22919,22920,22923,22926],{},[324,22921,22922],{},"Improved security: Protect schema definitions from unauthorized access.",[324,22924,22925],{},"Granular permissions: Assign user-specific access roles to enforce governance.",[324,22927,22928],{},"Enterprise-grade compliance: Aligns with best practices for managing data governance in streaming environments.",[48,22930,22931],{},"Organizations interested in previewing this feature can sign up to get early access and provide feedback.",[40,22933,22935],{"id":22934},"new-slim-image-using-bom-for-dependency-management","New Slim Image Using BOM for Dependency Management",[48,22937,22938],{},"We have introduced a new slim image that uses a Bill of Materials (BOM) to manage dependencies, reducing the overall image size to approximately 1GB. This improvement enhances deployment efficiency and security by minimizing the attack surface and optimizing resource usage.",[32,22940,22818],{"id":22941},"why-it-matters-1",[321,22943,22944,22947,22950],{},[324,22945,22946],{},"Smaller footprint: Reduces storage and transfer costs.",[324,22948,22949],{},"Faster deployment: Speeds up container startup times.",[324,22951,22952],{},"Better dependency management: Ensures consistency across different environments.",[40,22954,3880],{"id":3877},[48,22956,22957,22958,22962],{},"These enhancements are part of our continued effort to improve the StreamNative ecosystem and provide the best experience for our users. If you are interested in joining our ",[55,22959,22961],{"href":15569,"rel":22960},[264],"private preview programs"," for Ursa Engine on Azure or Kafka Schema Registry RBAC support, reach out to us today.",[48,22964,22965],{},"Stay tuned for more updates as we continue to innovate and push the boundaries of real-time data streaming!",{"title":18,"searchDepth":19,"depth":19,"links":22967},[22968,22972,22976,22977,22980,22983,22986],{"id":22796,"depth":19,"text":22797,"children":22969},[22970,22971],{"id":22803,"depth":279,"text":22804},{"id":14959,"depth":279,"text":22818},{"id":22838,"depth":19,"text":22839,"children":22973},[22974,22975],{"id":22842,"depth":279,"text":22843},{"id":22852,"depth":279,"text":22853},{"id":22867,"depth":19,"text":22868},{"id":22882,"depth":19,"text":22883,"children":22978},[22979],{"id":18605,"depth":279,"text":22895},{"id":22909,"depth":19,"text":22910,"children":22981},[22982],{"id":22916,"depth":279,"text":22917},{"id":22934,"depth":19,"text":22935,"children":22984},[22985],{"id":22941,"depth":279,"text":22818},{"id":3877,"depth":19,"text":3880},"Discover the latest updates from StreamNative, including Ursa Engine’s private preview on Azure, Snowpipe Streaming support for Snowflake, enhanced cluster naming, and Kafka Schema Registry RBAC. Learn more!",{},"10 min",{"title":22787,"description":22987},"blog\u002Fjanuary-data-streaming-launch-organization-profile-ursa-engine-on-azure-enhancements-for-streamnative-cloud-and-more",[302,3550,1332,18653,4301,303],"rEb5acO28-1xHeuTvKm7QchjzaaMyRtuDAfYrgb4_5E",{"id":22995,"title":22996,"authors":22997,"body":22999,"category":3550,"createdAt":290,"date":23087,"description":23088,"extension":8,"featured":294,"image":23089,"isDraft":294,"link":290,"meta":23090,"navigation":7,"order":296,"path":23091,"readingTime":23092,"relatedResources":290,"seo":23093,"stem":23094,"tags":23095,"__hash__":23096},"blogs\u002Fblog\u002Fconnecting-the-dots-real-time-data-streaming-for-a-smarter-data-lake.md","Connecting the Dots: Real-Time Data Streaming for a Smarter Data Lake",[22998],"Amy Krishnamohan",{"type":15,"value":23000,"toc":23082},[23001,23004,23007,23010,23014,23017,23020,23023,23034,23037,23041,23049,23060,23063,23067,23073,23080],[48,23002,23003],{},"In recent years, one of the hottest topics in the data world has been the emergence of lakehouse data formats. Names like Iceberg, Delta Lake, and Hudi have taken center stage, sparking a level of interest and debate reminiscent of the SQL vs. NoSQL discussions of the past. But why has the lakehouse format become such a game-changer?",[48,23005,23006],{},"The answer lies in the complexities and costs of data management. Managing data efficiently has always been challenging, and with organizations increasingly adopting open-source solutions, the allure of standardization and the availability of skilled talent make lakehouse formats a compelling choice.",[48,23008,23009],{},"So, let’s say you’ve chosen a lakehouse solution—what’s next? You now face a critical question: what kind of data will you fill your data lake with?",[40,23011,23013],{"id":23012},"the-challenge-of-real-time-data","The Challenge of Real-Time Data",[48,23015,23016],{},"Most organizations start by populating their data lakes with traditional sources: CRM data, web analytics, financial transactions, and other historical datasets. These are essential for conducting retrospective analyses and generating actionable insights.",[48,23018,23019],{},"But what about real-time data?",[48,23021,23022],{},"A typical suggestion might be, \"We already have Kafka for real-time data streaming!\" While Kafka is a powerful tool, it introduces several challenges:",[321,23024,23025,23028,23031],{},[324,23026,23027],{},"Kafka’s built-in storage layer requires data to be transformed into formats like Iceberg for integration with a lakehouse.",[324,23029,23030],{},"Transferring data from Kafka to your data lake involves significant network costs and added complexity.",[324,23032,23033],{},"The process is inefficient, often requiring expensive transformations and reformatting, which can hinder the real-time data pipeline.",[48,23035,23036],{},"Real-time data should be treated with the same care as historical data. It should be ready for consumption without the costly overhead of network transfers or reformatting.",[40,23038,23040],{"id":23039},"enter-ursa-a-smarter-solution","Enter Ursa: A Smarter Solution",[48,23042,23043,23044,23048],{},"This is where ",[55,23045,1332],{"href":23046,"rel":23047},"http:\u002F\u002Fstreamnative.io\u002Fursa",[264]," comes in. Ursa is a revolutionary engine designed for real-time data streaming, offering seamless integration with lakehouse architectures. Here’s why it stands out:",[321,23050,23051,23054,23057],{},[324,23052,23053],{},"Kafka API Compatibility: Ursa supports the Kafka API, allowing teams to continue leveraging their existing expertise while streamlining processes.",[324,23055,23056],{},"Object Storage Integration: By utilizing object storage solutions like S3, Ursa ensures cost-effective scalability and data durability.",[324,23058,23059],{},"Native Data Lake Integration: Ursa is natively integrated with data lake catalogs, eliminating the need for costly transformations or reformatting.",[48,23061,23062],{},"With Ursa, real-time data flows directly into your lakehouse, as effortlessly as historical data, reducing complexity and costs while enhancing usability.",[40,23064,23066],{"id":23065},"building-the-future-of-data","Building the Future of Data",[48,23068,23069,23070,23072],{},"The lakehouse paradigm has firmly established itself as a cornerstone of modern data management. The next step in its evolution is bridging the gap between real-time and historical data - enter ",[55,23071,18899],{"href":18898},". Tools like Ursa enable organizations to simplify their data pipelines, minimize overhead, and focus on unlocking insights that drive innovation.",[48,23074,23075,23076,23079],{},"The future of data is real-time, and with Ursa, your lakehouse can become a truly intelligent data ecosystem. ",[55,23077,22668],{"href":17075,"rel":23078},[264]," with StreamNative today with $200 free credit.",[48,23081,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":23083},[23084,23085,23086],{"id":23012,"depth":19,"text":23013},{"id":23039,"depth":19,"text":23040},{"id":23065,"depth":19,"text":23066},"2025-01-29","Discover how Ursa revolutionizes real-time data streaming for lakehouses. Seamlessly integrate with Iceberg, Delta Lake, and Hudi while reducing costs and complexity. Learn more about Streaming Augmented Lakehouse with Ursa today!","\u002Fimgs\u002Fblogs\u002F679a606106eb048b00177519_image-29.png",{},"\u002Fblog\u002Fconnecting-the-dots-real-time-data-streaming-for-a-smarter-data-lake","5 min",{"title":22996,"description":23088},"blog\u002Fconnecting-the-dots-real-time-data-streaming-for-a-smarter-data-lake",[799,800,1331,1332],"58hDxDIqRhStXNMunWwuvIN4GI1N2E7d1RHh2Mat92o",{"id":23098,"title":23099,"authors":23100,"body":23101,"category":3550,"createdAt":290,"date":23272,"description":23273,"extension":8,"featured":294,"image":23274,"isDraft":294,"link":290,"meta":23275,"navigation":7,"order":296,"path":20731,"readingTime":22989,"relatedResources":290,"seo":23276,"stem":23277,"tags":23278,"__hash__":23279},"blogs\u002Fblog\u002Fintroducing-snowpipe-streaming-support-in-streamnatives-snowflake-streaming-sink-connector.md","Introducing Snowpipe Streaming Support in StreamNative's Snowflake Streaming Sink Connector",[311],{"type":15,"value":23102,"toc":23265},[23103,23111,23114,23118,23131,23135,23138,23164,23167,23171,23174,23177,23184,23187,23192,23194,23197,23204,23207,23210,23215,23218,23221,23224,23228,23235,23240,23245,23250,23255,23260,23262],[48,23104,23105,23106,23110],{},"As the landscape of data streaming and analytics continues to advance, the need for real-time, efficient data ingestion into powerful platforms like Snowflake has never been greater. StreamNative is proud to introduce the ",[55,23107,20715],{"href":23108,"rel":23109},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-snowflake-streaming-v1.0",[264],", a fully managed connector available in StreamNative Cloud. This connector leverages Snowpipe Streaming ingestion, enabling enterprises to achieve sub-second data ingestion. With this cutting-edge solution, organizations can ensure seamless data flow to power real-time analytics and operational intelligence with unparalleled efficiency.",[48,23112,23113],{},"Furthermore, StreamNative is honored to be the first data ingestion partner to support Apache Iceberg with Snowflake’s Snowpipe Streaming capability. This achievement highlights our commitment to providing innovative integration capabilities, enabling Snowflake and StreamNative customers to unlock advanced data solutions.",[40,23115,23117],{"id":23116},"streamnatives-integration-with-snowflake","StreamNative’s Integration with Snowflake",[48,23119,23120,23121,23125,23126,23130],{},"StreamNative has always prioritized robust data integration for customers needing quick and reliable data flow between systems. Our Snowflake Sink Connectors offer fully managed, low-latency data ingestion into Snowflake, empowering businesses to ingest real-time data streams into Snowflake within seconds. The ",[55,23122,23124],{"href":20719,"rel":23123},[264],"Snowflake Sink connector"," supported only Snowpipe ingestion, which allowed for quick loading of data batches. However, with the ",[55,23127,23129],{"href":23108,"rel":23128},[264],"Snowflake Streaming Sink connector",", users can now ingest data with sub-second data ingestion and an ideal choice for real-time analytics use cases. This enhanced support aligns with StreamNative’s commitment to empowering businesses with flexible and resilient data streaming solutions.",[40,23132,23134],{"id":23133},"snowflakes-ingestion-methods-snowpipe-vs-snowpipe-streaming","Snowflake's Ingestion Methods: Snowpipe vs. Snowpipe Streaming",[48,23136,23137],{},"Snowflake provides two primary methods for data ingestion:",[1666,23139,23140,23143,23146,23149,23152,23155,23158,23161],{},[324,23141,23142],{},"Snowpipe: This method leverages event-driven ingestion and is ideal for loading small batches of data frequently. Using REST API calls, Snowpipe automates the process of loading data into Snowflake as it arrives, generally within seconds. This is well-suited for scenarios where near-real-time data is sufficient. Functionality: Facilitates continuous, automated loading of data files from cloud storage (e.g., Amazon S3, Google Cloud Storage) into Snowflake tables.",[324,23144,23145],{},"Process: Monitors specified cloud storage locations for new files. Upon detecting new data, it automatically triggers the loading process into the target tables.",[324,23147,23148],{},"Use Cases: Ideal for scenarios where data arrives in batches or micro-batches, such as periodic uploads of log files or transactional data.",[324,23150,23151],{},"Latency: Typically offers near real-time loading with latencies ranging from a few seconds to minutes, depending on file arrival and processing times.",[324,23153,23154],{},"Snowpipe Streaming: Snowflake’s newer streaming ingestion feature, Snowpipe Streaming, enables ultra-low-latency data ingestion, achieving sub-second intervals for data availability. It is designed for applications needing immediate access to data as soon as it arrives, enhancing Snowflake’s support for real-time analytics and operational dashboards. Functionality: Enables low-latency ingestion of streaming data directly into Snowflake tables without the need for intermediary cloud storage.",[324,23156,23157],{},"Process: Utilizes the Snowflake Ingest SDK to allow applications to write data rows directly into Snowflake over HTTPS. This method supports real-time data ingestion from sources like IoT devices, application logs, or Kafka topics.",[324,23159,23160],{},"Use Cases: Suited for applications requiring real-time analytics and immediate data availability, such as monitoring systems, real-time dashboards, or event-driven architectures.",[324,23162,23163],{},"Latency: Achieves lower latencies, often within seconds, due to the direct ingestion approach.",[48,23165,23166],{},"The key difference is latency; while Snowpipe is near real-time, Snowpipe Streaming is designed to handle real-time demands, ensuring that data is available in Snowflake almost instantaneously.",[40,23168,23170],{"id":23169},"support-for-snowflake-ingestion-methods-in-streamnatives-snowflake-sink-connectors","Support for Snowflake Ingestion Methods in StreamNative’s Snowflake Sink Connectors",[48,23172,23173],{},"With StreamNative’s Snowflake Sink Connector, users can now choose between Snowpipe and Snowpipe Streaming ingestion modes based on their specific latency requirements and use case demands. Here’s a closer look at both options:",[48,23175,23176],{},"Efficient SNOWPIPE Integration With Snowflake Sink Connector",[48,23178,23179,23180,190],{},"To facilitate data ingestion from StreamNative to Snowflake using the SNOWPIPE method, users can leverage ",[55,23181,23183],{"href":20719,"rel":23182},[264],"StreamNative's Snowflake Sink Connector",[48,23185,23186],{},"This connector allows users to configure Snowflake’s traditional Snowpipe ingestion method, ideal for users looking to frequently load batches of data with a few seconds of latency. This is a fully managed connector available in StreamNative Cloud and it allows enterprises to stream data from Apache Pulsar topics stored in StreamNative Cloud to Snowflake AI Platform.",[48,23188,23189],{},[384,23190],{"alt":18,"src":23191},"\u002Fimgs\u002Fblogs\u002F679919440f374cc4137365d0_AD_4nXc5ybz5lTebAVUECfqy1CZZJdqATGZm7oLrL_I08N7yb2k6ZObsFl-ZTRnUqFCL-El-CNPbio_cZxFWGM2BmcfVIz5VOrHg-F-7IVGB9z5KZ644uzmYfk6D57ItWlYLNz7x58XZEu1nlCYpfsZ6wpBXcFo.png",[48,23193,3931],{},[48,23195,23196],{},"Efficient SNOWPIPE STREAMING Integration With Snowflake Streaming Sink Connector",[48,23198,23199,23200,190],{},"To facilitate data ingestion from StreamNative to Snowflake using the SNOWPIPE STREAMING method, users can leverage ",[55,23201,23203],{"href":20719,"rel":23202},[264],"StreamNative's Snowflake Streaming Sink Connector",[48,23205,23206],{},"This connector allows users to configure Snowflake’s Snowpipe Streaming ingestion method, ideal for ultra-low-latency data ingestion, achieving sub-second intervals for data availability.",[48,23208,23209],{},"For real-time use cases, the Snowpipe Streaming ingestion method provides sub-second ingestion into Snowflake.",[48,23211,23212],{},[384,23213],{"alt":18,"src":23214},"\u002Fimgs\u002Fblogs\u002F679926426b5261363e1dda27_AD_4nXdI_OGmU7U4rExq5_fYsp5PlF7wLkVhCGD9Bv_uGS9t0bvxbtLBO-zn1Boogp8FrH7MXHKhHGcxsn4evaukwlGG0X-KeECdDdMUyqYv9Gn5BuMjIkzLcAzDptckh3x5k5QsMyRwzw.png",[48,23216,23217],{},"Support for Apache Iceberg Format",[48,23219,23220],{},"The Snowflake Streaming Sink connector now supports the Apache Iceberg format, an open table format that simplifies data ingestion, transformation, and analytics. The Apache Iceberg format is essential for organizations that require schema evolution, version control, and partitioning—features that enhance data processing efficiency in real-time environments.",[48,23222,23223],{},"By default the Apache Iceberg format is disabled in the connector. Users can enable it by setting the icebergEnabled config to true.",[40,23225,23227],{"id":23226},"walkthrough-of-the-snowflake-sink-connector-in-streamnative-cloud","Walkthrough of the Snowflake Sink Connector in StreamNative Cloud",[48,23229,23230,23234],{},[55,23231,23233],{"href":20719,"rel":23232},[264],"Setting up the Snowflake Sink Connector in StreamNative Cloud"," is straightforward. Here’s a quick overview of how to get started with both Snowpipe and Snowpipe Streaming modes and enable Apache Iceberg format support:",[1666,23236,23237],{},[324,23238,23239],{},"Configuring the Connector: From the StreamNative Cloud Console, users can access the Snowflake Streaming Sink Connector setup and configure all the required details like url, user, database, schema, role, warehouse etc.",[48,23241,23242],{},[384,23243],{"alt":18,"src":23244},"\u002Fimgs\u002Fblogs\u002F67991944817ef4575ede01b2_AD_4nXdM0eHZ_ly-3o40s33AlcfJuK-y1c6Wq1UV7-J7m2WQDRhngY6BMYXfBL81esS5hOeCxiPFtQPdlQ1WFJdGXkIglMGd1H19za7_d9bSLF0Ioj2MIFO7Efw-zqaISWmBUaS84ZAR.png",[1666,23246,23247],{},[324,23248,23249],{},"Enabling Apache Iceberg Format Support: Apache Iceberg support can be enabled within the connector configuration. This capability allows users to write data in the Iceberg format, providing advanced schema management and partitioning benefits.",[48,23251,23252],{},[384,23253],{"alt":18,"src":23254},"\u002Fimgs\u002Fblogs\u002F67991944b32a1f95396d99d8_AD_4nXdccCLROJObO0JcyFOLIRqfmriUVakSi4gBaKtGLNubEGUaAbgwpSciJPhQNUWxz1CPWcbNYMaI9SuZ6RLSpkaJsYPY5vQATS-5LEII6xafhoTcMplmRRy9MAZal9l1d71D78V_.png",[1666,23256,23257],{},[324,23258,23259],{},"Monitoring and Managing the Connector: StreamNative Cloud provides a monitoring dashboard where users can track data ingestion progress, latency, and throughput. This visibility ensures users can manage their data flow and optimize performance effectively.",[40,23261,2125],{"id":2122},[48,23263,23264],{},"With this update, StreamNative’s Snowflake Sink Connectors become an even more powerful tool for enterprises looking to unlock real-time data analytics. Depending on the use case, enterprises can select the most suitable connector to leverage either Snowpipe or Snowpipe Streaming ingestion methods. Additionally, with support for the Apache Iceberg format, StreamNative provides a comprehensive and flexible data streaming solution designed to meet the diverse needs of modern data-driven organizations. Whether for real-time operational intelligence or large-scale data warehousing, the Snowflake Sink Connectors in StreamNative Cloud are purpose-built to support the next generation of data analytics.",{"title":18,"searchDepth":19,"depth":19,"links":23266},[23267,23268,23269,23270,23271],{"id":23116,"depth":19,"text":23117},{"id":23133,"depth":19,"text":23134},{"id":23169,"depth":19,"text":23170},{"id":23226,"depth":19,"text":23227},{"id":2122,"depth":19,"text":2125},"2025-01-28","Discover StreamNative's Snowflake Streaming Sink Connector for sub-second real-time data ingestion with Snowpipe Streaming. Supports Apache Iceberg for advanced schema management and analytics. Optimize your data flow with StreamNative Cloud.","\u002Fimgs\u002Fblogs\u002F67996ca2af3f7d5b20e09127_image-30.png",{},{"title":23099,"description":23273},"blog\u002Fintroducing-snowpipe-streaming-support-in-streamnatives-snowflake-streaming-sink-connector",[800,1332,18653],"n35kVb--sw8-fdCv6URTC3RQYLvHQHAvJ1ymYJ88nlU",{"id":23281,"title":23282,"authors":23283,"body":23284,"category":3550,"createdAt":290,"date":23478,"description":23479,"extension":8,"featured":294,"image":23480,"isDraft":294,"link":290,"meta":23481,"navigation":7,"order":296,"path":23482,"readingTime":290,"relatedResources":290,"seo":23483,"stem":23484,"tags":23485,"__hash__":23486},"blogs\u002Fblog\u002Ffrom-pulsar-to-ursa-reflecting-on-3-years-of-streamnative-cloud-evolution.md","From Pulsar to Ursa: Reflecting on 3 Years of StreamNative Cloud Evolution",[806],{"type":15,"value":23285,"toc":23465},[23286,23289,23296,23300,23303,23306,23309,23312,23316,23319,23323,23326,23337,23340,23344,23347,23361,23364,23368,23371,23374,23378,23381,23398,23401,23405,23408,23412,23415,23450,23454,23457,23460,23463],[48,23287,23288],{},"When we introduced StreamNative Cloud four years ago, it began as a “Managed Cloud” service—essentially what we now refer to as Bring Your Own Cloud (BYOC). At the time, we pioneered a novel approach: a control plane that deploys a portable data plane in each customer’s cloud environment. Although we didn’t call it BYOC initially, that’s precisely what it was—a new paradigm for flexible, portable cloud deployment. Today, this original concept has evolved further into what we’re calling BYOC² (Bring Your Own Cloud and Compute), reflecting our continued commitment to delivering a seamless, portable data plane.",[48,23290,23291,23292,23295],{},"Our journey hasn’t stopped there. Over the years, StreamNative Cloud has matured into a multi-protocol data streaming cloud service, now powered by our Ursa Engine. We’ve moved from a single-protocol offering (centered on Apache Pulsar) to a multi-protocol service that also supports Kafka and MQTT workloads as well. The innovations we built in the Ursa Engine and StreamNative Cloud are designed to address the challenges of modern cloud infrastructure – challenges we refer to as the ",[55,23293,23294],{"href":18969},"new CAP theorem",": balancing Cost, Availability, and Performance across multi-cloud and hybrid environments.",[40,23297,23299],{"id":23298},"ursa-engine-leaderless-architecture-and-lakehouse-first-storage","Ursa Engine: Leaderless Architecture and Lakehouse-First Storage",[48,23301,23302],{},"At the heart of our transformation is the Ursa Engine, which combines a leaderless architecture with a lakehouse-centric storage model to optimize performance, scalability, and cost.",[32,23304,23305],{"id":22123},"Leaderless Architecture",[48,23307,23308],{},"A leaderless architecture removes single-topic leadership and leverages object storage as shared storage for high-throughput, latency-relaxed use cases. As data streaming architectures evolve from leader-based to leaderless models, producers and consumers no longer need to traverse Availability Zones to write or read data from remote brokers. Simultaneously, shared storage eliminates much of the replication overhead.",[48,23310,23311],{},"This shift simplifies cluster management and operations and reduces overall networking costs, thereby lowering total infrastructure expenses. It also enables cost-effective real-time data processing without the networking overhead often seen in leader-based architectures such as vanilla Pulsar, Kafka, and RedPanda.",[32,23313,23315],{"id":23314},"lakehouse-first-storage","Lakehouse-First Storage",[48,23317,23318],{},"Ursa’s storage layer directly employs lakehouse formats (e.g., Delta Lake, Iceberg) to persist data in object storage, eliminating additional ETL steps when moving from streaming data to table data. By aligning real-time streaming with lakehouse storage, organizations can significantly reduce data movement overhead and often achieve substantial cost savings. This architecture also simplifies downstream analytics because data remains in lakehouse-compatible formats from the outset.",[32,23320,23322],{"id":23321},"one-system-for-stream-table-duality","One System for Stream-Table Duality",[48,23324,23325],{},"In most data ecosystems, engineering teams juggle multiple systems for ingestion, processing, and storage. Ursa’s design aims to reduce this complexity by using lakehouse storage as the central engine. This tight integration means:",[321,23327,23328,23331,23334],{},[324,23329,23330],{},"Fewer ETL pipelines to maintain",[324,23332,23333],{},"Less data duplication",[324,23335,23336],{},"Lower latency between ingestion and query",[48,23338,23339],{},"Ultimately, this connected, end-to-end approach better serves AI and machine learning workloads that require continuous, real-time data flows.",[40,23341,23343],{"id":23342},"flexible-deployment-options-serverless-byoc-and-beyond","Flexible Deployment Options: Serverless, BYOC, and Beyond",[48,23345,23346],{},"Different use cases demand different deployment strategies. StreamNative Cloud now provides four main options, each refined by our portable data plane:",[321,23348,23349,23352,23355,23358],{},[324,23350,23351],{},"ServerlessSpin up and scale real-time applications without manual infrastructure management.",[324,23353,23354],{},"DedicatedReserve dedicated resources in the public cloud for predictable, high-performance workloads.",[324,23356,23357],{},"BYOC (Bring Your Own Cloud)Deploy our portable data plane directly in your own cloud account, with StreamNative providing expert management of your Pulsar and Kafka-compatible clusters.",[324,23359,23360],{},"Private CloudInstall StreamNative Cloud on-premises or in a private environment to maintain complete control and meet stringent security mandates.",[48,23362,23363],{},"This evolution—from a standalone “Managed Pulsar” service to a fully portable, multi-protocol streaming solution—reflects our commitment to offering engineering teams maximum flexibility, while retaining visibility through robust tooling, observability, and enterprise-grade support.",[40,23365,23367],{"id":23366},"streamnative-cloud-on-all-major-clouds","StreamNative Cloud on All Major Clouds",[48,23369,23370],{},"Beyond these flexible deployment models, StreamNative Cloud is also available across the three leading public cloud providers—AWS, GCP, and Azure. Even better, you can procure StreamNative Cloud directly through their respective marketplaces, simplifying your purchasing process and making it easy to start with just one click.",[48,23372,23373],{},"Whether you need data streaming for short-term projects or enterprise-grade, long-term deployments, you can quickly spin up StreamNative Cloud wherever your data resides. This seamless availability helps businesses accelerate time to value by combining real-time data ingestion with a streamlined procurement process.",[40,23375,23377],{"id":23376},"security-compliance","Security & Compliance",[48,23379,23380],{},"Security and compliance are foundational to everything we build at StreamNative. Our platform has evolved to provide a suite of enterprise-grade features that ensure your data remains protected and compliant across a range of regulatory requirements and industry best practices:",[321,23382,23383,23386,23389,23392,23395],{},[324,23384,23385],{},"Single Sign-On (SSO) & Role-Based Access Control (RBAC)StreamNative Cloud integrates with your identity provider for seamless SSO and offers granular RBAC to control user permissions. This guarantees that only authorized personnel can access your data and services.",[324,23387,23388],{},"Encryption at Rest and in TransitWe support encryption at rest with bring your own key (BYOK) integrations and secure transmission via TLS. This ensures that data remains encrypted on disk and during network communication, safeguarding against unauthorized access.",[324,23390,23391],{},"End-to-End EncryptionFor additional privacy, you can enable end-to-end encryption, meaning only your applications hold the keys needed to decrypt data. This ensures that even StreamNative Cloud cannot access the contents of your messages.",[324,23393,23394],{},"Data Sovereignty with BYOCBy allowing you to deploy the data plane in your own cloud environment, StreamNative Cloud ensures your data never leaves your chosen infrastructure. This is crucial for meeting data sovereignty requirements and maintaining complete control over where your data resides.",[324,23396,23397],{},"Global Compliance StandardsStreamNative Cloud meets a wide array of compliance requirements, including SOC 2, ISO 20007, PCI, HIPAA, and GDPR. With these certifications, you can trust that our platform aligns with the highest standards in security and data protection.",[48,23399,23400],{},"Whether you’re operating in a highly regulated industry or simply value robust security controls, StreamNative Cloud delivers the confidence and compliance you need to accelerate innovation—without compromising on data protection.",[40,23402,23404],{"id":23403},"strategic-partnerships","Strategic Partnerships",[48,23406,23407],{},"StreamNative partners with industry leaders such as Ververica, Snowflake, and Google, among others, to enhance its data streaming and processing capabilities while providing robust support for open lakehouse formats like Delta Lake and Apache Iceberg. Additionally, StreamNative partners with esteemed System Integrators, including CalSoft, Nexaminds, and Nuaav, to deliver comprehensive support for customers across their entire data platform journey.",[40,23409,23411],{"id":23410},"success-stories-powering-mission-critical-apps","Success Stories: Powering Mission-Critical Apps",[48,23413,23414],{},"Our customers span a variety of industries, but they all share one thing in common: the need for a platform that handles mission-critical and transactional workloads in real time. A few highlights from our success stories and case studies",[321,23416,23417,23423,23429,23436,23443],{},[324,23418,23419,23422],{},[55,23420,92],{"href":23421},"\u002Fsuccess-stories\u002Fcisco",": Manages lifecycle operations for IoT devices on a massive scale (35,000 enterprises, 245 million+ connected devices), relying on StreamNative Cloud for high-volume, ultra-reliable telemetry.",[324,23424,23425,23428],{},[55,23426,96],{"href":23427},"\u002Fsuccess-stories\u002Fhow-apache-pulsar-helping-iterable-scale-its-customer-engagement-platform",": Processes billions of daily events to facilitate hyper-personalized real-time marketing and customer engagement.",[324,23430,23431,23435],{},[55,23432,100],{"href":23433,"rel":23434},"https:\u002F\u002Fyoutu.be\u002FcATfl6ih-6o?si=eYHkUBot5-kWlpzo",[264],": Builds scalable, secure data pipelines for fraud detection and real-time observability, supported by StreamNative’s robust architecture.",[324,23437,23438,23442],{},[55,23439,23441],{"href":23440},"\u002Fsuccess-stories\u002Finnerspace","Innerspace",": Uses Pulsar to analyze sensor data, improving workplace safety and enabling data-driven optimization.",[324,23444,23445,23449],{},[55,23446,23448],{"href":23447},"\u002Fsuccess-stories\u002Fteg","TEG",": Modernized its data infrastructure to better match shippers and carriers, reducing latency and vastly improving the end-customer experience.",[40,23451,23453],{"id":23452},"streamnative-cloud-your-data-streaming-partner-in-the-age-of-ai","StreamNative Cloud: Your Data Streaming Partner in the age of AI",[48,23455,23456],{},"We’ve delivered significant innovations over the past few years, but where are StreamNative and data streaming headed in the AI era? The key lies in building end-to-end AI pipelines that integrate ingestion, analytics, and machine learning workflows under one platform. Through our Ursa Engine, StreamNative Cloud will continue to drive the convergence of data streaming and lakehouse systems. This includes native compatibility with leading lakehouse formats (like Iceberg and Delta Lake) and automatic, Kafka-compatible ingestion pipelines, significantly reducing redundant ETL tasks and data movement. As a result, engineering teams can focus on insights instead of infrastructure.",[48,23458,23459],{},"Additionally, StreamNative Cloud is designed to serve both real-time and AI-driven workloads seamlessly, ensuring low-latency data delivery for AI applications that demand quick, accurate decisions. Ultimately, StreamNative Cloud is more than just a Pulsar or Kafka platform—it’s the critical link connecting real-time data ingestion with advanced analytics and machine learning. By uniting Apache Pulsar’s robust messaging model with Ursa’s Kafka compatibility, lakehouse-first architecture, and flexible deployment through portable data planes, we empower businesses to simplify data streaming, accelerate real-time analytics, and power next-generation AI solutions.",[48,23461,23462],{},"Let’s build the future—together.",[48,23464,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":23466},[23467,23472,23473,23474,23475,23476,23477],{"id":23298,"depth":19,"text":23299,"children":23468},[23469,23470,23471],{"id":22123,"depth":279,"text":23305},{"id":23314,"depth":279,"text":23315},{"id":23321,"depth":279,"text":23322},{"id":23342,"depth":19,"text":23343},{"id":23366,"depth":19,"text":23367},{"id":23376,"depth":19,"text":23377},{"id":23403,"depth":19,"text":23404},{"id":23410,"depth":19,"text":23411},{"id":23452,"depth":19,"text":23453},"2025-01-22","Discover how StreamNative Cloud, powered by the Ursa Engine, combines leaderless architecture, lakehouse storage, and flexible deployment to simplify data streaming and empower AI-driven workloads.","\u002Fimgs\u002Fblogs\u002F67913ea8403541b3d2b00432_image-27.png",{},"\u002Fblog\u002Ffrom-pulsar-to-ursa-reflecting-on-3-years-of-streamnative-cloud-evolution",{"title":23282,"description":23479},"blog\u002Ffrom-pulsar-to-ursa-reflecting-on-3-years-of-streamnative-cloud-evolution",[799,10322,302,1331,1332],"Dy7AtbTnHL1qJqb-79hOC0oujb16O18TA887xpisEBQ",{"id":23488,"title":23489,"authors":23490,"body":23492,"category":3550,"createdAt":290,"date":23863,"description":23864,"extension":8,"featured":294,"image":23865,"isDraft":294,"link":290,"meta":23866,"navigation":7,"order":296,"path":23867,"readingTime":18649,"relatedResources":290,"seo":23868,"stem":23869,"tags":23870,"__hash__":23871},"blogs\u002Fblog\u002Fintegrating-streamnatives-ursa-engine-with-puppygraph-for-real-time-graph-analysis.md","Integrating StreamNative's Ursa Engine with PuppyGraph for Real-Time Graph Analysis",[23491,810],"Danfung Xu",{"type":15,"value":23493,"toc":23850},[23494,23505,23508,23514,23517,23521,23534,23536,23539,23561,23566,23570,23573,23576,23579,23582,23586,23595,23598,23603,23607,23610,23624,23627,23633,23640,23647,23656,23660,23669,23672,23686,23689,23715,23718,23723,23725,23730,23734,23749,23752,23754,23757,23759,23762,23764,23769,23771,23774,23778,23786,23791,23793,23796,23801,23803,23807,23816,23821,23823,23826,23831,23833,23836,23848],[48,23495,23496,23497,23499,23500,23504],{},"At the Pulsar Virtual Summit EMEA 2024, StreamNative unveiled the ",[55,23498,4725],{"href":6647},"—a transformative advancement in data streaming architecture. The launch received overwhelmingly positive feedback from customers and prospects alike. Recently, StreamNative announced the public preview of the ",[55,23501,23503],{"href":23502},"\u002Fblog\u002Fannouncing-the-ursa-engine-public-preview-for-streamnative-byoc-clusters","Ursa Engine for its StreamNative AWS BYOC clusters",", marking a significant milestone in its development.",[48,23506,23507],{},"With the public preview, users gain access to core Ursa Engine features, including Oxia-based metadata management and S3-backed Write-Ahead Logs (WAL). These features provide enhanced flexibility, scalability, and cost-efficiency, making it easier than ever to manage and analyze streaming data.",[48,23509,23510,23513],{},[55,23511,674],{"href":682,"rel":23512},[264]," is proud to be the first graph compute engine to integrate with StreamNative's Ursa Engine. This partnership marks a shift toward democratizing access to streaming data and graph analytics—delivering cost-effective solutions without requiring a dedicated graph database. Combined with Ursa Engine, PuppyGraph enables users to query streaming data using graph query languages like Gremlin and openCypher, along with built-in visualization tools, providing a seamless experience for graph-based analytics.",[48,23515,23516],{},"In this blog, we will introduce StreamNative's Ursa Engine and explore how combining its capabilities with PuppyGraph’s zero-ETL graph query engine unlocks the potential of real-time graph analytics. First, let’s dive deeper into the features and benefits of StreamNative's Ursa Engine.",[40,23518,23520],{"id":23519},"what-is-streamnatives-ursa-engine","What is StreamNative's Ursa Engine",[48,23522,23523,23524,23529,23530,23533],{},"StreamNative's Ursa Engine represents a next-generation data streaming engine that builds upon and extends ",[55,23525,23528],{"href":23526,"rel":23527},"https:\u002F\u002Fpulsar.apache.org\u002F",[264],"Apache Pulsar's"," capabilities. At its foundation, Ursa Engine leverages two key components: ",[55,23531,5599],{"href":22142,"rel":23532},[264]," for metadata storage and Object Storage (S3, GCS, Azure Blob Storage) for data persistence. This architectural decision makes the traditional BookKeeper storage optional, reserving it exclusively for scenarios demanding ultra-low latency.",[32,23535,16295],{"id":16294},[48,23537,23538],{},"The engine stands on several essential pillars that make it particularly powerful:",[321,23540,23541,23544,23547,23550,23558],{},[324,23542,23543],{},"Complete Kafka API Compatibility: Organizations can seamlessly migrate their existing Kafka-based applications to Pulsar, as Ursa Engine provides full compatibility with the Kafka API.",[324,23545,23546],{},"Lakehouse Storage Integration: The engine incorporates lakehouse storage principles, ensuring long-term durability and adherence to open standards.",[324,23548,23549],{},"Advanced Metadata Management: By utilizing Oxia, Ursa Engine achieves highly scalable and durable metadata storage.",[324,23551,23552,23553,23557],{},"Flexible Storage Options: Users can choose between two storage tiers based on their specific needs:some textLatency-Optimized Storage: Powered by ",[55,23554,862],{"href":23555,"rel":23556},"https:\u002F\u002Fbookkeeper.apache.org\u002F",[264],", this option caters to high-throughput, latency-sensitive workloads requiring immediate message delivery.",[324,23559,23560],{},"Cost-Optimized Storage: Built on Object Storage services, this tier offers a more economical solution for workloads that can accommodate sub-second latencies.",[48,23562,23563],{},[384,23564],{"alt":18,"src":23565},"\u002Fimgs\u002Fblogs\u002F678007c950accd48e7a80b3d_AD_4nXe2Mjf6m0yQuCtY8ihKLstiGntsFIU1C_zaBh4LJgcYOf8hnckUGoxz1ShipaDilWEt9scubMtlwkUS_-oylC_u3yV2wOlP5oCFiCooNSlCoN2Agvf_-fBtKaqHg0_9oqjh8gTB0Q.png",[32,23567,23569],{"id":23568},"lakehouse-storage","Lakehouse Storage",[48,23571,23572],{},"Ursa streamlines the integration of streaming data into lakehouse environments. It allows users to store their Pulsar and Kafka topics, along with associated schemas, directly into lakehouse tables. Ursa's objective is to simplify feeding streaming data into lakehouses, making data instantly available for analytics and other use cases.",[48,23574,23575],{},"Building on Apache Pulsar’s pioneering tiered storage model, which offloads sealed log segments to commodity object stores like S3, GCS, and Azure Blob Store, Lakehouse Storage takes a leap forward by enabling data to be stored directly in lakehouse-ready formats. Traditionally, Pulsar relies on Apache BookKeeper to persist data in a write-ahead log, which consolidates entries across topics using a distributed index for fast lookups. Afterward, the data is compacted into Pulsar’s proprietary format for efficient scans and storage. Lakehouse Storage enhances this process by compacting data directly into open standard formats such as Apache Iceberg, and Delta Lake. This shift eliminates the need for complex integrations between streaming platforms and lakehouses, making data immediately accessible for lakehouse analytics.",[48,23577,23578],{},"The Ursa engine leverages the schema registry during compaction, automating schema mapping, evolution, and type conversion. This ensures schema enforcement as part of the data stream contract, catching incompatible data early and maintaining high data quality. Additionally, Ursa continuously compacts small Parquet files into larger ones, optimizing read performance for analytics and query workloads.",[48,23580,23581],{},"By seamlessly integrating with lakehouse solutions like Databricks and OneHouse, Lakehouse Storage further simplifies data workflows and enhances performance. This innovative approach removes the barriers between streaming and lakehouse ecosystems, offering a unified, high-performance solution for real-time and batch data analytics. With Lakehouse Storage, Ursa sets a new standard for efficient, schema-aware, and lakehouse-compatible data streaming.",[40,23583,23585],{"id":23584},"ursa-engine-puppygraph-architecture","Ursa Engine + PuppyGraph Architecture",[48,23587,23588,23589,23594],{},"Ursa Engine specializes in Lakehouse Storage capabilities, efficiently compacting data into open standard formats like Apache Iceberg, and Delta Lake. This efficient data storage foundation is enhanced by PuppyGraph, a graph query engine that works directly with existing data, eliminating the need for time-consuming ETL processes to a separate graph database. PuppyGraph ",[55,23590,23593],{"href":23591,"rel":23592},"https:\u002F\u002Fdocs.puppygraph.com\u002Fconnecting\u002F",[264],"connects to various data sources"," to build comprehensive graph models, leveraging these same standard formats. Combining these technologies results in a high-performance, cost-optimized solution for real-time graph analysis.",[48,23596,23597],{},"The solution delivers exceptional performance in graph querying through scalable and performant zero-ETL. PuppyGraph achieves this by leveraging the column-based data file format coupled with massively parallel processing and vectorized evaluation technology built into its engine. This distributed compute engine design ensures fast query execution even without efficient indexing and caching, delivering a performant and efficient graph querying and analytics experience without the hassles of the traditional graph infrastructure.",[48,23599,23600],{},[384,23601],{"alt":18,"src":23602},"\u002Fimgs\u002Fblogs\u002F678007c94b5d9832fd994807_AD_4nXdvdLkEuZawiesatktlF4DhoAIoCLZwn8JovBnt3EjzkmaGqsqKpK1DxN9fnH1EnCmovJCASx8JvMUCk3ZqO66WRX9RVFklNqwYi4vnQICkn74hNNdJaTNZyTh-6KfCOVp-veFfsg.png",[8300,23604,23606],{"id":23605},"integrate-ursa-engine-with-puppygraph","Integrate Ursa Engine with PuppyGraph",[48,23608,23609],{},"Integrating Ursa Engine with PuppyGraph is a straightforward process that involves four key steps:",[1666,23611,23612,23615,23618,23621],{},[324,23613,23614],{},"Deploy Ursa Engine: Set up and deploy the Ursa Engine.",[324,23616,23617],{},"Deploy PuppyGraph: Set up and deploy PuppyGraph.",[324,23619,23620],{},"Connect to Compacted Data: Establish a connection between PuppyGraph and the compacted data in Lakehouse storage generated by Ursa Engine.",[324,23622,23623],{},"Query Your Data as a Graph: Query data with Gremlin and openCypher in PuppyGraph. Visualize results with the graph visualization tool.",[48,23625,23626],{},"Both Lakehouse storage and PuppyGraph integrate seamlessly with lakehouse solutions like Databricks. For more information, see the following document and blogs:",[48,23628,23629],{},[55,23630,23632],{"href":23631},"\u002Fblog\u002Funlocking-lakehouse-storage-potential-seamless-data-ingestion-from-streamnative-to-databricks","StreamNative+Databricks",[48,23634,23635],{},[55,23636,23639],{"href":23637,"rel":23638},"https:\u002F\u002Fwww.unitycatalog.io\u002Fblogs\u002Fintegrating-unity-catalog-with-puppygraph-for-real-time-graph-analysis",[264],"PuppyGraph+UnityCatalog",[48,23641,23642],{},[55,23643,23646],{"href":23644,"rel":23645},"https:\u002F\u002Fdocs.puppygraph.com\u002Fconnecting\u002Fconnecting-to-delta-lake\u002F",[264],"PuppyGraph Connecting Document(delta lake)",[48,23648,23649,23650,23655],{},"We also have a detailed demo to help you ",[55,23651,23654],{"href":23652,"rel":23653},"https:\u002F\u002Fgithub.com\u002Fpuppygraph\u002Fpuppygraph-getting-started\u002Ftree\u002Fmain\u002Fintegration-demos\u002Fstreamnative-demo",[264],"get started",". Try it out and contact us if you have any questions!",[32,23657,23659],{"id":23658},"deploy-ursa-engine","Deploy Ursa Engine",[48,23661,23662,23663,23668],{},"For a quick start guide to the Ursa-Engine BYOC Cluster, refer to the ",[55,23664,23667],{"href":23665,"rel":23666},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fquickstart-ursa",[264],"quickstart document of Ursa Engine",". The prerequisites in that document involve Java and Maven, but you can choose from many available Kafka clients.",[48,23670,23671],{},"To deploy your Ursa-Engine BYOC cluster, you need the following prerequisites:",[321,23673,23674,23680,23683],{},[324,23675,23676,23677],{},"Access to ",[55,23678,3550],{"href":17075,"rel":23679},[264],[324,23681,23682],{},"Internet connectivity",[324,23684,23685],{},"Access to your AWS account for provisioning the BYOC infrastructure",[48,23687,23688],{},"Deployment involves the following steps:",[1666,23690,23691,23694,23697,23700,23703,23706,23709,23712],{},[324,23692,23693],{},"Sign up for StreamNative Cloud.",[324,23695,23696],{},"Grant StreamNative vendor access. Before deploying a StreamNative cluster within your cloud account, you must grant StreamNative vendor access.",[324,23698,23699],{},"Create a cloud connection. This establishes a connection to your AWS account.",[324,23701,23702],{},"Create a cloud environment. After creating a cloud connection, you can create a cloud environment and provision a BYOC instance.",[324,23704,23705],{},"Create a StreamNative instance and cluster. Your cluster will be deployed in AWS.",[324,23707,23708],{},"Create a service account. To interact with your cluster (by producing and consuming messages), you need to set up authentication and authorization. A service account serves as an identity for this purpose. It provides the necessary credentials for your applications to securely connect and operate on the Pulsar cluster.",[324,23710,23711],{},"Create a tenant and namespace, and authorize the service account. After creating the service account and obtaining the API key, authorize it to grant the necessary permissions to interact with your StreamNative Cloud cluster.",[324,23713,23714],{},"Grant permission to access the Kafka Schema Registry. You need to grant the service account access to the Kafka Schema Registry.",[48,23716,23717],{},"Now you can produce data to your topics via Kafka clients. Depending on the compaction configuration, after some time, you will see the compaction in your S3 storage according to your cloud environment. Compaction uses the Delta Lake format by default. You can read and manipulate your compaction data with Databricks. In the Databricks console, you can add a catalog for your data and then use the SQL Editor to create tables from the compacted data.",[48,23719,23720],{},[384,23721],{"alt":18,"src":23722},"\u002Fimgs\u002Fblogs\u002F678007c90ec215e2c7241bd8_AD_4nXfso6aTQOZzwz-m-kIemED07juZw0lDJuOho0nmehmzx3kabwkLccD6PwTHaT-POvnzXY1ngolOyn-WGpx7SD7KVMqNBenhmUpdJt3dK4j36HRGQ8bElCl5RAu1XRmLmo2DyESyeQ.png",[48,23724,3931],{},[48,23726,23727],{},[384,23728],{"alt":18,"src":23729},"\u002Fimgs\u002Fblogs\u002F678007c94b5d9832fd994815_AD_4nXfxULWDSCeHfz1qU0HKlVmZboZM1WzWWQErYiwSBynCy9q1zsN6URaNOaVXf3PA5QCDxaU0J8TAykYCpMqR-jzcMBgzVUz7FhTh2ji_gVvU3x05WpHrZE7P1MSXXpQFNw6bbqOt.png",[32,23731,23733],{"id":23732},"deploy-puppygraph","Deploy PuppyGraph",[48,23735,23736,23737,23742,23743,23748],{},"It is easy to deploy PuppyGraph, and can currently be done through ",[55,23738,23741],{"href":23739,"rel":23740},"https:\u002F\u002Fdocs.puppygraph.com\u002Fgetting-started\u002F",[264],"Docker, an AWS AMI"," through AWS Marketplace, or ",[55,23744,23747],{"href":23745,"rel":23746},"https:\u002F\u002Fconsole.cloud.google.com\u002Fmarketplace\u002Fproduct\u002Fpuppygraph-public\u002Fpuppygraph-professional",[264],"GCP Marketplace",". The AMI approach deploys your instance on your chosen infrastructure with just a few clicks. Below, we will focus on what it takes to launch a PuppyGraph instance on Docker.",[48,23750,23751],{},"With Docker installed, you can run the following command to launch the container in your terminal.  Note that the environment variable DATAACCESS_DATA_CACHE_STRATEGY is set as adaptive.",[48,23753,3931],{},[48,23755,23756],{},"docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 -e DATAACCESS_DATA_CACHE_STRATEGY=adaptive -d --name puppy --rm --pull=always puppygraph\u002Fpuppygraph:stable",[48,23758,3931],{},[48,23760,23761],{},"Launch a PuppyGraph instance locally, in the cloud, or on a server with the command above. Then, open your browser and navigate to localhost:8081 (or your instance's URL) to access the PuppyGraph login screen.",[48,23763,3931],{},[48,23765,23766],{},[384,23767],{"alt":18,"src":23768},"\u002Fimgs\u002Fblogs\u002F678007c9cf2a83769a13d0d4_AD_4nXdsFtGxgALUQ98wWn7cjvuIc6izpxrs4XiwbftfcaDwBOLvl6Ck-gxF7T8lYBnmffqcJ3tnI2AybiH9rhNvXksCACszoV6qiySvXQ9RXkbBf0h9btmSIbxUGGo4MxKRwR7SxL-nSQ.png",[48,23770,3931],{},[48,23772,23773],{},"After logging in with the default credentials (username: “puppygraph” and default password: “puppygraph123”) you’ll enter the application itself. At this point, our instance is ready to go and we can proceed with connecting to the compacted data.",[32,23775,23777],{"id":23776},"connect-to-the-compacted-data","Connect to the Compacted Data",[48,23779,23780,23781,23785],{},"To connect PuppyGraph to the compacted data, you need to define the graph schema. You can add the vertices and edges manually through the interface, or compose the JSON schema file and upload it. You need to configure the data source and specify the vertices and edges. In the demo, we provide a JSON schema file template for you and you just need to fill the configuration of the data source there. You can refer to the ",[55,23782,23784],{"href":23644,"rel":23783},[264],"connecting document"," for the details of those fields.",[48,23787,23788],{},[384,23789],{"alt":18,"src":23790},"\u002Fimgs\u002Fblogs\u002F678007c9832b5155234687c8_AD_4nXfy6wQqL-rXK4-_E6rvNu8-Q-99YJqLzJz_wwO6RD4DWn6m9AuvzVByfgR_AkSlvR-b4bya-HlKJY3BtyI7f9eOfQMbi8CVMbbE5ObtdkLuyVgfoPfDlKfL15hrTDjlQrfqsOGhGA.png",[48,23792,3931],{},[48,23794,23795],{},"After submitting the schema, you will see the schema graph.",[48,23797,23798],{},[384,23799],{"alt":18,"src":23800},"\u002Fimgs\u002Fblogs\u002F678007c9a2a90900d1c551ba_AD_4nXfWSK2qNnj6Okuhs_WebwVLvl1OynkIxWag5mADB6PYXSfIzAnUim6RFHOba6sYySj6Jd9RFBKjgKmSzcvBk851AIVITlg6EotoE_ZosF9gzHPVt13ATT8O6yg1jhydr7vF70vt.png",[48,23802,3931],{},[32,23804,23806],{"id":23805},"query-your-graph-via-puppygraph","Query Your Graph via PuppyGraph",[48,23808,23809,23810,23815],{},"Now you can ",[55,23811,23814],{"href":23812,"rel":23813},"https:\u002F\u002Fdocs.puppygraph.com\u002Fquerying\u002F",[264],"query the graph"," using Gremlin or openCypher and visualize the results with the built-in graph visualization tool.",[48,23817,23818],{},[384,23819],{"alt":18,"src":23820},"\u002Fimgs\u002Fblogs\u002F678007c9420bde6b6f539d06_AD_4nXc8q9oAKtgan4GyhAoPSJdRw3ClqsUpSxDQdcjPuZPS-nvBmg-HNIFQ1xyKsppkL2IZLV5GpEu-OEh5hbek9Rju44XF_TntGRuWHeKL0jLouIoNmB4gdU8vXj8gl1FO3R-_8mhYgw.png",[48,23822,3931],{},[48,23824,23825],{},"As new data is produced and added to the Ursa Cluster, your query results in PuppyGraph will update regularly.",[48,23827,23828],{},[384,23829],{"alt":18,"src":23830},"\u002Fimgs\u002Fblogs\u002F678007c94b5d9832fd994804_AD_4nXezHAcIlaHVHrKOuyYbopHmTj6by_nYeTA4xQPxQEmztMNLAfvsodIyme6MhKSKFtKLl37Ccb4nBkGlw17SLMtPkD0_1swRX7aT4mTgBf9rF7vz3NnXmg6EfC-fds-tXUURF9eQ.png",[40,23832,2125],{"id":2122},[48,23834,23835],{},"In this blog, we delved into integrating StreamNative's Ursa Engine Lakehouse Storage with PuppyGraph's zero-ETL graph query engine to achieve real-time graph analysis. This seamless process simplifies workflows and delivers high-speed graph queries, eliminating the complexities associated with traditional graph technologies.",[48,23837,23838,23839,4003,23844,23847],{},"Ready to take control and build a future-proof, graph-enabled real-time system? Try PuppyGraph and StreamNative today. Visit ",[55,23840,23843],{"href":23841,"rel":23842},"https:\u002F\u002Fwww.puppygraph.com\u002Fdownload-confirmation",[264],"PuppyGraph (forever free developer edition)",[55,23845,4496],{"href":15003,"rel":23846},[264]," to get started!",[48,23849,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":23851},[23852,23856,23862],{"id":23519,"depth":19,"text":23520,"children":23853},[23854,23855],{"id":16294,"depth":279,"text":16295},{"id":23568,"depth":279,"text":23569},{"id":23584,"depth":19,"text":23585,"children":23857},[23858,23859,23860,23861],{"id":23658,"depth":279,"text":23659},{"id":23732,"depth":279,"text":23733},{"id":23776,"depth":279,"text":23777},{"id":23805,"depth":279,"text":23806},{"id":2122,"depth":19,"text":2125},"2025-01-09","Discover how StreamNative's Ursa Engine and PuppyGraph enable real-time graph analysis without ETL processes. Learn to integrate these powerful tools for seamless, cost-effective data streaming and analytics.","\u002Fimgs\u002Fblogs\u002F678007afcdb9d392c190438d_image-26.png",{},"\u002Fblog\u002Fintegrating-streamnatives-ursa-engine-with-puppygraph-for-real-time-graph-analysis",{"title":23489,"description":23864},"blog\u002Fintegrating-streamnatives-ursa-engine-with-puppygraph-for-real-time-graph-analysis",[821,1332],"_r_bWQS_mnqGOL0hiSf32Bf5OUHk79n3DJ6vEKaTCZA",{"id":23873,"title":23874,"authors":23875,"body":23876,"category":3550,"createdAt":290,"date":24012,"description":24013,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":24014,"navigation":7,"order":296,"path":24015,"readingTime":22989,"relatedResources":290,"seo":24016,"stem":24017,"tags":24018,"__hash__":24019},"blogs\u002Fblog\u002Frbac-is-now-available-for-public-preview-with-predefined-roles.md","RBAC is now available for Public Preview with predefined roles",[806],{"type":15,"value":23877,"toc":24004},[23878,23881,23885,23888,23891,23902,23905,23908,23911,23915,23918,23938,23941,23945,23948,23959,23962,23966,23969,23977,23981,23984,23995,24002],[48,23879,23880],{},"We’re thrilled to announce that Role-Based Access Control (RBAC) is now available for Public Preview on StreamNative Cloud! This marks a significant step forward in our commitment to providing secure, streamlined, and enterprise-ready solutions for managing data streaming environments.",[40,23882,23884],{"id":23883},"why-rbac-matters-in-data-streaming","Why RBAC Matters in Data Streaming",[48,23886,23887],{},"In the world of enterprise software, ensuring the right people have the right level of access to critical resources is a cornerstone of security. RBAC has long been a trusted framework for achieving this, particularly when safeguarding sensitive data is paramount. But why is RBAC so essential in the context of data streaming?",[48,23889,23890],{},"In the early days of data streaming, the workflow was simple: someone requested a data topic, and it landed in a designated destination like a database or data warehouse. Fast forward to today, and the landscape is far more intricate:",[321,23892,23893,23896,23899],{},[324,23894,23895],{},"Real-time data has become a business necessity.",[324,23897,23898],{},"Microservices have introduced new architectures and workflows.",[324,23900,23901],{},"Data topics are requested by multiple departments, often destined for diverse systems.",[48,23903,23904],{},"This explosion in real-time data demand has led to challenges like manual tracking systems (“secret master spreadsheets”), inefficiencies, and fragmented solutions.",[48,23906,23907],{},"With the rise of Apache Kafka, many organizations deployed isolated Kafka instances across teams, resulting in data silos. Recognizing these limitations, Apache Pulsar was designed with multi-tenancy at its core, enabling shared environments while maintaining security and organization.",[48,23909,23910],{},"Now, StreamNative is taking it a step further. With RBAC, we’re providing a secure, unified environment for managing access across multi-tenant deployments in Pulsar—ensuring your data streaming infrastructure is efficient and well-protected.",[40,23912,23914],{"id":23913},"streamnative-rbac-predefined-roles-for-simplified-access-management","StreamNative RBAC: Predefined Roles for Simplified Access Management",[48,23916,23917],{},"To make adoption seamless, StreamNative offers predefined roles tailored to common use cases. These roles provide granular access controls, empowering teams to securely manage resources:",[321,23919,23920,23923,23926,23929,23932,23935],{},[324,23921,23922],{},"Org Admin (org-admin): Full administrative privileges for managing the entire organization.",[324,23924,23925],{},"Org Read Only (org-readonly): Read-only access for monitoring and auditing purposes.",[324,23927,23928],{},"Tenant Admin (tenant-admin): Full control over a specific tenant.",[324,23930,23931],{},"Tenant Read Only (tenant-readonly): Read-only access to tenant-level resources.",[324,23933,23934],{},"Topic Producer (topic-producer): Permissions to produce data to specified topics.",[324,23936,23937],{},"Topic Consumer (topic-consumer): Permissions to consume data from specified topics.",[48,23939,23940],{},"These predefined roles are designed to simplify setup while offering the flexibility to fine-tune permissions as needed.",[40,23942,23944],{"id":23943},"rbac-vs-acls-enhanced-flexibility","RBAC vs. ACLs: Enhanced Flexibility",[48,23946,23947],{},"RBAC role bindings work seamlessly with Pulsar ACLs (Access Control Lists) to provide comprehensive access control. Permissions can be granted through ACLs, RBAC role bindings, or both. The system evaluates all granted permissions to determine access, offering:",[321,23949,23950,23953,23956],{},[324,23951,23952],{},"Explicit Permissions: Users no longer have implicit Super Admin (Super User) access; they only have the permissions explicitly granted to them.",[324,23954,23955],{},"Granular Access: Apply ACLs or RBAC role bindings to principals (users or service accounts) for fine-grained control.",[324,23957,23958],{},"Combined Use: Use ACLs and RBAC role bindings together to meet complex access requirements.",[48,23960,23961],{},"For a deeper dive, explore our RBAC and ACL Documentation.",[40,23963,23965],{"id":23964},"how-to-enable-rbac","How to Enable RBAC",[48,23967,23968],{},"Ready to take control of your data streaming environment with RBAC? Enabling this feature is simple. Reach out to your account manager or our support team for assistance. We’re here to help you implement RBAC smoothly and effectively.",[48,23970,23971,23972,190],{},"For additional details, check out the ",[55,23973,23976],{"href":23974,"rel":23975},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-rel-note",[264],"release notes",[32,23978,23980],{"id":23979},"secure-your-data-streaming-environment-today","Secure Your Data Streaming Environment Today",[48,23982,23983],{},"With RBAC on StreamNative Cloud, you can:",[321,23985,23986,23989,23992],{},[324,23987,23988],{},"Enhance security by ensuring precise access control.",[324,23990,23991],{},"Streamline management of multi-tenant environments.",[324,23993,23994],{},"Improve operational efficiency with predefined roles and granular permissions.",[48,23996,23997,24001],{},[55,23998,24000],{"href":15003,"rel":23999},[264],"Join the Public Preview"," and experience how RBAC can transform your data streaming operations. Get started today and empower your teams with the tools they need to succeed!",[48,24003,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":24005},[24006,24007,24008,24009],{"id":23883,"depth":19,"text":23884},{"id":23913,"depth":19,"text":23914},{"id":23943,"depth":19,"text":23944},{"id":23964,"depth":19,"text":23965,"children":24010},[24011],{"id":23979,"depth":279,"text":23980},"2025-01-06","Discover the power of Role-Based Access Control (RBAC) on StreamNative Cloud. Now in Public Preview, RBAC enhances security, simplifies access management, and streamlines multi-tenant data streaming.",{},"\u002Fblog\u002Frbac-is-now-available-for-public-preview-with-predefined-roles",{"title":23874,"description":24013},"blog\u002Frbac-is-now-available-for-public-preview-with-predefined-roles",[4301,3550],"VsyKajQ7GZlKffDIeEzFR9h_hGuyvLw5yMVSVNmGE50",{"id":24021,"title":24022,"authors":24023,"body":24024,"category":3550,"createdAt":290,"date":24242,"description":24243,"extension":8,"featured":294,"image":24244,"isDraft":294,"link":290,"meta":24245,"navigation":7,"order":296,"path":24246,"readingTime":18649,"relatedResources":290,"seo":24247,"stem":24248,"tags":24249,"__hash__":24250},"blogs\u002Fblog\u002Fstreamnatives-2024-year-in-review.md","StreamNative’s 2024 Year in Review",[806],{"type":15,"value":24025,"toc":24230},[24026,24029,24034,24038,24044,24048,24055,24059,24066,24070,24078,24092,24095,24099,24107,24111,24114,24139,24143,24146,24161,24165,24168,24205,24209,24216,24218,24225,24228],[48,24027,24028],{},"As 2024 comes to a close, we’re thrilled to reflect on a year of remarkable achievements and innovations. This year has been pivotal for StreamNative as we continue to advance the field of data streaming and push the boundaries of what’s possible. Here’s a look at some of the most exciting developments that defined 2024.",[48,24030,24031],{},[384,24032],{"alt":18,"src":24033},"\u002Fimgs\u002Fblogs\u002F676317bc2e077cd16d1a971f_AD_4nXfuQ3xcwllwa4SUmzgNGqrsADjAiGQvuNHYRk-ugTvUA9Jn-IuRwv67LW2vmrXxhF8wqa3sH5UIZqY22ql-3vixf6VJWm8SdHSRp0s3ZfV49BFYtybavbCy8qNr0JwITjCIoHFC.png",[40,24035,24037],{"id":24036},"introducing-ursa-engine-augmenting-lakehouse-with-data-streaming-capabilities","Introducing Ursa Engine: Augmenting Lakehouse with data streaming capabilities",[48,24039,24040,24041,24043],{},"This year, we unveiled ",[55,24042,4725],{"href":6647},", a groundbreaking addition to our product portfolio. Ursa Engine delivers a 100% Kafka-compatible API paired with cost-effective storage designed for seamless lakehouse integration. By combining efficiency and compatibility, Ursa Engine empowers businesses to harness the power of a unified streaming and storage platform for their data lakehouse needs.",[40,24045,24047],{"id":24046},"uniconn-simplifying-data-pipelines","UniConn: Simplifying Data Pipelines",[48,24049,24050,24051,24054],{},"2024 saw the launch of ",[55,24052,24053],{"href":5039},"UniConn (Universal Connectivity)",", a revolutionary solution aimed at simplifying and enhancing data streaming. UniConn offers a consistent, declarative approach to, connecting, processing, debugging and monitoring. Whether leveraging Kafka Connect or Pulsar IO frameworks, UniConn delivers a streamlined experience that reduces complexity and accelerates pipeline development.",[40,24056,24058],{"id":24057},"unilink-cross-cluster-interoperability-made-easy","UniLink: Cross-Cluster Interoperability Made Easy",[48,24060,24061,24065],{},[55,24062,24064],{"href":24063},"\u002Fproducts\u002Funiversal-linking-lp","UniLink (Universal Linking)"," is another highlight of our 2024 journey. Designed for seamless data replication and interoperability, UniLink works across self-managed or fully-managed Kafka and Pulsar clusters. By utilizing object storage (e.g., S3) for networking and storage, UniLink provides a cost-effective and efficient solution while significantly reducing operational complexity.",[40,24067,24069],{"id":24068},"apache-pulsar-40-the-next-generation-of-data-streaming","Apache Pulsar 4.0: The Next Generation of Data Streaming",[48,24071,24072,24073,24077],{},"A major milestone this year was the release of ",[55,24074,24076],{"href":24075},"\u002Fblog\u002Fannouncing-apache-pulsar-tm-4-0-towards-an-open-data-streaming-architecture","Apache Pulsar 4.0",", the second Long-Term Support (LTS) version following the success of Pulsar 3.0. Pulsar 4.0 is a significant leap forward in our mission to make data streaming more accessible, affordable, and scalable. Key enhancements include:",[321,24079,24080,24083,24086,24089],{},[324,24081,24082],{},"Modularity: Enabling flexible, composable deployments.",[324,24084,24085],{},"Observability: Improved tools for monitoring and debugging.",[324,24087,24088],{},"Scalability: Enhanced for large-scale enterprise applications.",[324,24090,24091],{},"Security: Strengthened protections for sensitive data.",[48,24093,24094],{},"With advanced Quality of Service (QoS) controls and a focus on simplicity and flexibility, Pulsar 4.0 drives the vision of an Open Data Streaming Architecture closer to reality.",[40,24096,24098],{"id":24097},"fully-managed-flink-services","Fully Managed Flink Services",[48,24100,24101,24102,24106],{},"StreamNative introduced ",[55,24103,24105],{"href":24104},"\u002Fproducts\u002Fflink","Flink as a fully managed service",", providing an enterprise-grade stream processing platform built on Apache Flink. This service simplifies real-time data processing, offering a highly scalable, resilient, and efficient solution for developing and running stream processing applications.",[40,24108,24110],{"id":24109},"streamnative-cloud-feature-enhancements","StreamNative Cloud Feature Enhancements",[48,24112,24113],{},"This year, StreamNative Cloud introduced new UI and capabilities to further improve flexibility, reliability, and operational simplicity for Pulsar instances:",[321,24115,24116,24124,24132],{},[324,24117,24118,24123],{},[55,24119,24122],{"href":24120,"rel":24121},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-release-channel",[264],"Release Channel",": Provides users with the ability to choose stable or experimental software releases for their Pulsar environments.",[324,24125,24126,24131],{},[55,24127,24130],{"href":24128,"rel":24129},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-maintenance-window",[264],"Custom Maintenance Window",": Allows organizations to schedule maintenance periods that align with their operational requirements, reducing disruption to critical workflows.",[324,24133,24134,24138],{},[55,24135,24137],{"href":24136},"\u002Fblog\u002Fautomated-geo-replication-set-up-in-streamnative-cloud-pulsar-instances","Automated Geo-Replication Setup",": StreamNative Cloud simplifies setting up geo-replication for Pulsar instances, enabling seamless and reliable data replication across regions to improve resilience and availability.",[40,24140,24142],{"id":24141},"expanding-deployment-options","Expanding Deployment Options",[48,24144,24145],{},"In 2024, we expanded our deployment options to meet diverse customer needs:",[321,24147,24148,24155],{},[324,24149,24150,24154],{},[55,24151,24153],{"href":24152},"\u002Fblog\u002Fstreamnative-introduces-self-service-experience-for-byoc-infrastructure-setup","Self-service BYOC (Bring Your Own Cloud)",": Empowering users with greater control over their infrastructure.",[324,24156,24157,24160],{},[55,24158,24159],{"href":11196},"Serverless Deployments",": Reducing operational overhead with fully managed solutions.",[40,24162,24164],{"id":24163},"partnerships-and-marketplace-expansion","Partnerships and Marketplace Expansion",[48,24166,24167],{},"Collaboration has always been at the heart of our mission. This year, we strengthened our ecosystem with exciting new partnerships and marketplace integrations:",[321,24169,24170,24179,24195,24202],{},[324,24171,24172,24173,24178],{},"Pulsar-Spark Connector with Databricks: In collaboration with Databricks, StreamNative introduced the ",[55,24174,24177],{"href":24175,"rel":24176},"https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Fstreamnative-and-databricks-unite-power-real-time-data-processing-pulsar-spark-connector",[264],"Pulsar-Spark Connector",", enabling seamless integration of Apache Pulsar and Apache Spark. This connector allows organizations to process real-time streaming data within the Databricks ecosystem, combining the strengths of Pulsar’s flexible messaging platform and Spark’s powerful analytics capabilities.",[324,24180,24181,24182,1186,24186,5422,24190,24194],{},"Marketplace Presence: StreamNative is now available on ",[55,24183,14536],{"href":24184,"rel":24185},"https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fseller-profile?id=567ea745-17ba-4c66-b3ec-a1350352cae7",[264],[55,24187,6869],{"href":24188,"rel":24189},"https:\u002F\u002Fazuremarketplace.microsoft.com\u002Fen-us\u002Fmarketplace\u002Fapps\u002Fstreamnative.apache-pulsar-by-streamnative-azure?tab=overview",[264],[55,24191,4789],{"href":24192,"rel":24193},"https:\u002F\u002Fconsole.cloud.google.com\u002Fmarketplace\u002Fproduct\u002Fstreamnative-public\u002Fapache-pulsar-managed-by-streamnative",[264]," marketplaces.",[324,24196,24197,24201],{},[55,24198,24200],{"href":24199},"\u002Fpartners","Partner Program",": We welcomed innovative partners such as RisingWave, Zilliz, Calsoft Nexaminds, Timeplus, Pinecone, PuppyGraph, TiDB, Ververica, Snowflake, and Nuaav.",[324,24203,24204],{},"Google Cloud BigQuery Ready Program: Achieving BigQuery Ready status underscores our commitment to seamless cloud integration and analytics.",[40,24206,24208],{"id":24207},"data-streaming-summit-2024","Data Streaming Summit 2024",[48,24210,24211,24212,190],{},"Finally, we wrapped up the year with the highly anticipated Data Streaming Summit 2024. The event brought together thought leaders, industry experts, and developers from around the world to share insights and explore the future of data streaming. It was a celebration of innovation and collaboration, setting the stage for an even brighter 2025. Missed a session? You can watch the on-demand sessions ",[55,24213,267],{"href":24214,"rel":24215},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqRma1oIkcWgN9agdJ0DQhX2gPf8K2ynk",[264],[40,24217,17130],{"id":17129},[48,24219,24220,24221,24224],{},"As we reflect on these accomplishments, we’re excited about the road ahead. StreamNative remains dedicated to delivering cutting-edge solutions, fostering a vibrant ecosystem, and helping businesses unlock the full potential of data streaming. Experience the power of StreamNative today with a $200 credit to ",[55,24222,23654],{"href":17075,"rel":24223},[264],". Explore our innovative solutions and see how they can transform your data streaming needs.",[48,24226,24227],{},"Here’s to continued innovation and success in 2025 and beyond!",[48,24229,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":24231},[24232,24233,24234,24235,24236,24237,24238,24239,24240,24241],{"id":24036,"depth":19,"text":24037},{"id":24046,"depth":19,"text":24047},{"id":24057,"depth":19,"text":24058},{"id":24068,"depth":19,"text":24069},{"id":24097,"depth":19,"text":24098},{"id":24109,"depth":19,"text":24110},{"id":24141,"depth":19,"text":24142},{"id":24163,"depth":19,"text":24164},{"id":24207,"depth":19,"text":24208},{"id":17129,"depth":19,"text":17130},"2024-12-18","Discover StreamNative's 2024 milestones: introducing Ursa Engine, Apache Pulsar 4.0, UniConn, UniLink, and more. Explore groundbreaking advancements in data streaming and cloud integration.","\u002Fimgs\u002Fblogs\u002F676317d916bb0b5d1a2a4865_image-23.png",{},"\u002Fblog\u002Fstreamnatives-2024-year-in-review",{"title":24022,"description":24243},"blog\u002Fstreamnatives-2024-year-in-review",[800,10322,799,10054,302],"KFFbIt22wpea5dmI6gFML1142uZPNagI5NhsD0-C4J0",{"id":24252,"title":24253,"authors":24254,"body":24255,"category":3550,"createdAt":290,"date":24391,"description":24392,"extension":8,"featured":294,"image":24393,"isDraft":294,"link":290,"meta":24394,"navigation":7,"order":296,"path":24395,"readingTime":17934,"relatedResources":290,"seo":24396,"stem":24397,"tags":24398,"__hash__":24399},"blogs\u002Fblog\u002Fdata-platforms-optimized-for-the-cloud---the-streamnative-and-snowflake-partnership.md","Data Platforms Optimized for the Cloud – the StreamNative and Snowflake Partnership",[311],{"type":15,"value":24256,"toc":24383},[24257,24260,24274,24278,24281,24285,24288,24291,24295,24298,24301,24305,24311,24314,24317,24326,24333,24337,24340,24343,24346,24366,24369,24381],[48,24258,24259],{},"While AI is supposed to make our lives easier, that might not be true if you are working behind the scenes to build and deploy AI systems. With so much potential for a variety of moving parts, it’s important for you to ensure that your data architecture doesn't become overly complex and costly. Keeping a check on IT complexity is a key requirement to help you gain the advantages of AI without the challenges that arise down the road.",[48,24261,24262,24263,24268,24269,24273],{},"At StreamNative we are committed to helping our customers become successful with their data streaming initiatives. As a result, we are excited to announce we have achieved ",[55,24264,24267],{"href":24265,"rel":24266},"https:\u002F\u002Fwww.snowflake.com\u002Fen\u002Fwhy-snowflake\u002Fpartners\u002Fall-partners\u002Fstreamnative\u002F",[264],"Technology Select Tier partner status"," from ",[55,24270,18653],{"href":24271,"rel":24272},"https:\u002F\u002Fc212.net\u002Fc\u002Flink\u002F?t=0&l=en&o=4203468-1&h=3960528063&u=https%3A%2F%2Fwww.snowflake.com%2F&a=Snowflake",[264],", the AI Data Cloud company. This partnership enables joint customers to easily leverage streaming data to deploy real-time analytics with greater performance\u002Fscale, faster elasticity, and lower TCO (more on these later).",[40,24275,24277],{"id":24276},"two-companies-one-vision","Two Companies, One Vision",[48,24279,24280],{},"Our vision is to enable customers to get the most value from data. “SteamNative’s commitment to helping Snowflake mobilize the world’s data can be seen through their tremendous and fast growth as a Snowflake partner with us,\" said Tarik Dwiek, Head of Technology Alliances, Snowflake. \"We look forward to driving deeper value for Snowflake’s AI Data Cloud ecosystem by partnering with StreamNative to allow access to a fast, simplified, and cost-effective real-time analytics architecture through Snowflake’s single, integrated platform.\"",[40,24282,24284],{"id":24283},"apache-iceberg-open-source-table-format","Apache Iceberg™ Open-Source Table Format",[48,24286,24287],{},"Both companies recognize how customers can gain value from having Iceberg as an option for writing analytical data. This is especially important for businesses that have large-scale data sets that are less frequently accessed, in which storage cost is a main concern. Writing Iceberg tables into cloud object storage is a cost-effective way for storing and analyzing these data sets.",[48,24289,24290],{},"With our support for Snowpipe Streaming (more information coming soon in a blog), we are the first Snowflake partner to support Iceberg tables in our joint architecture. Writing to Iceberg tables automatically supports Snowflake Catalog to ensure this capability fits seamlessly in your data architecture. The Iceberg support gives our joint customers advantages around schema evolution, version control, and advanced partition handling while leveraging cost-effective object storage.",[40,24292,24294],{"id":24293},"real-time-plus-ai","Real-Time Plus AI",[48,24296,24297],{},"Real-time analytics has been a popular and important data pattern for many years, and the need to further accelerate data processing pipelines while simplifying the overall architecture has been an ongoing aspiration. In addition, businesses are extending their traditional real-time analytics architecture to augment their AI deployments, so systems are always up to date with the latest information. This is especially important in company-specific AI systems that rely on time-sensitive data, often sourced by recent customer interactions. And as always, keeping cloud infrastructure costs under control is a priority, especially when cloud bills are far higher than expected. So, a heavier focus on cloud FinOps is also a necessary part of your data strategies.",[48,24299,24300],{},"If you are using or are exploring the use of Snowflake as a component of your real-time analytics and AI architectures, then a data streaming platform is likely part of your strategy. Many would reasonably choose Apache Kafka, but might also realize that it is the Kafka API and its ecosystem we care most about. The different underlying engines that support the Kafka API\u002Fecosystem then become the points of comparison among the popular Kafka-compatible options in the market. StreamNative provides the Kafka API\u002Fecosystem compatibility to lower the learning curve for your data streaming initiatives, and also provides the performance, simplicity, and cost-effectiveness to get the ROI you seek. Before we go into more details, let’s briefly discuss two example joint customers.",[40,24302,24304],{"id":24303},"joint-customers","Joint Customers",[48,24306,24307,24310],{},[55,24308,24309],{"href":23440},"Our success story on InnerSpace"," provides a great example of how these technologies work together. InnerSpace is a location analytics business that captures insightful data about people’s behavior with the goal of improving indoor experiences. They help customers with operational optimization, in which the “operations” pertain to how people use office space. Many businesses invest in real estate and want to know how efficiently their space is used. InnerSpace gives their customers insights on metrics such as how many people are in the office, how many come in early, how many leave late, where are the underused areas, etc.",[48,24312,24313],{},"InnerSpace chose StreamNative to handle their speed, low latency, scale, and cost challenges while also trying to simplify their infrastructure to reduce the dependency on a large DevOps team. They ingest raw location data from standard hooks in networking equipment, and then use Pulsar Functions, a function-as-a-service engine in StreamNative Cloud, to process that data before it can be used by analysts. One of the two main computations they run on the raw data is to anonymize the MAC addresses of the various devices so that employee privacy is retained. Another is to run their location algorithm which takes the Wi-Fi signal strengths and calculates locations of the devices at a very accurate and granular level.",[48,24315,24316],{},"The integration of StreamNative Cloud and Snowflake then makes it easy to deliver that data into an analytics-ready format. Analysts can then use popular tools to analyze office space usage and make better use of their space.",[48,24318,24319,24320,24325],{},"Another ",[55,24321,24324],{"href":24322,"rel":24323},"https:\u002F\u002Fwww.iterable.com",[264],"great example is Iterable",", the AI-powered customer communication platform that helps brands like Redfin, Priceline, Calm, and Box to activate customers with joyful interactions at scale. With Iterable, organizations drive high growth with individualized, harmonized and dynamic communications that engage customers throughout the entire lifecycle at the right time. Iterable continuously processes a high volume of data in real time, and constantly pushes out messages via various channels to their customers’ customers. With billions of daily events to capture and process, and about a billion messages sent per day, Iterable needs a system that provides high throughput, low latency, and can scale.",[48,24327,24328,24329,24332],{},"‍\n",[384,24330],{"alt":18,"src":24331},"\u002Fimgs\u002Fblogs\u002F675b0dcef265f2d1d6bf0c7d_AD_4nXdmjDphqlP1IIfqj3GQWyVVulGAiD4k7gByIRVpd9SQPSCZ8zaPZSKvMm4D3UDi7Sns0l4G4EHnovNeB8pTrnQN8FraIBnbWzHPkI1laDsAyQxsFpZUTHBUO1vDTk9fT5A4W7kae2XmTWarr3hlMfcejTu077gr3UsmFgzsnUGuI1FcxywPxg.png","\nThey process the raw event data into an intermediary topic, which is then transformed into an analyzable format, and then load the transformed data into Snowflake for downstream processing and analytics. The StreamNative and Snowflake components work together to run the entire data pipelines from end to end.",[40,24334,24336],{"id":24335},"considerations-for-your-choice-of-data-streaming-platform","Considerations for Your Choice of Data Streaming Platform",[48,24338,24339],{},"What are the important considerations for a suitable data streaming platform to go with your Snowflake implementation?",[48,24341,24342],{},"As a start, you need a certified integration with Snowflake, and this partnership addresses that requirement. Second, the right cloud deployment option is critical, whether you need a fully managed service, a self-hosted deployment, or even a bring-your-own-cloud option for those of you who have stringent data security and sovereignty requirements. You need a technology option that supports your deployment needs. Another consideration is whether you can leverage existing expertise. If your developers and tools are focused on the Kafka API, then it makes sense to go with a Kafka API-compatible technology.",[48,24344,24345],{},"Those are the items that you probably have considered, and here are some other issues you should plan for:",[1666,24347,24348,24351,24358],{},[324,24349,24350],{},"Central platform engineering team. A common organizational model today is having a platform engineering team that is responsible for the foundational technologies, while separate dev teams build out systems for specific use cases. To support such an organizational structure, you need to have a platform that has the multi-tenancy capabilities to either isolate or share specific data sets to more efficiently support a broad audience. StreamNative Cloud provides the multi-tenancy capabilities to support a central platform team while also providing the simplicity of consolidating many use cases into a single cluster.",[324,24352,24353,24354,24357],{},"Faster elasticity. Elasticity is a basic characteristic among cloud technologies, but the speed and efficiency of elastic deployments can vary greatly. This is because distributed systems often must rebalance data across the many nodes, and this process can be time consuming. So, if you add or remove nodes, the rebalancing work kicks in, and that can be heavyweight and cause disruptions. StreamNative Cloud leverages an architecture that ",[55,24355,24356],{"href":21492},"eliminates the costly data rebalancing work",", so that you can quickly scale up or down without the housekeeping disruptions.",[324,24359,24360,24361,24365],{},"Downstream infrastructure costs. You have initial estimates on what your cloud bill will be, but as many businesses are finding, ",[55,24362,24364],{"href":24363},"\u002Fblog\u002Fa-guide-to-evaluating-the-infrastructure-costs-of-apache-pulsar-and-apache-kafka","there are costs that are not always obvious,"," and therefore harder to predict. One source of such costs is the networking costs associated with housekeeping tasks like data rebalancing, as mentioned above. With the no-data-rebalancing architecture of StreamNative, you can not only avoid disruptions when scaling, but you can also reduce networking costs to keep your cloud infrastructure bills under control.",[40,24367,24368],{"id":3532},"Learn More",[48,24370,24371,24372,24376,24377,24380],{},"This is just a brief overview of why this StreamNative partnership with Snowflake is beneficial. If real-time analytics and AI are part of your data initiatives, StreamNative and Snowflake make a great combination. There’s so much to explore here, ",[55,24373,24375],{"href":17075,"rel":24374},[264],"try StreamNative"," with free $200 credit or ",[55,24378,24379],{"href":6392},"contact us"," and we’d be happy to discuss this technology integration with you in more detail.",[48,24382,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":24384},[24385,24386,24387,24388,24389,24390],{"id":24276,"depth":19,"text":24277},{"id":24283,"depth":19,"text":24284},{"id":24293,"depth":19,"text":24294},{"id":24303,"depth":19,"text":24304},{"id":24335,"depth":19,"text":24336},{"id":3532,"depth":19,"text":24368},"2024-12-12","Discover how StreamNative and Snowflake’s partnership helps businesses optimize real-time analytics and AI deployments with scalable, cost-effective data streaming solutions. Learn more!","\u002Fimgs\u002Fblogs\u002F675b0e105a23323c815bd928_image-19.png",{},"\u002Fblog\u002Fdata-platforms-optimized-for-the-cloud-the-streamnative-and-snowflake-partnership",{"title":24253,"description":24392},"blog\u002Fdata-platforms-optimized-for-the-cloud---the-streamnative-and-snowflake-partnership",[800,18653],"tj9544onUMXJK80cYmh3JU5WHZbCgYHzZ8Oe54Lk5tc",{"id":24401,"title":24402,"authors":24403,"body":24404,"category":3550,"createdAt":290,"date":24657,"description":24658,"extension":8,"featured":294,"image":24659,"isDraft":294,"link":290,"meta":24660,"navigation":7,"order":296,"path":24661,"readingTime":17934,"relatedResources":290,"seo":24662,"stem":24663,"tags":24664,"__hash__":24665},"blogs\u002Fblog\u002Fdecember-data-streaming-launch-azure-marketplace-serverless-on-aws-and-azure-automated-geo-replication-and-more.md","December Data Streaming Launch: Azure Marketplace, Serverless on AWS and Azure, Automated Geo-replication, and more",[806],{"type":15,"value":24405,"toc":24648},[24406,24415,24418,24429,24432,24436,24439,24455,24463,24467,24470,24473,24480,24484,24487,24490,24501,24514,24522,24526,24534,24548,24551,24559,24563,24566,24572,24576,24579,24604,24607,24611,24628,24639,24646],[48,24407,24408,24409,4003,24412,24414],{},"At StreamNative, our vision has always been to democratize data streaming by making it affordable, accessible, and scalable for everyone. During the recent Data Streaming Summit, we announced exciting new features such as the ",[55,24410,24411],{"href":23502},"Ursa Engine Public Preview",[55,24413,1249],{"href":24063},", aimed at making data streaming more affordable and scalable.",[48,24416,24417],{},"Over the past few months, we’ve focused on accessibility improvements, culminating in this December Data Streaming Launch. These enhancements include:",[321,24419,24420,24423,24426],{},[324,24421,24422],{},"Azure Marketplace Launch: StreamNative Cloud is now available across all three major cloud marketplaces.",[324,24424,24425],{},"Expanded Deployment Options: Serverless on AWS and Azure, alongside self-service Bring Your Own Cloud (BYOC) on Azure, making Serverless, Dedicated, and BYOC deployment options accessible across all major cloud platforms.",[324,24427,24428],{},"Fully Automated Geo-Replication: Seamlessly set up global data streaming platforms across any region or cloud provider.",[48,24430,24431],{},"Let’s dive into the details of this exciting launch!",[32,24433,24435],{"id":24434},"streamnative-serverless-now-available-across-three-major-cloud-providers","StreamNative Serverless Now Available Across Three Major Cloud Providers",[48,24437,24438],{},"StreamNative Serverless has rapidly become one of our most popular offerings. Since its initial launch, we’ve received requests from users and customers for availability across more regions and cloud providers. Now, we’re thrilled to announce StreamNative Serverless is in Public Preview on AWS and Azure, joining GCP.",[48,24440,24441,24442,1186,24447,2869,24451,190],{},"You can now create Serverless clusters on any of the three major cloud providers using ",[55,24443,24446],{"href":24444,"rel":24445},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-console",[264],"our console",[55,24448,24450],{"href":15853,"rel":24449},[264],"CLI",[55,24452,7046],{"href":24453,"rel":24454},"https:\u002F\u002Fdocs.streamnative.io\u002Fterraform-provider\u002Fterraform-provider-overview",[264],[48,24456,24457,24458,190],{},"Learn how to ",[55,24459,24462],{"href":24460,"rel":24461},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fquickstart-console",[264],"get started with Serverless",[32,24464,24466],{"id":24465},"fully-automated-geo-replication-in-streamnative-cloud","Fully Automated Geo-Replication in StreamNative Cloud",[48,24468,24469],{},"Geo-replication, a cornerstone feature of Apache Pulsar, ensures seamless data replication across regions for enhanced reliability and availability. Previously, configuring geo-replication required significant effort, including global configuration setup, network connectivity, and credential management.",[48,24471,24472],{},"To address those challenges, we introduced Automated Geo-Replication Setup in StreamNative Cloud. This feature simplifies the setup process, enabling enterprise-grade multi-cluster functionality for all users.",[48,24474,24475,24476,24479],{},"Discover how to use ",[55,24477,24478],{"href":24136},"Automated Geo-Replication"," on StreamNative Cloud.",[32,24481,24483],{"id":24482},"enhanced-self-service-byoc-azure-bring-your-own-network-byon-dns-and-more","Enhanced Self-service BYOC: Azure, Bring Your Own Network (BYON), DNS, and More",[48,24485,24486],{},"Bring Your Own Cloud (BYOC) is one of our most cost-effective and secure solutions, tailored for customers needing sovereignty, security, and compliance. Customizations are critical for BYOC customers, especially in areas like networking and DNS.",[48,24488,24489],{},"With this launch, we’re introducing enhanced self-service support for:",[321,24491,24492,24495,24498],{},[324,24493,24494],{},"Self-service BYOC on Azure: Expanding on our support for Serverless on Azure, we now offer self-service BYOC on the Azure Cloud. Customers can provision BYOC clusters on Azure with the same ease as on GCP and AWS.",[324,24496,24497],{},"Bring Your Own Network (BYON): Provision and tag your VPC and subnets with Vendor=StreamNative to integrate seamlessly.",[324,24499,24500],{},"Bring Your Own DNS (BYOD): Use your existing public hosted DNS zones by specifying their ID and DNS names.",[48,24502,24503,24504,1186,24508,2869,24511,190],{},"These features are available on AWS and GCP through our ",[55,24505,24507],{"href":24444,"rel":24506},[264],"console",[55,24509,24450],{"href":15853,"rel":24510},[264],[55,24512,7046],{"href":24453,"rel":24513},[264],[48,24515,24516,24517,190],{},"Learn ",[55,24518,24521],{"href":24519,"rel":24520},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbyoc-overview",[264],"how to provision BYOC clusters yourself",[32,24523,24525],{"id":24524},"streamnative-cloud-on-azure-marketplace","StreamNative Cloud on Azure Marketplace",[48,24527,24528,24529,24533],{},"In addition to Serverless and BYOC support on Azure, StreamNative Cloud is now available on ",[55,24530,24532],{"href":24188,"rel":24531},[264],"Azure Marketplace",", supporting Serverless, Dedicated, and BYOC deployment options through annual or pay-as-you-go subscriptions. Azure customers can now benefit from:",[321,24535,24536,24539,24542,24545],{},[324,24537,24538],{},"Seamless Integration: Connect StreamNative Cloud to Azure services effortlessly.",[324,24540,24541],{},"Simplified Procurement: Add StreamNative Cloud to your Azure account with just a few clicks.",[324,24543,24544],{},"Enhanced Security: Enjoy the combined security of Azure and StreamNative Cloud.",[324,24546,24547],{},"Scalability on Demand: Scale your infrastructure flexibly to meet business needs.",[48,24549,24550],{},"With this launch, StreamNative Cloud is now available on all three major cloud marketplaces.",[48,24552,24553,24554,190],{},"Learn more about our ",[55,24555,24558],{"href":24556,"rel":24557},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-azure",[264],"Azure Marketplace launch",[32,24560,24562],{"id":24561},"custom-maintenance-windows","Custom Maintenance Windows",[48,24564,24565],{},"To minimize disruptions, StreamNative Cloud performs upgrades and maintenance in a rolling fashion. Now, you can align maintenance schedules with your business needs by setting custom maintenance windows via the cluster settings menu. This feature is available for production support customers across all cluster types.",[48,24567,10256,24568,190],{},[55,24569,24571],{"href":24128,"rel":24570},[264],"customizing maintenance windows",[32,24573,24575],{"id":24574},"improved-observability-rbac-for-metrics-api-and-built-in-grafana-dashboards","Improved Observability: RBAC for Metrics API, and Built-In Grafana Dashboards",[48,24577,24578],{},"Observability is critical for managing your data streaming clusters effectively. We’ve made significant improvements:",[321,24580,24581,24590],{},[324,24582,24583,24584,24589],{},"RBAC for Metrics API: Use a ",[55,24585,24588],{"href":24586,"rel":24587},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fmanage-rbac-roles#metrics-viewer",[264],"MetricsViewer"," role to scrape metrics with reduced privileges.",[324,24591,24592,24593,24598,24599,190],{},"Built-In Grafana Dashboard Templates: Quickly set up ",[55,24594,24597],{"href":24595,"rel":24596},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fgrafana-dashboards",[264],"dashboards and alert rules"," using our ",[55,24600,24603],{"href":24601,"rel":24602},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fstreamnative-cloud-dashboard",[264],"templates",[48,24605,24606],{},"Explore Grafana Dashboards and Alerting.",[32,24608,24610],{"id":24609},"streaming-ahead","Streaming Ahead",[48,24612,24613,24614,1186,24618,2869,24623,24627],{},"This December Data Streaming Launch reinforces our mission to deliver cost-effective, accessible data streaming solutions for organizations of all sizes. Whether you choose ",[55,24615,4839],{"href":24616,"rel":24617},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcluster-types#serverless-clusters",[264],[55,24619,24622],{"href":24620,"rel":24621},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcluster-types#dedicated-clusters",[264],"Dedicated",[55,24624,10322],{"href":24625,"rel":24626},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcluster-types#byoc-clusters",[264],", you now have the flexibility to deploy StreamNative Cloud on your preferred cloud provider with ease.",[48,24629,24630,24634,24635,24638],{},[55,24631,24633],{"href":24460,"rel":24632},[264],"Get started with StreamNative Serverless"," on GCP, AWS, and Azure or explore self-service ",[55,24636,10322],{"href":24519,"rel":24637},[264]," on Azure.",[48,24640,24641,24642,20076],{},"Let’s ",[55,24643,24645],{"href":17075,"rel":24644},[264],"keep streaming",[48,24647,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":24649},[24650,24651,24652,24653,24654,24655,24656],{"id":24434,"depth":279,"text":24435},{"id":24465,"depth":279,"text":24466},{"id":24482,"depth":279,"text":24483},{"id":24524,"depth":279,"text":24525},{"id":24561,"depth":279,"text":24562},{"id":24574,"depth":279,"text":24575},{"id":24609,"depth":279,"text":24610},"2024-12-10","Discover StreamNative’s December Data Streaming Launch! Explore exciting updates like Azure Marketplace availability, Serverless on AWS and Azure, automated geo-replication, enhanced BYOC options, and more. Revolutionize your data streaming today!","\u002Fimgs\u002Fblogs\u002F6758d45ca25fbb814863da6e_image-18.png",{},"\u002Fblog\u002Fdecember-data-streaming-launch-azure-marketplace-serverless-on-aws-and-azure-automated-geo-replication-and-more",{"title":24402,"description":24658},"blog\u002Fdecember-data-streaming-launch-azure-marketplace-serverless-on-aws-and-azure-automated-geo-replication-and-more",[302,10322],"n3wpO2g5WvysQdyhucv-9ww-72x1BZvRUfWBOHN_iIY",{"id":24667,"title":24668,"authors":24669,"body":24670,"category":3550,"createdAt":290,"date":24763,"description":24764,"extension":8,"featured":294,"image":24765,"isDraft":294,"link":290,"meta":24766,"navigation":7,"order":296,"path":24767,"readingTime":22989,"relatedResources":290,"seo":24768,"stem":24769,"tags":24770,"__hash__":24771},"blogs\u002Fblog\u002Fstreamnative-cloud-now-available-on-azure-marketplace.md","StreamNative Cloud Now Available on Azure Marketplace",[311],{"type":15,"value":24671,"toc":24755},[24672,24679,24684,24688,24691,24695,24698,24702,24716,24720,24723,24728,24732,24735,24738,24740,24746,24753],[48,24673,24674,24675,24678],{},"We're excited to announce that StreamNative Cloud is now available on ",[55,24676,24532],{"href":24188,"rel":24677},[264],"! This marks a significant milestone in our journey to make real-time messaging and streaming analytics accessible to organizations of all sizes, directly within their preferred cloud ecosystem.",[48,24680,24681],{},[384,24682],{"alt":18,"src":24683},"\u002Fimgs\u002Fblogs\u002F675732e18cffbc689352dc3f_AD_4nXdT2nIdNKf17PUqzAgoWJt6ZZ836BUKwSuA0IBWNSCbKpmpHeRso2wPSZy-RC7wMuvmgOCWyynGs5Vv2fFjFzJeqU1dNbANICuVCz0k9dhca2kbNu7dXrXjRrAXiBVuCEbq85zpU6LfrUDFXQSE29RExU2P.png",[40,24685,24687],{"id":24686},"streamnative-cloud-powering-real-time-data-solutions","StreamNative Cloud: Powering Real-Time Data Solutions",[48,24689,24690],{},"StreamNative Cloud offers a fully managed cloud service for Apache Pulsar designed to provide scalable, reliable, and secure messaging and event-streaming capabilities. Whether you're building real-time applications, data pipelines, or microservices architectures, StreamNative Cloud provides the foundation you need to ingest, process, and analyze streaming data in real time.",[40,24692,24694],{"id":24693},"why-azure-marketplace","Why Azure Marketplace?",[48,24696,24697],{},"Azure Marketplace is a trusted platform that connects companies with solutions tested and optimized to run on Azure. By launching StreamNative Cloud on Azure Marketplace, we're making it easier for Azure customers to access our streaming data services directly within their existing cloud environment. This integration not only simplifies procurement and management but also ensures seamless compatibility and performance.",[40,24699,24701],{"id":24700},"benefits-for-azure-customers","Benefits for Azure Customers",[321,24703,24704,24707,24710,24713],{},[324,24705,24706],{},"Seamless Integration: Easily connect StreamNative Cloud with your Azure services, leveraging Azure's global infrastructure for a unified cloud experience.",[324,24708,24709],{},"Simplified Procurement: Add StreamNative Cloud to your Azure account with a few clicks, streamlining billing and subscription management.",[324,24711,24712],{},"Trusted Security: Benefit from the combined security features of Azure and StreamNative Cloud, ensuring your data is protected according to the highest standards.",[324,24714,24715],{},"Scalability: Scale your streaming data infrastructure on demand, thanks to the elasticity of the cloud, to meet your evolving business needs.",[40,24717,24719],{"id":24718},"getting-started-is-easy","Getting Started Is Easy",[48,24721,24722],{},"Finding and deploying StreamNative Cloud on Azure Marketplace is straightforward. Search for StreamNative Cloud in the Marketplace or visit our listing directly. From there, you can see detailed information about our offering and get started with just a few clicks.",[48,24724,24725],{},[384,24726],{"alt":18,"src":24727},"\u002Fimgs\u002Fblogs\u002F675732e13e9cb86b64eab30f_AD_4nXdc29crvNtfbBdPRYXbBE5AXjxBtiQsH2EzbUoW3qkliiFBJ-ERpv7M_cIyUg8msYG7fFH0lqCTm-oiK4x5a0izeME1rMaV98-a8LQvNh0RPfU7R6rVao-Os4vwSqi3ohTmwtC6.png",[40,24729,24731],{"id":24730},"join-us-on-this-exciting-journey","Join Us on This Exciting Journey",[48,24733,24734],{},"We believe that real-time data is key to unlocking valuable insights and driving business success. With StreamNative Cloud now available on Azure Marketplace, it's easier than ever to embark on this journey. We're committed to providing our customers with the tools they need to harness the power of real-time data, and we're excited to see what you'll build with StreamNative Cloud on Azure.",[48,24736,24737],{},"For more information about StreamNative Cloud and how to get started, visit our listing on Azure Marketplace today. Welcome to the future of data streaming!",[40,24739,4135],{"id":4132},[48,24741,24742],{},[55,24743,24745],{"href":24556,"rel":24744},[264],"Azure Marketplace - Pay as you go",[48,24747,24748],{},[55,24749,24752],{"href":24750,"rel":24751},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-azure-commitments",[264],"Azure Marketplace with Commitments",[48,24754,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":24756},[24757,24758,24759,24760,24761,24762],{"id":24686,"depth":19,"text":24687},{"id":24693,"depth":19,"text":24694},{"id":24700,"depth":19,"text":24701},{"id":24718,"depth":19,"text":24719},{"id":24730,"depth":19,"text":24731},{"id":4132,"depth":19,"text":4135},"2024-12-09","StreamNative Cloud is now available on Azure Marketplace! Unlock scalable, secure, and real-time messaging and streaming analytics directly within your Azure ecosystem. Streamline integration, procurement, and performance today.","\u002Fimgs\u002Fblogs\u002F675733258a691826c4ec43e0_AzurePartnership.png",{},"\u002Fblog\u002Fstreamnative-cloud-now-available-on-azure-marketplace",{"title":24668,"description":24764},"blog\u002Fstreamnative-cloud-now-available-on-azure-marketplace",[302,3550,821,303],"SZJ4Mz1xnGtxmrYncUWPVD-_ofAd-O9MMBcaM4jJTJs",{"id":24773,"title":24774,"authors":24775,"body":24777,"category":821,"createdAt":290,"date":24863,"description":24864,"extension":8,"featured":294,"image":24865,"isDraft":294,"link":290,"meta":24866,"navigation":7,"order":296,"path":24136,"readingTime":22989,"relatedResources":290,"seo":24867,"stem":24868,"tags":24869,"__hash__":24870},"blogs\u002Fblog\u002Fautomated-geo-replication-set-up-in-streamnative-cloud-pulsar-instances.md","Automated Geo-Replication set up in StreamNative Cloud Pulsar instances",[24776],"Eric Shen",{"type":15,"value":24778,"toc":24856},[24779,24782,24786,24791,24793,24796,24800,24803,24807,24812,24815,24819,24822,24827,24830,24835,24838,24845,24847,24854],[48,24780,24781],{},"Geo-replication has always been a standout feature at the core of Apache Pulsar, enabling seamless data replication across geographically distributed regions for enhanced reliability and availability. However, setting up Geo-replication has traditionally required additional effort, including configuring a global configuration store, ensuring network connectivity, and managing complex credentials. At StreamNative, we’ve worked hard to address these challenges and simplify the process for our customers. Today, we’re excited to introduce Automated Geo-Replication setup in StreamNative Cloud Pulsar instance, a new feature available across all StreamNative product lines that streamlines the setup of Geo-replication and brings enterprise-grade multi-cluster functionality to all our users.",[40,24783,24785],{"id":24784},"what-is-apache-pulsar-geo-replication","What is Apache Pulsar Geo-Replication?",[48,24787,24788],{},[384,24789],{"alt":18,"src":24790},"\u002Fimgs\u002Fblogs\u002F6751e2fe1dca74e89eb5527b_AD_4nXe7vJJWIX8noeDBBm9bNBltjb2SvtKYIisqqVSb4MSiHT5u3ZG7J8YrSKbTj41Bzs3pyxSAFjDY9E4lLBaityQHygADETXrQ6dpnCu87QHlsRdhcMt8gAl3VNcxckQou6lChVOcLg.png",[48,24792,3931],{},[48,24794,24795],{},"Apache Pulsar's geo-replication mechanism is typically used for disaster recovery, enabling the replication of persistently stored message data across multiple data centers. For example, your application is publishing data in one region and you would like to process it for consumption in other regions. With Pulsar's geo-replication mechanism, messages can be produced and consumed in different geo-locations.",[40,24797,24799],{"id":24798},"the-challenge-of-traditional-geo-replication-setup","The Challenge of Traditional Geo-Replication Setup",[48,24801,24802],{},"Setting up the geo-replication in the Apache Pulsar requires some expertise and manual steps like selecting the geo-replication solution between global configuration store or separate configuration store, network infrastructure connectivity, cluster authentication credentials etc. For existing StreamNative Clusters, enabling replication between instances required additional administrative overhead. Customers opting for our \"Pro\" tier could rely on the StreamNative team to configure Geo-replication during provisioning, but this approach wasn’t available to all customers or products.",[40,24804,24806],{"id":24805},"a-simpler-solution-automatedgeo-replication-setup-in-streamnative-cloud","A Simpler Solution: AutomatedGeo-Replication setup in StreamNative Cloud",[48,24808,24809],{},[384,24810],{"alt":18,"src":24811},"\u002Fimgs\u002Fblogs\u002F6751e2fe391c16068439be46_AD_4nXf1zxqexRmcypNCseYEq7W2eQwQRGkMSO9toMDu52oUGyGbmd-m3Hs4mX6GRiXKBfPVSL5ijLe3K0s_7jf0y9nlJttbxqCipW7Ygc7fV1gGdNuwCgFUtjx3vuy--jyCVdUdd698hg.png",[48,24813,24814],{},"To make Geo-replication accessible to all our customers and eliminate the need for complex setup, We introduced the automated multi-cluster creation in a StreamNative Cloud Pulsar instance. The Pulsar instance on StreamNative Cloud is a group of Pulsar clusters that function together as a unified entity. This enhancement allows users to create multiple Pulsar clusters within a single instance effortlessly, ensuring geo-replication is configured with just a few clicks, without the manual configuration traditionally required.",[40,24816,24818],{"id":24817},"managing-multi-clusters-on-streamnative-cloud","Managing multi-clusters on StreamNative Cloud",[48,24820,24821],{},"StreamNative Cloud Console now supports creating multiple Pulsar clusters under the same Pulsar Instance, and you can create the second or following Pulsar clusters on the Pulsar Instance page.",[48,24823,24824],{},[384,24825],{"alt":18,"src":24826},"\u002Fimgs\u002Fblogs\u002F6751e2feb33bf6e8fa07c769_AD_4nXc5_sHxxP4OFG8zJG6x5wLzArBpjZYos1_GpaBOUEQ7lFvQb0L8Eq7IvMlozxxDI4NxUs9gJbfqXUfNoesZBQs1p96Z8rQnIcAnkMYqLlP3oyz3HVtbihFVsudM5KF-poz361XwsQ.png",[48,24828,24829],{},"You can manage the message replication on the Pulsar tenants and namespaces with flexibility. On the namespace configuration, you can select the replication clusters from the dropdown and  any topics that producers or consumers create within that namespace are replicated across clusters.",[48,24831,24832],{},[384,24833],{"alt":18,"src":24834},"\u002Fimgs\u002Fblogs\u002F6751e2fe98cec0369d6fd68e_AD_4nXfOXxCv4YqzfDHok6ctUlH8V1GZItM9-QZdUYSJkhM7XeqmLjR_WZ_wA2HDHNIElsCeTlyBZDGUrlgyGpbCMtFz2OPJeRylCIwco4AZ-WHzkNT9pkPSRzZPI_dTXlcG7MS1-oYgwg.png",[48,24836,24837],{},"Replicated subscription is useful within geo-replication, you can in case of failover, a consumer can restart consuming from the failure point in a different cluster:",[48,24839,24840,24841],{},"Consumer",[24842,24843,24844],"string",{}," consumer = client.newConsumer(Schema.STRING)\n.topic(\"my-topic\")\n.subscriptionName(\"my-subscription\")\n.replicateSubscriptionState(true)\n.subscribe();",[40,24846,2125],{"id":2122},[48,24848,24849,24850,190],{},"StreamNative Cloud's support for Pulsar instance geo-replication is a powerful feature that simplifies the data resilience and availability across global regions. For more information on Automated Geo-Replication setup, please visit the ",[55,24851,7120],{"href":24852,"rel":24853},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-geo-replication",[264],[48,24855,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":24857},[24858,24859,24860,24861,24862],{"id":24784,"depth":19,"text":24785},{"id":24798,"depth":19,"text":24799},{"id":24805,"depth":19,"text":24806},{"id":24817,"depth":19,"text":24818},{"id":2122,"depth":19,"text":2125},"2024-12-05","Learn how StreamNative Cloud's Automated Geo-Replication simplifies the process of replicating data across regions. Enhance reliability, availability, and disaster recovery with streamlined multi-cluster functionality.","\u002Fimgs\u002Fblogs\u002F6751e3a898cec0369d709763_image-16.png",{},{"title":24774,"description":24864},"blog\u002Fautomated-geo-replication-set-up-in-streamnative-cloud-pulsar-instances",[11899,3550,821],"_QSSsBTNKwvasDUktZpqj8ouhgKdWtu2424d8QAYeC8",{"id":24872,"title":24873,"authors":24874,"body":24875,"category":3550,"createdAt":290,"date":24979,"description":24980,"extension":8,"featured":294,"image":24981,"isDraft":294,"link":290,"meta":24982,"navigation":7,"order":296,"path":24983,"readingTime":22989,"relatedResources":290,"seo":24984,"stem":24985,"tags":24986,"__hash__":24987},"blogs\u002Fblog\u002Fstreamnatives-bigquery-integration-achieves-google-cloud-ready---bigquery-designation.md","StreamNative's BigQuery Integration Achieves Google Cloud Ready - BigQuery Designation",[311],{"type":15,"value":24876,"toc":24976},[24877,24895,24898,24901,24906,24915,24918,24941,24943,24947,24954,24959,24962,24965,24968,24974],[48,24878,24879,24880,24885,24886,24890,24891,24894],{},"StreamNative is excited to announce it is now a part of the ",[55,24881,24884],{"href":24882,"rel":24883},"https:\u002F\u002Fcloud.google.com\u002Fbigquery\u002Fdocs\u002Fbigquery-ready-overview",[264],"Google Cloud Ready - BigQuery"," initiative, a unique initiative that validates integrations into ",[55,24887,2143],{"href":24888,"rel":24889},"https:\u002F\u002Fcloud.google.com\u002Fbigquery",[264],". By earning this designation, StreamNative has proven its ",[55,24892,5579],{"href":24893},"\u002Fursa"," meets a core set of functionality and interoperability requirements when integrating with BigQuery.",[48,24896,24897],{},"Today’s  businesses are increasingly relying on real-time analytics to make data-driven decisions and maintain a competitive edge. Real-time analytics enables organizations to process, analyze, and act on data as it is generated, uncovering actionable insights almost instantly. This capability is essential for use cases such as personalized customer experiences, fraud detection, operational monitoring, and dynamic pricing. Achieving this level of agility requires a seamless integration of robust data streaming platforms and advanced analytics tools.",[48,24899,24900],{},"StreamNative’s data streaming platform, built on Apache Pulsar, and Apache Kafka ensures reliable ingestion and processing of high-throughput event streams, while AI-ready data analytics platforms like Google BigQuery provide the scalability and intelligence needed for complex analysis and machine learning. Together, these technologies create an end-to-end solution that transforms raw data into actionable insights, empowering organizations to stay ahead in the era of real-time decision-making.",[48,24902,24903],{},[384,24904],{"alt":18,"src":24905},"\u002Fimgs\u002Fblogs\u002F674ded36ef191be0d235724b_AD_4nXdT66nfrR-dvFiW9O5xnHxlV1OogT6spHelxfsDkgs7jQcHVtFbP8qVJ6ZHwUUu3byEUjULzxUmvFv05x99Bi7KG51CSqijsGjG2OvBfuAnV10j-DktC8yApOxJjlq_pmvPuxA2.png",[48,24907,24908,24909,24914],{},"StreamNative's integration with Google BigQuery through the ",[55,24910,24913],{"href":24911,"rel":24912},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-google-bigquery-sink-v4.0",[264],"BigQuery Sink Connector"," offers a seamless way to stream data directly into BigQuery for real-time analytics and AI-ready insights. Leveraging Apache Pulsar's multi-tenancy and scalability, the connector ensures efficient ingestion of high-throughput event streams into BigQuery’s powerful analytics engine. This integration enables businesses to effortlessly bridge their streaming data with advanced analytics capabilities, unlocking opportunities for real-time decision-making, predictive modeling, and machine learning workflows. Validated under the Google Cloud Ready BigQuery initiative, StreamNative provides a robust, enterprise-grade solution for end-to-end data streaming and analytics.",[48,24916,24917],{},"The StreamNative BigQuery Sink connector provides a comprehensive set of robust features, as outlined below.",[321,24919,24920,24923,24926,24929,24932,24935,24938],{},[324,24921,24922],{},"Real-Time Data Streaming: Enables seamless ingestion of high-throughput event streams into Google BigQuery for instant analytics.",[324,24924,24925],{},"Scalability: Built on Apache Pulsar, the connector handles massive data volumes with ease.",[324,24927,24928],{},"Multi-Tenancy Support: Supports multi-tenant architectures for streamlined data processing across teams or applications.",[324,24930,24931],{},"Schema Management: Automatically manages schemas for structured data integration.",[324,24933,24934],{},"Ease of Use: Simplifies setup and configuration for fast and efficient pipeline deployment.",[324,24936,24937],{},"Fully managed in StreamNative Cloud: Natively connects data from StreamNative Cloud to BigQuery for unified analytics workflows.",[324,24939,24940],{},"AI-Ready Data: Prepares data in BigQuery for advanced analytics, machine learning, and predictive modeling.",[48,24942,3931],{},[40,24944,24946],{"id":24945},"google-cloud-ready-bigquery-designation","Google Cloud Ready - BigQuery designation",[48,24948,24949,24953],{},[55,24950,24952],{"href":24882,"rel":24951},[264],"Google's Cloud Ready Initiative for BigQuery"," is designed to validate and certify partner integrations that meet Google Cloud’s high standards for functionality, reliability, and performance. This designation helps customers identify solutions that seamlessly integrate with BigQuery, ensuring optimized data flows and advanced analytics capabilities. Validated solutions undergo rigorous testing to ensure compatibility and deliver consistent, enterprise-grade performance. By participating in this initiative, StreamNative demonstrates their commitment to providing reliable, scalable, and secure integrations that help businesses harness the full power of BigQuery for real-time analytics and AI-driven insights.",[48,24955,24956],{},[384,24957],{"alt":18,"src":24958},"\u002Fimgs\u002Fblogs\u002F674ded368a6718b59c9ae684_AD_4nXef9_9kP_glvLA8cAdJJK3XZ1c5eqpjtEr36GFZt7kUxbHcSP1TdcXxbDUjo-rTRCja-9TBRHBslmpfdgGoV4wtod5O92Y3bcn0O6c7QgB7bDGagia4-heHZwo05AiUeuSzvozgRg.png",[48,24960,24961],{},"As part of this initiative, Google engineering teams validate partner integrations into BigQuery in a three-phase process: Evaluate - run a series of data integration tests and compare results against benchmarks, Enhance - work closely with partners to fill any gaps, and Enable - refine documentation for our mutual customers. Being part of the initiative, StreamNative collaborates closely with Google partner engineering and BigQuery teams to develop joint roadmaps.",[48,24963,24964],{},"StreamNative's validation under the Google Cloud Ready initiative for BigQuery offers customers enhanced confidence, backed by Google's recognition of its robust integration and performance. As a featured partner in this initiative, StreamNative demonstrates its commitment to delivering reliable, enterprise-grade solutions for real-time data streaming and analytics. StreamNative’s alignment with Google’s vision, roadmap, and regular validation of its BigQuery integration ensures customers benefit from a forward-looking, reliable solution that evolves with their needs. This collaboration delivers seamless, cutting-edge capabilities for real-time streaming and analytics, backed by continuous innovation and the assurance of Google’s rigorous standards.",[48,24966,24967],{},"“Data-driven organizations require frictionless integration between their data streaming platforms and analytics solutions to derive real-time insights at scale. StreamNative's validation through the BigQuery Ready Initiative demonstrates our commitment to providing enterprise customers with reliable, tested integrations that accelerate their data analytics initiatives\" said Naveen Punjabi, Global Partnership Lead for Data Analytics at Google Cloud. “Together with StreamNative, we're enabling customers to confidently build modern data pipelines that combine the power of real-time streaming with BigQuery's advanced analytics capabilities.\"",[48,24969,24970,24973],{},[55,24971,7137],{"href":17075,"rel":24972},[264]," to quickly build a seamless data ingestion pipeline between StreamNative Cloud and Google BigQuery.",[48,24975,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":24977},[24978],{"id":24945,"depth":19,"text":24946},"2024-12-02","StreamNative achieves Google Cloud Ready - BigQuery designation, validating seamless integration for real-time data streaming and advanced analytics. Empower your business with reliable pipelines, AI-ready insights, and scalable event stream processing using Apache Pulsar and Google BigQuery.","\u002Fimgs\u002Fblogs\u002F674dedc71d8b6c97378ab487_image-15.png",{},"\u002Fblog\u002Fstreamnatives-bigquery-integration-achieves-google-cloud-ready-bigquery-designation",{"title":24873,"description":24980},"blog\u002Fstreamnatives-bigquery-integration-achieves-google-cloud-ready---bigquery-designation",[1332,10054,800,302],"nikmSVGLlHcfurJ3fBafG74FvSGgq6d8Qw42ViNBP8U",{"id":24989,"title":24990,"authors":24991,"body":24992,"category":3550,"createdAt":290,"date":25353,"description":25354,"extension":8,"featured":294,"image":25355,"isDraft":294,"link":290,"meta":25356,"navigation":7,"order":296,"path":25357,"readingTime":25358,"relatedResources":290,"seo":25359,"stem":25360,"tags":25361,"__hash__":25362},"blogs\u002Fblog\u002Fmy-data-arrived-1ms-late.md","My data arrived 1ms late!",[22998],{"type":15,"value":24993,"toc":25343},[24994,24997,25000,25003,25007,25018,25022,25030,25034,25042,25046,25054,25058,25066,25070,25078,25082,25090,25093,25119,25122,25125,25130,25135,25140,25145,25159,25164,25178,25183,25197,25202,25213,25218,25229,25234,25245,25250,25261,25266,25277,25282,25296,25301,25315,25317,25321,25324,25330,25333,25341],[48,24995,24996],{},"John Doe, a platform engineer, was watching his dashboard closely when he noticed a troubling lag. His data wasn’t arriving exactly when it happened. The dashboard wasn’t reflecting the real-time data he needed—critical data that a stock trading analyst relied on to make split-second decisions. He’d poured his time and energy into designing this system, carefully picking the best data streaming technology and the perfect tech stack to make it as real-time as possible. And now, with a 1ms delay, the trading team might miss crucial opportunities. So what should John do?",[48,24998,24999],{},"When we talk about \"real-time,\" most people imagine high-stakes trading floors where milliseconds can mean the difference between profit and loss. And yes, for some data, arriving even 1ms late can indeed be a game-changer. But is every use case worth the intense effort and high cost of achieving near-zero latency? After all, 100ms is still less than a second. The answer depends on the time value of data—how valuable data is when it arrives instantaneously versus a few milliseconds (or more) later.",[48,25001,25002],{},"So how do we decide between ultra-low latency and more relaxed latency? Let’s break down the factors that determine if the chase for low latency is worth it.",[40,25004,25006],{"id":25005},"assess-the-real-time-needs-of-the-use-case","Assess the Real-Time Needs of the Use Case",[321,25008,25009,25012,25015],{},[324,25010,25011],{},"Immediate Impact on User Experience: For applications where delays are noticeable to users or degrade the experience (e.g., gaming, VR, or real-time communication), low latency (\u003C100ms) is essential.",[324,25013,25014],{},"Safety and Criticality: Applications related to safety or critical decision-making, like autonomous vehicles or emergency healthcare systems, require low latency to ensure timely actions.",[324,25016,25017],{},"Tolerance for Delays: If a use case can tolerate slight delays without impacting quality (e.g., streaming analytics dashboards, social media feeds), higher latency is often acceptable.",[40,25019,25021],{"id":25020},"determine-the-interaction-frequency-and-responsiveness-requirements","Determine the Interaction Frequency and Responsiveness Requirements",[321,25023,25024,25027],{},[324,25025,25026],{},"High-Frequency Interactions: Workloads that involve continuous, high-frequency interactions, like high-speed trading or multiplayer gaming, benefit from low latency to keep interactions smooth and synchronized.",[324,25028,25029],{},"Asynchronous or Periodic Updates: If updates can be batched or processed at intervals without affecting performance or insights (e.g., predictive maintenance or IoT sensor aggregation), higher latency is typically acceptable.",[40,25031,25033],{"id":25032},"evaluate-technical-requirements-and-constraints","Evaluate Technical Requirements and Constraints",[321,25035,25036,25039],{},[324,25037,25038],{},"Data Volume and Processing Complexity: Low latency requires fast processing and transmission, which can be costly or complex with large data volumes or intricate computations. If processing time or network transmission could introduce delays, high latency may be preferable.",[324,25040,25041],{},"Network and Bandwidth Constraints: Low latency often demands stable, high-bandwidth network infrastructure. For workloads running over unreliable or variable networks, higher latency might be more realistic and cost-effective.",[40,25043,25045],{"id":25044},"consider-cost-implications","Consider Cost Implications",[321,25047,25048,25051],{},[324,25049,25050],{},"Resource Cost: Low-latency systems require more computing power, optimized algorithms, and sometimes specialized hardware, which can be expensive. For non-critical or cost-sensitive applications, higher latency may be a viable compromise.",[324,25052,25053],{},"Scalability Needs: Low-latency infrastructure can become costly at scale (e.g., in distributed systems or global applications). If the application is intended for a large user base or frequent interactions, a balanced latency approach may help reduce costs.",[40,25055,25057],{"id":25056},"understand-user-expectations-and-perceptions","Understand User Expectations and Perceptions",[321,25059,25060,25063],{},[324,25061,25062],{},"Perceived Delay Tolerance: For some applications (e.g., retail or content delivery), user perception is more flexible regarding delays, and latencies above 100ms may not be noticeable or frustrating.",[324,25064,25065],{},"Competitive Benchmarking: Analyze latency expectations within the industry. For instance, finance, gaming, and customer service sectors often have low-latency benchmarks to remain competitive, while others, like analytics, can handle more relaxed latency standards.",[40,25067,25069],{"id":25068},"identify-regulatory-or-compliance-requirements","Identify Regulatory or Compliance Requirements",[321,25071,25072,25075],{},[324,25073,25074],{},"Compliance Requirements: Some industries have strict regulations around response times for data processing (e.g., financial trading), where compliance necessitates low latency.",[324,25076,25077],{},"Data Privacy and Localization: Regulations that require data to be processed within specific regions may impact latency, as data cannot be geo-replicated globally for speed. If regulations are flexible, higher latency may be acceptable if it lowers complexity.",[40,25079,25081],{"id":25080},"weigh-the-risk-of-latency-on-business-outcomes","Weigh the Risk of Latency on Business Outcomes",[321,25083,25084,25087],{},[324,25085,25086],{},"Business Impact of Delays: If delays could result in missed opportunities (e.g., in trading) or dissatisfied users (e.g., in customer support), prioritize low latency. However, if slight delays don’t compromise business outcomes, opt for a latency level that balances performance with efficiency.",[324,25088,25089],{},"Error Tolerance and Retries: High-latency systems often allow more error tolerance, as they can process data in batches or at intervals. If error tolerance is a priority, higher latency may be acceptable, whereas low latency typically requires high accuracy and minimal retries.",[48,25091,25092],{},"Here is the summary guidance for choosing latency optimized vs. latency relaxed \u002F cost optimized:",[321,25094,25095,25098,25101,25104,25107,25110,25113,25116],{},[324,25096,25097],{},"Choose Latency Optimized (\u003C100ms)Real-time interaction and immediate responsiveness are essential.",[324,25099,25100],{},"User experience will suffer noticeably from delays.",[324,25102,25103],{},"Safety or critical business decisions are impacted by response time.",[324,25105,25106],{},"The use case demands precise synchronization (e.g., multiplayer gaming, telemedicine).\nChoose Latency Relaxed and Cost Optimized (>100ms)",[324,25108,25109],{},"Slight delays are tolerable without impacting functionality or user experience.",[324,25111,25112],{},"Batching or periodic processing is feasible, reducing the cost of constant responsiveness.",[324,25114,25115],{},"Network infrastructure constraints make ultra-low latency impractical.",[324,25117,25118],{},"Costs, scalability, or compliance concerns outweigh the benefits of low latency.",[48,25120,25121],{},"Here are the use cases we've been collected from our customers and community members.",[48,25123,25124],{}," td {\n   padding: 0 15px;\n }",[48,25126,25127],{},[44,25128,25129],{},"Industry",[48,25131,25132],{},[44,25133,25134],{},"Use Cases Requiring \u003C100ms Latency",[48,25136,25137],{},[44,25138,25139],{},"Use Cases Tolerating >100ms Latency",[48,25141,25142],{},[44,25143,25144],{},"Gaming",[321,25146,25147,25150,25153,25156],{},[324,25148,25149],{},"Multiplayer online games",[324,25151,25152],{},"Real-time interactive VR\u002FAR gaming",[324,25154,25155],{},"In-game recommendations (e.g., suggested purchases)",[324,25157,25158],{},"Game data analytics",[48,25160,25161],{},[44,25162,25163],{},"Finance",[321,25165,25166,25169,25172,25175],{},[324,25167,25168],{},"High-frequency trading",[324,25170,25171],{},"Algorithmic trading",[324,25173,25174],{},"Fraud detection",[324,25176,25177],{},"Portfolio monitoring and risk management",[48,25179,25180],{},[44,25181,25182],{},"Healthcare",[321,25184,25185,25188,25191,25194],{},[324,25186,25187],{},"Remote robotic-assisted surgery",[324,25189,25190],{},"Real-time critical patient monitoring",[324,25192,25193],{},"Non-critical remote patient monitoring",[324,25195,25196],{},"Health & wellness tracking apps",[48,25198,25199],{},[44,25200,25201],{},"Telecommunications",[321,25203,25204,25207,25210],{},[324,25205,25206],{},"VoIP and video conferencing (e.g., Zoom, Teams)",[324,25208,25209],{},"Network performance monitoring",[324,25211,25212],{},"Call quality analysis",[48,25214,25215],{},[44,25216,25217],{},"Automotive",[321,25219,25220,25223,25226],{},[324,25221,25222],{},"Autonomous vehicle navigation and obstacle detection",[324,25224,25225],{},"Fleet tracking for logistics",[324,25227,25228],{},"Vehicle diagnostics and maintenance",[48,25230,25231],{},[44,25232,25233],{},"Retail\u002FE-commerce",[321,25235,25236,25239,25242],{},[324,25237,25238],{},"Augmented reality (AR) virtual try-ons",[324,25240,25241],{},"Personalized product recommendations",[324,25243,25244],{},"Inventory tracking",[48,25246,25247],{},[44,25248,25249],{},"Manufacturing",[321,25251,25252,25255,25258],{},[324,25253,25254],{},"Robotics and machine control on production lines",[324,25256,25257],{},"Predictive maintenance analytics",[324,25259,25260],{},"Inventory and supply chain updates",[48,25262,25263],{},[44,25264,25265],{},"Media & Streaming",[321,25267,25268,25271,25274],{},[324,25269,25270],{},"Cloud gaming services (e.g., Stadia, GeForce NOW)",[324,25272,25273],{},"Social media feed updates",[324,25275,25276],{},"Ad insertion in streaming video",[48,25278,25279],{},[44,25280,25281],{},"Smart Cities\u002FIoT",[321,25283,25284,25287,25290,25293],{},[324,25285,25286],{},"Traffic light control systems",[324,25288,25289],{},"Emergency response systems",[324,25291,25292],{},"Environmental monitoring (e.g., air quality sensors)",[324,25294,25295],{},"Smart metering",[48,25297,25298],{},[44,25299,25300],{},"Customer Support",[321,25302,25303,25306,25309,25312],{},[324,25304,25305],{},"Real-time chat support",[324,25307,25308],{},"Voice-enabled assistants",[324,25310,25311],{},"Ticket status updates",[324,25313,25314],{},"Customer feedback analytics",[48,25316,3931],{},[40,25318,25320],{"id":25319},"the-bottom-line-understanding-the-value-of-every-millisecond","The Bottom Line: Understanding the Value of Every Millisecond",[48,25322,25323],{},"Ultimately, choosing between low and high latency comes down to the time value of your data. For some applications, every millisecond matters—ultra-low latency can make or break the experience or profitability. But for other use cases, a slight delay won’t impact performance, and allowing some flexibility can free up resources for other critical areas without sacrificing functionality.",[48,25325,25326,25327,25329],{},"At StreamNative, we provide the flexibility to help you choose the right solution for your specific needs. For ultra-low latency requirements, our Classic Engine is optimized to deliver immediate data processing, perfect for high-stakes applications like trading and real-time monitoring. For workloads that can tolerate a bit more time—where near-real-time data is enough—our ",[55,25328,4725],{"href":24893}," is designed to handle relaxed latency with optimal efficiency.",[48,25331,25332],{},"With StreamNative, you can strike the right balance between performance and resource efficiency, making sure your data is there when it counts.",[48,25334,25335,25336,25340],{},"Ready to experience real-time data streaming tailored to your needs? ",[55,25337,25339],{"href":15003,"rel":25338},[264],"Sign up today"," and get $200 in free credits to explore both the Classic Engine for ultra-low latency and the Ursa Engine for flexible, high-efficiency workloads.",[48,25342,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":25344},[25345,25346,25347,25348,25349,25350,25351,25352],{"id":25005,"depth":19,"text":25006},{"id":25020,"depth":19,"text":25021},{"id":25032,"depth":19,"text":25033},{"id":25044,"depth":19,"text":25045},{"id":25056,"depth":19,"text":25057},{"id":25068,"depth":19,"text":25069},{"id":25080,"depth":19,"text":25081},{"id":25319,"depth":19,"text":25320},"2024-11-20","When every millisecond counts, like in trading or gaming, ultra-low latency is critical. But is it always worth the cost? Discover how to decide between latency-optimized and cost-efficient solutions for your use case.","\u002Fimgs\u002Fblogs\u002F673e271286ac0368df40fbc3_image-12.png",{},"\u002Fblog\u002Fmy-data-arrived-1ms-late","15min",{"title":24990,"description":25354},"blog\u002Fmy-data-arrived-1ms-late",[1331,303],"QO8qtjJ0zfopQJfThnMwo3rrra-fDBkG3PHqNUfPhmQ",{"id":25364,"title":25365,"authors":25366,"body":25367,"category":3550,"createdAt":290,"date":25664,"description":25665,"extension":8,"featured":294,"image":25666,"isDraft":294,"link":290,"meta":25667,"navigation":7,"order":296,"path":23502,"readingTime":22989,"relatedResources":290,"seo":25668,"stem":25669,"tags":25670,"__hash__":25671},"blogs\u002Fblog\u002Fannouncing-the-ursa-engine-public-preview-for-streamnative-byoc-clusters.md","Announcing the Ursa Engine Public Preview for StreamNative BYOC Clusters",[806],{"type":15,"value":25368,"toc":25649},[25369,25389,25392,25396,25399,25402,25416,25419,25427,25431,25434,25439,25441,25444,25448,25451,25454,25468,25472,25475,25478,25481,25486,25489,25493,25496,25499,25502,25505,25509,25512,25517,25520,25534,25538,25541,25549,25554,25565,25569,25581,25585,25599,25603,25617,25621,25624,25635,25638,25642],[48,25370,25371,25372,25374,25375,25379,25380,25384,25385,25388],{},"At the Pulsar Virtual Summit EMEA 2024, we introduced the ",[55,25373,4725],{"href":24893},"—a groundbreaking leap forward in data streaming architecture. The feedback has been overwhelmingly positive from both customers and prospects. Today, we are excited to announce the Ursa Engine Public Preview for StreamNative AWS BYOC clusters, unlocking new possibilities with ",[55,25376,25378],{"href":25377},"\u002Fblog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming#no-keepers","No Keepers architecture"," (",[55,25381,25383],{"href":25382},"\u002Fblog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming#our-ambition-unify-data-streaming-and-data-lakes","Phase 3 of Ursa","), leveraging ",[55,25386,5599],{"href":22142,"rel":25387},[264]," for scalable metadata storage and S3 for Write-Ahead Log storage. With this release, Phase 3 of the Ursa rollout enters Public Preview, advancing the engine into the next stage—providing users with greater flexibility, scalability, and cost-efficiency.",[48,25390,25391],{},"In this blog post, we’ll explore what the Ursa Engine offers, how it differs from the Classic Engine based on open-source Pulsar, and what the Public Preview unlocks for BYOC customers.",[40,25393,25395],{"id":25394},"what-is-the-ursa-engine","What is the Ursa Engine?",[48,25397,25398],{},"The Ursa Engine is a next-generation data streaming engine, designed to address the need for cost-effective, future-proof streaming platforms. Building upon Pulsar’s cloud-native architecture, it introduces innovations to reduce costs while maintaining flexibility.",[48,25400,25401],{},"Key pillars of the Ursa Engine include:",[321,25403,25404,25407,25410,25413],{},[324,25405,25406],{},"Fully Kafka API-compatible",[324,25408,25409],{},"Lakehouse storage for long-term durability and open standards",[324,25411,25412],{},"Oxia as a scalable metadata store",[324,25414,25415],{},"Support for both BookKeeper-based and S3-based WAL (Write-Ahead Log)",[48,25417,25418],{},"The engine supports two WAL implementations:",[1666,25420,25421,25424],{},[324,25422,25423],{},"Latency-Optimized WAL (BookKeeper-based) for transactional, low-latency workloads",[324,25425,25426],{},"Cost-Optimized WAL (Object storage-based) for workloads with relaxed latency requirements",[40,25428,25430],{"id":25429},"how-is-the-ursa-engine-different-from-the-classic-engine","How is the Ursa Engine Different from the Classic Engine?",[48,25432,25433],{},"Ursa targets cost-sensitive, latency-relaxed workloads. Ursa shifts from the ZooKeeper-based Classic Engine toward a headless stream storage architecture, using Oxia as a metadata store, making BookKeeper optional and relying on S3 object storage for durability. Below is a feature comparison between the Classic and Ursa Engines:",[48,25435,25436],{},[384,25437],{"alt":18,"src":25438},"\u002Fimgs\u002Fblogs\u002F67214a321b338bf375ef8a9a_AD_4nXdYz32zR0wxXn9neXJIingh-goy94qtKcRf1XQAy7bzEMcWR6D404x-h3DlWANqudu7bCFKvRaz_YRrXv1xMEdlRDIq6QhvaVPlsMVMPI8YOz8C2HyP6Y3bPbGA6cQZeTAFTRmCD4bxe-0VHTcFpZ9cpVI.png",[48,25440,3931],{},[48,25442,25443],{},"The Kafka API compatibility and Lakehouse Storage are available as add-ons for Classic Engine users. However, Oxia and S3-based WAL are integral to the Ursa Engine and cannot be retrofitted onto existing clusters.",[40,25445,25447],{"id":25446},"what-is-included-in-the-public-preview","What is included in the Public Preview?",[48,25449,25450],{},"With the Public Preview, users can access core Ursa Engine Features such as Oxia-based metadata management and S3-based WAL,.",[48,25452,25453],{},"In order to to focus on gathering feedback and optimizing the experience for broader production use, we are limiting the feature set of this Public Preview::",[321,25455,25456,25459,25462,25465],{},[324,25457,25458],{},"Only Kafka protocol is enabled during Public Preview",[324,25460,25461],{},"Transactions and topic compaction are not yet supported.",[324,25463,25464],{},"Only S3-based WAL is available, making it ideal for latency-relaxed use cases.",[324,25466,25467],{},"For ultra-low-latency applications, we recommend continuing with the Classic Engine until the Ursa Engine reaches full GA.",[40,25469,25471],{"id":25470},"shaping-ursas-stream-storage","Shaping Ursa’s Stream Storage",[48,25473,25474],{},"As part of the Public Preview for the Ursa Engine, we are defining its storage architecture with Oxia as the metadata store, S3 as a primary storage option, and Lakehouse table formats as the open standard for long-term storage. Together, we refer to this architecture as Ursa Stream Storage—a headless, multi-modal data storage system built on lakehouse formats. We will publish a follow-up blog post exploring the details of Ursa Stream Storage and its format, which we believe extends and enhances existing open table standards.",[48,25476,25477],{},"In the meantime, here’s a quick sneak peek at the core of Ursa Stream Storage.",[48,25479,25480],{},"At the heart of Ursa Stream Storage is a WAL (Write-Ahead Log) implementation based on S3. This design writes records directly to object storage services like S3, bypassing BookKeeper and eliminating the need for replication between brokers. As a result, Ursa Engine-powered clusters replace expensive inter-AZ replication with cost-efficient, direct-to-object-storage writes. This trade-off introduces a slight increase in latency (from 200ms to 500ms) but results in significantly lower network costs—on average 10x cheaper.",[48,25482,25483],{},[384,25484],{"alt":18,"src":25485},"\u002Fimgs\u002Fblogs\u002F672148fa26168cff7ad2e04f_AD_4nXfz9LnW7cFnMOn4AzXZGxa0H6vYlw8O3s9zXcCvtbRQ9PuwDI5wNNY6X5tIpkYbtoMkxtN8vVt2--IM8REgRxq-x2SstBnxqcIxbxBf2EYNGv2yNFScB_IjJ3HQnHuBP0OfrfrwSElgUjRH5neC7vkAKlIE.png",[48,25487,25488],{},"Figure 1. Object-storage based WAL eliminates inter-AZ replication traffic",[40,25490,25492],{"id":25491},"how-does-the-ursa-stream-storage-achieve-these-savings","How Does the Ursa Stream Storage Achieve These Savings?",[48,25494,25495],{},"In the S3-based WAL implementation, brokers create batches of produce requests and write them directly to object storage before acknowledging the client. These brokers are stateless and leaderless, meaning any broker can handle produce or fetch requests for any partition. For improved batch and fetch performance, however, specific partitions may still be routed to designated brokers.",[48,25497,25498],{},"This architecture eliminates inter-AZ replication traffic between brokers, while maintaining—and even improving—the durability and availability that customers expect from StreamNative.",[48,25500,25501],{},"As with any engineering trade-off, these savings come at a cost: Produce requests must now wait for acknowledgments from object storage, introducing some additional latency. However, this trade-off can result in up to 90% cost savings, making it a compelling choice for cost-sensitive workloads.",[48,25503,25504],{},"There’s much more to explore about the technology behind Ursa Stream Storage—stay tuned for a more detailed technical blog post coming soon!",[40,25506,25508],{"id":25507},"auto-scaling-clusters-offer-even-more-cost-savings","Auto-scaling clusters offer even more cost savings",[48,25510,25511],{},"Sizing and capacity planning are among the most challenging (and expensive) aspects of running and managing data streaming platforms like Apache Kafka. To handle peak workloads, users often have to over-provision resources, which results in paying for underutilized capacity most of the time. Optimizing utilization independently can also introduce significant operational overhead and complexity.",[48,25513,25514],{},[384,25515],{"alt":18,"src":25516},"\u002Fimgs\u002Fblogs\u002F672148fb8a39c46336f25fa8_AD_4nXfLugD33LzXsTqBYNb9RzSa4lw4SExOkWAXDb9H6Mg0mWJldm03DTFSVm6CLhjmb8ZhPU3FL0MjtnrCNUiw7khFIbjsJcRKu0tlLOLUL9S_LMvYBT_udSf9Fz2V-c1N7rsZqnUtb-t1NdBAJHgyDwkT7rk.png",[48,25518,25519],{},"Figure 2. Overprovisioned resources result in excessive spending on underutilized resources",[48,25521,25522,25523,25527,25528,25533],{},"Ursa Engine-powered clusters utilize the same ",[55,25524,25526],{"href":22693,"rel":25525},[264],"Elastic Throughput Unit (ETU)"," model as Serverless clusters, decoupling billing from the underlying resources in use to provide more value-based pricing. This means you only pay for the capacity you actually consume, avoiding the costs associated with over-provisioning. With ",[55,25529,25532],{"href":25530,"rel":25531},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-autoscaling",[264],"broker auto-scaling",", Ursa clusters automatically adjust to match your workload—without user intervention—helping you save on infrastructure costs. BYOC customers pay StreamNative based on actual throughput and pay their cloud provider only for the underlying resources when they are needed.",[40,25535,25537],{"id":25536},"how-to-get-started-with-ursa-engine","How to Get Started with Ursa Engine",[48,25539,25540],{},"Getting started with the Ursa Engine is simple:",[1666,25542,25543,25546],{},[324,25544,25545],{},"Create a BYOC instance through the StreamNative portal.",[324,25547,25548],{},"Select Ursa as the engine when creating the Instance.",[48,25550,25551],{},[384,25552],{"alt":18,"src":25553},"\u002Fimgs\u002Fblogs\u002F672148fba32f7af3298b4693_AD_4nXd7pjO6GGT_Wzh1xFMt3Xb-A-F2PHDuC3TtcXEb6HUlSPNrZ5jNCkBjzYkZwxkhK7TMwnMJXeTMRZerLDxtsb8NrMDI3P6ezFnfF829Ft5EJ1PhF8njzk3taL6xQYl9XuCJOJby49zyq7tXAL2hePJmyRk.png",[48,25555,25556,25557,4003,25560,190],{},"For a step-by-step tutorial, refer to our ",[55,25558,7120],{"href":23665,"rel":25559},[264],[55,25561,25564],{"href":25562,"rel":25563},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=R6Wga1JNrts",[264],"instructional video",[40,25566,25568],{"id":25567},"one-more-thing-how-to-migrate-to-ursa-engine","One More Thing: How to Migrate to Ursa Engine",[48,25570,25571,25572,25575,25576,18054],{},"So, Ursa is great—but how do you migrate to it? Since the Ursa Engine uses Oxia for metadata storage, whereas the Classic Engine relies on ZooKeeper, there is no direct upgrade path from the Classic Engine to the Ursa Engine. However, we are introducing a new data replication tool called ",[55,25573,18561],{"href":25574},"\u002Funiversal-linking-lp"," to streamline the migration process from both Kafka and Pulsar clusters to an Ursa Engine-powered cluster. Please checkout our ",[55,25577,25580],{"href":25578,"rel":25579},"http:\u002F\u002Fstreamnative.io\u002Fblog\u002Fintroducing-universal-linking-revolutionizing-data-replication-and-interoperability-across-data-streaming-systems",[264],"announcement blog post",[32,25582,25584],{"id":25583},"migration-from-classic-engine","Migration from Classic Engine",[1666,25586,25587,25590,25593,25596],{},[324,25588,25589],{},"Offload data from your Classic Engine cluster to Lakehouse Storage.",[324,25591,25592],{},"Spin up a new Ursa Engine instance.",[324,25594,25595],{},"Connect both clusters via UniLink and allow data replication.",[324,25597,25598],{},"Redirect consumers to the Ursa Engine instance, finalize data offloading, and switch producers.",[32,25600,25602],{"id":25601},"migration-from-kafka","Migration from Kafka",[1666,25604,25605,25608,25611,25614],{},[324,25606,25607],{},"Create an Ursa Engine cluster.",[324,25609,25610],{},"Configure UniLink to replicate data from your Kafka cluster.",[324,25612,25613],{},"Migrate consumers first, then switch producers.",[324,25615,25616],{},"Complete the migration with minimal downtime.",[40,25618,25620],{"id":25619},"up-to-90-savings-with-ursa-engine","Up to 90% Savings with Ursa Engine",[48,25622,25623],{},"In summary, the Ursa Engine leverages Ursa Stream Storage to deliver significant cost savings and performance benefits:",[321,25625,25626,25629,25632],{},[324,25627,25628],{},"Leaderless brokers: Any broker can serve produce or consume requests for any partition.",[324,25630,25631],{},"Object storage based writes: Produce requests are batched at the broker and sent directly to object storage before acknowledgment, eliminating the need for broker replication and avoiding inter-AZ network charges.",[324,25633,25634],{},"Elastic Throughput Units (ETUs): Ursa clusters auto-scale based on workload demands, ensuring you only pay for the throughput and resources you need, when you need them.",[48,25636,25637],{},"These innovations are designed for high-throughput use cases with relaxed latency requirements, delivering up to 90% cost savings compared to other data streaming platforms that rely on disk-based replication mechanisms.",[40,25639,25641],{"id":25640},"sign-up-for-the-public-preview-today","Sign Up for the Public Preview Today",[48,25643,25644,25645,25648],{},"We are excited for you to experience the power of Ursa Engine-powered clusters. ",[55,25646,25339],{"href":17075,"rel":25647},[264]," and give it a try!",{"title":18,"searchDepth":19,"depth":19,"links":25650},[25651,25652,25653,25654,25655,25656,25657,25658,25662,25663],{"id":25394,"depth":19,"text":25395},{"id":25429,"depth":19,"text":25430},{"id":25446,"depth":19,"text":25447},{"id":25470,"depth":19,"text":25471},{"id":25491,"depth":19,"text":25492},{"id":25507,"depth":19,"text":25508},{"id":25536,"depth":19,"text":25537},{"id":25567,"depth":19,"text":25568,"children":25659},[25660,25661],{"id":25583,"depth":279,"text":25584},{"id":25601,"depth":279,"text":25602},{"id":25619,"depth":19,"text":25620},{"id":25640,"depth":19,"text":25641},"2024-10-30","Discover the Ursa Engine—StreamNative’s latest innovation in data streaming. Now available for public preview on AWS BYOC clusters, the Ursa Engine offers groundbreaking flexibility, scalability, and cost-efficiency. Explore its No Keepers architecture, Oxia metadata storage, S3-based Write-Ahead Log, and potential 90% cost savings for latency-relaxed workloads. Dive into the future of data streaming architecture!","\u002Fimgs\u002Fblogs\u002F67214accc37a794d464be872_Ursa-Engine-Public-Preview_BlogPost.png",{},{"title":25365,"description":25665},"blog\u002Fannouncing-the-ursa-engine-public-preview-for-streamnative-byoc-clusters",[1332,799,821,10322,16985,5376,5954],"RCfUP7D8yyOjkScjZJsdvhYaqybeJaLFpGh88ceprlI",{"id":25673,"title":18857,"authors":25674,"body":25675,"category":3550,"createdAt":290,"date":25664,"description":25985,"extension":8,"featured":294,"image":25986,"isDraft":294,"link":290,"meta":25987,"navigation":7,"order":296,"path":18856,"readingTime":17934,"relatedResources":290,"seo":25988,"stem":25989,"tags":25990,"__hash__":25991},"blogs\u002Fblog\u002Fintroducing-universal-linking-revolutionizing-data-replication-and-interoperability-across-data-streaming-systems.md",[806],{"type":15,"value":25676,"toc":25976},[25677,25688,25696,25700,25703,25718,25727,25735,25740,25742,25746,25749,25752,25757,25759,25762,25832,25834,25839,25841,25844,25847,25850,25854,25857,25860,25863,25866,25869,25873,25876,25880,25883,25887,25890,25894,25897,25901,25904,25907,25912,25915,25923,25928,25933,25938,25943,25948,25952,25955,25958,25962,25970,25972],[48,25678,2609,25679,25684,25685,25687],{},[55,25680,25683],{"href":25681,"rel":25682},"http:\u002F\u002Fstreamnative.io\u002Fblog\u002Fannouncing-the-ursa-engine-public-preview-for-streamnative-byoc-clusters",[264],"previous blog post",", we were excited to announce the public preview release of the Ursa Engine, now available on StreamNative Cloud. At its core, the Ursa Engine features a headless, multi-modal storage layer called Ursa Stream Storage, which leverages object storage (S3) for both replication and storage to achieve cost-effective data streaming. Now, we're taking this principle even further by introducing ",[55,25686,1249],{"href":25574},"—a solution designed to address the challenges of migration and data replication across diverse environments while reducing network costs.",[48,25689,25690,25691,25695],{},"Built on top of Ursa Stream Storage, Universal Linking is available as a Private Preview feature on StreamNative Cloud for our ",[55,25692,25694],{"href":15569,"rel":25693},[264],"early access"," partners. This innovative solution redefines how independent data streaming clusters—whether Kafka or Pulsar—can be connected and migrated, enabling seamless data replication between clusters and management of \"mirrored\" data streams, even when different protocols are involved. We’re excited to share the possibilities Universal Linking opens up, enhancing interoperability between data streaming platforms and potentially transforming the future of data streaming.",[40,25697,25699],{"id":25698},"topic-mirroring-in-apache-kafka-and-geo-replication-in-apache-pulsar","Topic Mirroring in Apache Kafka and Geo-replication in Apache Pulsar",[48,25701,25702],{},"Data replication tools are commonly used to replicate data between clusters for various purposes, such as active-passive setups for disaster recovery, active-active configurations for high availability, and geo-replication for data locality. However, existing solutions have their limitations.",[48,25704,25705,25706,25711,25712,25717],{},"In the Kafka ecosystem, several tools are used for data replication, including ",[55,25707,25710],{"href":25708,"rel":25709},"https:\u002F\u002Fkafka.apache.org\u002Fdocumentation\u002F#georeplication",[264],"Kafka MirrorMaker"," (open source), Uber’s ",[55,25713,25716],{"href":25714,"rel":25715},"https:\u002F\u002Fgithub.com\u002Fuber\u002FuReplicator",[264],"uReplicator"," (a fork of Kafka MirrorMaker, open source), Confluent Replicator (commercial), and Confluent Cluster Linking (commercial). While MirrorMaker and uReplicator are open-source tools that provide basic data replication between Kafka clusters, they have limitations in offset management, schema replication, and operational complexity. Confluent Replicator builds on MirrorMaker with enterprise-grade features like schema integration, data filtering, and enhanced monitoring but requires adopting Confluent's platform. Confluent Cluster Linking offers native, real-time replication between clusters with minimal operational overhead but is proprietary and requires licensing.",[48,25719,25720,25721,25726],{},"On the other hand, Apache Pulsar offers a ",[55,25722,25725],{"href":25723,"rel":25724},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F4.0.x\u002Fconcepts-replication\u002F",[264],"built-in geo-replication"," feature that enables policy-based replication across clusters within a single Pulsar instance. Despite its power and flexibility, this feature is limited to Pulsar clusters using the Pulsar protocol.",[48,25728,25729,25730,25734],{},"While these tools use different frameworks—some leveraging Kafka Connect as external tools, others being built directly into brokers—they share some commonalities. All these tools use a wire protocol to replicate data over the network and are restricted to a single protocol. As discussed in previous blog posts, ",[55,25731,25733],{"href":25732},"\u002Fblog\u002Fa-guide-to-evaluating-the-infrastructure-costs-of-apache-pulsar-and-apache-kafka#network-costs","networking is one of the most expensive components when operating a data streaming platform in the public cloud",", making traditional approaches costly for replicating data within the same region.",[48,25736,25737],{},[384,25738],{"alt":18,"src":25739},"\u002Fimgs\u002Fblogs\u002F67214b86f9544c691afb9d54_AD_4nXd4ckcR0XwnFKlrXBGDycee_cRd0OB2kU20QnluzFWRU3UjmXM3FgZRAW_R-r7PO3pHKuTPpNLK-KSNhH4ZYBZlcZ12YgSQGN5HGzT2dYwcwP_PeoaALPsojOdKslHRsfPWJyStkM9vXnHdorxUgnmRTNxN.png",[48,25741,3931],{},[40,25743,25745],{"id":25744},"introducing-universal-linking-reimagining-data-replication-between-data-streaming-systems","Introducing Universal Linking: Reimagining Data Replication Between Data Streaming Systems",[48,25747,25748],{},"At StreamNative, we envision democratizing data streaming to make it accessible, affordable, and scalable for organizations of all sizes. Cost-effectiveness and interoperability are two guiding principles of our vision. We have already achieved significant progress with single-cluster deployments through the introduction of the Ursa Engine on StreamNative Cloud. However, to enable global data streaming while maintaining these goals, we needed to rethink how existing platforms and clusters could interoperate and redefine what cross-cluster and cross-platform data replication should look like. Today, we are proud to unveil Universal Linking—our vision for seamless, cost-effective, interoperable data replication across different data streaming platforms.",[48,25750,25751],{},"Universal Linking is a data replication service that replicates data from an external data streaming cluster—whether it's a self-managed Kafka cluster, a fully-managed Kafka cluster, a self-managed Pulsar cluster, or a StreamNative Private Cloud cluster—to clusters within a StreamNative instance (as illustrated below). Programmatically, it creates perfect copies of topics from the external cluster and keeps the data in sync with your StreamNative instance.",[48,25753,25754],{},[384,25755],{"alt":18,"src":25756},"\u002Fimgs\u002Fblogs\u002F67214b86288c8cdea26c385d_AD_4nXcJlPv2uUAHHV0vdwq096fxjyB0B3Wx1Ofv_g7W8OvjouPkDOY-KG8ucW2OT_c_3B4s9zCgNBcWGU8LXKKi1VQrh4M3o7SGYGMPRJefo2GpdePf5Y87J7MJCgVGin_k38n0ZFRq66dcKk1GoYFPYGxdSC62.png",[40,25758,2697],{"id":2696},[48,25760,25761],{},"Universal Linking allows you to replicate or migrate topics from an external Kafka or Pulsar cluster—replicating data partition by partition, byte by byte, along with all relevant metadata into a designated cluster within your StreamNative instance. Establishing a Universal Link (UniLink) is as simple as configuring geo-replication between Pulsar clusters. You can do this by creating a UniLink—a named object that defines the properties of the source and destination clusters. That's all you need to set up this link.",[8325,25763,25767],{"className":25764,"code":25765,"language":25766,"meta":18,"style":18},"language-shell shiki shiki-themes github-light github-dark","\nsnctl create unilink sfo-nyc\n\n  –source-cluster-name \u003Cname-to-identify-source-cluster>\n\n  –source-cluster-server \u003Csource-cluster-server>:9092\n\n  –source-cluster-config ‘{..}’\n\n  –destination-instance \u003Cstreamnative-instance>\n\n  –destination-cluster \u003Cstreamnative-cluster-to-link>\n\n","shell",[4926,25768,25769,25776,25781,25785,25790,25794,25799,25804,25810,25815,25821,25826],{"__ignoreMap":18},[2628,25770,25773],{"class":25771,"line":25772},"line",1,[2628,25774,25775],{"emptyLinePlaceholder":7},"\n",[2628,25777,25778],{"class":25771,"line":19},[2628,25779,25780],{},"snctl create unilink sfo-nyc\n",[2628,25782,25783],{"class":25771,"line":279},[2628,25784,25775],{"emptyLinePlaceholder":7},[2628,25786,25787],{"class":25771,"line":20920},[2628,25788,25789],{},"  –source-cluster-name \u003Cname-to-identify-source-cluster>\n",[2628,25791,25792],{"class":25771,"line":20934},[2628,25793,25775],{"emptyLinePlaceholder":7},[2628,25795,25796],{"class":25771,"line":20948},[2628,25797,25798],{},"  –source-cluster-server \u003Csource-cluster-server>:9092\n",[2628,25800,25802],{"class":25771,"line":25801},7,[2628,25803,25775],{"emptyLinePlaceholder":7},[2628,25805,25807],{"class":25771,"line":25806},8,[2628,25808,25809],{},"  –source-cluster-config ‘{..}’\n",[2628,25811,25813],{"class":25771,"line":25812},9,[2628,25814,25775],{"emptyLinePlaceholder":7},[2628,25816,25818],{"class":25771,"line":25817},10,[2628,25819,25820],{},"  –destination-instance \u003Cstreamnative-instance>\n",[2628,25822,25824],{"class":25771,"line":25823},11,[2628,25825,25775],{"emptyLinePlaceholder":7},[2628,25827,25829],{"class":25771,"line":25828},12,[2628,25830,25831],{},"  –destination-cluster \u003Cstreamnative-cluster-to-link>\n",[48,25833,3931],{},[48,25835,25836],{},[384,25837],{"alt":18,"src":25838},"\u002Fimgs\u002Fblogs\u002F67214b863a1397eb9ebf6248_AD_4nXfiipMmBQPlWC6Pgv_zf0soti9oJMPLN4IjEIpHMDcSuJi9-GPnswTwvPx5KacAGBks3iS4XeIjoON-3Vd2j1ty5VJh71bNFiPuiNORvKcCjLVFdc8Agh6Se7Mgw_p1eIQIeQSIG7BAu5HMU0bycisjKZdP.png",[48,25840,3931],{},[48,25842,25843],{},"Once a Universal Link is created, a background UniLink job starts. Topics from the source cluster are copied and saved into a storage bucket, with data written in the Ursa Stream format. The storage bucket is then mounted by the destination cluster, allowing the brokers powered by the Ursa Engine to read the data and serve it using the Pulsar or Kafka protocol, making the mirrored topic nearly indistinguishable from the original source topic from the consumer's perspective.",[48,25845,25846],{},"The replication is real-time, continuous, and asynchronous, minimizing the impact on the source topic and avoiding strict latency requirements between clusters—enabling global replication. Importantly, Universal Linking operates independently of the internal replication within the source and destination clusters, ensuring no coupling between them. In a steady state, a mirrored topic will lag behind its source by only a few seconds, ensuring low recovery point objectives.",[48,25848,25849],{},"In addition to data replication, Universal Linking mirrors source topics' configurations, consumer offsets, and ACLs to ensure full synchronization. Mirroring can be stopped at any time, promoting the mirrored topic to a normal, mutable topic on the destination.",[40,25851,25853],{"id":25852},"a-paradigm-shift-in-data-replication","A Paradigm Shift in Data Replication",[48,25855,25856],{},"Universal Linking represents a paradigm shift in data replication. Instead of relying on direct networking over streaming protocols like Pulsar or Kafka, Universal Linking leverages object storage (such as S3) for both networking and storage. This architecture enables cost-effective, robust, flexible, and scalable replication across heterogeneous environments.",[48,25858,25859],{},"Data is copied into object storage using the Ursa Stream format, where it is persisted in lakehouse tables. From there, it can be consumed via Kafka or Pulsar protocols through stateless brokers, allowing seamless interoperability without the complexity of managing multiple replication protocols.",[48,25861,25862],{},"Universal Linking operates on a pull-based replication model, where the destination cluster fetches data from the source. Rather than writing data directly to the destination, it first saves it into object storage in the Ursa Stream format. If both the source and destination clusters are running in the same region, the UniLink job minimizes the networking requirements for replication.",[48,25864,25865],{},"The UniLink object contains all the required configurations for the destination cluster to fetch, store, and retrieve data. Once established, UniLink functions similarly to Pulsar’s \"Cluster\" concept, enabling persistent links that can be referenced by name. The destination cluster can then create mirror topics that replicate the source topic's data, partitions, and configurations. Any changes in the source topic, such as new partitions, are automatically detected, ensuring accurate replication.",[48,25867,25868],{},"This approach fundamentally differentiates Universal Linking from other replication methods and offers several key advantages:",[3933,25870,25872],{"id":25871},"_1-byte-to-byte-copying-metadata-preservation","1. Byte-to-Byte Copying & Metadata Preservation",[48,25874,25875],{},"Universal Linking preserves data fidelity by performing byte-to-byte copying and retaining essential metadata like topic information, offsets, and more. Once data is persisted in S3, it is ready for consumption via the Ursa Engine as if it were still in the original Kafka cluster.",[3933,25877,25879],{"id":25878},"_2-s3-for-networking-and-storage-layers","2. S3 for Networking and Storage Layers",[48,25881,25882],{},"By utilizing object storage like S3 for both replication and storage, Universal Linking reduces the complexity and cost of maintaining high-throughput networking layers. S3’s scalability, combined with the efficiency of the Ursa Stream format, ensures data is both accessible and durable.",[3933,25884,25886],{"id":25885},"_3-backup-and-restore-for-kafka-clusters","3. Backup and Restore for Kafka Clusters",[48,25888,25889],{},"Universal Linking can also be used as a cost-effective backup-and-restore tool. Snapshot your Kafka cluster, store it in a lakehouse in one region, and restore it in another region with ease—making it an ideal solution for disaster recovery and multi-region management.",[3933,25891,25893],{"id":25892},"_4-interoperability-between-kafka-and-pulsar","4. Interoperability Between Kafka and Pulsar",[48,25895,25896],{},"With multi-protocol support in the Ursa Engine, Universal Linking bridges the gap between Kafka and Pulsar, offering seamless data mobility and interoperability. This empowers organizations to integrate different data streaming platforms without sacrificing flexibility.",[40,25898,25900],{"id":25899},"an-operational-view-of-universal-linking","An Operational View of Universal Linking",[48,25902,25903],{},"From an operational standpoint, Universal Linking significantly simplifies the mobility of data streams across various data streaming clusters. Whether migrating from an open-source Kafka cluster to StreamNative's Ursa Engine or moving between on-premises infrastructure and the cloud, Universal Linking ensures a smooth transition. Once you've created a new StreamNative cluster powered by the Ursa Engine, you can easily establish a link between your existing and new environments.",[48,25905,25906],{},"After setting up the Universal Link, replication begins in the background, automatically preserving critical data attributes such as topic partitions and consumer offsets. Once replication is complete, Universal Linking's initial release provides the ability to \"fail forward\" your mirror topics, promoting them to fully independent, writable topics on the destination cluster. This means you can seamlessly move clients from your old cluster to the new one without complex data reconciliation, ensuring minimal downtime and operational disruption.",[48,25908,25909],{},[384,25910],{"alt":18,"src":25911},"\u002Fimgs\u002Fblogs\u002F67214b8619efd0e9ab095f3c_AD_4nXfSwJnx1yFewUJPY3lO-m53LdBwFj-90t27AQRHiPbOp49zLYZEl63hwNsY_sxzZTcIqgrkQAYl_w52gEr0dM0VI__xPApZLnbhjKEfydyu954pldgRKe9KQ385-zGBstNPwc5PNG_gDfih2aEaUnpyEgOw.png",[48,25913,25914],{},"As organizations increasingly adopt hybrid cloud architectures, Universal Linking enables seamless data stream management across on-premises and cloud environments. Imagine having a hybrid cloud setup where your on-premises cluster contains critical topics that need to be used in the cloud—or vice versa. With Universal Linking, you can effortlessly connect (or migrate) your on-prem Kafka cluster to your StreamNative Cloud environment. This creates a reliable, cost-effective data highway that facilitates hybrid cloud applications, temporary cloud bursts, or full-scale migrations with minimal downtime and no data loss. Universal Linking ensures that your operations remain flexible and scalable, no matter where your data resides. This makes Universal Linking an excellent data replication solution for:",[321,25916,25917,25920],{},[324,25918,25919],{},"Global data streaming architecture with multi-cloud and hybrid-cloud environments",[324,25921,25922],{},"Data migration from a Pulsar or Kafka cluster to StreamNative Clusters powered by Ursa Engine",[48,25924,25925],{},[384,25926],{"alt":18,"src":25927},"\u002Fimgs\u002Fblogs\u002F67214b86e15836e8245410fd_AD_4nXf_Og04oWmxB-O9eNfrCsB9gv67AEFtE9Nn48JAKgt9IOfsgZxHSuWpZY-1Gi_Ki4emosKbRAfzS1Oo1j-FbITpUKipBi7j52lbXFjXF9xzlmmpapJZ6DvgjPn7KZNqQxZnl_ZsGwGDPzr8VOwlMmVEkNv5.png",[321,25929,25930],{},[324,25931,25932],{},"Backing up your streaming data into a data lake for analytics",[48,25934,25935],{},[384,25936],{"alt":18,"src":25937},"\u002Fimgs\u002Fblogs\u002F67214b869a4e977e69b08d29_AD_4nXcTQ3BAkX3scM2CfbJg_3sRyON63-IPS85l4Cut81pbilJ7_6WeQHLo1Fo5ZIyl3DWeklwjjzw14xePo-qKum0pvFYXvsPj8tRL0lol-Lu_cud0wm5MRL4R1HHkphIhqdgqmQPxHtWXLOsg-U784OYyP-7q.png",[321,25939,25940],{},[324,25941,25942],{},"Syncing data between production instances and staging or development instances",[48,25944,25945],{},[384,25946],{"alt":18,"src":25947},"\u002Fimgs\u002Fblogs\u002F67214b8648dd84424bbdb797_AD_4nXeSX07EcJ39LyUIk1cPeQSYbLose_wmnl3tKUTHEokmY0wjx3yq56jeeMY7gAAL9Xpexs_PhpBhX8bx9qd58gTxGex1R_9tt5I0R38rN5rDP3wbhZIxdsb5jksJ00Pq3nekMhqv1U99O4OMEE8CtRAZv40.png",[40,25949,25951],{"id":25950},"a-new-beginning-for-streaming-data-replication-and-interoperability","A New Beginning for Streaming Data Replication and Interoperability",[48,25953,25954],{},"Universal Linking transforms the way organizations approach data replication and bridges the gaps between different messaging protocols by making them interoperable through storing data in an open standard format. Whether you're managing multi-region Kafka clusters, migrating from various Kafka vendors to StreamNative, ensuring seamless interoperability between Kafka and Pulsar, or seeking a cost-effective disaster recovery solution, Universal Linking provides an efficient, scalable replication solution tailored for multi-cloud environments.",[48,25956,25957],{},"With this release, we are only scratching the surface of what the Ursa Stream Format can achieve. Exciting new features are on our roadmap, and we can't wait to share them with you.",[40,25959,25961],{"id":25960},"ready-to-get-started","Ready to Get Started?",[48,25963,25964,25965,25969],{},"Universal Linking is currently in private preview as part of our ",[55,25966,25968],{"href":15569,"rel":25967},[264],"early access program",", available for BYOC clusters. Join us and discover how Universal Linking can transform your data replication and migration strategy.",[48,25971,3931],{},[25973,25974,25975],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":18,"searchDepth":19,"depth":19,"links":25977},[25978,25979,25980,25981,25982,25983,25984],{"id":25698,"depth":19,"text":25699},{"id":25744,"depth":19,"text":25745},{"id":2696,"depth":19,"text":2697},{"id":25852,"depth":19,"text":25853},{"id":25899,"depth":19,"text":25900},{"id":25950,"depth":19,"text":25951},{"id":25960,"depth":19,"text":25961},"Explore Universal Linking, now in private preview on StreamNative Cloud, transforming data replication between Kafka and Pulsar with cost-effective, interoperable solutions. Built on Ursa Stream Storage, Universal Linking enables seamless cross-cluster and cross-platform data streaming with S3-based replication and multi-protocol support. Join our early access program to experience the future of data mobility and interoperability across hybrid and multi-cloud environments.","\u002Fimgs\u002Fblogs\u002F67214c905de0562859f51d1b_Universal-Linking_BlogPost.png",{},{"title":18857,"description":25985},"blog\u002Fintroducing-universal-linking-revolutionizing-data-replication-and-interoperability-across-data-streaming-systems",[11899,302,799,1332],"X-vdugnr_cvvwnasJlVHpsC6XhpTsY8SRKR7Eaed5AA",{"id":25993,"title":18766,"authors":25994,"body":25995,"category":3550,"createdAt":290,"date":26391,"description":26392,"extension":8,"featured":294,"image":26393,"isDraft":294,"link":290,"meta":26394,"navigation":7,"order":296,"path":18765,"readingTime":18649,"relatedResources":290,"seo":26395,"stem":26396,"tags":26397,"__hash__":26398},"blogs\u002Fblog\u002Fthe-evolution-of-log-storage-in-modern-data-streaming-platforms.md",[806],{"type":15,"value":25996,"toc":26382},[25997,26006,26009,26012,26015,26019,26022,26025,26028,26031,26036,26038,26041,26052,26055,26058,26061,26075,26078,26082,26085,26096,26099,26116,26121,26123,26126,26137,26140,26151,26154,26160,26165,26167,26171,26174,26177,26190,26195,26198,26209,26213,26216,26221,26224,26227,26231,26234,26237,26240,26245,26247,26250,26253,26256,26279,26282,26286,26300,26307,26310,26315,26317,26320,26325,26327,26333,26338,26340,26352,26357,26359,26363,26366,26369],[48,25998,25999,26000,26005],{},"In 2013, Jay Kreps wrote a pivotal blog post, \"",[55,26001,26004],{"href":26002,"rel":26003},"https:\u002F\u002Fengineering.linkedin.com\u002Fdistributed-systems\u002Flog-what-every-software-engineer-should-know-about-real-time-datas-unifying",[264],"The Log: What Every Software Engineer Should Know About Real-Time Data’s Unifying Abstraction","\". This post laid the conceptual foundation for Apache Kafka and catalyzed the development of the modern data streaming ecosystem. The core premise was both simple and transformative: the log is the backbone of real-time data streaming architectures, serving as a fundamental abstraction for managing and processing data streams. It simplifies critical concerns like state management, fault tolerance, and scalability.",[48,26007,26008],{},"In the decade since, the landscape has shifted dramatically, especially with the advent of cloud-native environments. The ecosystem has expanded to include a wide array of platforms, such as Apache Pulsar, Redpanda, and other commercial offerings. The rise of S3 as a primary storage layer for data streaming has further reshaped the conversation, with many vendors claiming their solutions are significantly more cost-efficient than self-managed Kafka. As a result, we are navigating a far more complex and diverse environment today.",[48,26010,26011],{},"This blog post explores this evolution by focusing on log storage and how data streaming engines have evolved. We will examine key architectural shifts and analyze the direction the industry is heading in terms of scalability, cost-efficiency, and integration with cloud-native storage technologies.",[48,26013,26014],{},"Let’s start with some basics.",[40,26016,26018],{"id":26017},"what-is-a-log","What is a Log?",[48,26020,26021],{},"At the core of any data streaming system lies the concept of a log. A log is an append-only sequence of records that are added in a strict order and never modified. Each record is assigned a unique sequential identifier, usually called an \"offset,\" which allows developers to quickly locate the record within the log.",[48,26023,26024],{},"The ordering of records in a log establishes a notion of \"time\"—records appended earlier are considered older than those appended later. This sequential identifier can be thought of as a logical \"timestamp\" for each record. Importantly, this concept of ordering is decoupled from any physical clock, which is critical in distributed systems where consistency and ordering guarantees are paramount.",[48,26026,26027],{},"Each record in the log consists of a payload—typically a bag of bytes. The structure of this payload is determined by a schema, which defines the data format and provides a contract between data producers and consumers for interpreting the data.",[48,26029,26030],{},"In many ways, a log is conceptually similar to both a file in a filesystem and a table in a relational database, though with key differences. A file is an unstructured sequence of bytes without inherent record boundaries, giving applications complete flexibility in interpreting the data. A table, on the other hand, is an array of records that can potentially be updated or overwritten. A log is essentially an append-only version of a table, with records arranged in a strict order. This leads to the principle of table-stream duality, a concept we’ll revisit later.",[48,26032,26033],{},[384,26034],{"alt":18,"src":26035},"\u002Fimgs\u002Fblogs\u002F671a91761001b72879fd47c1_AD_4nXfaupZpk7woQqDiAYY3DSHSsUSjqTc3CH3Aw8DO5oyqCDeuX1TJ0DH_MirEfmGx923cMMjFczT2JgpjYmq7cMrpv8NJqnpq2oBP_HS3dOZVbytrEC6Juza3tuMpsJteQdQnGaGHrsUHbSHg7BC7oG2B43Pa.png",[48,26037,3931],{},[48,26039,26040],{},"Due to its immutability, operations on a log are simple and predictable:",[321,26042,26043,26046,26049],{},[324,26044,26045],{},"Write: Unlike a table, where records can be updated or deleted, a log's records are immutable once appended. The write operation is straightforward—records are appended sequentially to the end of the log. Each append returns a unique sequential identifier, commonly referred to as an offset in Kafka or a message ID in Pulsar.",[324,26047,26048],{},"Read: While relational databases support point lookups by key or predicate, reading from a log is simpler. Log reads follow a seek-and-scan pattern: consumers specify the offset where they want to start reading, and the system seeks to that offset and scans the records sequentially. Depending on the starting offset, the read can be classified as either:some textTailing Read: The consumer is close to the log’s tail and reads new records as they are appended, keeping up with the producer.",[324,26050,26051],{},"Catch-up Read: The consumer starts reading from an older offset and processes the backlog of records to eventually catch up to the log’s tail.",[48,26053,26054],{},"In addition to the seek-and-scan behavior, log implementations usually track consumer state, which refers to the offset in the log that the reader has reached. This typically involves tracking offsets for consumer groups, allowing the system to monitor the \"lag\" between a consumer’s current offset and the tail of the log. Some systems go beyond simple offset tracking—for example, Pulsar provides individual acknowledgments for shared and key-shared subscriptions, allowing fine-grained control over which records have been consumed. This topic is complex and deserves a deeper dive in a future post. For simplicity, we will focus on simple offset tracking in this discussion.",[48,26056,26057],{},"Finally, a log system must manage its size over time. Continuously appending records to the log would eventually exhaust storage, or the older data would become obsolete. To handle this, logs support retention policies, which can be time-based, size-based, or explicitly managed via truncation. A truncate operation is used to reduce the log’s size by discarding old or unneeded records according to the retention rules.",[48,26059,26060],{},"Up to this point, we’ve covered the core fundamental operations that any log system needs to support. To summarize, if you were to build a log system from scratch, you would need to account for the following key concepts:",[321,26062,26063,26066,26069,26072],{},[324,26064,26065],{},"Write: Append records to the log and return an offset as a unique sequential identifier.",[324,26067,26068],{},"Read: Provide an API to seek a specific offset and scan records from that point onward.",[324,26070,26071],{},"Offset Tracking: Track offsets for each reader to allow resuming from the last position.",[324,26073,26074],{},"Truncate: Implement a truncate operation to discard older records at a specified offset.",[48,26076,26077],{},"While implementing a system for a single log is straightforward, the real challenge arises in building a platform capable of efficiently managing thousands or even millions of logs concurrently. This need has driven significant innovation in the storage layer of data streaming platforms over the past decade. Let’s explore how modern data streaming systems have evolved to meet these demands.",[40,26079,26081],{"id":26080},"apache-kafka-the-first-practical-implementation","Apache Kafka: The First Practical Implementation",[48,26083,26084],{},"To create a data streaming platform that manages a large number of logs within a distributed cluster, three key challenges must be addressed:",[1666,26086,26087,26090,26093],{},[324,26088,26089],{},"Distribution: How can logs be distributed across different nodes within the cluster, so clients know where to append and read records?",[324,26091,26092],{},"Sequencing: How can writes be sequenced to maintain the order of records within a log?",[324,26094,26095],{},"Truncation: How can logs be truncated when necessary?",[48,26097,26098],{},"Apache Kafka was the first practical implementation that effectively managed thousands of logs in a distributed cluster setup. Originally developed by engineers at LinkedIn, Kafka centers around the log concept as its primary architectural construct. In Kafka, topics represent logs that are partitioned for scalability and distributed across brokers for load balancing and fault tolerance. Let’s break down Kafka’s implementation.",[321,26100,26101,26104,26107,26110,26113],{},[324,26102,26103],{},"Partitions: Each topic in Kafka is divided into partitions, with each partition serving as a log that represents a totally ordered sequence of messages (or records). This design allows Kafka to scale horizontally by distributing partitions across multiple brokers.",[324,26105,26106],{},"Offsets: Every record within a partition is assigned a unique offset that acts as an identifier. This enables consumers to track their progress through the log without impacting ordering guarantees.",[324,26108,26109],{},"Replication: Kafka ensures durability and fault tolerance through replication. Each partition has a configurable number of replicas, with one designated as the leader. The leader handles all reads and writes, while the other replicas, known as followers, replicate data from the leader. If the leader fails, one of the followers is promoted to leader, ensuring high availability.",[324,26111,26112],{},"Log Segments: A partition in Kafka is not a single large file but is divided into smaller segments. Each segment is a file on disk, and Kafka retains only a portion of the log in memory for performance. Older segments are periodically flushed to disk, and retention policies—either time-based or size-based—determine how long data is kept.",[324,26114,26115],{},"Data Retention: Kafka supports both time-based and size-based retention policies. After reaching a specified threshold, older data may be deleted or compacted. The log compaction feature allows Kafka to retain the most recent state of a key, making it suitable for use cases like event sourcing.",[48,26117,26118],{},[384,26119],{"alt":18,"src":26120},"\u002Fimgs\u002Fblogs\u002F671a917793ddc6c9a6f7caf8_AD_4nXfID6G1rrovv7VUi2_IeTc4_iAl9zekFpOZsn6jm_UTmBL_ItqPNnFbOkDO9DfAh0fY20pYVzJaxKFKZZDCbJXLHHsepaVFQZa_ivhYMHKGUyqwNxuqaSspZjSskSGXb8fm5RZ6WHydycQWXeLRz8Tl_yEv.png",[48,26122,3931],{},[48,26124,26125],{},"Now, let’s examine how Kafka addresses the challenges outlined earlier:",[321,26127,26128,26131,26134],{},[324,26129,26130],{},"Distribution: Kafka distributes partitions across brokers, with each partition owned by a set of brokers consisting of a leader and its followers. This ownership information is stored in a centralized metadata store, initially implemented with ZooKeeper and now replaced with KRaft-based coordinators. Locating the log involves querying the metadata storage to find the topic owner. Once identified, all writes are sequenced by the leader broker, while reads can be served by either the leader or the followers. Kafka brokers maintain local indexes to enable consumers to seek to a given offset, effectively locating the corresponding log segment file on local disks.",[324,26132,26133],{},"Sequencing: The leader broker is responsible for sequencing writes, ensuring the order of records within the log is preserved.",[324,26135,26136],{},"Truncation: Log truncation is managed via time-based or size-based retention policies. Since data is organized into log segment files, outdated log segments can be deleted according to these retention rules.",[48,26138,26139],{},"To summarize, Kafka’s implementation showcases several key highlights that influence how a data streaming platform manages its log storage layer:",[321,26141,26142,26145,26148],{},[324,26143,26144],{},"Partition-centric: Kafka employs a partition-centric approach, where partition locations are maintained in a centralized metadata store or determined through distribution algorithms. Once a partition is located, all operations are performed by the nodes storing and serving that partition. Local offset-based indexes facilitate effective seeking.",[324,26146,26147],{},"Leader-based: The system relies on a leader-based and replication-based approach. The leader is responsible for storing and serving logs, sequencing writes, and ensuring the log's ordering properties.",[324,26149,26150],{},"Replication-based: Fault tolerance is achieved via a disk-based replication algorithm from the leader to followers.",[48,26152,26153],{},"The partition-centric, leader-based, and replication-based approach is a common implementation for the storage layer of most Kafka or Kafka-like data streaming platforms. Systems like Redpanda and AutoMQ also adopt this model. For example, Redpanda re-implements Kafka using C++, with the primary difference being its use of the Raft consensus algorithm for replication instead of Kafka's in-sync replica (ISR) mechanism. Despite this change, Redpanda’s architecture remains largely partition-centric and leader-based.",[48,26155,26156,26157,190],{},"While this relatively straightforward approach worked well in a monolithic architecture, it is not well suited for the cloud-native world. As autoscaling becomes the norm, data partition rebalancing during scaling becomes a significant challenge. Scaling, which ideally should be automated and seamless, often requires intervention from SREs, risking uptime and impacting error budgets. We have explored these challenges in",[55,26158,26159],{"href":21492}," previous blog posts",[48,26161,26162],{},[384,26163],{"alt":18,"src":26164},"\u002Fimgs\u002Fblogs\u002F671a91761c1bdbda9876eb33_AD_4nXd6wWSfuz0-PMWCH3Zvie9k6Q3pL0tzyW7a8_5Z5aacscq3QvCHOstAr8S1aVQC6-SmOC0JCuOEsWNR6WDT24zPYt2DNlDKsiL3I2ZWULPBbDIsvAXf72ZeLs2inDm_ro2Kq8M65J55lrvxYSq1gj8rENKJ.png",[48,26166,3931],{},[40,26168,26170],{"id":26169},"apache-pulsar-address-data-rebalancing-challenges-via-separation-of-compute-and-storage","Apache Pulsar: Address Data Rebalancing Challenges via Separation of Compute and Storage",[48,26172,26173],{},"Unlike other Kafka alternatives, Apache Pulsar, initially developed by Yahoo!, employs a distinct strategy to address the challenges of scaling and operational complexity. To mitigate the issues arising from the coupling of data storage and serving, Pulsar separates its storage layer from the brokers, which are designed solely for serving requests. This separation shifts the storage model from a partition-centric approach to a segment-centric one.",[48,26175,26176],{},"In this new storage model:",[321,26178,26179,26187],{},[324,26180,26181,26182,190],{},"Logical Partitions: Partitions in Pulsar are treated as logical entities. Brokers manage the ownership of topic partitions but do not store any data locally. Instead, all data is stored in a remote segment storage system called",[55,26183,26186],{"href":26184,"rel":26185},"https:\u002F\u002Fbookkeeper.apache.org",[264]," Apache BookKeeper",[324,26188,26189],{},"Distributed Segments: Data segments are distributed across multiple storage nodes (or \"bookies\") and are not tied to a specific broker, allowing for greater flexibility and scalability beyond the limitations of physical disk capacity.",[48,26191,26192],{},[384,26193],{"alt":18,"src":26194},"\u002Fimgs\u002Fblogs\u002F671a9178a01de306efb26e41_AD_4nXcFdQc_GzvNtfmV_PAUwnT99QsBQYNYLr-Vmm49XUsq5zOm59_15ZOYaiCmQXll1xL_XipHOAUyZNUWHbHdKlPjbYoir_ySS9eY58pPLIkOfjsHWNe75n_e-cf7KlRkE4Be42IxQziTjDQc8LJyhCWg-68.png",[48,26196,26197],{},"Now, let’s explore how Pulsar effectively addresses the previously outlined challenges:",[321,26199,26200,26203],{},[324,26201,26202],{},"Distribution: Pulsar’s architecture features two layers of distribution:some textFirst, while partitions are still distributed across brokers, the partitions now become logical partitions, which are not bound to physical brokers, unlike in Kafka. Brokers manage the ownership of these logical partitions, with ownership information stored in a centralized metadata store. Unlike Apache Kafka, Pulsar organizes topics into namespace bundles to leverage its multi-tenancy capabilities, assigning these bundles to brokers for efficient resource management.",[324,26204,26205,26206,190],{},"Second, data segments are distributed across multiple bookies, ensuring redundancy and fault tolerance. Locating a partition involves querying the metadata store to identify the topic owner. Once identified, all write operations are sequenced by the owner broker, while read requests can be fulfilled by either the owner broker or designated read-only brokers. Pulsar brokers utilize metadata about log segments to pinpoint the appropriate storage node for reading data.\nSequencing: The owner broker is tasked with sequencing writes, ensuring that the order of records within each log is maintained. This is crucial for applications that rely on ordered event processing.Truncation: Log truncation in Pulsar is managed through time-based or size-based retention policies, as well as explicit truncation after all data has been acknowledged by all subscriptions. Since data is organized into log segments, acknowledged and expired segments can be safely deleted according to these retention rules.\nIn summary, while Apache Pulsar continues to employ a leader-based and replication-based approach, it shifts from a partition-centric model to a segment-centric architecture, effectively decoupling the storage layer from the serving layer. I refer to this as a \"Giant Write-Ahead Log\" approach in",[55,26207,26208],{"href":10453}," the \"Stream-Table Duality\" blog post",[40,26210,26212],{"id":26211},"tiered-storage-a-must-have","Tiered Storage - A Must Have",[48,26214,26215],{},"In this new storage model, where topic partitions are treated as logical and virtual entities, the introduction of a distributed segment storage architecture provides an effective abstraction for log segments. These log segments can be stored in a low-latency, disk-based replication system like Apache BookKeeper or as objects in object storage solutions such as S3, Google Cloud Storage, or Azure Blob Storage. This distributed segment storage abstraction introduces the concept of tiered storage into the data streaming landscape, making it a must-have feature for modern data streaming platforms.",[48,26217,26218],{},[384,26219],{"alt":18,"src":26220},"\u002Fimgs\u002Fblogs\u002F671a917693532c9ddc9e91ee_AD_4nXdEaeHoU9V25aD4LedDphly1jg-oQuVI7VBGPxT8IbZknvwQfk1rV-l1Wgxpo2-Xxe106ELewJcEvOATGY3sqYrC7N0nQikkpT9qmiBOPiAL4xm9pOeL4D-jr3KXsPvQbNKX3mQWVWb4Snp3lqMremU2lQv.png",[48,26222,26223],{},"However, not all tiered storage implementations are created equal. Most Kafka-based tiered storage solutions focus on moving log segments to tiered storage while still maintaining local indexes within brokers for locating those segments. While this approach may reduce the impact of data rebalancing, it does not eliminate the need for it. During scaling operations, the system must move both the local indexes and any log segments that have not yet been offloaded to different brokers.",[48,26225,26226],{},"In contrast, Pulsar stores the indexes of log segments in a centralized metadata store, allowing log segments to remain completely remote and decoupled from the brokers. Consequently, during scaling operations, there is no need to rebalance either the log segments or their indexes. We refer to this important concept as the \"index\u002Fdata split\" or \"metadata\u002Fdata split,\" which will ultimately set the stage for the future evolution of the storage layer in data streaming. I will explain it further in the following paragraphs.",[40,26228,26230],{"id":26229},"from-leader-based-to-leaderless-addressing-cost-challenges-with-s3-as-primary-storage-for-data-streaming","From Leader-Based to Leaderless: Addressing Cost Challenges with S3 as Primary Storage for Data Streaming",[48,26232,26233],{},"2024 is set to be a pivotal year for data streaming as the trend shifts from using object storage as a secondary option through tiered storage to adopting it as the primary storage layer. Notable examples in this trend include WarpStream (acquired by Confluent), Confluent's Freight Clusters, and StreamNative's Ursa engine, all leveraging S3 as their main storage solution.",[48,26235,26236],{},"Cost is the key driving force behind this transition. Leveraging S3 as primary storage allows organizations to use it not only as a storage medium but also as a replication layer. By relying on S3, organizations can eliminate traditional disk-based replication solutions, which typically involved managing replicas across brokers or storage nodes, requiring extensive cross-availability zone network traffic and complex coordination to ensure data redundancy and availability. This approach significantly reduces the costs of operating a data streaming platform on public cloud infrastructures. Many organizations adopting S3 as their primary storage claim to be 10x cheaper than self-managed Kafka, primarily due to reduced networking costs.",[48,26238,26239],{},"Beyond cost savings, this object-storage-based approach also represents a significant architectural shift from a leader-based to a leaderless approach in data streaming. In this approach, brokers create batches of produce requests and write them directly to object storage before acknowledging them to clients. The leaderless approach allows any broker to handle produce or fetch requests for any partition, improving availability by eliminating a single point of failure. However, these savings come with trade-offs: produce requests must wait for acknowledgments from object storage, introducing latency, although this is potentially offset by cost reductions of up to 90%.",[48,26241,26242],{},[384,26243],{"alt":18,"src":26244},"\u002Fimgs\u002Fblogs\u002F671a91760c2f78100993cc15_AD_4nXdDsXygPVkFJ-zEE_hRbITD_voEQJCzD0m0R_UVqJsIpHLqbmIL9d4awAlWgoDMSbaxFvQIoSoMCgN42NCTbcQyRlmI-0wgmNVK_r_GBJyjR5c9b5g7uz8bNt2NDsvbZgS6trdEyuSKg8L8Eut4CK5RvgOS.png",[48,26246,3931],{},[48,26248,26249],{},"This trade-off explains why Confluent offers a separate cluster type known as \"Freight Clusters,\" while StreamNative provides configurable storage classes as part of tenant and namespace policies. Not all applications can tolerate higher latency, particularly those handling mission-critical transactional workloads.",[48,26251,26252],{},"Transitioning to a leaderless, object-storage-based architecture marks a significant step forward in aligning data streaming with the data lakehouse paradigm, where object storage serves as the headless storage layer for both streaming and batch data. This headless storage layer stores data independently of any specific compute nodes or services, providing greater flexibility in accessing and processing data without dependency on a particular storage server or broker. The key to this architectural shift is taking the index\u002Fdata split concept a step further by moving all metadata and index information for both partitions and log segments, as well as the sequencing work, to a centralized metadata store. This enables all brokers to efficiently access the information needed to locate partitions and log segments and generate sequences in a decentralized manner.",[48,26254,26255],{},"Now, let’s examine how a leaderless data streaming platform effectively addresses the previously outlined challenges:",[321,26257,26258,26261,26276],{},[324,26259,26260],{},"Distribution: With the index\u002Fdata split, all metadata (index) is stored in a centralized metadata store, allowing any broker to access the necessary information to locate partitions and their corresponding log segments. This enables any broker to serve write and read requests efficiently.",[324,26262,26263,26264,26269,26270,26275],{},"Sequencing: In the absence of a leader broker, sequencing tasks are managed in the metadata layer. Brokers accepting write requests commit their writes to the metadata storage, which then generates the sequences for those writes to determine the order. This approach is not entirely new; it has been utilized for many years, as introduced by",[55,26265,26268],{"href":26266,"rel":26267},"https:\u002F\u002Flogdevice.io",[264]," Facebook\u002FMeta’s LogDevice",". For a deeper comparison, check out",[55,26271,26274],{"href":26272,"rel":26273},"https:\u002F\u002Fwww.splunk.com\u002Fen_us\u002Fblog\u002Fit\u002Fcomparing-logdevice-and-apache-pulsar.html",[264]," a blog post by Pulsar PMC member Ivan Kelly",", which contrasts Apache Pulsar with LogDevice.",[324,26277,26278],{},"Truncation: With all metadata\u002Findex managed in a centralized location, truncating logs becomes straightforward. Truncating a log is as simple as removing indices from files in the centralized metadata storage.",[48,26280,26281],{},"The index\u002Fdata split, which has been a foundational element of Pulsar since its inception, underpins this latest evolution in the data streaming landscape. This concept facilitates a seamless transition from a leader-based to a leaderless architecture—an evolution likely to be embraced by all data streaming vendors.",[40,26283,26285],{"id":26284},"the-indexdata-split-multi-modality-and-stream-table-duality","The Index\u002FData Split, Multi-Modality, and Stream-Table Duality",[48,26287,26288,26289,26294,26295,190],{},"In addition to S3 reshaping the design of storage layers for distributed data infrastructures, several other paradigm shifts are influencing the broader data landscape. One such shift is the notion that \"batch is a special case of streaming\". Coined nearly a decade ago, this idea has faced practical implementation challenges. Over the years, various efforts have aimed to consolidate batch and stream processing across different layers. Notably, ",[55,26290,26293],{"href":26291,"rel":26292},"https:\u002F\u002Fbeam.apache.org\u002F",[264],"Apache Beam"," introduced a unified model and set of APIs for both batch and streaming data processing. Additionally, frameworks like Spark and Flink have sought to integrate these processing paradigms within the same runtime and engine. However, because these efforts primarily occur above the data storage layer, they often lead to some form of ",[55,26296,26299],{"href":26297,"rel":26298},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLambda_architecture",[264],"Lambda architecture",[48,26301,26302,26303,26306],{},"The concept that unbounded data encompasses bounded data, where both batch and streaming jobs have windows (with batch jobs having a global window), makes logical sense but has proven difficult to execute—especially when historical sources and real-time streaming sources originate from different systems. With the emergence of S3 as the primary storage for data streaming, along with the index\u002Fdata split, achieving stream-table duality and supporting multi-modality over the same set of physical data files is now possible. Confluent's Tableflow and ",[55,26304,26305],{"href":24893},"StreamNative's Ursa engine"," are moving forward in this direction.",[48,26308,26309],{},"Returning to the initial discussion, it's important to recognize that a stream is fundamentally similar to an append-only table. A lakehouse table is essentially a collection of Parquet files. These Parquet files can be indexed by a Write-Ahead Log (WAL) in Delta Lake, manifest files in Apache Iceberg, or a timeline in Apache Hudi. In essence, lakehouse tables are composed of a set of files (representing the data) and an index (representing the metadata), encapsulating the concept of the index\u002Fdata split discussed in data streaming (see the diagram below).",[48,26311,26312],{},[384,26313],{"alt":18,"src":26314},"\u002Fimgs\u002Fblogs\u002F671a9176830b18c40532029d_AD_4nXeyO_f69QGtx35kOtPdDKzTEOxhIToksTDy4EiUTX1v_9Rx0GStCMPzJNTJU81lX20oRg7psMhMz9xivyDNBz-6G62YE58oLkH4H7oNgq5E1XodQwii3Uw2rZuuiiKn-1YWxswHTuuxStq0FrgN5SF4nu6B.png",[48,26316,3931],{},[48,26318,26319],{},"With S3 as the primary storage in the data streaming space, a stream can be seen as equivalent to a table in a lakehouse. A stream consists of a collection of data files (appended in row formats and compacted into columnar table formats) indexed by \"time\" (sequential numbers generated by the streaming system). This structure supports append-only write operations, seek-and-scan read operations, and offers either time-based or size-based retention policies for removing outdated data.",[48,26321,26322],{},[384,26323],{"alt":18,"src":26324},"\u002Fimgs\u002Fblogs\u002F671a91798a88b4e55d27243e_AD_4nXexzyYfZI5Pgmzeekit2TC049XDOIgNxPDUt9FT5_omffSqAP4ANUdUTxiyskeb4r8JTnuTgRdTUs7RqCPEwOm4GqTiByiWvTgmU5JvMGuLAlaDwVqzyV9Ulf_ddX9-ScL4DS0rq2dT-Q-Pb9hYSawhiX0.png",[48,26326,3931],{},[48,26328,26329,26330,26332],{},"We have now reached a point where we can maintain a single physical copy of data (stored as either row-based WAL files or columnar Parquet files) while providing different indexes for streaming and tabular access. This construct is referred to as \"Headless Multi-Modal Storage\", a key aspect of the \"",[55,26331,20194],{"href":18898},"\" concept. Data streamed into this storage is appended to WAL files, compacted into Parquet files, and organized as lakehouse tables—eliminating the need for separate copies of the data. Changes made to lakehouse tables can be indexed as streams and consumed through Kafka or Pulsar protocols.",[48,26334,26335],{},[384,26336],{"alt":18,"src":26337},"\u002Fimgs\u002Fblogs\u002F671a91765a5b82533b2b9c02_AD_4nXfMCypmo0q4veyJcHGQVuR7jGXbiup9E28hr3yjklJii0PsVyZ8_SIq71RC1MyRk3jVC0WpeDodz2cSJLLQ3d_t7fDxo3qLHIDpb_TyvQbuOYJBM_OZ1zCwSdL4B1_nOyIL48XnBTgnqfqq9_SeXqzI5XaD.png",[48,26339,3931],{},[48,26341,26342,26343,26346,26347,26351],{},"By building multiple modalities (stream or table) over the same data, the system becomes headless, enabling various processing semantics. Data can be processed as continuous streams using engines like Flink or ",[55,26344,512],{"href":520,"rel":26345},[264]," or as cost-effective tables with batch query engines like ",[55,26348,2599],{"href":26349,"rel":26350},"https:\u002F\u002Fwww.databricks.com\u002F",[264]," or Trino. This architecture seamlessly integrates real-time streams and historical tables, providing the flexibility to use the right processing tool for each specific use case.",[48,26353,26354],{},[384,26355],{"alt":18,"src":26356},"\u002Fimgs\u002Fblogs\u002F66fdd3af3d6ed104965f5dae_AD_4nXdNjMOGmQd5Qnu1KMpXZMY8wJtNZtAuGbOpM8IRhaJruLQ5B6ONNiWOtvzmLA0ajhqBqMkiV_DGsIbeIjFuXPOPCG6si6ATNYQY_LffHDkDUaPsCP7_PHiKkb6gQ3D3T6AnfS2CdxbowbX-EsqQe9HPnNLb.png",[48,26358,3931],{},[40,26360,26362],{"id":26361},"final-thoughts","Final Thoughts",[48,26364,26365],{},"The evolution of log storage has been pivotal in shaping the modern data streaming ecosystem. Over the past decade, we have seen a transformation from partition-centric, leader-based models like Apache Kafka to more flexible, segment-centric architectures such as Apache Pulsar. The adoption of S3 and other cloud-native object storage as a primary storage layer marks a significant shift, driving cost-efficiency and simplifying operations by eliminating traditional disk-based replication. This has enabled a transition towards leaderless, headless architectures that align more closely with the data lakehouse paradigm.",[48,26367,26368],{},"As we move into the age of AI, these advancements are accelerating. The convergence of data streaming and lakehouse technologies has led to headless, multi-modal data storage solutions that seamlessly integrate streaming and batch data, creating a robust foundation for real-time generative AI. Lakehouse vendors are beginning to adopt streaming APIs for real-time ingestion, while streaming platforms are incorporating lakehouse formats to provide more versatile data processing capabilities.",[48,26370,26371,26372,26375,26376,26381],{},"The evolution of data streaming storage is ongoing, and headless, multi-modal architectures are emerging as the backbone for real-time generative AI. We look forward to discussing these topics further at the upcoming ",[55,26373,5376],{"href":5372,"rel":26374},[264],", taking place at the Grand Hyatt SFO on October 28-29, 2024. We ",[55,26377,26380],{"href":26378,"rel":26379},"https:\u002F\u002Fwww.eventbrite.com\u002Fe\u002Fdata-streaming-summit-2024-tickets-950220995577?aff=oddtdtcreator",[264],"invite you to join us"," for this exciting event focused on the future of data streaming.",{"title":18,"searchDepth":19,"depth":19,"links":26383},[26384,26385,26386,26387,26388,26389,26390],{"id":26017,"depth":19,"text":26018},{"id":26080,"depth":19,"text":26081},{"id":26169,"depth":19,"text":26170},{"id":26211,"depth":19,"text":26212},{"id":26229,"depth":19,"text":26230},{"id":26284,"depth":19,"text":26285},{"id":26361,"depth":19,"text":26362},"2024-10-25","Discover the transformation of log storage in modern data streaming platforms, from Apache Kafka’s partition-centric model to Apache Pulsar’s segment-centric architecture, and how cloud-native object storage is reshaping data streaming for real-time AI and multi-modal data systems.","\u002Fimgs\u002Fblogs\u002F671a9162c0417d052d8bd4e3_LogStorage_BlogPost-1.png",{},{"title":18766,"description":26392},"blog\u002Fthe-evolution-of-log-storage-in-modern-data-streaming-platforms",[1332,799,303],"Zp0O2KrIP8ednr9NTXzhJg7CUJb9o4mz7Vf4e-wsNWw",{"id":26400,"title":26401,"authors":26402,"body":26403,"category":821,"createdAt":290,"date":26740,"description":26741,"extension":8,"featured":294,"image":26742,"isDraft":294,"link":290,"meta":26743,"navigation":7,"order":296,"path":24075,"readingTime":18649,"relatedResources":290,"seo":26744,"stem":26745,"tags":26746,"__hash__":26748},"blogs\u002Fblog\u002Fannouncing-apache-pulsar-tm-4-0-towards-an-open-data-streaming-architecture.md","Announcing Apache Pulsar™ 4.0: Towards an Open Data Streaming Architecture",[6785],{"type":15,"value":26404,"toc":26721},[26405,26408,26411,26415,26418,26421,26424,26428,26437,26440,26444,26447,26450,26453,26456,26460,26463,26466,26470,26473,26475,26478,26482,26485,26487,26490,26494,26497,26500,26503,26507,26511,26514,26517,26520,26523,26527,26530,26533,26536,26539,26542,26546,26549,26552,26555,26558,26561,26565,26568,26571,26574,26577,26580,26583,26587,26596,26599,26613,26616,26627,26630,26647,26650,26652,26656,26659,26662,26679,26682,26685,26687,26690,26710,26713,26719],[48,26406,26407],{},"We are excited to announce the release of Apache Pulsar 4.0, the second Long-Term Support (LTS) version after the successful introduction of LTS with Pulsar 3.0 in May 2023. Pulsar 4.0 represents a pivotal step forward in our mission to make data streaming more accessible, affordable, and scalable. With a focus on modularity, observability, scalability, and security, this release extends Pulsar advantages for enterprise deployments, which necessarily emphasize this release’s enhanced Quality of Service (QoS) controls. With an ongoing trend towards simplicity and flexibility in data streaming today, this release drives Pulsar closer to becoming the foundation of an Open Data Streaming Architecture.",[48,26409,26410],{},"In this post, we’ll explore the key areas of innovation in Pulsar 4.0 and the Pulsar Improvement Proposals (PIPs) that have been instrumental in shaping this release.",[40,26412,26414],{"id":26413},"a-modular-data-streaming-architecture","A Modular Data Streaming Architecture",[48,26416,26417],{},"From the start, Pulsar was designed with modularity as a core principle. This philosophy has guided our contributors and committers as they continuously enhance the project to make data streaming available and accessible for organizations of all sizes. The modular architecture allows organizations to opt for deployment models that align with their specific security and infrastructure requirements. Over multiple releases, the Apache Pulsar community has transformed every layer of the Pulsar system—including metadata storage, data storage, protocol handling, and load balancing—into fully pluggable components. This flexibility fosters rapid innovation and allows Pulsar to adapt to the ever-changing demands of IT infrastructure.",[48,26419,26420],{},"Modularity has enabled the development of reusable components like Oxia for scalable metadata storage and the S3-based Write-Ahead Logging implementation in the Ursa Engine.",[48,26422,26423],{},"This modular architecture is the result of incremental improvements rather than a single large refactor. Below are some notable modularized components:",[32,26425,26427],{"id":26426},"metadata-storage","Metadata Storage",[48,26429,26430,26431,26436],{},"Metadata storage is the most critical component of a data streaming engine, acting as its central nervous system by managing node coordination, consensus, node membership tracking, and more. Historically, both Kafka and Pulsar have relied on ZooKeeper for metadata storage. While Kafka moved to KRaft as an alternative, the Pulsar community took a different approach by making the metadata storage pluggable (see",[55,26432,26435],{"href":26433,"rel":26434},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-45%3A-Pluggable-metadata-interface",[264]," PIP-45","). This approach allows Pulsar to integrate alternative metadata implementations, such as RocksDB for local standalone instances, etcd as a ZooKeeper alternative, or more scalable solutions like Oxia.",[48,26438,26439],{},"After the Pulsar community stabilized the metadata interface, we looked into addressing a bigger challenge—scaling the number of topic partitions Pulsar could support. This led to the creation of Oxia, which provides a scalable, distributed metadata storage solution, completely different from KRaft's approach.",[32,26441,26443],{"id":26442},"data-storage","Data Storage",[48,26445,26446],{},"Pulsar has supported pluggable data storage from the beginning, implemented through the Managed Ledger. The Managed Ledger represents a segmented stream, with distributed log segments stored in remote storage. This interface was designed specifically for Pulsar's segmented storage architecture, providing an abstraction layer that separates compute from storage. This flexibility also enabled the introduction of Tiered Storage in 2018, making Pulsar the first solution in the market to support this feature.",[48,26448,26449],{},"While the Managed Ledger is inherently flexible, much of Pulsar's core code has historically depended on specific implementation classes such as ManagedLedgerImpl, ManagedCursorImpl, and PositionImpl. This tight coupling made it difficult to introduce new implementations. Pulsar 4.0 addresses this limitation by performing a major refactor of the Managed Ledger, significantly simplifying the process of integrating new implementations. One example implementation is the S3-based Write-Ahead Log in messaging platforms and services powered by Apache Pulsar, such as the ONE StreamNative Platform.",[48,26451,26452],{},"Another change in Pulsar 4.0 supporting new Managed Ledger implementations is the abstraction of the message ID implementation, eliminating the direct dependency on ledger IDs and entry IDs. By centralizing the sequence generation process, this refactor enables more efficient, incremental sequence generation, which is a key requirement for the S3-based write-ahead log implementation.",[48,26454,26455],{},"These changes in Pulsar 4.0 accelerate innovation at the storage layer, opening the door to future improvements and integrations.",[32,26457,26459],{"id":26458},"pluggable-protocols","Pluggable Protocols",[48,26461,26462],{},"One of Pulsar’s key differentiators is its ability to store a single copy of data and serve it through multiple subscription models, each tailored to specific business requirements. This flexibility stands out compared to other technologies, but adopting these advanced features often requires development teams to modify their existing applications and learn new APIs - a process that can impact development timelines. Meanwhile, many existing applications are built around protocols and APIs like Kafka or MQTT. Rather than forcing users to migrate to the Pulsar protocol and APIs, Pulsar’s architecture supports the development of additional protocols and APIs that can leverage its underlying data storage capabilities.",[48,26464,26465],{},"This approach led to the creation of KoP (Kafka on Pulsar), MoP (MQTT on Pulsar), and other protocol handlers. These protocols add capabilities to Pulsar transforming it into an open platform capable of supporting multiple protocols natively. The protocol framework has significantly matured in recent years, particularly with StreamNative driving broader adoption through the Ursa Engine, which is Kafka API-compatible and built on top of Pulsar’s protocol handler framework and powered by Apache Pulsar. The pluggable protocol support in Apache Pulsar allows platforms such as the ONE StreamNative Platform to integrate seamlessly into existing ecosystems while still offering the architectural advantages of its storage layer.",[32,26467,26469],{"id":26468},"load-manager","Load Manager",[48,26471,26472],{},"The Load Manager is a key component in distributing workloads across multiple nodes in Pulsar. Over the course of several releases, the load manager has evolved to include various load-shedding strategies tailored to different types of workloads. This flexibility is one of the reasons why Pulsar can support a wide range of organizations, from startups and unicorns to large enterprises and hyperscalers. The Extensible Load Manager, introduced in Pulsar 3.0, has matured significantly by the time of the Pulsar 4.0 release.",[48,26474,3931],{},[48,26476,26477],{},"Many new functionalities have been built on top of it in the ONE StreamNative Platform, including graceful rollout capabilities and more. At the upcoming Data Streaming Summit, the author of the Extensible Load Manager will provide a deeper dive into its implementation and how it powers features like read-only brokers and graceful rollouts for high availability during cluster upgrades.",[32,26479,26481],{"id":26480},"ongoing-modular-approach","Ongoing Modular Approach",[48,26483,26484],{},"In addition to these major components, many other features in Pulsar are also pluggable, which fosters rapid innovation. These pluggable components include, but are not limited to, the delayed delivery queue implementation, topic compaction service, and more. This modularity allows the Pulsar community to evolve Pulsar’s implementations incrementally, without disrupting the core functionality.",[48,26486,3931],{},[48,26488,26489],{},"With Pulsar 4.0, we are proud to see how this modular approach is shaping the future of data streaming architecture. Pulsar now supports multiple protocols through the protocol handler framework, offers multi-tenancy with workload isolation between tenants, and enables the potential for multi-modality through pluggable storage classes that can be configured at the tenant and namespace levels. This flexibility paves the way for innovations like the StreamNative Ursa Engine, positioning Pulsar as a future-proof solution for modern data streaming needs.",[40,26491,26493],{"id":26492},"built-in-opentelemetry-for-comprehensive-observability","Built-in OpenTelemetry for Comprehensive Observability",[48,26495,26496],{},"As the industry continues to standardize around OpenTelemetry for observability, Pulsar 4.0 embraces this evolution by integrating with the framework. This integration provides robust telemetry data collection, greatly improving debugging, monitoring, and performance insights at scale.",[48,26498,26499],{},"The implementation of PIP-264 Enhanced OTel-based metric system introduces key improvements to manage the issue of cardinality in large-scale deployments where brokers may handle between 10k-100k topics. A central solution introduced in this PIP is the Topic Metric Group, a new aggregation level for metrics. This feature allows users to organize topics into groups through configurations using wildcards or regular expressions, effectively organizing large topic sets into manageable groups. By offering a more granular aggregation level—beyond just namespaces—users can control how topics are grouped, thus reducing the burden of tracking metrics across many topics. This approach strikes a balance between reducing cardinality and maintaining necessary levels of detail for observability.",[48,26501,26502],{},"Another critical aspect of PIP-264 is the fine-grained filtering mechanism. This rule-based dynamic configuration allows users to specify which metrics should be collected or dropped at the namespace, topic, or group level. By default, only a minimal subset of essential metrics is retained at the group or namespace level, while unnecessary metrics are discarded to maintain efficiency. However, when performance issues or anomalies arise, users can dynamically override these default settings, expanding metric collection at higher granularity levels—down to the topic or even consumer\u002Fproducer level. This dynamic filtering system allows for real-time responsiveness to issues, similar to adjusting logging levels dynamically. After the need for observing detailed metrics is resolved, users can disable these overrides to return to the default filtering settings, maintaining optimal system performance. As a result, performance remains optimized while still delivering valuable insights when necessary.",[40,26504,26506],{"id":26505},"increased-scalability-to-support-demanding-workloads","Increased Scalability to Support Demanding Workloads",[32,26508,26510],{"id":26509},"enhanced-load-balancing-for-millions-of-topics","Enhanced Load Balancing for Millions of Topics",[48,26512,26513],{},"Since the introduction of the Extensible Load Manager in Pulsar 3.0, Pulsar's load management capabilities have evolved significantly with each subsequent release. The Extensible Load Manager has consistently introduced new features designed to handle dynamic workloads efficiently while improving overall system performance and resource utilization. These advancements have made the load manager a crucial component in ensuring scalability and reliability in modern data streaming systems.",[48,26515,26516],{},"One of the load balancing enhancements came with PIP-307, which optimized the bundle transfer protocol for the Extensible Load Manager. This improvement eliminates the need for redundant topic lookups during bundle transfers, reducing publish latency spikes and improving performance when unloading large numbers of topics. The new protocol also introduces graceful Managed Ledger shutdown, minimizing potential race conditions during ownership transfers and ensuring smoother topic transitions between brokers.",[48,26518,26519],{},"Building on that, PIP-354 in Pulsar 4.0 introduces dynamic improvements that further enhance the adaptability of the load manager. A new automatic load-shedding mechanism enables brokers to autonomously adjust to fluctuating workloads by redistributing topics to balance the load across the cluster. This minimizes bottlenecks and prevents individual brokers from becoming overloaded, ensuring that the system remains stable even during periods of high traffic. Additionally, with enhanced metrics integration, the load manager can make more intelligent, real-time decisions on resource allocation, making Pulsar more resilient to sudden changes in workload patterns.",[48,26521,26522],{},"From the StreamNative perspective, the introduction of Oxia as a scalable metadata storage backend—improving observability and addressing the cardinality issues in metrics collection—combined with the enhancements to the load balancer, position the ONE StreamNative Platform to support beyond millions of topic partitions. These improvements ensure that Pulsar can scale effectively, maintaining performance and reliability in even the most demanding data streaming environments.",[40,26524,26526],{"id":26525},"enhanced-key_shared-subscription-scale-without-compromising-message-order","Enhanced Key_Shared Subscription: Scale Without Compromising Message Order",[48,26528,26529],{},"Key_Shared subscription is one of Pulsar's most valuable features, enabling organizations to scale their message processing capacity by adding multiple consumers while maintaining strict message ordering based on keys. This capability is crucial for applications requiring both high throughput and ordered processing, such as financial transactions, event processing, and real-time analytics.",[48,26531,26532],{},"In Pulsar 4.0, we've improved the Key_Shared subscription implementation through a significant enhancement with PIP-379. The new design ensures messages with the same key are handled by only one consumer at a time, while eliminating unnecessary message blocking that previously impacted system performance during consumer changes and application restarts.",[48,26534,26535],{},"The enhancement brings business value through improved service reliability and operational efficiency. Organizations can now scale their consumer application count dynamically without worrying about message ordering inconsistencies or system slowdowns. When consumers are added or removed, only the affected message keys are temporarily managed, rather than blocking entire message streams.",[48,26537,26538],{},"Operations teams can quickly identify and resolve any Key_Shared ordered message delivery issues through comprehensive troubleshooting metrics in Pulsar topic stats. This translates to reduced system downtime and faster incident resolution, crucial for maintaining service level agreements in production environments. Future improvements will introduce a REST API that will further simplify troubleshooting by providing direct access to unacknowledged message details and powerful key-based search capabilities for resolving message delivery issues where typically the root cause is in an application that doesn't acknowledge a message and due to message ordering constraints, further messages for the key are blocked. Web based user interfaces and CLI tools can build upon this REST API, allowing also automation for resolving or alerting in operations. Related Key_shared troubleshooting metrics will also be exposed via Prometheus and OTel interfaces in future updates.",[48,26540,26541],{},"This major improvement positions Pulsar 4.0 as an even more compelling choice for organizations requiring both strict message ordering and high scalability in their data streaming architecture, particularly valuable for businesses processing millions of ordered events across a large number of consumers.",[40,26543,26545],{"id":26544},"enhanced-secure-docker-image-runtime-based-on-alpine-and-java-21","Enhanced Secure Docker Image Runtime Based on Alpine and Java 21",[48,26547,26548],{},"Pulsar 4.0 contains enhancements to its Docker runtime environment, combining the security benefits of Alpine Linux with the performance improvements of Java 21's runtime. PIP-324 introduced in Pulsar 3.3.0 aligns with our commitment to providing a secure, efficient, and resource-optimized platform for messaging workloads.",[48,26550,26551],{},"The new Docker images are now based on Alpine Linux instead of Ubuntu, reducing the image size while improving the security posture.",[48,26553,26554],{},"A key security enhancement is the elimination of CVEs in the base image. While the previous Ubuntu-based images carried 12 Medium\u002FLow CVEs with no available resolution, the new Alpine-based images start with zero CVEs, providing a more secure foundation for production deployments. This improvement is particularly valuable for organizations with strict security requirements and compliance needs.",[48,26556,26557],{},"The Docker images now include Java 21 with Generational ZGC, bringing significant improvements in garbage collection performance. Generational ZGC provides sub-millisecond pause times, better CPU utilization, and improved memory efficiency compared to previous garbage collectors. This translates to more predictable latencies and better resource utilization for Pulsar deployments.",[48,26559,26560],{},"These improvements make Pulsar 4.0's Docker runtime an even more compelling choice for organizations requiring both security and performance in their messaging infrastructure. The combination of Alpine Linux's minimal attack surface and Java 21's advanced garbage collection provides a robust foundation for running Pulsar in containerized environments.",[40,26562,26564],{"id":26563},"enhanced-quality-of-service-controls","Enhanced Quality of Service Controls",[48,26566,26567],{},"Apache Pulsar's truly multi-tenant architecture has made it a preferred choice for organizations building messaging-as-a-service platforms versus disparate, siloed clusters. The platform's ability to efficiently manage resources across multiple tenants while maintaining service reliability has proven particularly valuable in demanding enterprise environments.",[48,26569,26570],{},"In Pulsar 4.0, we highlight significant improvements in Quality of Service (QoS) controls, particularly through PIP-322 that was introduced in Pulsar 3.2. This enhancement refactors the rate limiting implementation, addressing critical performance issues that previously impacted service reliability and system performance during high-load scenarios.",[48,26572,26573],{},"Rate limiting serves as the foundation for comprehensive capacity management in multi-tenant environments. One of the key goals of capacity management in a multi-tenant system is to address the \"noisy neighbor\" problem - where one tenant's workload negatively impacts others - without requiring significant infrastructure overprovisioning to handle peak loads.",[48,26575,26576],{},"The new rate limiting implementation uses an efficient token bucket algorithm that provides accurate and consistent rate limiting across all levels - broker, topic, and resource group. This unified approach eliminates the need for previous separate \"default\" and \"precise\" rate limiters, significantly reducing CPU overhead and lock contention that previously affected IO threads and added unnecessary latency for resources that weren’t throttled.",[48,26578,26579],{},"The refactored rate limiting system provides more consistent behavior when handling various throttling scenarios. This ensures more predictable performance in multi-tenant environments where multiple rate limiting conditions may apply simultaneously.",[48,26581,26582],{},"These QoS improvements position Pulsar as an even more robust platform for messaging-as-a-service teams, enabling better service level management and capacity control in large-scale deployments. The enhanced rate limiting system provides a foundation for future QoS features, particularly valuable for organizations requiring precise control over resource utilization and improved service reliability across multiple tenants.",[40,26584,26586],{"id":26585},"the-pips-behind-pulsar-40","The PIPs Behind Pulsar 4.0",[48,26588,26589,26590,26595],{},"Pulsar 4.0 ",[55,26591,26594],{"href":26592,"rel":26593},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002Fversioned\u002Fpulsar-4.0.0\u002F",[264],"includes numerous Pulsar Improvement Proposals (PIPs)"," that have enhanced the platform's capabilities across multiple areas. Here are some of the most significant improvements:",[48,26597,26598],{},"Core Architecture",[321,26600,26601,26604,26607,26610],{},[324,26602,26603],{},"PIP-264: Implements comprehensive OpenTelemetry integration for improved observability",[324,26605,26606],{},"PIP-335: Introduces Oxia as a scalable metadata storage solution, offering improved scalability and reliability",[324,26608,26609],{},"PIP-376: Makes topic policies service pluggable for better extensibilityPIP-379: Enhances Key_Shared subscription with improved message ordering and troubleshooting capabilities",[324,26611,26612],{},"PIP-384: Decouples ManagedLedger interfaces for more flexible storage implementations",[48,26614,26615],{},"Performance and Scalability",[321,26617,26618,26621,26624],{},[324,26619,26620],{},"PIP-354: Applies topK mechanism to ModularLoadManagerImpl for better resource utilization",[324,26622,26623],{},"PIP-358: Enhances resource weight functionality across load management componentsPIP-364: Introduces a new AvgShedder load balancing algorithm",[324,26625,26626],{},"PIP-378: Adds ServiceUnitStateTableView abstraction for improved state management",[48,26628,26629],{},"Security and Operations",[321,26631,26632,26635,26638,26641,26644],{},[324,26633,26634],{},"PIP-324: Introduces Alpine-based Docker images for reduced attack surface and smaller footprint",[324,26636,26637],{},"PIP-337: Adds SSL Factory Plugin for customized SSL Context and Engine generation",[324,26639,26640],{},"PIP-347: Adds role field in consumer statistics for better authentication tracking",[324,26642,26643],{},"PIP-369: Introduces flag-based selective unload for namespace isolation policy changes",[324,26645,26646],{},"PIP-383: Supports granting\u002Frevoking permissions for multiple topics",[48,26648,26649],{},"Many of these improvements build upon features introduced in Pulsar 3.x releases, such as PIP-322 (Rate Limiting Refactoring) from 3.2, which laid the groundwork for better multi-tenancy and Quality of Service controls. The combination of these PIPs positions Pulsar 4.0 as a significant step forward in building a more robust, scalable, and secure streaming platform.",[48,26651,3931],{},[40,26653,26655],{"id":26654},"thank-you-to-apache-pulsar-contributors","Thank You to Apache Pulsar Contributors",[48,26657,26658],{},"Apache Pulsar 4.0 represents the collaborative effort of a vibrant and growing open-source community. This landmark release was made possible through the dedication and contributions of developers, organizations, and users worldwide who share our vision of making data streaming more accessible, affordable, and scalable.",[48,26660,26661],{},"We extend our deepest gratitude to:",[321,26663,26664,26667,26670,26673,26676],{},[324,26665,26666],{},"The individual contributors who developed new features, reported bugs, fixed bugs, and improved documentation",[324,26668,26669],{},"The committers and PMC members who guided the project's technical direction",[324,26671,26672],{},"The organizations that have deployed Pulsar in production and shared their valuable feedback",[324,26674,26675],{},"The users who participated in testing and provided invaluable input during the release process",[324,26677,26678],{},"The broader Apache Software Foundation community for their continued support",[48,26680,26681],{},"Your collective efforts have not only shaped this release but continue to strengthen Apache Pulsar's position as a leading data streaming platform. The improvements in Pulsar 4.0 reflect our community's commitment to technical excellence and innovation.",[48,26683,26684],{},"We welcome new contributors to join our community and help us build the future of data streaming technology. Whether through code contributions, documentation improvements, or sharing your Pulsar deployment experiences, every contribution helps make Pulsar better for everyone.",[40,26686,2125],{"id":2122},[48,26688,26689],{},"Apache Pulsar 4.0 marks a transformative milestone as our second Long-Term Support (LTS) release, delivering major advancements in modularity, observability, and scalability. This release significantly enhances Pulsar's position as the foundation for an open data streaming architecture, with improvements that address critical enterprise needs:",[321,26691,26692,26695,26698,26701,26704,26707],{},[324,26693,26694],{},"A fully modular architecture that enables flexible deployment models and storage options, from Oxia for metadata to S3-based write-ahead logging",[324,26696,26697],{},"Built-in OpenTelemetry integration providing deep insights into system performance and behavior",[324,26699,26700],{},"Enhanced Key_Shared subscriptions bringing improved message ordering while maintaining scalability",[324,26702,26703],{},"Advanced load balancing capabilities supporting millions of topic partitions",[324,26705,26706],{},"Strengthened Quality of Service controls with refined rate limiting",[324,26708,26709],{},"A more secure and efficient containerized runtime based on Alpine Linux and Java 21",[48,26711,26712],{},"These enhancements make Pulsar 4.0 a compelling choice for organizations building modern data streaming applications, from startups to global enterprises. The combination of enterprise-grade features, operational simplicity, and robust security positions Pulsar as a foundational technology for the future of data streaming.",[48,26714,26715,26716,26718],{},"We invite the global community of developers, architects, and data engineers to explore Pulsar 4.0's capabilities and join us in advancing the state of the art in data streaming technology. We also invite you to ",[55,26717,24379],{"href":6392}," to explore the advantages that we provide on top of Pulsar in our ONE StreamNative Platform which simplifies your data streaming initiatives.",[48,26720,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":26722},[26723,26730,26731,26734,26735,26736,26737,26738,26739],{"id":26413,"depth":19,"text":26414,"children":26724},[26725,26726,26727,26728,26729],{"id":26426,"depth":279,"text":26427},{"id":26442,"depth":279,"text":26443},{"id":26458,"depth":279,"text":26459},{"id":26468,"depth":279,"text":26469},{"id":26480,"depth":279,"text":26481},{"id":26492,"depth":19,"text":26493},{"id":26505,"depth":19,"text":26506,"children":26732},[26733],{"id":26509,"depth":279,"text":26510},{"id":26525,"depth":19,"text":26526},{"id":26544,"depth":19,"text":26545},{"id":26563,"depth":19,"text":26564},{"id":26585,"depth":19,"text":26586},{"id":26654,"depth":19,"text":26655},{"id":2122,"depth":19,"text":2125},"2024-10-24","Discover Apache Pulsar 4.0, the latest LTS release featuring a modular architecture, advanced load balancing, improved observability with OpenTelemetry, and enhanced Quality of Service controls. Ideal for scalable, secure, and flexible data streaming solutions.","\u002Fimgs\u002Fblogs\u002F671975f63c706f74bca76169_Pulsar_BlogPost.png",{},{"title":26401,"description":26741},"blog\u002Fannouncing-apache-pulsar-tm-4-0-towards-an-open-data-streaming-architecture",[302,821,4301,26747],"Observability","Bap7isUYbRsRN9sWxobQTYJMUQ9b4OLV4bacD4Eq2ls",{"id":26750,"title":26751,"authors":26752,"body":26753,"category":290,"createdAt":290,"date":26896,"description":26897,"extension":8,"featured":294,"image":26898,"isDraft":294,"link":290,"meta":26899,"navigation":7,"order":296,"path":26900,"readingTime":17934,"relatedResources":290,"seo":26901,"stem":26902,"tags":26903,"__hash__":26904},"blogs\u002Fblog\u002Fbyoc2-portable-data-plane-and-the-vision-of-making-data-streaming-accessible-and-affordable.md","BYOC², Portable Data Plane, and the vision of making data streaming accessible and affordable",[806],{"type":15,"value":26754,"toc":26890},[26755,26771,26774,26777,26781,26784,26787,26793,26796,26799,26802,26806,26809,26812,26815,26818,26821,26824,26829,26831,26835,26842,26844,26847,26852,26854,26857,26860,26863,26866,26870,26873,26876,26879,26882],[48,26756,26757,26758,26760,26761,26765,26766,26770],{},"At StreamNative, we understand the challenges organizations face in today’s economic environment: rising costs, data sovereignty concerns, data accessibility issues, and the need for economic scalability. Our vision is to simplify data streaming by offering an affordable, accessible, and scalable solution that addresses these pain points directly. With our ",[55,26759,4725],{"href":24893},", we empower users to optimize storage by selecting the most cost-effective storage classes without sacrificing availability or performance. This flexibility allows businesses to balance their data needs with budget constraints, ensuring seamless data streaming even in uncertain times. Our recent ",[55,26762,26764],{"href":26763},"\u002Fisvs\u002Fververica","partnership with Ververica"," brings us closer to realizing this vision. This collaboration is built on a foundation we call BYOC², offering a ",[55,26767,26769],{"href":26768},"\u002Fflink","Managed Flink service"," to our customers.",[48,26772,26773],{},"BYOC² stands for \"Bring Your Own Compute\" and \"Bring Your Own Cloud.\" Your data remains in your own bucket, and you bring your own engine to process it—all within your cloud account with a fully managed experience.",[48,26775,26776],{},"We envision BYOC² as a model for how we partner with compute engine vendors to bring the most affordable and secure data streaming solution to our customers. Let's delve deeper into the model, the rationale behind this strategic partnership, its origins, and the direction we are heading.",[40,26778,26780],{"id":26779},"byoc-bring-your-own-compute","BYOC: Bring Your Own Compute",[48,26782,26783],{},"The first part of the equation is Bring Your Own Compute.",[48,26785,26786],{},"The concept of Bring-Your-Own-Compute (BYOC) has been present in data lakes, data warehouses, and lakehouses for many years. With the emergence of S3 as the primary storage and the adoption of open table formats as a standard for storing structured tabular data in S3, BYOC is evident. You own your data in your bucket, and you can plug in your processing engine to access that data. This paradigm is well-established in the broader data ecosystem, often referred to as \"Headless Data Storage\", where all your data is stored in a central repository, and the engines plug in to process it.",[48,26788,26789,26790,190],{},"However, this concept has not been as popular in data streaming. One major barrier to the mainstream adoption of data streaming is the high cost, especially networking expenses. Unlike batch processing in lakehouses, stream processing engines using the Kafka protocol heavily depend on networking, as it requires streaming data in and out, which can be expensive, as we have previously discussed as part of the",[55,26791,26792],{"href":18969}," New CAP theorem",[48,26794,26795],{},"This has started to change, particularly as more data streaming vendors transition from using S3 as tiered storage to using S3 as primary storage. The concept of stream-table duality has now become a reality. Our Ursa Engine represents an innovative approach that evolves the storage engine into a headless, multi-modality storage solution for both streams and tables. Essentially, you only need to store one copy of the data and can consume it either through stream semantics or table semantics. This reduces the cost, complexity, and redundancy of storing data across different systems.",[48,26797,26798],{},"With a headless, multi-modality storage engine and standard streaming APIs like Pulsar or Kafka, Bring-Your-Own (Streaming) Compute becomes a reality for data streaming. You can keep your data in your bucket, consume it as a table or stream, and bring the streaming compute engine into your account, eliminating inter-account or inter-zone traffic.",[48,26800,26801],{},"Our Managed Flink offering, announced in partnership with Ververica, is built on this concept of BYO-Compute. Through our strategic OEM partnership, we bring Apache Flink closer to where the data resides. Your Kafka or Pulsar data stays in your cloud, while Flink jobs run in your own VPC. You maintain ownership of the data, incur no expensive networking costs, and enjoy the same fully managed experience.",[40,26803,26805],{"id":26804},"byoc-dedicated-private-all-you-need-is-a-portable-data-plane","BYOC, Dedicated, Private - All You Need is a Portable Data Plane",[48,26807,26808],{},"The other part of the equation is BYO-Cloud. BYO-Cloud has gained attention recently, partly due to the WarpStream acquisition. In our view, whether it's BYO-Cloud, Dedicated\u002FServerless, or self-managed in a Private Cloud, what organizations truly need is a Portable Data Plane. This allows them to choose the deployment option that best suits their needs (balancing data sovereignty, a fully managed experience, and cost) while providing the flexibility to switch between options.",[48,26810,26811],{},"A Data Streaming Platform is unique in the data infrastructure landscape. It serves as both a data platform and an integration platform, crucial for an enterprise's multi-cloud and hybrid-cloud strategies. Many use cases for data streaming involve moving data between public and private clouds. Therefore, having a portable data stack that can operate seamlessly in the public cloud as a fully managed SaaS or BYOC, and in a private cloud as a self-managed solution, is essential.",[48,26813,26814],{},"While Pulsar or Kafka can run in any cloud environment, ensuring the same fully managed experience across different environments is much more challenging. To make data streaming accessible to organizations of all sizes, we designed our data plane to be a Portable Data Plane from day one.",[48,26816,26817],{},"A Portable Data Plane is one that is portable across different deployment options—suitable for Dedicated & Serverless, BYOC, and Private deployments. To make a Data Plane portable, the entire data stack running in the data plane must have no external dependencies. The unavailability of any other data plane or the centralized control plane should not affect the applications interacting with the services in the data plane.",[48,26819,26820],{},"The first step to ensure a data plane is portable is adopting the concept of immutable infrastructure, aligned with cloud-native computing philosophies. This means the entire data plane can be described as code or configuration (e.g., Terraform states, Kubernetes manifests). This code and configuration can be used to rebuild the data plane, enabling quick migration from one location to another—from one public cloud to another, or between public and private clouds.",[48,26822,26823],{},"With a portable data plane, the control plane can become lightweight and avoid being a bottleneck or a single point of failure, serving primarily as the orchestration layer.",[48,26825,26826],{},[384,26827],{"alt":18,"src":26828},"\u002Fimgs\u002Fblogs\u002F6712857a41cd68ec71b2324c_AD_4nXc5tDFMrHZDkEZq3hI9fGoPmU6pR-EDRjnJZeZce_ZgPUDFvyEvvsSBF_ytClSUQaHSNuc1cLMYb4rgQWaFchoyF9qtmzLa7e3v4bwQuOeHSPCrmQdrR6te7jZEq6a4N1qd8iqLFIWIOGIvFKwCk_3yJOU.png",[48,26830,3931],{},[40,26832,26834],{"id":26833},"build-a-simplified-portable-cloud-on-kubernetes","Build A Simplified Portable Cloud on Kubernetes",[48,26836,26837,26838,26841],{},"We have designed our Portable Data Plane in layers and embraced open standards as much as possible. Our Portable Data Plane is built around Kubernetes, the cornerstone of cloud-native computing. We divided resource provisioning into two layers: Infrastructure and Workloads. The Infrastructure layer provisions workload pools, which are collections of Kubernetes clusters used for deploying workloads. A workload can be a Pulsar cluster, a ",[55,26839,11512],{"href":26840},"\u002Funiconn"," connector, a Pulsar function, or a Flink job. The infrastructure layer is managed with Terraform. Within the workload pools, all workloads are defined as Custom Resources and managed by Kubernetes operators.",[48,26843,3931],{},[48,26845,26846],{},"In this model, the distinction between different deployment options lies in infrastructure provisioning and management. In the Dedicated\u002FServerless deployment model, we own, provision, and manage the infrastructure. In the BYOC deployment model, customers own the resources, and we provision and manage them on their behalf. In a Private deployment model, customers own, provision, and manage the infrastructure.",[48,26848,26849],{},[384,26850],{"alt":18,"src":26851},"\u002Fimgs\u002Fblogs\u002F6712857a790e0aad20781930_AD_4nXd45u2Y0azRQNmQL_Np_Gu7SW1zPvZ20ycAyByW-pgv5Ex8M4opuZ3mhQtE4_HzurbQaf2ZF0Xbk_o3IKxo7yewv6Rndoz7RqjOPTN8PvUMOTMHIS0XltzkZc18EEL3UcbDkzSLFg_nrHy_Y8KQMrBTsj3R.png",[48,26853,3931],{},[48,26855,26856],{},"Once infrastructure is provisioned, all workloads are provisioned and managed by Kubernetes operators, making everything uniform.",[48,26858,26859],{},"The security model for BYOC becomes straightforward. We can use cross-account IAM permissions for provisioning infrastructure or delegate infrastructure provisioning to the customer using our Terraform scripts. Once infrastructure is provisioned, our control plane only requires management permissions to access the Kubernetes clusters and provision and manage workloads. The control plane's role is to propagate the manifests describing workload changes. The operators running in the data plane reconcile the state to apply these changes.",[48,26861,26862],{},"This model is simple, powerful, and secure. No closed-source agents need to be installed—everything is executed via Kubernetes API and standards, ensuring an open approach. Changes are auditable through Kubernetes tools, allowing for self-auditing. Unlike an agent-based approach, you do not have to blindly trust the vendor's promises.",[48,26864,26865],{},"If desired, you can disconnect the entire data plane by revoking our control plane's permissions to access your Kubernetes cluster, halting the propagation of changes to the clusters.",[40,26867,26869],{"id":26868},"byoc-a-new-approach-towards-data-sovereignty","BYOC²: A New Approach Towards Data Sovereignty",[48,26871,26872],{},"Your data remains in your bucket, and you bring your own engine to process it—all within your cloud account with a fully managed experience.",[48,26874,26875],{},"Fully managed SaaS services provide excellent building blocks to develop real-time AI applications faster and bring ideas to fruition more rapidly. However, with the increasing complexity of regulations and geopolitical situations, sovereignty has become a major concern for many enterprises. While startups can quickly bring ideas to reality with fully managed SaaS services, they often relinquish ownership of their data and, consequently, access to it. In many cases, data becomes confined by SaaS providers. A simple query can incur significant costs, and access to the data is restricted to a specific API.",[48,26877,26878],{},"With BYO-Cloud, you retain ownership of the data in your bucket. We are extending this concept by collaborating with partners in the ecosystem to bring Compute Engine to the data. We believe BYOC² represents a new approach to data sovereignty throughout the entire data lifecycle, offering a more affordable solution.",[48,26880,26881],{},"We are not alone in this vision. The partnership between StreamNative and Ververica exemplifies the BYOC² model, and we are excited to collaborate with other stream processing engine vendors to provide the best tools to our customers. The future looks promising with a straightforward, portable cloud under the BYOC² framework.",[48,26883,26884,26885,26889],{},"We look forward to discussing these topics further at the upcoming ",[55,26886,5376],{"href":26887,"rel":26888},"http:\u002F\u002Fdatastreaming-summit.org",[264],", scheduled for October 28-29, 2024, at the Grand Hyatt SFO. Join us for this engaging event focused on all aspects of data streaming.",{"title":18,"searchDepth":19,"depth":19,"links":26891},[26892,26893,26894,26895],{"id":26779,"depth":19,"text":26780},{"id":26804,"depth":19,"text":26805},{"id":26833,"depth":19,"text":26834},{"id":26868,"depth":19,"text":26869},"2024-10-18","Discover how StreamNative's BYOC² model, powered by the Ursa Engine, simplifies data streaming with a flexible, cost-effective, and scalable solution. Learn about our partnership with Ververica and how BYO-Compute and BYO-Cloud enhance data sovereignty and reduce costs.","\u002Fimgs\u002Fblogs\u002F6712cc8c4fea103b02df24f2_Screenshot-2024-10-18-at-12.31.27-PM.png",{},"\u002Fblog\u002Fbyoc2-portable-data-plane-and-the-vision-of-making-data-streaming-accessible-and-affordable",{"title":26751,"description":26897},"blog\u002Fbyoc2-portable-data-plane-and-the-vision-of-making-data-streaming-accessible-and-affordable",[10322,1332,5954],"jBA61n6m6HQMERguiUadW56Xvxrsrl2-sC4uP3fMXuc",{"id":26906,"title":26907,"authors":26908,"body":26909,"category":290,"createdAt":290,"date":27317,"description":27318,"extension":8,"featured":294,"image":27319,"isDraft":294,"link":290,"meta":27320,"navigation":7,"order":296,"path":27321,"readingTime":22989,"relatedResources":290,"seo":27322,"stem":27323,"tags":27324,"__hash__":27325},"blogs\u002Fblog\u002Fintroducing-managed-apache-flink-in-streamnative-cloud.md","Introducing Managed Flink In StreamNative Cloud",[311],{"type":15,"value":26910,"toc":27302},[26911,26914,26917,26920,26923,26925,26930,26932,26938,26946,26949,26953,26955,26960,26962,26965,26969,26972,26977,26980,26983,26987,26990,26994,26997,27014,27019,27022,27026,27029,27032,27037,27039,27042,27046,27050,27053,27064,27067,27071,27075,27078,27082,27090,27094,27105,27109,27120,27124,27127,27131,27134,27138,27149,27153,27164,27168,27179,27181,27185,27196,27200,27203,27207,27212,27216,27227,27231,27236,27238,27242,27245,27249,27260,27264,27269,27273,27278,27280,27284,27292,27295,27297,27300],[48,26912,26913],{},"As enterprises face increasing demands for real-time insights, they are under growing pressure to adopt stream processing solutions such as Apache Flink® to power real-time data and AI use cases.",[48,26915,26916],{},"In particular, enterprises leveraging a Bring Your Own Cloud (BYOC) model for stream processing encounter several challenges. These include network latency and performance bottlenecks when streaming and processing data across clouds or regions, as well as performance complex network configurations involving VPC peering, private endpoints, and secure connections to ensure seamless data flow. Additionally, data transfer and egress costs can rise significantly, especially when large volumes of streaming data are transmitted between clouds and processing environments.",[48,26918,26919],{},"Overcoming these challenges requires optimized connectivity, secure communication, and efficient cost management to maintain smooth, secure, and cost-effective data streaming and processing. One effective approach is to bring processing closer to data by deploying stream processing solutions within the same network, minimizing cross-cloud data movement and reducing latency and networking costs.",[48,26921,26922],{},"In this blog, you'll discover how two of the most powerful engines in the streaming ecosystem—StreamNative's URSA for Data Streaming and Ververica's VERA for Stream Processing—are coming together to deliver an integrated, out-of-the-box solution for building stream processing applications. As part of this offering, StreamNative is bringing processing closer to data by deploying Ververica's VERA engine alongside URSA within the same Virtual Private Cloud (VPC) in StreamNative Cloud, delivering a fully managed Flink service. This strategic integration minimizes data movement across networks, reduces latency, and optimizes costs, providing a seamless experience for real-time data ingestion and processing. This combination is set to transform how enterprises manage real-time data, empowering them with efficient and scalable streaming capabilities.",[48,26924,3931],{},[48,26926,26927],{},[384,26928],{"alt":18,"src":26929},"\u002Fimgs\u002Fblogs\u002F670d4de0dbcf18dcd84d4d6c_AD_4nXcfmrosNVRNTsdLic_H_XDu1s38KQJ3sv9POoE7svylIPCEraT8Bmwksv7jH7sY011m3ls-n0Fp72FWMQkUmeRdmqPMGYMvOkd7eTgDhU-i6mcokydNFnEAarM6A_BodBPtKTHZa2uRvjokxrUfoxhZf8CO.png",[48,26931,3931],{},[48,26933,26934,26937],{},[55,26935,26936],{"href":24893},"StreamNative's URSA engine"," is a truly cloud-native data streaming platform, designed to be much easier to run while reducing TCO by 60% and increasing throughput by 2.5 times. Ursa empowers organizations to effortlessly manage high-throughput, low-latency data streams across a wide range of use cases on Apache Kafka and Apache Pulsar, delivering exceptional performance and cost efficiency.",[48,26939,26940,26945],{},[55,26941,26944],{"href":26942,"rel":26943},"https:\u002F\u002Fdocs.ververica.com\u002Fvvc\u002Fabout-ververica-cloud\u002Fvera\u002F",[264],"Ververica's VERA engine"," offers up to 2x the performance of Apache Flink through advanced features like SQL optimization, GeminiStateBackend, and Autopilot tuning. It also includes key capabilities such as Advanced CDC, Dynamic CEP, built-in Flink ML, and Apache Paimon connectors for enhanced stream processing.",[48,26947,26948],{},"When VERA is paired with URSA, users get a native, integrated solution that takes care of both ends of the real-time data pipeline—ingesting and processing data in real time with minimal effort and maximum efficiency.",[40,26950,26952],{"id":26951},"introducing-streamnatives-fully-managed-service-for-apache-flink","Introducing StreamNative’s Fully Managed Service for Apache Flink",[48,26954,3931],{},[48,26956,26957],{},[384,26958],{"alt":18,"src":26959},"\u002Fimgs\u002Fblogs\u002F670d4de15b7527bff8fb32f5_AD_4nXc_W1LUPhlTpjbdKkk9xPEKk_9_lAFHbqMXHkvUQIdRgFNR4rC8FwIzK_PCjdRZ-S3ScWcyQinA6szXFG8i3xOXVaoTIiVCYHRNan4aFhV6d_QN7ftfGNd-pScYqdpbsQWO_UQTv7WIn3L7t2O15qJ5Lw4.png",[48,26961,3931],{},[48,26963,26964],{},"An outcome of this collaboration is StreamNative's Fully Managed Service for Apache Flink, powered by Ververica's VERA engine. This managed service is tailored for enterprises transitioning from batch to real-time stream processing. With this service, businesses can focus on building stream processing applications without worrying about infrastructure management or operational overhead. Enterprises can now build, deploy, and scale stream processing applications quickly, benefiting from the combined expertise of both StreamNative and Veverica in the streaming data space.",[40,26966,26968],{"id":26967},"flink-capabilities-in-streamnative-cloud","Flink Capabilities in StreamNative Cloud",[48,26970,26971],{},"StreamNative Cloud empowers users to create and deploy stream processing applications as Flink Deployments within a StreamNative Workspace. While Java-based deployments are currently supported, future updates will enable users to leverage Python and Flink SQL to interact with backend systems seamlessly.",[48,26973,26974],{},[384,26975],{"alt":5878,"src":26976},"\u002Fimgs\u002Fblogs\u002F67106471b2a055e9f0831c5d_6710646dbeb230632314af72_StreamNativeFlinkWorkspace.png",[48,26978,26979],{},"A StreamNative Workspace serves as a unified environment for organizing and managing Flink, Kafka, and Pulsar compute resources, facilitating the efficient execution of stream processing applications. Users can configure Flink deployments to access and process data from topics across one or multiple Kafka\u002FPulsar clusters within the workspace.",[48,26981,26982],{},"Users have the ability to perform full lifecycle management (create, update, and delete) of their Flink deployments, ensuring operational flexibility and control.",[40,26984,26986],{"id":26985},"stateful-flink-deployments","Stateful Flink Deployments",[48,26988,26989],{},"StreamNative Cloud also supports stateful Flink deployments, utilizing cloud-based storage for state management. Currently, Google Cloud Storage (GCS) is supported, with plans to expand support to AWS S3 and Azure Blob Storage in the future. Flink automatically manages the state configuration, eliminating the need for users to handle backend storage settings. This fully managed approach simplifies the deployment of stateful stream processing applications, ensuring reliable state management with minimal user involvement.",[40,26991,26993],{"id":26992},"value-proposition-streamnative-cloud-managed-flink","Value Proposition - StreamNative Cloud Managed Flink",[48,26995,26996],{},"The fully managed Apache Flink service within StreamNative Cloud delivers exceptional value, combining VERA-powered performance with the cost efficiencies of deploying Flink in your own VPC.",[321,26998,26999,27002,27005,27008,27011],{},[324,27000,27001],{},"VERA-Powered Performance: Enhanced processing speed and reliability, powered by Ververica's VERA engine.",[324,27003,27004],{},"BYOC Flexibility: Deploy Flink in your own VPC for increased security, optimized performance, and cost savings.",[324,27006,27007],{},"Unified Streaming Integration: Seamlessly integrate Flink with Pulsar and Kafka for a comprehensive streaming solution.",[324,27009,27010],{},"Connector Flexibility: Utilize Kafka Connect or Pulsar IO connectors to expand integration capabilities.",[324,27012,27013],{},"Lakehouse Compatibility: Support for integrating Flink workloads with Lakehouse architecture, driving advanced analytics.",[48,27015,27016],{},[384,27017],{"alt":18,"src":27018},"\u002Fimgs\u002Fblogs\u002F670d4de0b796b0cb1fd5a701_AD_4nXekh1rHGd1ZAc8Sqi9WGMJXGSzFlTwSx2kRcYJyZUNDxc-7de0BFN2iPsGNWxtYRQx9QZ-p9IUxS85ebVHDUe78elZQhEMrIXaih0sb6ZjGhShiqYKAehILpVk_G2efCId65l6nmZN9jpaYkUXJ695-ES4a.png",[48,27020,27021],{},"Among the various differentiators outlined, I would like to highlight the BYOC flexibility related to networking configuration. When enterprises integrate a third-party Flink service with StreamNative Cloud, they often encounter complex networking setups and incur additional egress costs as data transfers between Flink and StreamNative clusters. However, with StreamNative’s native Flink service, users benefit from out-of-the-box Flink functionality within the same VPC as the StreamNative Cloud cluster. This eliminates the need for complex networking configurations and ensures that no egress costs are incurred when data flows between StreamNative clusters and Flink deployments, resulting in a seamless, cost-efficient deployment experience.",[40,27023,27025],{"id":27024},"use-case-deep-dive-e-commerce-real-time-analytics-using-streamnative-cloud","Use Case Deep Dive - E-Commerce Real-Time Analytics Using StreamNative Cloud",[48,27027,27028],{},"To gain a deeper understanding of the stream processing capabilities in StreamNative Cloud through the managed Apache Flink service, let's explore a specific use case. This will demonstrate how developers can leverage the combined power of URSA and VERA to deploy a data platform that is both faster and more cost-efficient than a self-managed solution built on Kafka and Flink.",[48,27030,27031],{},"In this scenario, we’ll explore how an e-commerce platform leverages StreamNative’s URSA (for Kafka and Pulsar-based data streaming) and StreamNative’s fully managed Flink service (powered by Ververica’s VERA) to build a comprehensive real-time analytics solution. We’ll highlight where each solution fits into the architecture and the unique value propositions they offer.",[48,27033,27034],{},[384,27035],{"alt":18,"src":27036},"\u002Fimgs\u002Fblogs\u002F670d4de1b796b0cb1fd5a712_AD_4nXdDqxauCj6TmLFbta0ktS-NE-oYiWHoXYeOeAyDI0i60WosQlTAVL57J6b_f8qIIQCHBmWPzHQJ-fFn7ISuMXTqMcjdRJj7Q7n_87uy76dpQjCtecu2Ffzr0m5PoPg-AftUsbWsn-8Jyo7CLIr8qcalXh1F.png",[48,27038,3931],{},[48,27040,27041],{},"Lets go left to right, as shown in the figure above.",[32,27043,27045],{"id":27044},"_1-data-ingestion-from-e-commerce-platform","1. Data Ingestion from E-Commerce Platform",[3933,27047,27049],{"id":27048},"source","Source",[48,27051,27052],{},"The e-commerce platform generates a variety of real-time events:",[321,27054,27055,27058,27061],{},[324,27056,27057],{},"User actions: Product views, clicks, and searches.",[324,27059,27060],{},"Transactions: Purchases, cart additions, and payment events.",[324,27062,27063],{},"Inventory updates: Stock changes due to orders, restocking, or promotions.",[48,27065,27066],{},"These events must be captured and processed in real time to power analytics, product recommendations, and operational alerts.",[32,27068,27070],{"id":27069},"_2-ingesting-data-into-ursa-kafkapulsar-cluster-via-kafka-connect-or-pulsar-io","2. Ingesting Data into URSA (Kafka\u002FPulsar Cluster) via Kafka Connect or Pulsar IO",[3933,27072,27074],{"id":27073},"streamnatives-ursa","StreamNative’s URSA:",[48,27076,27077],{},"StreamNative’s URSA powers the unified Kafka\u002FPulsar cluster, serving as the backbone for real-time messaging and data streaming. URSA supports both Kafka and Pulsar protocols, providing flexibility and high-throughput messaging.",[3933,27079,27081],{"id":27080},"data-ingestion-with-connectors","Data Ingestion with Connectors:",[321,27083,27084,27087],{},[324,27085,27086],{},"Kafka Connect JDBC Connector: This connector pulls transactional data (e.g., purchases) from the e-commerce platform’s relational database and streams it into URSA’s Kafka-compatible topics (e.g., transactions topics).",[324,27088,27089],{},"Pulsar IO Debezium Connector: This Pulsar IO connector captures change data (CDC) from the inventory database, streaming real-time stock updates into Pulsar topics (e.g., inventory updates).",[3933,27091,27093],{"id":27092},"data-flow-into-ursa","Data Flow into URSA:",[321,27095,27096,27099,27102],{},[324,27097,27098],{},"Kafka-compatible transactions topic: Ingests purchase and payment data.",[324,27100,27101],{},"Pulsar-based inventory-updates topic: Streams real-time changes in stock levels.",[324,27103,27104],{},"User-actions topic: Ingests user behavior events, such as product clicks and views.",[3933,27106,27108],{"id":27107},"value-of-ursa","Value of URSA:",[321,27110,27111,27114,27117],{},[324,27112,27113],{},"Unified Streaming Platform: URSA supports both Kafka and Pulsar protocols, simplifying the data ingestion layer and ensuring seamless integration with external data sources.",[324,27115,27116],{},"High-Throughput Messaging: URSA enables scalable, low-latency messaging, capable of handling millions of events per second with cost savings.",[324,27118,27119],{},"Flexible Data Ingestion: URSA allows connectors like Kafka Connect and Pulsar IO to ingest data from multiple sources, reducing the complexity of custom pipelines.",[32,27121,27123],{"id":27122},"_3-processing-in-fully-managed-flink-service-in-streamnative-cloud-powered-by-ververicas-vera","3. Processing in Fully Managed Flink Service in StreamNative Cloud (Powered by Ververica’s VERA)",[48,27125,27126],{},"After data is ingested into URSA’s Kafka\u002FPulsar cluster, the next step is real-time processing, handled by the fully managed Flink service in StreamNative Cloud(powered by Ververica’s VERA engine). This service delivers advanced stream processing capabilities, ideal for large-scale real-time data analytics.",[3933,27128,27130],{"id":27129},"fully-managed-flink-in-streamnative-cloud-powered-by-vera","Fully Managed Flink in StreamNative Cloud (Powered by VERA):",[48,27132,27133],{},"The fully managed Flink service within StreamNative Cloud integrates directly with URSA to provide stateful stream processing and complex event processing, all powered by VERA.",[3933,27135,27137],{"id":27136},"flink-job-setup","Flink Job Setup",[321,27139,27140,27143,27146],{},[324,27141,27142],{},"Flink Sources: The Flink service subscribes to the Kafka\u002FPulsar topics in URSA:some textUser-actions stream: Tracks user behavior in real time.",[324,27144,27145],{},"Transactions stream: Monitors sales and payment events.",[324,27147,27148],{},"Inventory-updates stream: Manages real-time stock tracking.",[3933,27150,27152],{"id":27151},"flink-processing-powered-by-vera","Flink Processing (Powered by VERA):",[321,27154,27155,27158,27161],{},[324,27156,27157],{},"User Behavior Aggregation: Flink (powered by VERA) processes user behavior data in windowed operations, generating insights like \"Top 10 most viewed products\" or identifying users with high purchase intent in real time.",[324,27159,27160],{},"Real-Time Sales Monitoring: VERA processes the transactions stream to deliver real-time revenue insights, track top-selling products, and calculate sales velocity.",[324,27162,27163],{},"Inventory Analytics: For inventory management, VERA processes the inventory-updates stream to monitor stock levels and trigger alerts when stock falls below critical thresholds.",[3933,27165,27167],{"id":27166},"key-flink-features-in-vera","Key Flink Features in VERA:",[321,27169,27170,27173,27176],{},[324,27171,27172],{},"Stateful Processing: VERA handles stateful computations, such as maintaining running totals for user sessions or sales, which are critical for real-time analytics in e-commerce.",[324,27174,27175],{},"Event Time Processing: VERA enables accurate processing of out-of-order events using event-time semantics and watermarks, ensuring real-time insights are timely and accurate.",[324,27177,27178],{},"Exactly-Once Semantics: VERA guarantees exactly-once processing, ensuring data integrity, especially for financial transactions and inventory updates.",[48,27180,3931],{},[3933,27182,27184],{"id":27183},"value-of-the-managed-flink-service-powered-by-vera","Value of the Managed Flink Service (Powered by VERA):",[321,27186,27187,27190,27193],{},[324,27188,27189],{},"Advanced Stream Processing: The fully managed service provides real-time stream processing for critical e-commerce workflows, powered by VERA’s high-performance, stateful stream processing engine.",[324,27191,27192],{},"Enterprise-Grade Reliability: With VERA, the Flink service in StreamNative Cloud ensures fault tolerance, scalability, and continuous availability for mission-critical applications.",[324,27194,27195],{},"Unified Batch and Stream: VERA can handle both real-time streaming and historical batch data, simplifying the data pipeline and allowing continuous insights.",[32,27197,27199],{"id":27198},"_4-sinking-data-into-destination-with-streamnative-clouds-lakehouse-storage-support-and-connectors","4. Sinking Data into Destination with StreamNative Cloud’s Lakehouse Storage Support and Connectors",[48,27201,27202],{},"VERA processes and enriches the data it is stored for, enabling both real-time insights and long-term analysis.",[3933,27204,27206],{"id":27205},"real-time-analytics-database-olap","Real-Time Analytics Database (OLAP):",[321,27208,27209],{},[324,27210,27211],{},"Aggregated metrics (e.g., top-selling products and revenue trends) are written to a real-time OLAP database (e.g., ClickHouse or Druid) for instant querying and business dashboards.",[3933,27213,27215],{"id":27214},"streamnative-clouds-lakehouse-storage-with-apache-iceberg","StreamNative Cloud’s Lakehouse Storage with Apache Iceberg:",[321,27217,27218,27221,27224],{},[324,27219,27220],{},"Lakehouse Offloading: StreamNative’s Lakehouse Storage support allows VERA to offload processed data to a Lakehouse architecture in Apache Iceberg format, which enables:some textUnified Real-Time and Historical Data: Apache Iceberg integrates real-time and batch data for a comprehensive view.",[324,27222,27223],{},"Optimized Data Storage: Iceberg provides efficient data storage with partitioning, snapshotting, and schema evolution.",[324,27225,27226],{},"Data Retention for Advanced Analytics: Offloaded data can be queried by engines like Trino or Apache Spark for deeper insights, machine learning, or long-term trend analysis.",[3933,27228,27230],{"id":27229},"alerts-and-notifications","Alerts and Notifications:",[321,27232,27233],{},[324,27234,27235],{},"Operational Alerts: Flink (VERA) generates real-time alerts (e.g., low stock or transaction spikes) and sends them back to URSA’s Kafka\u002FPulsar topics for immediate consumption by alert systems (e.g., SMS, email).",[48,27237,3931],{},[32,27239,27241],{"id":27240},"_5-real-time-analytics-consumed-by-end-users","5. Real-Time Analytics Consumed by End Users",[48,27243,27244],{},"Once the data is processed and stored, it is ready to be consumed by various stakeholders for real-time insights.",[3933,27246,27248],{"id":27247},"dashboards-and-visualizations","Dashboards and Visualizations:",[321,27250,27251,27254,27257],{},[324,27252,27253],{},"Real-Time Dashboards: Business teams can access interactive dashboards powered by Superset or Grafana, which visualize key metrics such as:some textMost viewed products and top user actions.",[324,27255,27256],{},"Real-time sales and revenue trends.",[324,27258,27259],{},"Inventory levels and low-stock alerts.",[3933,27261,27263],{"id":27262},"personalized-customer-experiences","Personalized Customer Experiences:",[321,27265,27266],{},[324,27267,27268],{},"Dynamic Web Pages: VERA's real-time analytics can be used to update the website dynamically, offering personalized product recommendations and promotions based on real-time user behavior.",[3933,27270,27272],{"id":27271},"operational-alerts","Operational Alerts:",[321,27274,27275],{},[324,27276,27277],{},"Efficient Operations: Alerts generated by VERA ensure that teams are notified immediately about critical events, such as inventory shortages or sales surges, allowing for swift operational responses.",[48,27279,3931],{},[40,27281,27283],{"id":27282},"summary-of-ursa-and-vera-in-the-e-commerce-use-case","Summary of URSA and VERA in the E-Commerce Use Case:",[321,27285,27286,27289],{},[324,27287,27288],{},"URSA (Kafka\u002FPulsar):URSA powers the real-time data ingestion and messaging, supporting both Kafka and Pulsar protocols for scalable and flexible data flow. It handles high-throughput streams from user actions, transactions, and inventory updates.",[324,27290,27291],{},"VERA:VERA powers the fully managed Flink service within StreamNative Cloud, delivering stateful and advanced stream processing capabilities. It ensures real-time insights, exactly-once processing, and seamless integration with both streaming and batch use cases.",[48,27293,27294],{},"Together, StreamNative’s URSA and Ververica’s VERA offer a powerful, end-to-end solution for processing high-volume data streams and delivering real-time analytics. This ensures the e-commerce platform can efficiently scale and respond to evolving business needs.",[40,27296,2125],{"id":2122},[48,27298,27299],{},"StreamNative’s integration of URSA for Data Streaming and VERA for Stream Processing offers enterprises a powerful, fully managed stream processing solution within StreamNative Cloud. By bringing processing closer to data—deploying both engines within the same VPC—this solution minimizes data movement, reduces latency, and optimizes costs. The managed Flink service, powered by Ververica’s VERA, enables organizations to transition seamlessly from batch to real-time stream processing, unlocking advanced analytics, enhanced performance, and operational efficiency. With unified support for Kafka and Pulsar, as well as Lakehouse integration, StreamNative Cloud provides enterprises with the tools needed to build scalable, real-time applications that drive business value.",[48,27301,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":27303},[27304,27305,27306,27307,27308,27315,27316],{"id":26951,"depth":19,"text":26952},{"id":26967,"depth":19,"text":26968},{"id":26985,"depth":19,"text":26986},{"id":26992,"depth":19,"text":26993},{"id":27024,"depth":19,"text":27025,"children":27309},[27310,27311,27312,27313,27314],{"id":27044,"depth":279,"text":27045},{"id":27069,"depth":279,"text":27070},{"id":27122,"depth":279,"text":27123},{"id":27198,"depth":279,"text":27199},{"id":27240,"depth":279,"text":27241},{"id":27282,"depth":19,"text":27283},{"id":2122,"depth":19,"text":2125},"2024-10-14","Read this blog post to learn more about how StreamNative is providing full end to end services for data streaming powered by Ververica","\u002Fimgs\u002Fblogs\u002F670d8a114dda5eb6ac94c659_Managed-Apache-Flink_BlogPost-1.png",{},"\u002Fblog\u002Fintroducing-managed-apache-flink-in-streamnative-cloud",{"title":26907,"description":27318},"blog\u002Fintroducing-managed-apache-flink-in-streamnative-cloud",[3550,8057,10322,303],"4cxHbRafUcXYDRfFHJ7x3SdjuNCXm7IQakXOQ1cpG9k",{"id":27327,"title":27328,"authors":27329,"body":27330,"category":290,"createdAt":290,"date":27475,"description":27476,"extension":8,"featured":294,"image":27477,"isDraft":294,"link":290,"meta":27478,"navigation":7,"order":296,"path":27479,"readingTime":290,"relatedResources":290,"seo":27480,"stem":27481,"tags":27482,"__hash__":27483},"blogs\u002Fblog\u002Fdata-streaming-summit-2024-the-future-of-streaming-and-ai.md","Data Streaming Summit 2024: The Future of Streaming and AI",[22998],{"type":15,"value":27331,"toc":27468},[27332,27339,27343,27346,27357,27360,27364,27367,27372,27386,27389,27393,27402,27416,27420,27423,27428,27431,27445,27448,27452,27455,27458,27466],[48,27333,27334,27335,27338],{},"In just 20 days, the ",[55,27336,24208],{"href":5372,"rel":27337},[264]," will kick off, and we couldn’t be more excited! This event marks a significant evolution in our journey, as we transition from the Pulsar Summit into something bigger—Data Streaming Summit. Why the change? Because the challenges and innovations in the data streaming world go beyond any single technology. Whether it’s Apache Pulsar, Kafka, or other messaging or streaming technologies, the ecosystem is expanding, and our summit is designed to reflect that growth.",[40,27340,27342],{"id":27341},"what-we-heard-from-you","What We Heard from You",[48,27344,27345],{},"Earlier this year, we had insightful conversations with customers and community members, where they shared their experiences with data streaming workloads. At StreamNative, while many of our customers use Apache Pulsar, we know that Kafka clusters and other messaging or streaming technologies workloads also play a major role in their ecosystems. Despite using different technologies, the pain points were strikingly similar across the board.",[1666,27347,27348,27351,27354],{},[324,27349,27350],{},"Scaling data streaming workloads: One of the biggest challenges is scaling data streaming environments, particularly with Kafka, where repartitioning requires a massive effort. For traditional message queue users, many of them struggle with the lack of scalable solutions.",[324,27352,27353],{},"Integration with batch processing platforms: There’s a growing demand to seamlessly integrate data streaming with batch processing tools like S3, Databricks, Snowflake, and BigQuery.",[324,27355,27356],{},"Generative AI: Across the board, everyone is looking to harness the power of generative AI to make their systems smarter and more efficient.",[48,27358,27359],{},"These discussions highlighted the need for a broader conversation within the data streaming community, which is why we’ve expanded the summit’s scope this year.",[40,27361,27363],{"id":27362},"announcing-an-exciting-lineup-of-speakers","Announcing an Exciting Lineup of Speakers",[48,27365,27366],{},"We are thrilled to bring together industry leaders and experts who are shaping the future of data streaming.",[48,27368,27369],{},[384,27370],{"alt":18,"src":27371},"\u002Fimgs\u002Fblogs\u002F670708a531b476d93b19db78_AD_4nXfpdC8DRTBW-zk2RqLpeW2p8ovYyzO44mPTtwtL_ulW2EN2-hoN7mnM9jFEhxM3vBEycxJRjtIXu3t51CMoj6h-E5nhJG3Km7dHyfSMH30odToaK5edsWscpHZ4HN5H-vpK5sC-r_yVCIKXOWaosXULxqiX.png",[321,27373,27374,27377,27380,27383],{},[324,27375,27376],{},"Hugo Smitter, Principal Platform Architect at FICO, will take the stage to talk about how FICO is transitioning its platform engineering team into API-first services. This shift is critical to powering all of FICO’s applications, ensuring scalability and future-proofing their infrastructure.",[324,27378,27379],{},"Vahid Hashemian, Staff Software Engineer at Pinterest, who is Apache Kafka Committer and PMC member will talk about innovation he led at Pinterest to created Tiered Storage for Kafka.",[324,27381,27382],{},"Hao Sun and Si Lao Software Engineer at Uber, and Matteo Merli, CTO at StreamNative will talk about Kafka and Pulsar replication in a back to back session.",[324,27384,27385],{},"Anup Ghatage and Venkateswara Rao Jujjuri at Salesforce will share their hands-on experience scaling BookKeeper from terabytes to petabytes.",[48,27387,27388],{},"You’ll also hear from Sijie Guo and Matteo Merli, the original creators of Apache Pulsar, who will give a sneak peek into the upcoming Pulsar 4.0 release as well as the StreamNative’s core engine URSA’s evolution. They’ll discuss how the data streaming landscape is evolving and what new features and capabilities are on the horizon.",[40,27390,27392],{"id":27391},"breakout-sessions-to-watch","Breakout Sessions to Watch",[48,27394,27395,27396,27401],{},"We’ve organized the ",[55,27397,27400],{"href":27398,"rel":27399},"https:\u002F\u002Fdatastreaming-summit.org\u002Fevent\u002Fnorth-america-2024\u002Fschedule",[264],"breakout sessions"," into three major tracks: Technology Deep Dive, Ecosystem and Use Cases, and AI. Here are some sessions you absolutely won’t want to miss:",[1666,27403,27404,27407,27410,27413],{},[324,27405,27406],{},"Powering Billion-Scale Vector Search at Milvus with Apache Pulsar by Zilliz: Discover how Apache Pulsar is enabling vector search at a massive scale for AI-driven workloads.",[324,27408,27409],{},"Scaling Kafka Replication at Uber’s Monumental Scale by Uber: Learn how Uber handles replication challenges as they scale Kafka to support their global operations.",[324,27411,27412],{},"Making Kafka Connectors Dance with Apache Pulsar by StreamNative and Google Cloud: This session will explore how Pulsar is enhancing Kafka connectors for seamless data flow and integration.",[324,27414,27415],{},"Pinterest Tiered Storage for Apache Kafka by Pinterest: Dive into how Pinterest has optimized their Kafka storage for efficiency and scale.",[40,27417,27419],{"id":27418},"rich-data-streaming-ecosystem-sponsors","Rich Data Streaming Ecosystem sponsors",[48,27421,27422],{},"We’re proud to announce that the Data Streaming Summit 2024 is supported by a diverse range of industry-leading sponsors. These companies are at the forefront of innovation in the data streaming, cloud infrastructure, and AI ecosystems, helping to drive the future of scalable, high-performance streaming technologies.",[48,27424,27425],{},[384,27426],{"alt":18,"src":27427},"\u002Fimgs\u002Fblogs\u002F670708a5b85473353526aa78_AD_4nXc5CupR4136O0ymOJzBEDhfqI90vZ8xTALJJWmcKXPk1wNjb5ZhdmbCPHNaIEnqK392q8yBggWUfLgFlcQOSnC2XpSe4PIkpAydlrMhsjl7S7lIqw1MIJxZ7NrHoa91cimGZQLvIHvBEgfPWvji9XSpqsfF.png",[48,27429,27430],{},"Our sponsors include:",[321,27432,27433,27436,27439,27442],{},[324,27434,27435],{},"Google Cloud: Bringing world-class cloud infrastructure and services to empower seamless data flow and AI-driven analytics.",[324,27437,27438],{},"Snowflake: A leader in cloud-based data warehousing, enabling secure and scalable data integration with streaming environments.",[324,27440,27441],{},"PingCAP: Innovators behind TiDB, a distributed SQL database, helping enterprises handle massive streaming and real-time analytics workloads.",[324,27443,27444],{},"Ververica: The creators of Apache Flink, providing powerful stream processing tools that scale across a wide range of use cases.",[48,27446,27447],{},"These partners are not just sponsors—they are collaborators in advancing the future of data streaming technologies. Their contributions ensure that the summit is a comprehensive, high-value experience for all attendees. Be sure to check out their booths and learn more about their innovations during the summit!",[40,27449,27451],{"id":27450},"a-new-era-for-data-streaming","A New Era for Data Streaming",[48,27453,27454],{},"With speakers from across the data streaming ecosystem, the Data Streaming Summit 2024 is the place to be if you want to stay ahead of the curve. This event is not just about Apache Pulsar; it’s about the entire data streaming landscape and how we can address common challenges—scaling, integration, and AI. We’re bringing together a community of developers, engineers, and data enthusiasts to share knowledge, experiences, and visions for the future.",[48,27456,27457],{},"If you’re working with data streaming technologies, this summit will offer invaluable insights into where the industry is heading, the tools that will lead the way, and how generative AI is shaping the future of streaming workloads.",[48,27459,27460,27461,27465],{},"So mark your calendars—October 28-29, 2024—and ",[55,27462,27464],{"href":26378,"rel":27463},[264],"get ready"," for a deep dive into the future of data streaming! We can’t wait to see you there.",[48,27467,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":27469},[27470,27471,27472,27473,27474],{"id":27341,"depth":19,"text":27342},{"id":27362,"depth":19,"text":27363},{"id":27391,"depth":19,"text":27392},{"id":27418,"depth":19,"text":27419},{"id":27450,"depth":19,"text":27451},"2024-10-09","What is happening at Data Streaming Summit 2024? Read this blog post to learn more!","\u002Fimgs\u002Fblogs\u002F66df59d537f080a346f15d1e_Data-Streaming-Summit-Color-Scheme.png",{},"\u002Fblog\u002Fdata-streaming-summit-2024-the-future-of-streaming-and-ai",{"title":27328,"description":27476},"blog\u002Fdata-streaming-summit-2024-the-future-of-streaming-and-ai",[799,821,5376],"jIXkCwcXYf4p6wwmoQVq9dLOukGz4C1YYWhA3EdXzMI",{"id":27485,"title":27486,"authors":27487,"body":27488,"category":3550,"createdAt":290,"date":27654,"description":27655,"extension":8,"featured":294,"image":27656,"isDraft":294,"link":290,"meta":27657,"navigation":7,"order":296,"path":18898,"readingTime":17934,"relatedResources":290,"seo":27658,"stem":27659,"tags":27660,"__hash__":27661},"blogs\u002Fblog\u002Fintroducing-streaming-augmented-lakehouse-sal-for-the-data-foundation-of-real-time-gen-ai.md","Introducing Streaming-Augmented Lakehouse (SAL) for the Data Foundation of Real-Time Gen AI",[806],{"type":15,"value":27489,"toc":27646},[27490,27493,27496,27498,27502,27506,27509,27512,27516,27519,27522,27526,27529,27549,27553,27556,27560,27563,27566,27570,27578,27581,27585,27588,27591,27605,27609,27612,27615,27619,27636,27640,27643],[48,27491,27492],{},"Two weeks ago, I had the pleasure of attending Current 2024, a data-streaming conference hosted by Confluent. It was a valuable opportunity to connect with industry peers. While the event may not have featured product announcements from Confluent as notable as those at Kafka Summit London in March, it still offered insightful discussions and networking opportunities. The event focused heavily on the Shift-Left architecture, with Jay Kreps, CEO of Confluent, stating that a data streaming platform is the natural foundation for AI and calling out lakehouse as unsuitable.",[48,27494,27495],{},"While I agree that Shift-Left is an important data strategy for moving toward real-time streaming architectures and we respect Jay’s perspective on Lakehouses, we have a different viewpoint on it. There are additional factors to consider that provide a more comprehensive understanding of the situation. Data streaming is indeed critical for real-time generative AI, but the true data foundation for real-time Gen AI applications must combine data streaming and lakehouses. This synergy is what I call Streaming-Augmented Lakehouse (SAL), similar to Retrieval-Augmented Generation (RAG) in AI. Just as RAG enhances large language models (LLMs), SAL augments traditional lakehouses with real-time data streaming capabilities.",[48,27497,20198],{},[40,27499,27501],{"id":27500},"the-shift-left-principle","The Shift-Left Principle",[3933,27503,27505],{"id":27504},"_1-what-is-shift-left","1. What is Shift-Left?",[48,27507,27508],{},"Shift-Left is a modern data strategy that accelerates data processing by shifting from batch processing to real-time streaming architectures. Traditional data pipelines in lakehouses are batch-oriented, where data is ingested in bulk at scheduled intervals, transformed, and stored for downstream consumption. This approach works well for historical data analytics, machine learning model training, and big data processing, but it doesn’t meet the needs of applications requiring real-time insights or low-latency responses.",[48,27510,27511],{},"Shift-Left focuses on continuously processing data as it arrives, enabling businesses to build real-time data products such as recommendation engines, fraud detection systems, and dynamic AI-driven applications that respond to user interactions and changes in real-time.",[3933,27513,27515],{"id":27514},"_2-the-problem-with-batch-oriented-lakehouses-in-a-shift-left-world","2. The Problem with Batch-Oriented Lakehouses in a Shift-Left World",[48,27517,27518],{},"Lakehouses excel at managing large datasets and AI assets, with strengths in efficient querying, long-term storage, and batch processing. However, they struggle to meet the demands of real-time applications that require immediate processing and rapid decision-making. The key limitation is data latency (not query latency; in fact, many lakehouse systems excel at reduced query latency). Batch processing often occurs hours or days after data is generated, which is unacceptable for real-time Gen AI applications, where the freshness of data directly impacts decision-making quality. This becomes especially critical as Gen AI applications evolve from conversational chatbots to actionable agents.",[48,27520,27521],{},"This latency becomes a bottleneck in a Shift-Left architecture, and that’s where data streaming plays a crucial role. Lakehouses must adopt record-level ingestion to make them more suitable for real-time Gen AI applications.",[40,27523,27525],{"id":27524},"you-cant-only-shift-left","You Can’t Only Shift Left",[48,27527,27528],{},"While Shift-Left is undoubtedly a valuable strategy, it’s important to consider it as part of a broader data strategy rather than the only valid solution. A balanced approach that incorporates both Shift-Left strategy and Lakehouses together often yields the most comprehensive results. Here are some factors that prevent the full implementation of shift-left practices:",[1666,27530,27531,27534,27537],{},[324,27532,27533],{},"Cost Efficiency: Data streaming platforms are not as cost-effective as lakehouses for batch processing and managing historical data. Most of the world's data (90%) is processed in batches, and replacing batch workloads with streaming jobs doesn’t make economic sense. While systems like StreamNative’s Ursa Engine and Confluent’s Tableflow try to make streaming platforms cost-effective by persisting historical data in lakehouse formats, the core of this approach is still the lakehouse. Streaming serves as an augmentation layer rather than a replacement.",[324,27535,27536],{},"Multi-Hop Pipelines: Streaming platforms often require multi-stage pipelines, which can become complex, introducing challenges with intermediate result persistence, correctness, and robustness. The idea that lakehouses require multi-hops while streaming doesn’t is misleading—streaming jobs with too many stages can be more complex than chaining multiple batch jobs.",[324,27538,27539,27540,27545,27546,190],{},"Not All Jobs Require Low Latency: While technologies like Flink and RisingWave deliver low-latency computation, not all jobs require real-time processing. Systems like Spark are still better suited for many use cases. Especially with ",[55,27541,27544],{"href":27542,"rel":27543},"https:\u002F\u002Fwww.databricks.com\u002Fblog\u002F2022\u002F06\u002F28\u002Fproject-lightspeed-faster-and-simpler-stream-processing-with-apache-spark.html",[264],"Project Lightspeed",", Spark is now equipped for real-time data streaming as well. There is no one-size-fits-all solution, and businesses need to ",[55,27547,27548],{"href":18969},"strike a balance between cost and performance (latency)",[40,27550,27552],{"id":27551},"sal-augmenting-lakehouses-with-data-streaming","SAL: Augmenting Lakehouses with Data Streaming",[48,27554,27555],{},"Instead of solely focusing on Shift-Left, let’s introduce Streaming-Augmented Lakehouse (SAL). SAL combines the strengths of data lakehouses and real-time data streaming, addressing the shortcomings of traditional batch-oriented architectures and avoiding completely shifting-left towards a streaming-only architecture. It builds on the concept of Stream-Table Duality, treating a table as a stream and a stream as a table. At its core:",[3933,27557,27559],{"id":27558},"_1-data-lakehouses-the-foundation-for-managing-data-assets","1. Data Lakehouses: The Foundation for Managing Data Assets",[48,27561,27562],{},"Data lakehouses provide a strong foundation for managing both structured and unstructured data, offering both scalable storage and transactional capabilities. In Generative AI, lakehouses are crucial for storing and managing large datasets used for training, validation, and tuning models.",[48,27564,27565],{},"For example, training large-scale machine learning models like LLMs requires extensive datasets. Lakehouses ensure these datasets are organized, queryable, and accessible, while their transactional capabilities allow for accurate versioning and governance, ensuring that models can be retrained against historical data.",[3933,27567,27569],{"id":27568},"_2-data-streaming-the-real-time-layer-augmenting-lakehouses","2. Data Streaming: The Real-Time Layer Augmenting Lakehouses",[48,27571,27572,27573,27577],{},"While lakehouses manage historical data, data streaming provides the real-time layer needed to continuously feed up-to-the-second information into AI systems. As I discussed in my previous blog post, \"",[55,27574,27576],{"href":27575},"\u002Fblog\u002Fdata-streaming-for-generative-ai","Data Streaming for Generative AI","\", streaming enables AI models to adapt dynamically based on the latest inputs.",[48,27579,27580],{},"Streaming ensures that lakehouses are continuously updated with new data points, enabling real-time AI applications to respond to current events, user interactions, or sensor readings, making AI systems reactive and proactive.",[40,27582,27584],{"id":27583},"why-sal-is-different-from-lambda-architecture","Why SAL is Different from Lambda Architecture",[48,27586,27587],{},"You might wonder whether SAL resembles Lambda Architecture. While they share similarities, the core difference lies in how data is managed. Lambda Architecture uses two separate systems (streaming and batch), leading to challenges like data inconsistency, dual storage costs, and complex governance.",[48,27589,27590],{},"SAL, on the other hand, stores one copy of data, presenting it as either a stream or a table depending on the use case. It shifts ingestion & computation left to achieve low latency while ensuring data quality and governance. SAL emphasizes storing one copy of data and allowing it to be consumed via multiple modalities (stream or table), protocols (Kafka or Pulsar), and semantics (competing queues vs. sequential streams) tailored to specific business needs. The benefits of SAL include:",[1666,27592,27593,27596,27599,27602],{},[324,27594,27595],{},"Cost and Time Efficiency: You no longer need to move data between systems, saving both time and money. For example, users can access their data as tables and seamlessly integrate it with Athena, Databricks, Snowflake, or Redshift without relocating it.",[324,27597,27598],{},"Data Consistency: By eliminating the need for multiple copies of data, SAL reduces the occurrence of similar-yet-different datasets, leading to fewer data pipelines and simpler data management.",[324,27600,27601],{},"Bring Your Own Compute: With SAL, you can choose the best processing engine for each task—using Flink for one job and DuckDB for another. Since data is abstracted from the processing engines, you're not locked into any specific technology because of decisions made years ago.",[324,27603,27604],{},"Governance and Security: SAL ensures governance through data catalogs and centralized access control. This allows for fine-grained control over sensitive data, ensuring that private and financial information remains secure.",[40,27606,27608],{"id":27607},"ursa-engine-augmenting-lakehouses-with-data-streaming","Ursa Engine: Augmenting Lakehouses with Data Streaming",[48,27610,27611],{},"Many streaming platforms have already started integrating with lakehouses, including StreamNative’s Ursa Engine and Confluent’s Tableflow. However, not all integrations are created equal. Confluent’s Tableflow follows a Lambda-like architecture, storing two copies of data—one for streaming and one for the lakehouse.",[48,27613,27614],{},"In contrast, Ursa Engine implements SAL with a headless stream storage engine that augments lakehouse tables with streaming updates. At the core of the Ursa Engine is the Ursa Stream storage, a stream storage implementation over object storage that incorporates row-based WAL files for fast appends and columnar Parquet files for efficient scans and queries. Data streamed into Ursa is stored in Ursa Stream, compacted into Parquet files, and organized as lakehouse tables—eliminating the need for separate copies of the data. Changes made to the lakehouse tables can be indexed as streams and consumed via either Kafka or Pulsar protocols.",[48,27616,27617],{},[384,27618],{"alt":18,"src":26356},[48,27620,27621,27622,27625,27626,27630,27631,190],{},"By building multiple modalities over the same data, Ursa enables different semantics. You can process the data as continuous streams using a stream processing engine like Flink or ",[55,27623,512],{"href":520,"rel":27624},[264],", or as cost-effective tables with a batch query engine like Databricks or Trino. Ursa’s design allows for seamless integration between real-time streams and historical data, empowering AI models to operate in a Shift-Left manner. For more details about Ursa Engine, we ",[55,27627,27629],{"href":26378,"rel":27628},[264],"invite you"," to attend ",[55,27632,27635],{"href":27633,"rel":27634},"https:\u002F\u002Fdatastreaming-summit.org\u002Fevent\u002Fnorth-america-2024",[264],"the upcoming Data Streaming Summit at Grand Hyatt SFO on October 28-29, 2024",[40,27637,27639],{"id":27638},"sal-the-shift-left-foundation-for-real-time-gen-ai","SAL: The Shift-Left Foundation for Real-Time Gen AI",[48,27641,27642],{},"In the age of AI, businesses need a new data foundation that combines the real-time power of streaming with the robustness of lakehouses. Streaming-Augmented Lakehouse (SAL) is that foundation. It allows enterprises to build AI systems continuously fed with real-time data while also managing the vast historical datasets needed for training and improving models.",[48,27644,27645],{},"By embracing SAL, organizations can integrate real-time insights into their lakehouse architectures, ensuring they act on data as it arrives while maintaining governance, scalability, and analytical depth. SAL represents the future of the data foundation for real-time Gen AI, enabling businesses to continuously evolve and respond to real-time changes.",{"title":18,"searchDepth":19,"depth":19,"links":27647},[27648,27649,27650,27651,27652,27653],{"id":27500,"depth":19,"text":27501},{"id":27524,"depth":19,"text":27525},{"id":27551,"depth":19,"text":27552},{"id":27583,"depth":19,"text":27584},{"id":27607,"depth":19,"text":27608},{"id":27638,"depth":19,"text":27639},"2024-10-02","Read this thought-leadership blog post to learn how Streaming Augmented Lakehouse can be the future of data foundation for real-time gen AI","\u002Fimgs\u002Fblogs\u002F66fdd3c5636eecd9dc6434b2_Screenshot-2024-10-02-at-1.38.14-PM.png",{},{"title":27486,"description":27655},"blog\u002Fintroducing-streaming-augmented-lakehouse-sal-for-the-data-foundation-of-real-time-gen-ai",[10054,800],"-Dayey5Xx3fhYRYiAtLxwxpkFGM5o-mgSCB_MsKlC0M",{"id":27663,"title":27664,"authors":27665,"body":27666,"category":821,"createdAt":290,"date":27839,"description":27840,"extension":8,"featured":294,"image":27841,"isDraft":294,"link":290,"meta":27842,"navigation":7,"order":296,"path":27843,"readingTime":22989,"relatedResources":290,"seo":27844,"stem":27845,"tags":27846,"__hash__":27848},"blogs\u002Fblog\u002Fcelebrating-the-6th-anniversary-of-apache-pulsar-as-a-top-level-asf-project-a-journey-of-innovation-and-community.md","Celebrating the 6th Anniversary of Apache Pulsar as a Top-Level ASF Project: A Journey of Innovation and Community",[6785,806],{"type":15,"value":27667,"toc":27833},[27668,27681,27685,27698,27707,27710,27714,27717,27723,27732,27738,27744,27751,27758,27761,27787,27790,27794,27802,27808,27811,27815,27818,27821,27828,27831],[48,27669,27670,27671,27674,27675,27680],{},"About ten years ago, the Pulsar team at Yahoo was tasked with creating a multi-tenant unified messaging and data streaming platform. The technology was developed to power hundreds of real-time business applications within Yahoo! and eventually became ",[55,27672,821],{"href":23526,"rel":27673},[264],". Six years ago, Apache Pulsar graduated to become a ",[55,27676,27679],{"href":27677,"rel":27678},"https:\u002F\u002Fwww.splunk.com\u002Fen_us\u002Fblog\u002Fit\u002Fa-major-step-forward-for-apache-pulsar-new-top-level-apache-project.html",[264],"Top-Level Project (TLP)"," within the Apache Software Foundation (ASF). This milestone marked the beginning of an incredible journey—one filled with technological innovation, community growth, and an ever-evolving vision for democratizing data streaming. Today, we celebrate not just this achievement but also the people, the technology, and the vision that continues to propel Apache Pulsar forward as a fast-growing community in the age of data streaming.",[40,27682,27684],{"id":27683},"pulsars-community-growth-building-together","Pulsar’s Community Growth: Building Together",[48,27686,27687,27688,27692,27693,27697],{},"From its inception, Apache Pulsar was driven by a strong, collaborative community. Over the past six years, this community has blossomed into one of the most vibrant ecosystems in the Apache Software Foundation. (",[55,27689,27691],{"href":27690},"\u002Fblog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark","Pulsar was ranked as one of the top 5 most active projects in the ASF.",") Developers, contributors, and companies from across the globe have united around Pulsar’s ",[55,27694,27696],{"href":27695},"\u002Fblog\u002Fhow-pulsars-architecture-delivers-better-performance-than-kafka","unique storage-and-compute-separation architecture"," and its unparalleled capabilities, such as a unified messaging model, geo-replication, multi-tenancy, and more, driving its adoption across industries ranging from Financial Services to Automotive, Marketing Technologies, E-Commerce & Retail, IoT, and beyond. We have witnessed many mission-critical businesses built around Apache Pulsar, from powering tens of billions of billing requests in one of the largest billing & payment systems to supporting hundreds of millions of players in online games to providing company-wide messaging and streaming platforms for enterprises and unicorns.",[48,27699,27700,27701,27706],{},"The Pulsar community has grown exponentially. What started with a small group of dedicated developers has now ",[55,27702,27705],{"href":27703,"rel":27704},"https:\u002F\u002Fpulsar.apache.org\u002Fblog\u002F2023\u002F02\u002F03\u002Fapache-pulsar-hits-its-600th-contributor\u002F",[264],"expanded to hundreds, even thousands, of contributors",", with Pulsar user groups in major regions around the world. Many Fortune 100 enterprises, unicorns, startups, and developers alike have adopted Pulsar as the real-time messaging and streaming platform for mission-critical, real-time transactional workloads. This growth reflects not only the robustness of the technology but also the commitment of the community to advance the platform and make Pulsar accessible to all.",[48,27708,27709],{},"This community has contributed not only code but also knowledge through meetups, webinars, and events, ensuring that Pulsar is more than just a technology—it’s a movement. The success of Pulsar is a testament to the power of collaboration and open-source innovation.",[40,27711,27713],{"id":27712},"from-kafka-to-pulsar-to-ursa-shaping-the-future-of-data-streaming","From Kafka to Pulsar to Ursa: Shaping the Future of Data Streaming",[48,27715,27716],{},"Over the years, Apache Pulsar has evolved from a scalable messaging system into an open, multi-protocol data streaming platform. It has become the backbone of real-time data streaming architectures, empowering modern enterprises with mission-critical transactional workloads. Pulsar’s success can be attributed to its unique features across several key dimensions:",[48,27718,27719,27720,190],{},"Cloud-Native Architecture: Pulsar’s multi-layered architecture separates compute from storage, enabling unmatched scalability, durability, and flexibility. This design eliminates the need for data rebalancing, ",[55,27721,27722],{"href":21492},"allowing Pulsar to scale up to 1000x faster than other data streaming platforms",[48,27724,27725,27726,27731],{},"Unified Messaging & Data Streaming: Pulsar remains the only system that seamlessly unifies message queuing and data streaming into a single model. It offers ",[55,27727,27730],{"href":27728,"rel":27729},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fconcepts-messaging\u002F#subscriptions",[264],"a flexible subscription model",", allowing developers to store a single copy of data and consume it multiple times in various ways tailored to business needs. This unique queuing capability has empowered enterprises and unicorns to run their most mission-critical transactional workloads. Meanwhile, other platforms like Kafka are still in the early stages of trying to implement similar features, with production use still years away.",[48,27733,27734,27735,190],{},"Multi-Tenancy: Pulsar is the first and only data streaming platform to natively support multi-tenancy from day one. This feature is vital for enterprises and unicorns to reduce the total cost of ownership (TCO) when managing their data streaming infrastructure. Please check out our guide to ",[55,27736,27737],{"href":24363},"Evaluating the Infrastructure Costs of Apache Pulsar and Apache Kafka",[48,27739,27740,27741,190],{},"Oxia: Scalable Metadata & Coordination: One of Pulsar's newest innovations, Oxia, addresses the challenge of metadata scalability in large-scale environments. It provides a robust and scalable metadata and coordination layer, ensuring low-latency, high-performance coordination, and state management, which is essential for complex, distributed systems. See our blog post: ",[55,27742,27743],{"href":21529},"Introducing Oxia: Scalable Metadata and Coordination",[48,27745,27746,27747,190],{},"Geo-Replication: Pulsar’s innovative geo-replication feature, governed by policies, is widely adopted by businesses to meet disaster recovery (DR) requirements. See our blog post: ",[55,27748,27750],{"href":27749},"\u002Fblog\u002Ffailure-is-not-an-option-it-is-a-given","Failover strategies deliver additional resiliency for Apache Pulsar",[48,27752,27753,27754,27757],{},"Tiered Storage: Pulsar pioneered the use of object storage as tiered storage in the data streaming space. Today, tiered storage is a must-have functionality for data streaming platforms, and Pulsar continues to lead the way. With ",[55,27755,27756],{"href":23631},"the introduction of Lakehouse storage"," in the StreamNative Ursa engine, we are taking the story of tiered storage to new heights.",[48,27759,27760],{},"Reflecting on Pulsar’s development over the years, we are proud of how it has shaped the present of data streaming platforms. Many of the concepts and features introduced by Pulsar have been embraced by peer communities and competitors alike, pushing the entire data streaming ecosystem into the mainstream.",[48,27762,27763,27764,27766,27767,27770,27771,27775,27776,27779,27780,27782,27783,27786],{},"However, our journey of innovation is far from over. With the ongoing development of the ",[55,27765,4725],{"href":10389},", we are taking Pulsar’s core architecture to new heights, redefining the standards for what an open data streaming platform should be. Ursa introduces several cutting-edge advancements, including addressing the ",[55,27768,27769],{"href":18969},"New CAP Theorem",", offering ",[55,27772,27774],{"href":27773},"\u002Fdeployment","flexible deployment options"," across public and private clouds, and supporting ",[55,27777,10322],{"href":27778},"\u002Fblog\u002Fempowering-data-sovereignty-with-byoc-taking-control-in-a-cloud-centric-world",", Dedicated, and ",[55,27781,4839],{"href":11196}," deployments. It also enables seamless integration with lakehouses through ",[55,27784,27785],{"href":10453},"table-stream duality"," and supports multiple semantics via a range of protocols, from Pulsar to Kafka, MQTT, and beyond.",[48,27788,27789],{},"We believe that Pulsar—and now Ursa—represent the future of data streaming. They are not just tools; they are the foundation upon which the next generation of data streaming platforms will be built.",[40,27791,27793],{"id":27792},"from-pulsar-summit-to-data-streaming-summit-broadening-horizons","From Pulsar Summit to Data Streaming Summit: Broadening Horizons",[48,27795,27796,27797,27801],{},"Another key milestone in Pulsar’s journey has been the evolution of the Pulsar Summit. What began as a niche gathering for Pulsar enthusiasts has",[55,27798,27800],{"href":27799},"\u002Fblog\u002Fintroducing-data-streaming-summit-2024"," transformed into the Data Streaming Summit",", an upgraded industry event embracing the broader ecosystem of data streaming technologies.",[48,27803,3600,27804,27807],{},[55,27805,5376],{"href":5372,"rel":27806},[264]," isn’t just about Pulsar; it’s a platform for exchanging ideas on the latest trends, architectures, and innovations in data streaming. Our goal with these summits is to foster cross-community collaboration, uniting the best minds in open-source, cloud computing, data engineering, and real-time analytics to push the boundaries of what's possible with streaming data.",[48,27809,27810],{},"This transformation from Pulsar Summit to Data Streaming Summit mirrors our broader mission: to democratize data streaming by creating a platform that is open, scalable, and future-proof.",[40,27812,27814],{"id":27813},"looking-forward-pulsar-40-and-ursa-engine-at-the-data-streaming-summit","Looking Forward: Pulsar 4.0 and Ursa Engine at the Data Streaming Summit",[48,27816,27817],{},"As we look ahead to the next phase of Apache Pulsar, the upcoming release of Pulsar 4.0 promises to be one of the most significant updates yet. This release will introduce key innovations designed to make Pulsar an attractive open data streaming technology in multi-cloud and hybrid environments. From enhanced storage efficiency to improved latency handling, Pulsar 4.0 will continue to evolve for delivering unified messaging and data streaming platform at scale.",[48,27819,27820],{},"The Pulsar community will continue to grow. Much of that future revolves around the Ursa Engine. At the upcoming Data Streaming Summit, we’ll unveil exciting advancements in Ursa that will redefine what’s possible with data streaming and data lakehouses. Ursa will integrate more deeply with machine learning pipelines, stream processing, and generative AI, powering not only data streams but full-fledged real-time processing capabilities.",[48,27822,10386,27823,27827],{},[55,27824,27826],{"href":26378,"rel":27825},[264],"invite everyone to join us at the Data Streaming Summit"," to explore these exciting developments and celebrate the remarkable achievements of the Apache Pulsar community, alongside the broader data streaming ecosystem. The next chapter in the data streaming journey is just beginning, and we look forward to continuing this adventure with all of you.",[48,27829,27830],{},"Here’s to the future of data streaming and to the community that makes it all possible. Thank you for being part of this incredible journey.",[48,27832,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":27834},[27835,27836,27837,27838],{"id":27683,"depth":19,"text":27684},{"id":27712,"depth":19,"text":27713},{"id":27792,"depth":19,"text":27793},{"id":27813,"depth":19,"text":27814},"2024-09-25","Apache Pulsar: fast-growing community in the age of data streaming","\u002Fimgs\u002Fblogs\u002F66f43693b8d7ee4f97c290fa_6thAnniversary_BlogPost-2.png",{},"\u002Fblog\u002Fcelebrating-the-6th-anniversary-of-apache-pulsar-as-a-top-level-asf-project-a-journey-of-innovation-and-community",{"title":27664,"description":27840},"blog\u002Fcelebrating-the-6th-anniversary-of-apache-pulsar-as-a-top-level-asf-project-a-journey-of-innovation-and-community",[302,821,27847,303],"Multi-Tenancy","i4X4ulgvtdAiEA5SkP1o0P7WOFslG05i8k1gxRBgPmA",{"id":27850,"title":27851,"authors":27852,"body":27853,"category":3550,"createdAt":290,"date":28154,"description":28155,"extension":8,"featured":294,"image":28156,"isDraft":294,"link":290,"meta":28157,"navigation":7,"order":296,"path":11363,"readingTime":22989,"relatedResources":290,"seo":28158,"stem":28159,"tags":28160,"__hash__":28161},"blogs\u002Fblog\u002Frevolutionizing-data-connectivity-introducing-streamnatives-universal-connectivity-uniconn-for-seamless-real-time-data-access.md","Revolutionizing Data Connectivity: Introducing StreamNative's Universal Connectivity (UniConn) for Seamless Real-Time Data Access",[311],{"type":15,"value":27854,"toc":28142},[27855,27858,27862,27865,27882,27886,27895,27900,27911,27915,27918,27923,27940,27943,27948,27951,27954,27957,27962,27965,27968,27970,27975,27978,27986,27988,27993,27997,28000,28009,28013,28018,28020,28023,28059,28062,28066,28069,28073,28076,28080,28083,28088,28090,28094,28103,28105,28110,28112,28115,28118,28121,28123,28126,28140],[48,27856,27857],{},"In the new age of AI-driven applications, the ability to leverage data effectively hinges on the capability to stream and access it in real-time. The rapid movement of data necessitates robust connectivity to external applications and data stores, serving as both the source and destination of this data. However, enterprises often face significant connectivity challenges that can impede their ability to fully utilize their data.",[40,27859,27861],{"id":27860},"connectivity-challenges","Connectivity Challenges",[48,27863,27864],{},"Enterprises encounter a myriad of connectivity challenges that can hinder their data integration efforts:",[1666,27866,27867,27870,27873,27876,27879],{},[324,27868,27869],{},"Diverse Systems: Integrating a variety of systems and applications can be complex without standardized connectors, making seamless communication difficult.",[324,27871,27872],{},"High Development Costs: Building and maintaining custom connectors require significant resources and skilled developers, leading to high costs.",[324,27874,27875],{},"Security and Compliance Risks: Ensuring secure data transmission and compliance with regulations is challenging without robust, standardized connectors.",[324,27877,27878],{},"Scalability Issues: Custom connectors may not be optimized for performance, causing scalability issues as data volumes grow.",[324,27880,27881],{},"Operational Challenges: Monitoring, debugging, and managing data flows can be difficult with custom-built connectors, leading to increased downtime and operational inefficiencies.",[40,27883,27885],{"id":27884},"introducing-uniconn-unified-connectivity-by-streamnative","Introducing UniConn: Unified Connectivity by StreamNative",[48,27887,27888,27889,27894],{},"StreamNative is excited to announce the launch of ",[55,27890,27893],{"href":27891,"rel":27892},"https:\u002F\u002Fwww.streamnative.io\u002Funiconn",[264],"UniversalConnectivity (UniConn)"," in Public Preview for StreamNative Cloud. UniConn provides a consistent and declarative experience to connect, process, debug, and monitor data pipelines powered by Kafka Connect or Pulsar IO connectivity frameworks. This innovative framework facilitates seamless data movement in and out of StreamNative Pulsar or Kafka clusters, and allows users to perform lightweight data transformations using their preferred programming languages such as Java, Go, or Python.",[48,27896,27897],{},[384,27898],{"alt":18,"src":27899},"\u002Fimgs\u002Fblogs\u002F66df2cadaf0e613775d95338_AD_4nXfsrSSqnCVZVF0lXBWm-lC6vJFXr1-TAsyKLyHMy7v84GRX8oJSgEYwAwWaIaFvnV_pcPPMCPU20_gL0ZUn5IwRUu0birnNDKyoLiJs0xuZGqjyoTmb3MjWIJKc2tfag0r5y5RhR62qDUzyuR1Lao9sCdQy.png",[321,27901,27902,27905,27908],{},[324,27903,27904],{},"Seamless Pipeline Building: Build robust pipelines with Kafka and Pulsar connectors, ensuring flexibility in technology choice.",[324,27906,27907],{},"Consistent User Experience: Enjoy a unified experience for developing, debugging, and monitoring pipelines with both Kafka and Pulsar.",[324,27909,27910],{},"Flexible Connector Options: Use built-in connectors or bring your own, whether custom-built, open-source, or from partners.",[40,27912,27914],{"id":27913},"introducing-kafka-connect-in-streamnative-cloud","Introducing Kafka Connect In StreamNative Cloud",[48,27916,27917],{},"StreamNative has long supported Pulsar IO, a framework for building connectivity with Apache Pulsar. With the introduction of UniConn, StreamNative now extends support to run Kafka Connect-based connectors within StreamNative Cloud. Kafka Connect, an open-source framework, is designed for developing connectors that link external data stores to Kafka clusters.",[48,27919,27920],{},[384,27921],{"alt":18,"src":27922},"\u002Fimgs\u002Fblogs\u002F66df2cad07eef30bd3fa9dc5_AD_4nXeWCFJbzme7MsBarMz_YiuYktYkYac1QGEDEtv1YGqFVhGRb9NAr1vCyCDsmYERG6sl6ZOZP0ZvCNHKAsCODnM80V-12Oc8kxn6b-5lvUz-5S0DXJUnQWIFtGda97RUm_iM9gpqv7h9l_GLupNXvEIWfqCv.png",[321,27924,27925,27928,27931,27934,27937],{},[324,27926,27927],{},"Scalable Data Integration: Efficiently integrates data between Kafka API-compatible systems and various systems.",[324,27929,27930],{},"Pre-Built Connectors: Wide range of connectors from community, and ISVs.",[324,27932,27933],{},"Distributed and Fault-Tolerant: High availability and automatic failure recovery.",[324,27935,27936],{},"Simplified Data Movement: Abstracts data ingestion and export complexities.",[324,27938,27939],{},"Easy to Manage: Built-in tools for simple deployment and management.",[48,27941,27942],{},"Users can now log in to StreamNative Cloud to access the newly added connectors in the Connector Catalog. These connectors are available under the Kafka Sinks and Kafka Source tabs.",[48,27944,27945],{},[384,27946],{"alt":18,"src":27947},"\u002Fimgs\u002Fblogs\u002F66df2cad4bbf44c30ecb57f0_AD_4nXdwiPVivWD_49xzlzmH1NpKE83frbLRgjXUQ0EDOeRfKsKu8JO_MRg_PKwvQA1wABstvOUl3Q8t5Hz-Hlc7ghGmYHRE1GIIP-j5TvojZFKvuAkTujBdm0NBf9jT-lzDJGWkg4_JPwIfL41YUEPQJpEbzu4.png",[48,27949,27950],{},"Through the StreamNative Cloud Console UI, users can create, debug, and monitor Kafka Connectors seamlessly.",[48,27952,27953],{},"Create Connectors In StreamNative Cloud",[48,27955,27956],{},"Users can quickly build a data pipeline by creating a connector in just seconds. StreamNative Cloud's built-in connector catalog provides a wide selection of connectors, including the newly added Kafka Connectors. Users simply select a connector, enter the required configuration, and deploy the connector with ease.",[48,27958,27959],{},[384,27960],{"alt":18,"src":27961},"\u002Fimgs\u002Fblogs\u002F66df2cad3a0d5d333097b8fd_AD_4nXeTisY5ZqZwWHp4FG2_peJ5szJjGqUvSLlQDlDdR6WBM4D4Ree2AarW7Y5ZVvvBye5Mzj3DIhpFv8jw88qi7P0_GJ_5iEYtVzciF08eDjwRaIvPSLL0NBDUikWKZDV9rDljSQcQXfgqI1iIu6sOyYI8lwB0.png",[48,27963,27964],{},"Debug Connectors In StreamNative Cloud",[48,27966,27967],{},"StreamNative Cloud offers robust debugging capabilities for connectors, ensuring users can troubleshoot issues efficiently. Users have full access to connector logs, which can be viewed directly within the Console UI or routed to a Kafka topic for integration with logging services such as Datadog, Elastic, and others.",[48,27969,3931],{},[48,27971,27972],{},[384,27973],{"alt":18,"src":27974},"\u002Fimgs\u002Fblogs\u002F66df2cad81d31f549af228f3_AD_4nXfKsCjEXzgt8sV4qbcOx7yNCnagFTyNbwvrK_9CLaHhHc_kxOeW1fEHQhv7ii9spEPr3e5UqcUJ1qp7-2cLw2TpxAf7lBhwir18dDmERULBjNVaGT4rkncIfRKN-Cj72pUPywp40FFF0MhzrVMa79r63s_a.png",[48,27976,27977],{},"Monitoring Connectors In StreamNative Cloud",[48,27979,27980,27981,190],{},"Once a connector is operational and data is flowing between the external system and Kafka, users can monitor its performance by viewing connector metrics in the Connector Dashboard or exporting the metrics to observability platforms such as Prometheus or Grafana. StreamNative Cloud offers a comprehensive set of metrics, enabling users to monitor various aspects of connector performance. ",[55,27982,27985],{"href":27983,"rel":27984},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-metrics-api#kafka-connect-metrics",[264],"Learn more about the Kafka Connect metrics supported by StreamNative Cloud here",[48,27987,3931],{},[48,27989,27990],{},[384,27991],{"alt":18,"src":27992},"\u002Fimgs\u002Fblogs\u002F66df2cad073fb3bdafba35a7_AD_4nXfBzdPEuOcL0kBMLrkz0CXS9fcn_SyDOozNLkKYSaZvNo4HQXjKytuy-sI8bzfGqlA40pUTDOAqXTLqPWrfsMB-xAGVONL6tmEWPLoHwjKuOeSQCNA5Q7LAb6i97mMWAl9LvmhtIDI3xiUxhp-8UCfz2xo.png",[40,27994,27996],{"id":27995},"support-for-single-message-transforms-smts-in-streamnative-cloud","Support for Single Message Transforms (SMTs) in StreamNative Cloud",[48,27998,27999],{},"Kafka Connect Single Message Transforms (SMTs) are lightweight transformations applied to individual messages as they pass through Kafka Connect. They allow users to modify, filter, or manipulate messages without writing custom code. SMTs are commonly used to alter message structures, add or remove fields, or apply simple logic such as routing, masking, or format conversions. These transformations help streamline the integration process between data sources and sinks, enhancing data flow between Kafka and external systems.",[48,28001,28002,28003,28008],{},"The newly launched Kafka Connect functionality in StreamNative Cloud fully supports ",[55,28004,28007],{"href":28005,"rel":28006},"https:\u002F\u002Fkafka.apache.org\u002Fdocumentation\u002F#connect_included_transformation",[264],"Single Message Transforms (SMTs)",", allowing users to apply real-time transformations to individual messages within their data pipelines. This feature enables seamless data manipulation and customization without the need for additional coding.",[40,28010,28012],{"id":28011},"new-kafka-connectors-in-streamnatives-built-in-catalog","New Kafka Connectors in StreamNative's Built-In Catalog",[48,28014,28015],{},[384,28016],{"alt":18,"src":28017},"\u002Fimgs\u002Fblogs\u002F66df2cad1d036a49234a7f07_AD_4nXeqA9khT37lIMYJUUwazR3AtRwpueSTBaB_s4od1Fmzq6yI5iDwE3tvWc0YE-o-NVnu7I42K_CUy40ej36YVM6b86d0lGbqO180paTQcmEF-0cDIC7WJn_HUVujkx4iNwoi2DAwgsdGfC_VWE4jikf5WsQ.png",[48,28019,3931],{},[48,28021,28022],{},"StreamNative is thrilled to announce the initial launch of four connectors within its built-in catalog, now available with Kafka Connect support in StreamNative Cloud. These connectors include:",[321,28024,28025,28032,28039,28045,28052],{},[324,28026,28027],{},[55,28028,28031],{"href":28029,"rel":28030},"https:\u002F\u002Fgithub.com\u002Fzilliztech\u002Fkafka-connect-milvus\u002Ftree\u002Fv0.1.3",[264],"Milvus Sink",[324,28033,28034],{},[55,28035,28038],{"href":28036,"rel":28037},"https:\u002F\u002Fgithub.com\u002Fmongodb\u002Fmongo-kafka\u002Ftree\u002Fr1.13.0",[264],"MongoDB Sink",[324,28040,28041],{},[55,28042,28044],{"href":28036,"rel":28043},[264],"MongoDB Source",[324,28046,28047],{},[55,28048,28051],{"href":28049,"rel":28050},"https:\u002F\u002Fgithub.com\u002Ftabular-io\u002Ficeberg-kafka-connect\u002Ftree\u002Fv0.6.19",[264],"Iceberg (OSS)",[324,28053,28054],{},[55,28055,28058],{"href":28056,"rel":28057},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-kafka-connect-yugabyte-cdc-source-v1.9",[264],"Yugabyte CDC Source",[48,28060,28061],{},"These new connectors enable seamless integration and data flow between Kafka and these popular data stores and services, enhancing the overall connectivity experience for users. Overtime StreamNative plans to add more connectors to the UniConn catalog based on the demand.",[40,28063,28065],{"id":28064},"scaling-connectors-without-high-costs","Scaling Connectors without High Costs",[48,28067,28068],{},"UniConn offers a scalable solution for running connectors on demand, without the worry of increased costs associated with higher levels of parallelism. This allows users to scale connectors efficiently and cost-effectively.",[40,28070,28072],{"id":28071},"user-provided-connectors","User-Provided Connectors",[48,28074,28075],{},"UniConn supports the ability to bring your own connectors to StreamNative Cloud, whether they are built in-house, by an open-source community, or a third-party vendor. Users can upload and self-manage these connectors while",[40,28077,28079],{"id":28078},"connector-portfolio-in-streamnative-hub","Connector Portfolio In StreamNative Hub",[48,28081,28082],{},"StreamNative Hub provides more than 50 connectors in StreamNative Hub. You can filter connectors by Kafka Connect or Pulsar IO.",[48,28084,28085],{},[384,28086],{"alt":18,"src":28087},"\u002Fimgs\u002Fblogs\u002F66df2cad35942b5530cd16df_AD_4nXcUje0qlJHz_qvlAyvSb5TZviIXCbnoJHkPsWUxEXV-oYT9OYBeQQIO3eU-nzi1SqFKJ0wuudZTnLztcAR3_QkhUc-5hQzbEIA_c15yGEvc_UiUOmBUYyoIi36TkIRzlLGNVHGdYb0ei4NgZjInysiDiHzB.png",[48,28089,3931],{},[40,28091,28093],{"id":28092},"connector-shared-responsibility-model","Connector Shared Responsibility Model",[48,28095,28096,28097,28102],{},"StreamNative Cloud operates under a ",[55,28098,28101],{"href":28099,"rel":28100},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fconnector-overview#connectors-shared-responsibility",[264],"shared responsibility model for connectors",". Enterprise support for fully managed connectors is provided by StreamNative, while support for user-provided connectors involves a shared responsibility between StreamNative and the user. This model ensures robust support while empowering users with flexibility and control over their connectors.",[48,28104,3931],{},[48,28106,28107],{},[384,28108],{"alt":18,"src":28109},"\u002Fimgs\u002Fblogs\u002F66df2cad2e139686d19cde4f_AD_4nXd6pT8V-hA-AHfloQj2d09_ZTtaLTBZZDVFyAr2ChpRWZ2Y1NSVsMBbEMsyUeqzWuDZoYWNJIiW8-PzY-JfoByJmdhPEYrDTzCwMufVSqQUcS6b6BfTuodBZLqQQaxmRsWyduyCj_uOReNDq3_BuHf3oeI.png",[48,28111,3931],{},[48,28113,28114],{},"Custom Connectors: Customers are responsible for self-managing these connectors. StreamNative does not provide support for custom connectors or any other open-source connectors uploaded by the customers to StreamNative Cloud.",[48,28116,28117],{},"Partner Connectors: These are connectors which are built and supported by StreamNative partners.",[48,28119,28120],{},"Fully Managed Connectors: These are connectors which are built and supported by StreamNative as fully managed connectors in StreamNative Cloud.",[40,28122,2125],{"id":2122},[48,28124,28125],{},"With the launch of UniConn, StreamNative is set to revolutionize data connectivity, offering enterprises a robust, scalable, and cost-effective solution to meet their real-time data integration needs. Explore the possibilities with UniConn and transform your data connectivity experience today.",[48,28127,28128,28129,28133,28134,28139],{},"Visit StreamNative’s website to ",[55,28130,28132],{"href":27891,"rel":28131},[264],"learn more about UniConn"," and explore ",[55,28135,28138],{"href":28136,"rel":28137},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fkafka-connect-overview",[264],"Kafka Connect documentation",". Transform the way you connect, process, and manage your data with the cutting-edge capabilities of StreamNative Cloud and UniConn.",[48,28141,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":28143},[28144,28145,28146,28147,28148,28149,28150,28151,28152,28153],{"id":27860,"depth":19,"text":27861},{"id":27884,"depth":19,"text":27885},{"id":27913,"depth":19,"text":27914},{"id":27995,"depth":19,"text":27996},{"id":28011,"depth":19,"text":28012},{"id":28064,"depth":19,"text":28065},{"id":28071,"depth":19,"text":28072},{"id":28078,"depth":19,"text":28079},{"id":28092,"depth":19,"text":28093},{"id":2122,"depth":19,"text":2125},"2024-09-10","UniConn provides a consistent and declarative experience to connect, process, debug, and monitor data pipelines powered by Kafka Connect or Pulsar IO connectivity frameworks.","\u002Fimgs\u002Fblogs\u002F66df5a0f1b3a6adb917fd577_Universal-Connectivity_BlogPost.png",{},{"title":27851,"description":28155},"blog\u002Frevolutionizing-data-connectivity-introducing-streamnatives-universal-connectivity-uniconn-for-seamless-real-time-data-access",[799,821],"B7j1R8gxJQif00V5h7rWVE_XErLIS8vbJxsJ_02A8sc",{"id":28163,"title":28164,"authors":28165,"body":28166,"category":3550,"createdAt":290,"date":28339,"description":28340,"extension":8,"featured":294,"image":28341,"isDraft":294,"link":290,"meta":28342,"navigation":7,"order":296,"path":11196,"readingTime":22989,"relatedResources":290,"seo":28343,"stem":28344,"tags":28345,"__hash__":28346},"blogs\u002Fblog\u002Fintroducing-streamnative-serverless-instant-start-seamless-scaling-and-effortless-data-streaming.md","Introducing StreamNative Serverless: Instant Start, Seamless Scaling, and Effortless Data Streaming",[806],{"type":15,"value":28167,"toc":28333},[28168,28177,28181,28184,28187,28190,28204,28207,28216,28220,28243,28246,28249,28253,28261,28264,28275,28280,28283,28286,28291,28294,28299,28302,28307,28310,28316,28318,28321,28328,28331],[48,28169,28170,28171,28176],{},"Today’s businesses thrive on data, and real-time insights are crucial for staying competitive in an ever-evolving market. However, adopting data streaming has often been challenging due to the complexity and operational overhead required to size and manage the infrastructure properly. To address these challenges and make data streaming more accessible to organizations of all sizes, we are proud to introduce ",[55,28172,28175],{"href":28173,"rel":28174},"http:\u002F\u002Fstreamnative.io\u002Fdeployment\u002Fserverless",[264],"StreamNative Serverless","! Our new product delivers a Pulsar & Kafka API as a fully managed data streaming service that is instantly available, allowing organizations to get started on their data streaming journey much faster.",[40,28178,28180],{"id":28179},"serverless-data-streaming-instant-effortless-and-powerful","Serverless Data Streaming: Instant, Effortless, and Powerful",[48,28182,28183],{},"With StreamNative Serverless, we're changing how organizations approach data streaming by offering instant provisioning and elastic scaling—all without the need for complex sizing exercises, configuration, or resource management. Built on the robust and reliable Apache Pulsar architecture, our serverless solution is designed to meet the real-time demands of businesses without requiring extensive expertise in data engineering or infrastructure management.",[48,28185,28186],{},"Our goal is simple: empower businesses to focus on driving insights and innovation, rather than wrestling with the complexities of data streaming technology.",[48,28188,28189],{},"There’s no infrastructure to manage. You can get started instantly and stay flexible with:",[321,28191,28192,28195,28198,28201],{},[324,28193,28194],{},"Instant Provisioning: Get started with data streaming in seconds, without the need for complex sizing exercises.",[324,28196,28197],{},"Seamless Scaling: Autoscaling capabilities ensure that your streaming workloads dynamically adjust based on demand, whether you're handling surges in traffic or operating at minimal capacity.",[324,28199,28200],{},"Throughput-based Usage Pricing: Pay only for the traffic you use with our elastic, throughput-based pricing model, ensuring you’re not overpaying for idle capacity.",[324,28202,28203],{},"Simplified User Experience: A revamped user interface that makes getting started and managing data streams intuitive and easy, even for non-experts.",[48,28205,28206],{},"While Serverless removes the complexity of sizing, planning, and managing capacity, it continues to offer the same functionalities as our other product offerings, including Dedicated and BYOC clusters. Specifically, with a Serverless cluster, you can continue to leverage Pulsar’s unique multi-tenancy architecture. This ensures that businesses can run multiple workloads in one serverless environment with enhanced security and flexibility—critical for organizations handling diverse data streams.",[48,28208,28209,28210,28215],{},"Want to learn more about our Serverless offering directly from our team? ",[55,28211,28214],{"href":28212,"rel":28213},"https:\u002F\u002Fhs.streamnative.io\u002Fstreamnative-product-roadmap-webinar-for-q4-2024",[264],"Sign up for our Q4 Roadmap webinar on September 26th",". In the meantime, here’s what you need to know about our exciting new service.",[40,28217,28219],{"id":28218},"universal-connectivity-and-streamnative-partnership-ecosystem","Universal Connectivity and StreamNative Partnership ecosystem",[48,28221,28222,28223,4003,28227,28231,28232,28236,28237,28242],{},"In addition to introducing Serverless clusters to complement StreamNative Cloud’s ",[55,28224,24622],{"href":28225,"rel":28226},"http:\u002F\u002Fstreamnative.io\u002Fdeployment\u002Fdedicated",[264],[55,28228,10322],{"href":28229,"rel":28230},"http:\u002F\u002Fstreamnative.io\u002Fdeployment\u002Fbyoc",[264]," cluster options, we are also excited to announce ",[55,28233,11512],{"href":28234,"rel":28235},"http:\u002F\u002Fstreamnative.io\u002Funiconn",[264]," (aka “Universal Connectivity”), which supports running Kafka Connect connectors natively on StreamNative Cloud. We’re also launching the ",[55,28238,28241],{"href":28239,"rel":28240},"http:\u002F\u002Fstreamnative.io\u002Fpartners",[264],"StreamNative Partnership Program",". With Serverless and UniConn, we can help organizations accelerate opportunities to build innovative solutions across the data streaming ecosystem.",[48,28244,28245],{},"For example, you can create a GenAI application in minutes: Use a Serverless cluster to get a Kafka API managed service up and running in seconds, then instantly launch a MongoDB Kafka Connect connector to capture change events from your system of record. You can also write a Pulsar function to redact sensitive information, chunk, and transform the data, and finally, use a Pinecone Sink connector to write embeddings to Pinecone.",[48,28247,28248],{},"With Serverless, UniConn, and the StreamNative Partnership Network, we aim to deliver a first-class data streaming experience that helps organizations of all sizes grow their businesses with real-time data.",[40,28250,28252],{"id":28251},"getting-started-with-streamnative-serverless-a-quick-tour","Getting Started with StreamNative Serverless: A Quick Tour",[48,28254,28255,28256,28260],{},"It’s never been easier to start your data streaming journey with StreamNative Serverless. We’re offering $200 in free credits to help you explore the platform—no credit card required! Let’s take a quick tour to show you just how simple it is to ",[55,28257,28259],{"href":17075,"rel":28258},[264],"get started ","and begin working with streaming data in just a few seconds.",[48,28262,28263],{},"Creating a Serverless cluster takes just three easy steps:",[1666,28265,28266,28269,28272],{},[324,28267,28268],{},"Choose a Cloud Provider",[324,28270,28271],{},"Select a Region",[324,28273,28274],{},"Click \"Deploy\"",[48,28276,28277],{},[384,28278],{"alt":5878,"src":28279},"\u002Fimgs\u002Fblogs\u002F66df8f014979c56fde8c527f_66df8ea031f9401aca79e9e7_Screenshot-2024-09-09-at-10.59.34-AM_BORDER.png",[48,28281,28282],{},"Within moments, your Serverless cluster will be provisioned and ready to use.",[48,28284,28285],{},"There’s no need for complex sizing calculations. StreamNative Serverless automatically scales based on your needs—no need to worry about the number of brokers or configuring CPUs and memory. When you’re ready to grow beyond Serverless, you can easily transition to a Dedicated cluster for more predictable workloads or BYOC clusters to meet specific data sovereignty requirements.",[48,28287,28288],{},[384,28289],{"alt":5878,"src":28290},"\u002Fimgs\u002Fblogs\u002F66df8f024979c56fde8c528b_66df8ec343c2d990d66ffd54_Screenshot-2024-09-09-at-9.42.46-AM_BORDER.png",[48,28292,28293],{},"Once inside the StreamNative Console, you’ll find quickstart guides for using Pulsar CLI tools, Kafka CLI tools, and examples for Pulsar and Kafka client libraries to get you up and running quickly.",[48,28295,28296],{},[384,28297],{"alt":5878,"src":28298},"\u002Fimgs\u002Fblogs\u002F66df8f024979c56fde8c5296_66df8e6f3662214b7c5f9f71_Screenshot-2024-09-09-at-9.44.45-AM_BORDER.png",[48,28300,28301],{},"While most users interact with the service programmatically, the StreamNative Cloud Console also offers an intuitive interface to manage resources like Tenants, Namespaces, and Topics—perfect for rapid prototyping and testing.",[48,28303,28304],{},[384,28305],{"alt":5878,"src":28306},"\u002Fimgs\u002Fblogs\u002F66df8f024979c56fde8c5292_66df8ef099ea43028fe543bd_Screenshot-2024-09-09-at-10.20.41-AM_BORDER.png",[48,28308,28309],{},"In addition to the native Kafka and Pulsar APIs, you can also launch Kafka Connect connectors or Pulsar IO connectors directly within your Serverless clusters for seamless data integration.",[48,28311,28312,28313,190],{},"For more information on StreamNative Serverless, including pricing, supported features, and detailed instructions, check out our ",[55,28314,7120],{"href":24616,"rel":28315},[264],[40,28317,25961],{"id":25960},[48,28319,28320],{},"With StreamNative Serverless, we’re democratizing data streaming by making it accessible, affordable, and scalable. Whether you’re new to data streaming or looking to optimize your existing workflows, StreamNative Serverless gives you the power and flexibility you need—without the operational headaches.",[48,28322,28323,28324,28327],{},"Visit our ",[55,28325,7120],{"href":19105,"rel":28326},[264]," to learn more about how StreamNative Serverless can transform your data streaming journey.",[48,28329,28330],{},"Join us in breaking down the barriers to data streaming and unlocking the full potential of real-time data for your business.",[48,28332,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":28334},[28335,28336,28337,28338],{"id":28179,"depth":19,"text":28180},{"id":28218,"depth":19,"text":28219},{"id":28251,"depth":19,"text":28252},{"id":25960,"depth":19,"text":25961},"2024-09-09","Experience StreamNative Serverless for instant provisioning, seamless scaling, throughput-based usage pricing and simplified user experience","\u002Fimgs\u002Fblogs\u002F66df5a1f1e964b7302f7acbd_Serverless_BlogPost.png",{},{"title":28164,"description":28340},"blog\u002Fintroducing-streamnative-serverless-instant-start-seamless-scaling-and-effortless-data-streaming",[799,821],"JAk4g5dmcbVRcEdssK_VkD1qqBR9Cwe4rVa4zfhJIeg",{"id":28348,"title":28349,"authors":28350,"body":28351,"category":3550,"createdAt":290,"date":28339,"description":28565,"extension":8,"featured":294,"image":28566,"isDraft":294,"link":290,"meta":28567,"navigation":7,"order":296,"path":28568,"readingTime":22989,"relatedResources":290,"seo":28569,"stem":28570,"tags":28571,"__hash__":28573},"blogs\u002Fblog\u002Fstreamnative-unveils-partner-program-to-expand-data-streaming-ecosystem.md","StreamNative Unveils Partner Program To Expand Data Streaming Ecosystem",[311],{"type":15,"value":28352,"toc":28549},[28353,28361,28365,28369,28372,28376,28379,28383,28386,28390,28393,28397,28400,28411,28415,28418,28422,28425,28436,28440,28443,28447,28450,28461,28465,28468,28470,28475,28477,28484,28488,28491,28511,28515,28523,28528,28542,28544,28547],[48,28354,28355,28356,28360],{},"StreamNative is excited to announce the launch of its ",[55,28357,24200],{"href":28358,"rel":28359},"https:\u002F\u002Fwww.streamnative.io\u002Fpartners",[264],", designed to accelerate innovation, enhance service delivery, and drive mutual growth through collaboration with a wide range of technology and services partners. This program brings together Independent Software Vendors (ISVs),System Integrators (SIs), and Affiliates to expand the reach of Apache Pulsar, Apache Kafka, and other modern data technologies by integrating them seamlessly into diverse environments.",[40,28362,28364],{"id":28363},"streamnative-partner-program-principles","StreamNative Partner Program Principles",[32,28366,28368],{"id":28367},"collaborative-innovation","Collaborative Innovation",[48,28370,28371],{},"At the heart of the StreamNative Partner Program is the principle of collaborative innovation. By working closely with partners, we enable co-creation of solutions leveraging Apache Pulsar, Apache Kafka, and cutting-edge data technologies. These joint efforts accelerate the prototyping process and speed up time-to-market for innovative solutions, ensuring both StreamNative and its partners remain at the forefront of technological advancements. Collaborative innovation fosters stronger partnerships and helps drive long-term success for all involved.",[32,28373,28375],{"id":28374},"mutual-growth","Mutual Growth",[48,28377,28378],{},"The StreamNative Partner Program is built on the foundation of mutual growth. We are committed to helping partners expand their market reach, enhance their service offerings, and grow their revenue. To support this growth, StreamNative provides comprehensive training resources, and access to our ecosystem. By prioritizing mutual growth, we ensure that partners thrive alongside StreamNative, creating a sustainable and symbiotic relationship that benefits all parties.",[40,28380,28382],{"id":28381},"streamnative-partner-programs","StreamNative Partner Programs",[48,28384,28385],{},"The StreamNative Partner Program offers distinct pathways for Technology Partners (ISVs), Services Partners (SIs), and Affiliates allowing each group to maximize the value of their expertise within the StreamNative ecosystem.",[40,28387,28389],{"id":28388},"streamnative-technology-program-for-independent-software-vendors-isvs","StreamNative Technology Program For Independent Software Vendors (ISVs)",[48,28391,28392],{},"Independent Software Vendors (ISVs) play a crucial role in the StreamNative Technology Partner Program by developing technology integrations tailored for specific use cases. These integrations showcase the combined power of their solutions with StreamNative’s cloud-based offerings, delivering significant value to our mutual customers.",[32,28394,28396],{"id":28395},"what-does-streamnative-offer-to-isvs","What does StreamNative offer to ISVs?",[48,28398,28399],{},"StreamNative provides ISVs with an array of benefits designed to accelerate the development of these technology integrations. This includes access to a structured framework for joint go-to-market (GTM) activities, and a collaborative environment to ensure seamless integration and promotion of solutions.",[321,28401,28402,28405,28408],{},[324,28403,28404],{},"Discover: Identify opportunities to integrate your technology with StreamNative’s platform. Work with our team to define use cases and the value your solution will add to customers.",[324,28406,28407],{},"Build: Utilize StreamNative’s resources to create robust, scalable integrations with StreamNative Cloud.",[324,28409,28410],{},"Market: Expand your market reach through joint marketing campaigns, leveraging StreamNative’s network to access new customers.",[32,28412,28414],{"id":28413},"streamnative-services-program-for-system-integrators-si","StreamNative Services Program For System Integrators (SI)",[48,28416,28417],{},"System Integrators (SIs) are vital in supporting customers as they adopt or migrate to StreamNative Cloud. SIs develop vertical-specific solutions based on the StreamNative platform, driving increased customer consumption and adoption of StreamNative’s technologies.",[3933,28419,28421],{"id":28420},"what-does-streamnative-offer-to-sis","What does StreamNative offer to SIs?",[48,28423,28424],{},"StreamNative equips SIs with the necessary tools and training to master its platform. Through structured training programs, SIs gain deep expertise in deploying StreamNative Cloud solutions. Additionally, the program provides a framework for joint GTM activities to promote integrations effectively.",[321,28426,28427,28430,28433],{},[324,28428,28429],{},"Assess: Work with StreamNative to evaluate customer needs and identify opportunities for adoption or migration to StreamNative Cloud.",[324,28431,28432],{},"Implement: Ensure smooth deployment and integration of StreamNative’s solutions within customer infrastructures, following best practices.",[324,28434,28435],{},"Optimize: Continuously improve the performance and efficiency of the deployed solutions, providing ongoing support and updates to meet evolving customer needs.",[32,28437,28439],{"id":28438},"streamnative-program-for-affiliates","StreamNative Program For Affiliates",[48,28441,28442],{},"StreamNative Affiliates play a key role by referring StreamNative Cloud to their networks. Leveraging their industry expertise and networks, Affiliates help introduce businesses to StreamNative’s advanced data streaming solutions.",[3933,28444,28446],{"id":28445},"what-does-streamnative-offer-to-affiliates","What does StreamNative offer to Affiliates?",[48,28448,28449],{},"StreamNative supports Affiliates with top-notch resources and dedicated assistance to ensure successful customer engagements and maximize impact.",[321,28451,28452,28455,28458],{},[324,28453,28454],{},"Refer: Leverage your industry connections and expertise to identify and refer potential customers to StreamNative who can benefit from our data streaming solutions.",[324,28456,28457],{},"Engage: Work closely with StreamNative’s sales team to align on customer needs and ensure the referred leads receive the best possible engagement and support.",[324,28459,28460],{},"Earn: Benefit from an evergreen commission program with no quotas or minimums for helping bring value to customers and expanding StreamNative’s customer base.",[40,28462,28464],{"id":28463},"streamnative-welcomes-the-following-partners-to-the-streamnative-partner-program","StreamNative Welcomes the Following Partners to the StreamNative Partner Program",[48,28466,28467],{},"StreamNative is thrilled to welcome an exceptional group of partners to our newly launched Partner Program. These companies have joined forces with StreamNative to push the boundaries of data streaming and deliver top-tier solutions for our shared customers:",[48,28469,3931],{},[48,28471,28472],{},[384,28473],{"alt":18,"src":28474},"\u002Fimgs\u002Fblogs\u002F66df2fd084422e7137e607ee_AD_4nXdrnBNKBRU8z8oUF5laIV_-RZDeSvLqNrvJzff7hvUikwAPiTosMLvKiRipr1b5X7kDpXNVn56tXjZxFAgp9zSuw8dv7MVcmsuo4emoD0py2umVQIuqTjczrxS3P2-0lvuu_7B4aE5KiG-ktqR-wU3oawZD.png",[48,28476,3931],{},[48,28478,28479,28480,190],{},"These partnerships will bring cutting-edge innovations to the data streaming ecosystem and further expand the capabilities of StreamNative Cloud, Apache Pulsar, and Apache Kafka. Learn more about StreamNative Partners on ",[55,28481,28483],{"href":28482},"\u002Fpartners#show-all-partners-scroll","StreamNative Partner Directory",[40,28485,28487],{"id":28486},"benefits-of-streamnative-partner-programs","Benefits Of StreamNative Partner Programs",[48,28489,28490],{},"The StreamNative Partner Program offers a set of benefits to ensure partners can maximize their potential within the ecosystem. These include:",[321,28492,28493,28496,28499,28502,28505,28508],{},[324,28494,28495],{},"Access to StreamNative brand elements",[324,28497,28498],{},"Promotion in StreamNative’s partner directory",[324,28500,28501],{},"Early access to beta releases and new features",[324,28503,28504],{},"Invitations to SPN-exclusive webinars and events",[324,28506,28507],{},"Opportunities for joint Go-To-Market activities",[324,28509,28510],{},"$200 in StreamNative Cloud credits (one-time use)",[40,28512,28514],{"id":28513},"streamnative-partner-program-sign-up-process","StreamNative Partner Program - Sign Up Process",[48,28516,28517,28522],{},[55,28518,28521],{"href":28519,"rel":28520},"https:\u002F\u002Fhs.streamnative.io\u002Fpartner-program-for-streamnative",[264],"Joining the StreamNative Partner Program"," is a streamlined process, designed to ensure ease of onboarding for potential partners. Here’s what the process looks like:",[48,28524,28525],{},[384,28526],{"alt":18,"src":28527},"\u002Fimgs\u002Fblogs\u002F66df2fd0d1f5587dc1958732_AD_4nXfVLBLqb3CPdaGFdNxrOyaN0D2e-8QvOMMzBKceJirec9XByTRsJEvwpkL424ssyZsf1heErVUsEf9QE4yd5LsuRxsXVhsDJzT3b-X6j0UAjjYAznTt1_LqgqvZ3QYkMGSZFJSOMgW_gFaVFfczKXE7Q4rW.png",[321,28529,28530,28533,28536,28539],{},[324,28531,28532],{},"Step 1: StreamNative reviews the request and schedules a discovery call.",[324,28534,28535],{},"Step 2: StreamNative and the partner validate and test integration or solution.",[324,28537,28538],{},"Step 3: The partner signs the StreamNative Partner Agreement.",[324,28540,28541],{},"Step 4: Partner gets listed in StreamNative Partner Directory.",[40,28543,2125],{"id":2122},[48,28545,28546],{},"The StreamNative Partner Program marks a significant milestone in our mission to expand the ecosystem surrounding modern data streaming technologies. By fostering collaboration, driving mutual growth, and enabling seamless integration, the program sets the stage for our partners to innovate, succeed, and grow alongside us. Whether you’re an ISV looking to integrate with StreamNative Cloud,an SI ready to help customers adopt modern streaming technologies, or Affiliate referring StreamNative solutions, our partner program offers the tools and support you need to thrive.",[48,28548,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":28550},[28551,28555,28556,28561,28562,28563,28564],{"id":28363,"depth":19,"text":28364,"children":28552},[28553,28554],{"id":28367,"depth":279,"text":28368},{"id":28374,"depth":279,"text":28375},{"id":28381,"depth":19,"text":28382},{"id":28388,"depth":19,"text":28389,"children":28557},[28558,28559,28560],{"id":28395,"depth":279,"text":28396},{"id":28413,"depth":279,"text":28414},{"id":28438,"depth":279,"text":28439},{"id":28463,"depth":19,"text":28464},{"id":28486,"depth":19,"text":28487},{"id":28513,"depth":19,"text":28514},{"id":2122,"depth":19,"text":2125},"StreamNative Partner Program is designed to accelerate innovation, enhance service delivery, and drive mutual growth through collaboration with a wide range of technology and services partners.","\u002Fimgs\u002Fblogs\u002F66df5a2ace6900d605774936_PartnerProgram_BlogPost-1.png",{},"\u002Fblog\u002Fstreamnative-unveils-partner-program-to-expand-data-streaming-ecosystem",{"title":28349,"description":28565},"blog\u002Fstreamnative-unveils-partner-program-to-expand-data-streaming-ecosystem",[799,821,28572],"Connectors","cePsJIGoNMS913g1SJ32TQk0vhXbFbIM5bXQlCmFeGI",{"id":28575,"title":28576,"authors":28577,"body":28578,"category":290,"createdAt":290,"date":28654,"description":28655,"extension":8,"featured":294,"image":27477,"isDraft":294,"link":290,"meta":28656,"navigation":7,"order":296,"path":27799,"readingTime":290,"relatedResources":290,"seo":28657,"stem":28658,"tags":28659,"__hash__":28660},"blogs\u002Fblog\u002Fintroducing-data-streaming-summit-2024.md","Introducing Data Streaming Summit 2024",[806],{"type":15,"value":28579,"toc":28652},[28580,28583,28590,28599,28605,28611,28617,28623,28629,28637,28640,28642,28650],[48,28581,28582],{},"Data streaming is a rapidly evolving technology that is revolutionizing how businesses leverage real-time data. Innovations in data streaming platforms and stream processing are continuously emerging. In today’s data-driven world, the integration of data streaming technologies like Pulsar and Kafka with AI presents endless opportunities. Notably, Apache Pulsar has evolved from a single-protocol platform to a multi-protocol one, showcasing its extensibility. In addition, businesses often utilize multiple technologies in their data streaming journeys. Hence, we feel that a data streaming conference shouldn’t be limited to one technology itself.",[48,28584,28585,28586,28589],{},"With these advancements in mind, I'm excited to announce a significant upgrade to the Pulsar Summit: introducing the ",[55,28587,5376],{"href":5372,"rel":28588},[264],". This event will be held on October 28-29, 2024, at the Grand Hyatt SFO. The Data Streaming Summit aims to be the premier conference for all things data streaming, including Pulsar, Kafka, Kafka-compatible, Flink, Spark, and many other innovative technologies. It will be a gathering place for developers, architects, and technical executives to connect with peers in a broad data streaming community beyond just Pulsar or Kafka.",[48,28591,28592,28593,28598],{},"Our theme this year is \"DataStreaming + AI\", and we are seeking innovative, informative, and thought-provoking presentations. The Call for Papers (CFP) is now open, and the event has a growing list of sponsors. We invite you to ",[55,28594,28597],{"href":28595,"rel":28596},"https:\u002F\u002Fsessionize.com\u002Fpulsar-summit-north-america-2024",[264],"submit talks"," in the following areas:",[48,28600,28601,28604],{},[44,28602,28603],{},"📖 Learning Data Streaming (Pulsar or Kafka):"," Are you passionate about teaching? Present an entry-level talk introducing data streaming with Pulsar or Kafka, highlighting key features and best practices. Help beginners embark on their data streaming journey.",[48,28606,28607,28610],{},[44,28608,28609],{},"🔍 Deep Dive into Data Streaming (Pulsar or Kafka):"," Are you a vendor building Kafka-compatible technologies or a committer of Pulsar or Kafka? Share your deep technical insights, including inner workings, optimization techniques, or advanced features.",[48,28612,28613,28616],{},[44,28614,28615],{},"✨ Data Streaming Use Cases:"," Have you adopted Pulsar, Kafka, or Kafka-compatible technologies for your data streaming projects? Share your experiences and use cases with the data streaming community.",[48,28618,28619,28622],{},[44,28620,28621],{},"🤖 Data Streaming + AI:"," Discuss how you're leveraging data streaming technologies with AI to solve real-world problems. Whether it's real-time AI models, AI-driven anomaly detection, or optimizing data processing with machine learning, your journey can inspire others.",[48,28624,28625,28628],{},[44,28626,28627],{},"🐿 Stream Processing and Ecosystem:"," Share your experiences with stream processing technologies like Apache Flink, Apache Spark, Risingwave, or any other tools that enhance your data streams. If you're building innovative tools for Pulsar or Kafka, we also want to hear about your work.",[48,28630,28631,28632,28636],{},"Join us by ",[55,28633,28635],{"href":28595,"rel":28634},[264],"submitting your talks"," and become part of a vibrant community of experts, developers, and thought leaders who are shaping the future of DataStreaming and AI.",[48,28638,28639],{},"I look forward to seeing you all in San Francisco!",[48,28641,3931],{},[48,28643,28644,28645,28649],{},"p.s Don't forget to ",[55,28646,28648],{"href":26378,"rel":28647},[264],"register early"," to take advantage of the early bird pricing!",[48,28651,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":28653},[],"2024-07-30","Join us for an exciting in-person event at the Grand Hyatt at SFO to learn about the latest trends and innovations in the world of data steaming, Pulsar and Kafka.",{},{"title":28576,"description":28655},"blog\u002Fintroducing-data-streaming-summit-2024",[5376,821,799],"OZT9Kd6VHT0JWEQEv2OwXgB2h5KzJWCHX2rO0ROsPkM",{"id":28662,"title":28663,"authors":28664,"body":28665,"category":3550,"createdAt":290,"date":28858,"description":28859,"extension":8,"featured":294,"image":28860,"isDraft":294,"link":290,"meta":28861,"navigation":7,"order":296,"path":24152,"readingTime":17934,"relatedResources":290,"seo":28862,"stem":28863,"tags":28864,"__hash__":28865},"blogs\u002Fblog\u002Fstreamnative-introduces-self-service-experience-for-byoc-infrastructure-setup.md","StreamNative Introduces Self-Service Experience for BYOC Infrastructure Setup",[311],{"type":15,"value":28666,"toc":28852},[28667,28673,28676,28678,28682,28685,28688,28690,28695,28697,28701,28706,28711,28713,28721,28723,28728,28736,28739,28742,28747,28749,28752,28757,28759,28762,28767,28769,28774,28777,28782,28784,28787,28789,28794,28796,28799,28801,28806,28808,28811,28813,28818,28820,28829,28831,28835,28838,28843,28845,28847,28850],[48,28668,3600,28669,28672],{},[55,28670,28671],{"href":27778},"Bring Your Own Cloud (BYOC) model is transforming data sovereignty,"," giving users unprecedented control over their data. StreamNative is now elevating this experience by offering self-service options for BYOC infrastructure, significantly enhancing convenience for users. With API, CLI, and Terraform interfaces already available for BYOC provisioning, StreamNative is taking the next step to refine the user experience.",[48,28674,28675],{},"In this post, we delve into the user journey within StreamNative Cloud for setting up f a BYOC infrastructure. This setup lays the foundation for creating and managing Pulsar clusters. Let’s explore how StreamNative Cloud simplifies the experience of BYOC.",[48,28677,3931],{},[40,28679,28681],{"id":28680},"introducing-the-self-service-streamnative-byoc-provisioning","Introducing the Self Service StreamNative BYOC Provisioning",[48,28683,28684],{},"We are thrilled to unveil the rollout of self-service  StreamNative Bring Your Own Cloud (BYOC). This new functionality enables customers to set up the infrastructure necessary to deploy StreamNative BYOC in a fully automated way without any SRE involvement. After setting up the BYOC infrastructure, users can create, manage and monitor Pulsar resources within their cloud environments.. This enhancement is designed to provide a more intuitive and efficient experience, empowering users to leverage the full potential of BYOC with ease.",[48,28686,28687],{},"The self-service BYOC setup process is completed in the following three steps.",[48,28689,3931],{},[48,28691,28692],{},[384,28693],{"alt":18,"src":28694},"\u002Fimgs\u002Fblogs\u002F667aefc509692f3f517db075_AD_4nXc-8gH_STln5rtKtSsTkmcvNLbak2L6Oe87yXdfBl0TEMIxJDbyD7yyBBcKhq97UeHDwUH5B_RBq_I3Jq-okHptgzTLCK5XurjXzKgpV-VppkDaid_QUosNavcp3GdC_nKmMnYkr-57WLmGGhJNwjQG0d9-.png",[48,28696,3931],{},[3933,28698,28700],{"id":28699},"setup-byoc-infrastructure-pool","Setup BYOC Infrastructure Pool",[1666,28702,28703],{},[324,28704,28705],{},"Grant Vendor Access to StreamNative: Start by securely granting StreamNative the necessary access to manage your cloud environment. This is an important step which is executed by running the Terraform scripts. StreamNative simplifies the configuration of essential policies and roles with a Terraform module.",[48,28707,28708],{},[384,28709],{"alt":18,"src":28710},"\u002Fimgs\u002Fblogs\u002F667aefc5a37a01be941a8bce_AD_4nXetCMuUOz4_T9D4zLR08aS-u3TrmpQlSblbZh1MbehYUm4Qhp5XyE2bYrRBkVyEnMsiJhcagoo5PPKf6j8oabmxJH-qWm2wa0Z4bAY61zdpffhK56KItbcDoK6XuWeAyUM9J4RR7JCkfSs3lN51KpXEYjTy.png",[48,28712,3931],{},[48,28714,28715,28716,28720],{},"This module can be deployed independently (",[55,28717,28719],{"href":19316,"rel":28718},[264],"as detailed here",") or integrated into existing Terraform projects.",[48,28722,3931],{},[1666,28724,28725],{},[324,28726,28727],{},"Create Cloud Connection: Establish a connection to your cloud provider, whether it's AWS, GCP, or Azure.",[48,28729,28730,28731,28735],{},"Once access is granted (described in Step1 above), you can ",[55,28732,28734],{"href":19325,"rel":28733},[264],"establish a Cloud Connection"," , enabling the StreamNative Cloud control plane to interface with your AWS account.",[48,28737,28738],{},"Cloud Connections enable StreamNative to link with your AWS, GCP, or Azure account, setting up your Cloud Environment to operate Pulsar Clusters. You can establish a Cloud Connection using either snctl , StreamNative's Terraform provider or the newly introduced user experience as shown below.",[48,28740,28741],{},"To create a new connection, navigate to User profile menu > Cloud Environments as shown in the figure below",[48,28743,28744],{},[384,28745],{"alt":18,"src":28746},"\u002Fimgs\u002Fblogs\u002F667aefc5278e1173e2979d5e_AD_4nXdCqHahk9wT4GbiQA3d_JhfY126cXzCzoJFdY-6nQYfwhwCt_MR83xTf9JknEqZjK0_VTTNqbZDFPxO_XGbvCGiQY2Itp8leQ83cMUY3Cbi_YTogGicnNNlkvudea7anr1ciKyrM_OQQ8E_fzYeOsJGYWA.png",[48,28748,3931],{},[48,28750,28751],{},"Click on New > Create connection as shown in the figure below",[48,28753,28754],{},[384,28755],{"alt":18,"src":28756},"\u002Fimgs\u002Fblogs\u002F667aefc526b05a94d9366fa1_AD_4nXfNk8SaUg4gbnqItJZBL6RktgGKX-oMK_WPz_bfHM_q_5ZIryDx71TM7sq_h-c_ehMZQNW-cBNGdxoHFYxjLuMXvblBItNAvzy0TrSN6fR6Vxh4hTHAm7gv3vdkxePsoan5vg0NA8jurHb9BzYA7iXR8XA.png",[48,28758,3931],{},[48,28760,28761],{},"Enter the Connection name, AWS account ID and check the box to acknowledge and Confirm the vendor access Terraform module is executed as shown in the figure below.",[48,28763,28764],{},[384,28765],{"alt":18,"src":28766},"\u002Fimgs\u002Fblogs\u002F667aefc524b8a5927793dd54_AD_4nXduJ3h5Q5od3WQDWh9DunlfV4KM1BoF_6bSKqVFszqNk2UBU_rVB7gB9T66A2tWU3XjchlPTV4jE-DpuoPv_QNEpuE083QvdF7AVQcmicOg8uCiMUzIFifIvstExI6xTopWLXzxEHfjUBmmygy8i-n94go.png",[48,28768,3931],{},[1666,28770,28771],{},[324,28772,28773],{},"Create Cloud Environment: Set up your cloud environment, ready for Pulsar resource provisioning.",[48,28775,28776],{},"To create a Cloud Environment, select the Cloud Connection you created in the previous step, as shown in the figure below and click on Environment setup.",[48,28778,28779],{},[384,28780],{"alt":18,"src":28781},"\u002Fimgs\u002Fblogs\u002F667aefc6931c708f315bdfb6_AD_4nXc-ErYgJEeXFRMBvpzo0TuG49Yk7buuliD4V1C3dc1LNJe_WW87-rVoApoWkynEo-6ydx-MlR0u43pNNs5bI0pyX72QRlynNlMt4BYUWauy5TdMZ31zSrPQDhJExgqBUHqGpORUbg39X7Q7VpjVx71Naupk.png",[48,28783,3931],{},[48,28785,28786],{},"Enter the details by selecting the region, zone, Network CIDR and Default Gateway and create the environment as shown in the figure below.",[48,28788,3931],{},[48,28790,28791],{},[384,28792],{"alt":18,"src":28793},"\u002Fimgs\u002Fblogs\u002F667aefc69d3cc8e88f8f44e0_AD_4nXfGwimCjLEtlRSp255AufKd0eJpw3bIZ9AuYSj1du8jkTdpmXM1olQxMC95TRNV8VzMt2YEponUFKM2THpiZe2u4eoOPdYOat8UL0ZZViKsELj2y6lEpv8YJsKMTxMzrtjIz6vOOhMKbO7y-Mam8so-MX6j.png",[48,28795,3931],{},[48,28797,28798],{},"Upon submitting all the necessary details to create a Cloud Environment, the user receives an email notification indicating that the Cloud Environment creation is underway, as illustrated in the figure below.",[48,28800,3931],{},[48,28802,28803],{},[384,28804],{"alt":18,"src":28805},"\u002Fimgs\u002Fblogs\u002F667aefc50c146132da7a5634_AD_4nXeiNsAGqQBj6NX5TSnPfvL4lXJt2ywseg1yzsl9BQzGQ46OKlvteXI5d-sVIqObQsQltSIArLX1SpLffM7nxNx6_68WvF4M0qSCqXERgAqmZVtXOwdovMlHBsbChk9gHy--l62vSSkuPUOscr00yrJR0VXL.png",[48,28807,3931],{},[48,28809,28810],{},"Once the cloud environment is successfully created, the user receives a second email notification confirming the completion of this step, as shown in the figure below.",[48,28812,3931],{},[48,28814,28815],{},[384,28816],{"alt":18,"src":28817},"\u002Fimgs\u002Fblogs\u002F667aefc50ae56d3e50995ccc_AD_4nXc095TkgAx5wnrisTmRvyqEvlpDBQuZ9igMzSrhYsVBvKLlw7oMastiDomuOw0vQUAc36GpyVQWJkIW-zzBuTNcmnhat2CWlUdLQ436Oipz3pG8IZN8e_PTFvw1fHRZBtzKhFpdW2C-e4bQ-EFs0WtAMSXE.png",[48,28819,3931],{},[48,28821,28822,28823,28828],{},"Finishing all the steps, you are now ready to ",[55,28824,28827],{"href":28825,"rel":28826},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Finstance",[264],"create Pulsar resources"," within the StreamNative Bring Your Cloud Environment.",[48,28830,3931],{},[40,28832,28834],{"id":28833},"completing-the-story-with-multi-interface-support","Completing the Story with Multi-Interface Support",[48,28836,28837],{},"StreamNative Cloud now delivers a comprehensive set of interfaces for managing your BYOC instance with the new self-service experience. In addition to the REST APIs, Command Line Interface via StreamNative CTL, and Terraform interface, the enhanced user experience provides a seamless and intuitive way to manage your Pulsar resources.",[48,28839,28840],{},[384,28841],{"alt":18,"src":28842},"\u002Fimgs\u002Fblogs\u002F667aefc5889377dadb0d63a5_AD_4nXdTaV9FxYVZUGxk01cAaMj_FBkuanHrvHYWnB_Ul21XejDzuE5Aswl5BEAMzG1Sufpp_tTAPOOufJEx0s9FLp-8sRa6ZdvfKhYuiOwyC9RqI7r3A-4dD_OV7L352ANReLciTnukkcvUna5YjSZ-bG1-n_k.png",[48,28844,3931],{},[32,28846,2125],{"id":2122},[48,28848,28849],{},"The self-serviceStreamNative Bring Your Own Cloud cloud is designed to empower you with greater control, enhanced security, and seamless management of your Pulsar resources. Embrace the future of data streaming with StreamNative’s BYOC functionality and take full advantage of your cloud environment.",[48,28851,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":28853},[28854,28855],{"id":28680,"depth":19,"text":28681},{"id":28833,"depth":19,"text":28834,"children":28856},[28857],{"id":2122,"depth":279,"text":2125},"2024-06-25","Read this blog to learn about how to get started with StreamNative self service BYOC","\u002Fimgs\u002Fblogs\u002F667ed61a2a3aad2ef910e12c_Blog-Final.jpg",{},{"title":28663,"description":28859},"blog\u002Fstreamnative-introduces-self-service-experience-for-byoc-infrastructure-setup",[10322],"USdp2JonMV2fYQKsQtzgtW1cWGP9EWKvv9JnHzryhhU",{"id":28867,"title":28868,"authors":28869,"body":28871,"category":290,"createdAt":290,"date":28858,"description":29019,"extension":8,"featured":294,"image":29020,"isDraft":294,"link":290,"meta":29021,"navigation":7,"order":296,"path":29022,"readingTime":290,"relatedResources":290,"seo":29023,"stem":29024,"tags":29025,"__hash__":29026},"blogs\u002Fblog\u002Funpacking-the-latest-streaming-announcements-a-comprehensive-analysis.md","Unpacking the Latest Streaming Announcements: A Comprehensive Analysis",[28870],"Jesse Anderson",{"type":15,"value":28872,"toc":29011},[28873,28875,28878,28882,28885,28888,28892,28895,28898,28909,28912,28915,28918,28922,28934,28937,28946,28949,28952,28956,28959,28962,28965,28968,28972,28975,28978,28981,28984,28988,28991,29002,29005,29007,29009],[48,28874,3931],{},[48,28876,28877],{},"It’s conference season, and we’re interpreting the latest announcements from the various streaming vendors. This post will consider the recent StreamNative, Confluent, and WarpStream announcements.",[40,28879,28881],{"id":28880},"is-it-easy-now","Is It Easy Now?",[48,28883,28884],{},"Confluent has never shied away from saying Kafka is “easy,” and I disagree. During the Kafka Summit London Keynote, the speakers said “easy” 17 times; in the Kafka Summit Bangalore Keynote, they said it 18 times. It was said 0 times in the Pulsar Summit EMEA keynote hosted by StreamNative.",[48,28886,28887],{},"We’ll get deeper into the individual announcements, which are primarily related to operational changes. None of the changes by all three companies will affect the ease of architecture or development. This nuance is important because Confluent sends a strong message that will lead management and developers to think things are easy now. They aren’t easy.",[40,28889,28891],{"id":28890},"its-all-about-the-protocol","It’s All About the Protocol",[48,28893,28894],{},"I’ve been saying for a long time that Kafka’s value is in the protocol, and the protocol will outlive Apache Kafka.",[48,28896,28897],{},"Using Confluent Cloud? You’re using the Kafka protocol to connect to Confluent’s Kora, which, in turn, talks to the Kafka Cluster.",[48,28899,28900,28901,28903,28904,190],{},"Want to use the Kafka protocol with Pulsar? You can use ",[55,28902,1332],{"href":10389}," from StreamNative to treat Pulsar as ",[55,28905,28908],{"href":28906,"rel":28907},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fkafka-on-cloud",[264],"a Kafka cluster",[48,28910,28911],{},"Using WarpStream? You’re using the Kafka protocol to connect the WarpStream agents that support the Kafka protocol and write to S3.",[48,28913,28914],{},"We’re going to see increasing competition in the space from others. I think the key will be the vendor’s support of the Kafka protocol. The leaders are Confluent Cloud and StreamNative Cloud, which support everything. WarpStream doesn’t support transactions.",[48,28916,28917],{},"The other key will be the extra features the new backend gives us. For example, Pulsar's two-tier architecture removes Kafka’s issues with rebalancing as it is rebalance-free. You’ll also get the built-in replication. Confluent mentioned Kafka supporting queuing in version 4.0. You could wait and see how well it works or have your cake and eat it too with Pulsar’s production-worthy queuing support. IMHO, if something is important enough to use queuing, you’d better be sure that queuing works right.",[40,28919,28921],{"id":28920},"the-keys-to-cost","The Keys to Cost",[48,28923,28924,28925,28930,28931,28933],{},"We’ve all heard of the ",[55,28926,28929],{"href":28927,"rel":28928},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCAP_theorem",[264],"CAP Theorem",". StreamNative asked us to consider a ",[55,28932,23294],{"href":18969},": cost, availability, and performance. When creating streaming systems, we can choose two of the three choices. If we’re going to be focused on cost, we will have to give up availability or performance. I think it’s an interesting way of framing the tradeoffs we deal with in streaming systems.",[48,28935,28936],{},"Pulsar is a more attractive choice in this regard because it allows us to mix different cost types on the same cluster. This mixing is made possible by Pulsar’s separation of brokers and storage layers. Time is bearing out that Pulsar’s storage architecture is standing the test of time.",[48,28938,28939,28940,28945],{},"There isn’t any multi-tenancy on Kafka, and brokers are highly coupled with storage. Confluent Cloud’s ",[55,28941,28944],{"href":28942,"rel":28943},"https:\u002F\u002Fwww.vldb.org\u002Fpvldb\u002Fvol16\u002Fp3822-povzner.pdf",[264],"Kora"," simulates multitenancy. You might think about Kora more as a proxy for a Kafka cluster than anything else.",[48,28947,28948],{},"An essential piece of the cost equation is the economy of scale for streaming clusters. At Kafka Summit, one speaker mentioned seeing Kafka clusters 2-3x overprovisioned. This overprovisioning corresponds to my experience in the field, except I usually see multiple Kafka clusters across the organization instead of one with multitenancy—the overprovisioning comes from Kafka’s lack of multi-tenancy and resource separation. Since Pulsar supports multi-tenancy, an enterprise could have a single cluster supporting each team’s load, and a single cluster costs less to maintain than multiple clusters.",[48,28950,28951],{},"It’s worth noting that Kora and Ursa are only available on their respective cloud offerings. However, Pulsar has more built-in functionality, and Kora is adding more necessary functionality to Kafka.",[40,28953,28955],{"id":28954},"cost-reductions","Cost Reductions",[48,28957,28958],{},"StreamNative, Confluent, and WarpStream optimize for cost by focusing on the CA (cost\u002Favailability) rather than the CP (cost\u002Fperformance). The performance tradeoff is using S3 or the cloud provider’s equivalent.",[48,28960,28961],{},"Confluent calls this their Kora Freight cluster. A significant part of a Kafka cluster’s cost arises from the bandwidth costs of replication. Having the data stored directly into S3, S3 will handle the replication. S3 and their equivalents don’t directly charge for replication bandwidth. The tradeoff for the low cost is the high latency.",[48,28963,28964],{},"WarpStream only operates by saving data to S3, which decreases the cost, but is at the mercy of S3’s various performance issues, including higher latency. In these scenarios, a lagging consumer can force their agent (“broker”) to read relatively recent data from S3 instead of a faster local disk.",[48,28966,28967],{},"StreamNative’s Ursa can be configured to use S3. The difference is that you can choose whichever namespaces to be stored directly in S3 (the fundamental breakdown in Pulsar is cluster->tenant->namespace->topic). This cost optimization will allow teams to pick which topics they want: CA, CP, or AP (availability\u002Fperformance).",[40,28969,28971],{"id":28970},"now-analyze-it","Now Analyze It",[48,28973,28974],{},"One of the difficulties of streaming systems has been landing the data somewhere to be analyzed. We’ve had many different ways of writing out a topic’s data to S3, such as Kafka Connect, Pulsar Connect, etc. Each one of these methods had tradeoffs, such as how much to write at once or how only to read data that was finished writing. There was a chasm between the pub\u002Fsub-system and the data lake.",[48,28976,28977],{},"Then Apache Iceberg came and changed how we write data. Confluent’s and StreamNative’s strategies are Kafka+Flink+Iceberg and Pulsar+Flink+Iceberg\u002FDelta Lake, respectively. Apache Flink does the processing, while Kafka and Pulsar write directly to S3 in Iceberg format. With batch systems such as Apache Spark, we do not have to start extra processes to write to S3 for reading. This change simplifies the operations of reading real-time data, which translates into cost savings.",[48,28979,28980],{},"While WarpStream writes to S3, it isn’t written in a format for processes other than their agents to read or use. I contacted WarpStream, and they said to stay tuned for forthcoming announcements.",[48,28982,28983],{},"With Databrick’s acquisition of Tabular, which was started by the founders of Iceberg, and Snowflake’s announcement of Polaris Catalog, it looks like we’re in for exciting times. If history repeats, we’re in for another proxy fight as the open source community deals with vendors' competing interests. In these situations, I suggest teams go with the most open option available that supports the most protocols, formats, and technologies. Look for the solution that supports the most possibilities, and you will avoid lock-in.",[40,28985,28987],{"id":28986},"what-does-this-mean-to-you","What Does This Mean to You?",[48,28989,28990],{},"I break these announcements into three categories. If any apply to you, I recommend you take action.",[321,28992,28993,28996,28999],{},[324,28994,28995],{},"Are you discounting other pub\u002Fsub systems because they’re not Apache Kafka? It’s about their support of the Kafka protocol, not the system itself.",[324,28997,28998],{},"Are you under pressure to be more cost-effective? Explore systems writing directly to S3 or, at minimum, use tiered storage.",[324,29000,29001],{},"Are you experiencing the pain of integrating streaming and batch processing? Look at using the built-in Iceberg integrations.",[48,29003,29004],{},"Note: This post was sponsored by StreamNative but did not have editorial control.",[48,29006,3931],{},[48,29008,3931],{},[48,29010,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":29012},[29013,29014,29015,29016,29017,29018],{"id":28880,"depth":19,"text":28881},{"id":28890,"depth":19,"text":28891},{"id":28920,"depth":19,"text":28921},{"id":28954,"depth":19,"text":28955},{"id":28970,"depth":19,"text":28971},{"id":28986,"depth":19,"text":28987},"Dive into Jesse Anderson's in-depth analysis of the latest streaming announcements. Explore key insights, industry impacts, and future trends in data streaming technology.","\u002Fimgs\u002Fblogs\u002F679a56bab9d5848713ddea44_article.webp",{},"\u002Fblog\u002Funpacking-the-latest-streaming-announcements-a-comprehensive-analysis",{"title":28868,"description":29019},"blog\u002Funpacking-the-latest-streaming-announcements-a-comprehensive-analysis",[799,821,302],"vz1h58zM6H2hvh3y56GQAL-jQaTNy91PftjhwlEOzTs",{"id":29028,"title":29029,"authors":29030,"body":29031,"category":3550,"createdAt":290,"date":29204,"description":29029,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":29205,"navigation":7,"order":296,"path":27778,"readingTime":17161,"relatedResources":290,"seo":29206,"stem":29207,"tags":29208,"__hash__":29209},"blogs\u002Fblog\u002Fempowering-data-sovereignty-with-byoc-taking-control-in-a-cloud-centric-world.md","Empowering Data Sovereignty with BYOC: Taking Control in a Cloud-Centric World",[311],{"type":15,"value":29032,"toc":29192},[29033,29036,29043,29047,29050,29053,29056,29059,29063,29066,29069,29073,29085,29088,29093,29096,29100,29113,29116,29121,29124,29127,29131,29134,29138,29141,29145,29148,29152,29155,29159,29169,29190],[48,29034,29035],{},"In the dynamic world of technology, data sovereignty has emerged as a critical concept with the potential to reshape the future of managed cloud services. As businesses and individuals increasingly turn to the cloud for data storage, processing, and transformation, the issue of data control and governance becomes paramount.",[48,29037,29038,29039,29042],{},"Who holds the reins to your data in the cloud? Are you confident in its sovereignty? StreamNative BYOC offers a robust solution, ensuring data sovereignty while delivering a fully managed modern streaming data platform. It seamlessly scales to accommodate your growing data streams without the accompanying worries.In this comprehensive blog post, we will delve into the intricacies of data sovereignty, its growing significance, its implications for managed cloud services, and how we address data sovereignty through our ",[55,29040,29041],{"href":18949},"BYOC (Bring Your Own Cloud)"," deployment option on StreamNative.",[40,29044,29046],{"id":29045},"understanding-data-sovereignty","Understanding Data Sovereignty",[48,29048,29049],{},"Data sovereignty revolves around the principle that data is subject to the laws and regulations of the country or jurisdiction in which it physically resides. In simpler terms, data must adhere to the rules and governance structures of its physical location. This concept has gained prominence in response to global data privacy concerns and regulations that have surged due to the increasing digitalization of our lives.",[48,29051,29052],{},"One of the primary reasons for the rise of data sovereignty is the increasing importance of data privacy regulations. Initiatives like the General Data Protection Regulation (GDPR) in the European Union, the California Consumer Privacy Act (CCPA) in the United States, and similar laws worldwide mandate stringent data protection measures, including processing data within specific jurisdictions. Data sovereignty ensures compliance with these regulations.",[48,29054,29055],{},"In a world where data breaches and cyber threats loom large, data sovereignty empowers organizations to exercise greater control over the security of their data. By keeping data within their borders, they can implement security measures tailored to their specific needs.",[48,29057,29058],{},"Moreover, in the event of a legal dispute or government investigation, data sovereignty allows organizations to maintain access to their data without relying on third-party providers in other jurisdictions. This safeguards business continuity and mitigates the risk of data disruptions.",[40,29060,29062],{"id":29061},"data-privacy-vs-data-sovereignty","Data Privacy vs. Data Sovereignty",[48,29064,29065],{},"It's important to distinguish between data privacy and data sovereignty. Data privacy involves simple methods like access control and data policies, which protect specific personally identifiable information (PII) through clear actions such as deletion, masking, obfuscation, and indexing.",[48,29067,29068],{},"On the other hand, data sovereignty is fundamentally about an organization's ability to control the lifecycle of the resources that store its data. Essentially, there are no gray areas; data is either stored on resources that you control, or it isn't.",[40,29070,29072],{"id":29071},"streamnatives-deployment-options","StreamNative’s Deployment Options",[48,29074,29075,29076,29080,29081,190],{},"In StreamNative Cloud, there are two approaches to achieve data sovereignty. The first is through our ",[55,29077,29079],{"href":29078},"\u002Fdeployment\u002Fprivate-cloud-license","Private Cloud License",", a self-managed product offering installed on Kubernetes, on-premises or across hybrid environments. The Private Cloud offers the utmost privacy, security, and data sovereignty, because you retain full control over the message lifecycle, from message backlog quotas to retention policies.This choice is particularly well-suited for those following the emerging trend of \"cloud repatriation,\" which involves migrating resources back to on-premises or private cloud infrastructure. However, it's important to note that this approach may entail trade-offs, potentially sacrificing some of the operational, cost, and scalability benefits associated with a fully SaaS offering like ",[55,29082,29084],{"href":29083},"\u002Fdeployment\u002Fhosted","StreamNative Hosted",[48,29086,29087],{},"StreamNative Hosted, on the other hand, is a SaaS offering hosted on Pulsar clusters on StreamNative’s public cloud infrastructure. This offering exposes the Pulsar clusters to users as a SaaS, accessible through public and private networking options, and compatible with both Pulsar and Kafka APIs. The diagram provided below illustrates the deployment model for StreamNative Hosted.",[48,29089,29090],{},[384,29091],{"alt":18,"src":29092},"\u002Fimgs\u002Fblogs\u002F6671a6bb98e49774cf991d83_AD_4nXcNqMQlY9IIkRP7i__b0gI-8I_1QOOpjoPFFvig1Zw4xqoHULIvXMwg_KUaU2g2cMPwcKPh4HWdYiL3jcbjRRJA4HXV61auqCiV8YzoPTcqVNGcyL2ItED21Dg8gO3NRGxQoepBC_CGvzK_zVvox6NSVHfp.png",[48,29094,29095],{},"While StreamNative Hosted delivers a seamless SaaS experience for users, enabling them to leverage Data Streaming as a service, it introduces a challenge regarding data sovereignty for certain regulated industries.  For compliance reasons, if organizations have to keep sensitive applications on-premises indefinitely, SaaS might be a difficult choice for them.",[40,29097,29099],{"id":29098},"bring-your-own-cloud-byoc-achieving-streaming-data-sovereignty-for-managed-cloud-services","Bring-Your-Own-Cloud (BYOC): Achieving Streaming Data Sovereignty for Managed Cloud Services",[48,29101,29102,29103,29106,29107,29109,29110,29112],{},"Enter ",[55,29104,29105],{"href":18949},"Bring Your Own Cloud (BYOC)",", the third deployment option of StreamNative, which offers a third path that strikes a balance between self-managed ",[55,29108,29079],{"href":29078}," and fully-managed ",[55,29111,29084],{"href":29083}," service. BYOC provides the same fully managed experience as StreamNative Hosted while preserving data sovereignty.",[48,29114,29115],{},"In the BYOC deployment model, an organization's data remains within its virtual private cloud (VPC) while StreamNative’s control plane operates and maintains the software as a service remotely. This approach grants customers’ infrastructure teams greater visibility and control than a pure StreamNative Hosted model, all while allowing them to offload time-consuming and resource-intensive operational tasks to us. This model additionally frees teams to concentrate on critical business opportunities. The diagram below illustrates a BYOC cluster deployment in StreamNative Cloud.",[48,29117,29118],{},[384,29119],{"alt":18,"src":29120},"\u002Fimgs\u002Fblogs\u002F6671a6bb98e49774cf991d7f_AD_4nXePo2ukQChYjb95y0-MIijJViBqcXJWTvsGb0Q-d9Oad6Z9x6UX1gnzeG4P78PXWiaJNvHMaQ218anUhIhrF3BApnfYqjrEHygioGn0MOp07zaiP0IAmsFM1SCRd1Ha5fNZ-u9KyWUNOcbQ_CBdiYTRv-iE.png",[48,29122,29123],{},"Visibility, control, and operations are critical factors when managed Data Streaming services underpin an organization's streaming data infrastructure. Many data streaming infrastructure teams grapple with the complexity of supporting real-time data streaming workloads at scale in the cloud, often involving the maintenance of numerous Kafka or Pulsar clusters across cloud providers with a multi-availability zone setup.",[48,29125,29126],{},"Simultaneously, they contend with data sovereignty challenges as data regulations become more demanding. A BYOC model proves ideal for navigating compliance and regulatory requirements for real-time streaming data infrastructure, as the data plane remains within the customer's virtual private cloud, with StreamNative's control plane managing cluster operations.",[40,29128,29130],{"id":29129},"benefits-of-streamnative-byoc","Benefits of StreamNative BYOC",[48,29132,29133],{},"StreamNative BYOC bridges the gap between the self-managed private cloud and the fully-managed StreamNative Hosted models. It combines the convenience of a fully managed SaaS experience with the control and adaptability of self-management. StreamNative BYOC enables you to implement security measures tailored to your environment, reducing the burden of managing platform infrastructure while allowing you to delegate operational, support, and maintenance responsibilities to trusted data streaming experts.",[32,29135,29137],{"id":29136},"maintaining-control-while-enjoying-the-saas-experience","Maintaining Control While Enjoying the SaaS Experience",[48,29139,29140],{},"StreamNative BYOC offers a fully managed service that replicates the SaaS experience but enhances control over your data. This is achieved by separating the control plane, hosted in StreamNative’s environment, from the data plane, which resides within your own infrastructure. This structure ensures continuous operation and data accessibility, even if StreamNative's control plane goes offline.",[32,29142,29144],{"id":29143},"cost-efficiency-through-existing-cloud-commitments","Cost Efficiency Through Existing Cloud Commitments",[48,29146,29147],{},"Cloud providers often offer discounts for committed spending or usage. StreamNative BYOC allows organizations to capitalize on these discounts as though they were hosting the services themselves, thereby optimizing their cloud spend management.",[32,29149,29151],{"id":29150},"enhanced-security-and-compliance","Enhanced Security and Compliance",[48,29153,29154],{},"StreamNative BYOC not only addresses data sovereignty but also helps organizations adhere to stringent data privacy regulations. It employs zero-trust access control and isolated, protected clusters to provide robust security. This setup supports the enforcement of multiple security layers, all managed by your team. Additionally, StreamNative BYOC ensures that the principle of least privilege is maintained, as StreamNative’s control plane does not possess excessive credentials or permissions, bolstering overall security.",[40,29156,29158],{"id":29157},"choosing-the-right-option-to-handle-the-hybrid-world","Choosing the Right Option to Handle the Hybrid World",[48,29160,29161,29162,29165,29166,29168],{},"Data streaming teams are currently navigating a complex landscape marked by a wide array of technologies, escalating cloud costs, and an increase in service options. Added to these challenges is the need to address data sovereignty. StreamNative Cloud provides a variety of ",[55,29163,29164],{"href":27773},"deployment choices"," powered by the ",[55,29167,5579],{"href":10389}," that support both Pulsar and Kafka protocols, enabling organizations to choose the deployment that best meets their needs for data privacy, sovereignty, and cost-efficiency. For organizations that want the advantages of a self-managed solution—including control, observability, and governance—without the associated complexity and risk, StreamNative BYOC offers a compelling solution.",[48,29170,29171,29172,29177,29178,29180,29181,29184,29185,29189],{},"To get started, you have the option to ",[55,29173,29176],{"href":29174,"rel":29175},"https:\u002F\u002Fconsole.streamnative.cloud\u002F?defaultMethod=signup",[264],"sign up"," for StreamNative Cloud or ",[55,29179,24379],{"href":6392}," to initiate a trial of BYOC or explore our ",[55,29182,29183],{"href":29078},"Private Cloud"," distribution. ",[55,29186,10265],{"href":29187,"rel":29188},"https:\u002F\u002Fhs.streamnative.io\u002Fstreamnative-roadmap-webinar-for-q3-2024",[264]," for our product launch webinar to learn more about BYOC on June 25th.",[48,29191,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":29193},[29194,29195,29196,29197,29198,29203],{"id":29045,"depth":19,"text":29046},{"id":29061,"depth":19,"text":29062},{"id":29071,"depth":19,"text":29072},{"id":29098,"depth":19,"text":29099},{"id":29129,"depth":19,"text":29130,"children":29199},[29200,29201,29202],{"id":29136,"depth":279,"text":29137},{"id":29143,"depth":279,"text":29144},{"id":29150,"depth":279,"text":29151},{"id":29157,"depth":19,"text":29158},"2024-06-18",{},{"title":29029,"description":29029},"blog\u002Fempowering-data-sovereignty-with-byoc-taking-control-in-a-cloud-centric-world",[10322],"8IdUeafEqjBaQM-VsCyR0NPqhzpAvnTCeS6P6Co64-4",{"id":29211,"title":29212,"authors":29213,"body":29215,"category":3550,"createdAt":290,"date":29307,"description":29308,"extension":8,"featured":294,"image":29309,"isDraft":294,"link":290,"meta":29310,"navigation":7,"order":296,"path":29311,"readingTime":5505,"relatedResources":290,"seo":29312,"stem":29313,"tags":29314,"__hash__":29315},"blogs\u002Fblog\u002Fstreamnative-achieves-iso-27001-certification-elevating-data-security-standards.md","StreamNative Achieves ISO 27001 Certification: Elevating Data Security Standards",[29214],"Riva Dunn",{"type":15,"value":29216,"toc":29300},[29217,29220,29224,29227,29231,29234,29237,29241,29244,29248,29262,29266,29269,29292,29295,29298],[48,29218,29219],{},"We are thrilled to announce that StreamNative has successfully obtained ISO 27001 certification, a globally recognized standard for information security management. This significant milestone underscores our unwavering commitment to protecting our clients' data and maintaining the highest standards of security across all our operations.",[40,29221,29223],{"id":29222},"what-is-iso-27001-certification","What is ISO 27001 Certification?",[48,29225,29226],{},"ISO 27001 certification is awarded to organizations that meet rigorous criteria for establishing, implementing, maintaining, and continually improving an information security management system (ISMS). The ISMS is a systematic approach to managing sensitive company information so that it remains secure. It includes people, processes, and IT systems by applying a risk management process.",[40,29228,29230],{"id":29229},"how-we-achieved-iso-27001-certification","How We Achieved ISO 27001 Certification",[48,29232,29233],{},"ISO 27001 certification involves a comprehensive process to establish and maintain an Information Security Management System (ISMS). Prior to undergoing the external audit, we spent several months ensuring the necessary security controls, policies and procedures, assessments, and testing were in place and would meet the requirements of the ISO 27001 standard.",[48,29235,29236],{},"Once prepared, our external auditor, Prescient, conducted a two-stage audit: reviewing documentation and assessing implementation.",[40,29238,29240],{"id":29239},"benefits-to-our-customers","Benefits to Our Customers",[48,29242,29243],{},"Obtaining ISO 27001 certification offers several benefits to our customers. Our robust security controls help protect against data breaches and cyber threats, and we proactively identify and mitigate potential security risks to safeguard customer data. Our certification ensures adherence to global data protection regulations and industry standards, providing customers with confidence that their sensitive information is handled with the highest level of security.",[40,29245,29247],{"id":29246},"key-benefits-of-our-iso-27001-certification","Key Benefits of Our ISO 27001 Certification",[321,29249,29250,29253,29256,29259],{},[324,29251,29252],{},"Enhanced Data Security: We have implemented comprehensive security controls to protect against data breaches and cyber threats.",[324,29254,29255],{},"Risk Management: Our ISMS enables us to proactively identify and mitigate potential security risks.",[324,29257,29258],{},"Compliance: StreamNative's certification ensures that we comply with global data protection regulations and industry standards.",[324,29260,29261],{},"Customer Confidence: Our clients can have greater confidence in our ability to protect their sensitive information and maintain the integrity of our services.",[40,29263,29265],{"id":29264},"streamnative-security-features","StreamNative Security features",[48,29267,29268],{},"At StreamNative, we understand the paramount importance of data security in today's digital landscape. That's why we've implemented a range of robust security features to ensure the confidentiality, integrity, and availability of our clients' data.",[1666,29270,29271,29274,29277,29280,29283,29286,29289],{},[324,29272,29273],{},"End-to-End Encryption: We employ end-to-end encryption protocols to safeguard data in transit, ensuring that sensitive information remains secure as it travels between systems.",[324,29275,29276],{},"Access Controls: Our platform features granular access controls, allowing administrators to define and enforce access policies based on roles and responsibilities. This ensures that only authorized personnel can access sensitive data and resources.",[324,29278,29279],{},"Audit Logging: We maintain comprehensive audit logs that capture all user activities within our platform. This enables real-time monitoring and analysis of system events, facilitating rapid detection and response to security incidents.",[324,29281,29282],{},"Regular Security Audits: We conduct regular security audits and assessments to identify potential vulnerabilities and weaknesses in our systems. This proactive approach allows us to address security issues promptly and continuously improve our security posture.",[324,29284,29285],{},"Threat Detection and Prevention: Leveraging advanced threat detection technologies, we continuously monitor our systems for suspicious activities and potential security threats. In addition, we employ proactive measures to prevent unauthorized access and mitigate security risks.",[324,29287,29288],{},"Secure Data Storage: Our data storage infrastructure is designed with security in mind, utilizing encryption at rest and robust access controls to protect data stored within our systems.",[324,29290,29291],{},"Compliance Frameworks: StreamNative adheres to industry best practices and regulatory requirements, including GDPR, HIPAA, and CCPA. Our commitment to compliance ensures that our clients' data is handled in accordance with relevant data protection regulations.",[48,29293,29294],{},"We invested significant resources to enhance our security protocols, conduct risk assessments, and implement security measures to safeguard our clients' data. This achievement would not have been possible without the hard work and dedication of our entire team.",[48,29296,29297],{},"As we continue to grow, maintaining high standards of information security will remain a priority. We are committed to improving our security practices to meet the needs of our clients and the changing landscape of information security.",[48,29299,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":29301},[29302,29303,29304,29305,29306],{"id":29222,"depth":19,"text":29223},{"id":29229,"depth":19,"text":29230},{"id":29239,"depth":19,"text":29240},{"id":29246,"depth":19,"text":29247},{"id":29264,"depth":19,"text":29265},"2024-06-13","StreamNative obtained an ISO 27001 certification","\u002Fimgs\u002Fblogs\u002F666a1581aa76404cb2703daa_ISO_IEC-27001_2022.png",{},"\u002Fblog\u002Fstreamnative-achieves-iso-27001-certification-elevating-data-security-standards",{"title":29212,"description":29308},"blog\u002Fstreamnative-achieves-iso-27001-certification-elevating-data-security-standards",[4301],"49W9cifHs3LHNSWxMTlujS9ch3DUi3tgiYFOVmUQumY",{"id":29317,"title":29318,"authors":29319,"body":29320,"category":290,"createdAt":290,"date":29563,"description":29564,"extension":8,"featured":294,"image":29565,"isDraft":294,"link":290,"meta":29566,"navigation":7,"order":296,"path":27575,"readingTime":11508,"relatedResources":290,"seo":29567,"stem":29568,"tags":29569,"__hash__":29570},"blogs\u002Fblog\u002Fdata-streaming-for-generative-ai.md","Streaming Data into the Future of Generative AI",[806],{"type":15,"value":29321,"toc":29552},[29322,29325,29334,29338,29341,29344,29347,29361,29364,29367,29371,29374,29377,29380,29398,29412,29416,29419,29435,29438,29442,29450,29465,29473,29477,29480,29483,29487,29490,29504,29511,29515,29518,29532,29536,29539,29542],[48,29323,29324],{},"Generative AI is revolutionizing the tech landscape, offering businesses unprecedented capabilities like hyper-personalization, data monetization, and enhanced customer interactions. The backbone of generative AI is its reliance on large language models (LLMs), which are trained on vast datasets to create outputs that reflect learned patterns.",[48,29326,29327,29328,29333],{},"This transformative potential of generative AI, with ",[55,29329,29332],{"href":29330,"rel":29331},"https:\u002F\u002Fwww.mckinsey.com\u002Fcapabilities\u002Fmckinsey-digital\u002Four-insights\u002Fthe-economic-potential-of-generative-ai-the-next-productivity-frontier#introduction",[264],"McKinsey estimating"," it could contribute between $2.6 and $4.4 trillion to the global economy annually. However, to realize this potential, we must use LLMs effectively with the right real-time data. This is because integrating generative AI into business needs domain-specific, real-time data. This is particularly crucial in fields such as customer service, where the relevance and timeliness of the data can significantly impact the quality of service delivered. For example, an airline customer service agent using a generative AI tool needs current, specific information about flight statuses and company policies to provide accurate assistance. Why is that?",[40,29335,29337],{"id":29336},"llms-trained-on-general-data-require-domain-specific-real-time-inputs","LLMs trained on general data require domain-specific, real-time inputs.",[48,29339,29340],{},"Large language models (LLMs) are trained on extensive public datasets, but to address specific queries such as, \"Is my flight delayed?\" or \"Can I upgrade to first class?\", they require domain-specific, real-time data. For instance, the answers to such questions depend on personal details about the traveler, the airline, and the flight timing. LLM cannot resolve these issues independently, as it is trained on public, historical data and cannot access private, real-time data.",[48,29342,29343],{},"This limitation cannot be overcome merely by enhancing OpenAI's capabilities or by integrating ChatGPT with search engines like Bing, which access only publicly available information. Instead, the airline must securely integrate its internal data sources with LLM to provide accurate, real-time responses to customer inquiries. This approach diverges significantly from traditional machine learning infrastructure.",[48,29345,29346],{},"In traditional machine learning setups, most data engineering tasks are performed during model training, using a specific dataset to optimize the model through feature engineering. Once trained, the model is generally static and tailored to a particular task. In contrast, LLMs utilize massive general datasets, allowing for a broad and reusable model created through deep learning algorithms. This shift means that LLMs, such as those used by OpenAI and Google, rely on continual prompt-based training rather than one-time, problem-specific training. Consequently, data engineering must handle real-time data streams to ensure prompt accuracy.",[48,29348,29349,29350,4003,29355,29360],{},"The shift to data streaming is crucial as companies adapt LLMs to their specific domains through ",[55,29351,29354],{"href":29352,"rel":29353},"https:\u002F\u002Faws.amazon.com\u002Fwhat-is\u002Fprompt-engineering\u002F",[264],"prompt engineering",[55,29356,29359],{"href":29357,"rel":29358},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FFine-tuning_(deep_learning)",[264],"fine-tuning"," techniques. Prompt engineering involves crafting textual inputs that effectively communicate with LLMs, anchoring the AI in a domain-specific context to enhance accuracy and narrow the scope of semantic interpretation. Alternatively, fine-tuning adjusts pre-trained models with targeted datasets, aligning them more closely with specific business needs. However, fine-tuning can sometimes overwrite previous knowledge, potentially degrading the model’s performance on tasks it was originally trained on.",[48,29362,29363],{},"Effective use of LLMs in domains such as customer service requires adapting the model to handle industry-specific queries and ensuring that the model can access and utilize real-time data. For example, an AI assistant designed to manage flight delays must be informed about current specifics, not just general data about a train. This necessitates a system where data flows in real-time to the LLM at the moment of request, enabling truly intelligent, automated responses. This real-time data integration unlocks AI's full potential in domain-specific applications.",[48,29365,29366],{},"We live in a world that needs data streaming more than ever.",[40,29368,29370],{"id":29369},"data-streaming-enabling-real-time-generative-ai-applications","Data Streaming: Enabling Real-Time Generative AI Applications",[48,29372,29373],{},"If you liken LLMs to rockets, then data streaming is the fuel that powers them. Without the real-time, business-specific, highly contextual knowledge provided by data streams, no LLM can function effectively.",[48,29375,29376],{},"Generative AI is transforming how we approach data engineering, business operations, and interactions with data. Data streaming catalyzes this change by enabling real-time generative AI applications not constrained by where the data lives. It liberates data from various silos, making it readily available and accessible for generative AI applications.",[48,29378,29379],{},"At StreamNative, we developed the ONE StreamNative Platform—a data streaming platform designed to ensure that the right data is available at the right place and time by routing relevant data streams anywhere they’re needed in the business, all in real-time.",[48,29381,29382,29383,29388,29389,29391,29392,29397],{},"Two weeks ago at the ",[55,29384,29387],{"href":29385,"rel":29386},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqRma1oIkcWjlQzScK7p3jaR6JOzbuqUj",[264],"Pulsar Summit",", we were excited to introduce the ",[55,29390,5579],{"href":10389}," during our keynote presentation. ",[55,29393,29396],{"href":29394,"rel":29395},"https:\u002F\u002Fyoutu.be\u002FsSLIQrQ-Owk?list=PLqRma1oIkcWjlQzScK7p3jaR6JOzbuqUj&t=1616",[264],"As Matteo highlighted",", Ursa is vital to our grand data streaming vision. We believe that there are four core pillars of a data streaming platform essential for helping enterprises achieve real success with real-time data streams:",[321,29399,29400,29403,29406,29409],{},[324,29401,29402],{},"Stream: This foundational layer stores data streams and supplies real-time data feeds to other applications or services.",[324,29404,29405],{},"Connect: This feature enables the integration of segregated data sources with a data streaming platform, facilitating the flow of domain-specific and real-time knowledge into your business operations.",[324,29407,29408],{},"Secure: Our platform is designed to ensure that data stream access is secure and trustworthy. Robust governance ensures you know the data’s origin and lineage, creating a reliable data stream that teams can trust and access securely.",[324,29410,29411],{},"Everywhere: In today’s complex and hybrid environments, a data streaming platform must be versatile enough to operate anywhere to effectively deliver the right data to the right place at the right time.",[32,29413,29415],{"id":29414},"stream-deliver-fresh-data-as-streams","Stream: Deliver Fresh Data as Streams",[48,29417,29418],{},"The foundation of a data streaming platform is a store that stores data streams and offers the same dataset as real-time data feeds to other applications and services. Ursa is the core data streaming engine that fulfills our technology vision to enable data sharing across different teams, departments, and organizations. The Ursa engine provides the following major capabilities:",[321,29420,29421,29429,29432],{},[324,29422,29423,29428],{},[55,29424,29427],{"href":29425,"rel":29426},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=wWjcIeGAvoU&list=PLqRma1oIkcWjlQzScK7p3jaR6JOzbuqUj&index=2",[264],"Kafka API Compatible",": Ursa is Kafka API compatible, allowing you to continue using the Kafka API to build your streaming applications without needing to rewrite them. Additionally, Ursa is a multi-protocol engine that supports Pulsar and MQTT. This flexibility lets you choose the protocol that best meets your business needs, enabling you to utilize the Kafka ecosystem immediately and focus on building your generative AI applications.",[324,29430,29431],{},"Built on Top of Lakehouse: Ursa maximizes the capability for enabling data sharing by storing data streams in Lakehouse table formats. This compatibility with open lakehouse formats means you don’t need to create bespoke integrations to integrate data streams into data lakes, ensuring data freshness for training your models.",[324,29433,29434],{},"Designed for a Hybrid World: The Ursa engine is not merely designed for on-premises or solely for cloud environments. It adheres to architectural principles suited for hybrid settings, offering latency-optimized and cost-optimized data streams for various workloads and environments. This flexibility allows you to balance trade-offs between latency (performance), availability, and cost.",[48,29436,29437],{},"Overall, the Ursa engine offers a cost-effective solution to provide fresh data as streams for your business, allowing you to allocate saved capital toward advancing your generative AI journey.",[32,29439,29441],{"id":29440},"connect-bring-domain-specific-and-real-time-data-to-your-business","Connect: Bring Domain-Specific and Real-Time Data to Your Business",[48,29443,29444,29445,29449],{},"While Ursa supports multiple protocols, allowing users to choose how they write their streaming applications, not every piece of software is already designed with data streaming in mind. Some data generated by legacy software or other methods remains crucial for powering your generative AI applications. ",[55,29446,28572],{"href":29447,"rel":29448},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub",[264],", including those specific to domain knowledge built using tools like Pulsar Functions, are vital for linking domain-specific data from various silos to a data streaming platform. These connectors make domain-specific knowledge readily available and easily accessible to generative AI-enabled applications.",[48,29451,29452,29453,29458,29459,29464],{},"Kafka Connect and ",[55,29454,29457],{"href":29455,"rel":29456},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fconnector-overview",[264],"Pulsar I\u002FO"," are two common frameworks used to facilitate the integration of data from disparate silos into a data streaming platform. Traditionally, StreamNative has supported only Pulsar I\u002FO connectors. However, as announced at the Pulsar Summit, we are enhancing the ",[55,29460,29463],{"href":29461,"rel":29462},"https:\u002F\u002Ffunctionmesh.io\u002F",[264],"Function Mesh"," framework to create a unified connector framework that can accommodate both Kafka Connect and Pulsar I\u002FO connectors. This development means you no longer need to consider whether a connector is specifically for Pulsar or Kafka. The unified connectors are designed to efficiently transport data into and out of a data streaming platform, delivering domain-specific, real-time data to your generative AI applications.",[48,29466,29467,29472],{},[55,29468,29471],{"href":29469,"rel":29470},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Ffunction-develop-wasm",[264],"WASM"," is another significant innovation in our connector space. It enables users to write transformation logic in any programming language of their choice.",[32,29474,29476],{"id":29475},"secure-ensure-data-is-secured-and-trusted","Secure: Ensure Data is Secured and Trusted",[48,29478,29479],{},"While the \"Stream\" pillar provides the engine for storing data streams effectively, and \"Connect\" facilitates the integration of different systems with data streams to deliver domain-specific, real-time data to your business, the \"Secure\" aspect focuses on ensuring that access to your data is both secure and trustworthy. Data streaming platforms enforce robust governance measures so you know the origin and lineage of your data. With this knowledge, you have a reliable data stream that teams can confidently trust and access securely.",[48,29481,29482],{},"Features such as multi-tenancy and role-based access control are foundational to guaranteeing data security and trust. These features help manage and safeguard access, ensuring that only authorized personnel have the right level of interaction with sensitive information.",[32,29484,29486],{"id":29485},"everywhere-deploy-anywhere-to-handle-the-complex-and-hybrid-world","Everywhere: Deploy Anywhere to Handle the Complex and Hybrid World",[48,29488,29489],{},"This introduces the final pillar of data streaming: the necessity to manage data that might be generated and stored in diverse locations, including different places, data centers, or cloud providers across the globe. To ensure the right data is delivered to the right places at the right time, data streaming platforms need the capability to be deployed anywhere the business requires.",[48,29491,29492,29493,29496,29497,29499,29500,29503],{},"The ONE StreamNative platform was purposefully designed based on Kubernetes, allowing it to be deployed anywhere Kubernetes can run. In addition to these cloud-native capabilities, StreamNative offers various deployment options, ranging from ",[55,29494,29495],{"href":29083},"SaaS"," to ",[55,29498,10322],{"href":18949}," (Bring Your Own Cloud) to ",[55,29501,29502],{"href":29078},"Private Cloud licenses",". This flexibility lets you choose the best deployment option for your business needs.",[48,29505,29506,29507,29510],{},"Beyond our existing cloud offerings, we have recently expanded our services to include ",[55,29508,6869],{"href":29509},"\u002Fblog\u002Fstreamnative-cloud-supports-microsoft-azure"," and will soon introduce self-service BYOC capabilities in our UI.",[32,29512,29514],{"id":29513},"all-four-pillars-together","All Four Pillars Together",[48,29516,29517],{},"By integrating all four pillars, data streaming platforms are a crucial solution, providing the necessary infrastructure to support real-time, generative AI applications. These platforms facilitate the seamless flow of targeted data streams, ensuring that large language models (LLMs) receive the most relevant and current information. This capability is essential for maintaining the accuracy and reliability of AI-driven solutions, as it enables immediate responses to changing conditions and inputs. Data streaming platforms enable real-time generative applications at scale by offering the following:",[321,29519,29520,29523,29526,29529],{},[324,29521,29522],{},"Integrating diverse operational data in real time enhances the reliability and usability of business-specific knowledge.",[324,29524,29525],{},"The organization of unstructured data into structured formats that are more easily processed by AI systems.",[324,29527,29528],{},"Decoupling customer-facing applications from backend AI processes allows for scalable and efficient customer interactions.",[324,29530,29531],{},"The modular architecture supports ongoing technological upgrades without disrupting existing operations.",[40,29533,29535],{"id":29534},"enable-data-streaming-throughout-the-organization","Enable Data Streaming Throughout the Organization",[48,29537,29538],{},"Generative AI represents a paradigm shift for the entire software and tech industry. It not only changes how we interact with data but also how we engage with people. No matter what generative AI applications you build, they should not be treated as another traditional engineering project. Instead, there needs to be a shift in mindset of how to use data, from batch processing to data streaming, enabling data to flow throughout the organization. This approach allows for the selective incorporation of valuable data as needed, fostering experimentation and adaptation—treating it like modular building blocks.",[48,29540,29541],{},"Traditional project-based engineering approaches, which often rely on periodic data updates, can lead to outdated or irrelevant data. In contrast, data streaming offers a dynamic and continuous data integration strategy. This approach meets the immediate needs of generative AI applications and facilitates rapid adaptation and experimentation with new data sources and AI models.",[48,29543,29544,29545,5157,29548,29551],{},"Ultimately, embracing data streaming is not just about enhancing current capabilities but is a strategic move towards future-proofing business operations and leveraging real-time data for competitive advantage. Organizations should consider incorporating data streaming into their operational model to fully harness the potential of generative AI, ensuring they remain at the forefront of technological innovation and service excellence. StreamNative supports your transition to generative AI with the ",[55,29546,29547],{"href":10259},"most cost-effective data streaming platform",[55,29549,29550],{"href":6392},"Talk to us"," if you want to learn more about data streaming and generative AI.",{"title":18,"searchDepth":19,"depth":19,"links":29553},[29554,29555,29562],{"id":29336,"depth":19,"text":29337},{"id":29369,"depth":19,"text":29370,"children":29556},[29557,29558,29559,29560,29561],{"id":29414,"depth":279,"text":29415},{"id":29440,"depth":279,"text":29441},{"id":29475,"depth":279,"text":29476},{"id":29485,"depth":279,"text":29486},{"id":29513,"depth":279,"text":29514},{"id":29534,"depth":19,"text":29535},"2024-05-31","GenAI continues to be top of mind for many companies. However, most are coming to realize that LLMs don’t stand alone. RAG, or retrieval-augmented generation, has emerged as the common pattern for GenAI to extend the powerful LLM models to domain-specific data sets in a way that avoids hallucination and allows granular access controls. Data streaming platforms play a pivotal role in enriching RAG-enabled workloads with contextual and trustworthy data. This enables companies to tap into a continuous stream of real-time data from the systems that power the business and transform it into the right format to be used by vector databases for AI applications.","\u002Fimgs\u002Fblogs\u002F666a1201cfa40fabc459c526_Blog-3.png",{},{"title":29318,"description":29564},"blog\u002Fdata-streaming-for-generative-ai",[799,1332,821,1331],"EWwx43_1O10yCFkLqQesbLIhhcVCKRE3zG_v04HxCdM",{"id":29572,"title":29573,"authors":29574,"body":29575,"category":290,"createdAt":290,"date":29807,"description":29808,"extension":8,"featured":294,"image":29809,"isDraft":294,"link":290,"meta":29810,"navigation":7,"order":296,"path":10453,"readingTime":11508,"relatedResources":290,"seo":29811,"stem":29812,"tags":29813,"__hash__":29814},"blogs\u002Fblog\u002Fstream-table-duality-and-the-vision-of-enabling-data-sharing.md","Stream-Table Duality and the Vision of Enabling Data Sharing",[806],{"type":15,"value":29576,"toc":29799},[29577,29587,29617,29620,29624,29627,29630,29633,29636,29639,29647,29650,29654,29657,29660,29663,29666,29669,29673,29682,29685,29688,29691,29699,29704,29707,29718,29721,29730,29734,29737,29743,29746,29750,29757,29760,29771,29774,29778,29781],[48,29578,29579,29580,29583,29584,29586],{},"We were thrilled to ",[55,29581,29582],{"href":10389},"unveil our new data streaming engine"," at this week's Pulsar Summit. ",[55,29585,1332],{"href":24893}," is not something entirely new; those familiar with Pulsar and StreamNative will recognize it as the culmination of years of development by our talented engineering team at StreamNative.",[48,29588,29589,29590,29594,29595,29598,29599,4003,29603,29608,29609,29612,29613,29616],{},"Ursa is a Kafka API-compatible data streaming engine built on top of Lakehouse, and it simplifies management by eliminating the need for ZooKeeper and BookKeeper. Our journey toward Kafka API compatibility began with ",[55,29591,906],{"href":29592,"rel":29593},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fkop",[264],", evolved into ",[55,29596,910],{"href":29597},"\u002Fblog\u002Fkafka-on-streamnative-bringing-enterprise-grade-kafka-support-to-streamnative-pulsar-clusters",", and has now officially been made generally available as part of the Ursa engine. The concept of Lakehouse storage, initially introduced in 2021 as part of offloading in columnar formats, was developed into a streaming offloader for ",[55,29600,29602],{"href":29601},"\u002Fblog\u002Fstreaming-lakehouse-introducing-pulsars-lakehouse-tiered-storage","Lakehouse Tiered Storage",[55,29604,29607],{"href":29605,"rel":29606},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9ZDv-W65NMA&pp=ygUYbGFrZWhvdXNlIHRpZXJlZCBzdG9yYWdl",[264],"presented at the Pulsar Summit in 2023",". It has now become the primary storage solution for the Ursa engine. ",[55,29610,5599],{"href":22142,"rel":29611},[264],", introduced in 2022 as a replacement for ZooKeeper, has become the robust metadata plane of the Ursa engine (I deliberately use \"metadata plane\" rather than \"metadata storage\"). Making BookKeeper optional is the final piece of our long-standing \"No Keepers\" initiative, which is expected to achieve GA status soon. While the \"No Keeper\" approach is forward-thinking, we must acknowledge that BookKeeper ",[55,29614,29615],{"href":21492},"remains essential"," for achieving ultra-low latency in data streaming and is particularly suitable for on-premise or cloud deployments without concerns about inter-AZ traffic costs.",[48,29618,29619],{},"While there are many technical details to explore behind these developments, we will continue to delve deeper into each in future blog posts. In this blog, I want to unpack the technology vision behind Ursa—enabling effective data sharing across teams, departments, and even organizations.",[40,29621,29623],{"id":29622},"data-sharing-cost-efficiency-and-the-first-principle","Data Sharing, Cost-Efficiency, and The First Principle",[48,29625,29626],{},"Cost is obviously a top priority for everyone. We want to help people reduce the total cost of ownership, enable them to reinvest the saved capital in other business innovations, and accelerate their time to market.",[48,29628,29629],{},"Strategies like separating compute from storage to avoid overprovisioning, saving data in more cost-effective storage mediums, reducing cross-AZ traffic, or even rewriting code in C++ or Rust are all tactics to achieve cost efficiency. It is common for vendors to periodically rewrite technologies to align with the underlying technological cycle, a typical pattern seen in the tech industry. Eventually, everyone catches up in one way or another.",[48,29631,29632],{},"So, the most important question is: What is the First Principle we should follow to achieve cost-effectiveness?",[48,29634,29635],{},"A data streaming platform's primary focus is sharing data between machines, services, teams, departments, and organizations. So, we asked ourselves, how can we share data efficiently?",[48,29637,29638],{},"When we created Pulsar, we had the answer almost from day one: “Save one copy of data and consume it in different ways the businesses need.” This is the fundamental principle behind Pulsar’s unified protocol (queuing and streaming) and multi-tenancy.",[1666,29640,29641,29644],{},[324,29642,29643],{},"Unified Queuing and Streaming: The concept of unified queuing and streaming involves storing the data in a single copy, which can be consumed as a stream or as a queue (competing consumers). This means you don’t need to save multiple copies.",[324,29645,29646],{},"Multi-Tenancy: The idea of multi-tenancy is to store the data once and allow multiple teams to use the same data within the same cluster without needing to copy it to another location.",[48,29648,29649],{},"We extend this principle further in Ursa: if we already have operational and transactional data coming in as streams and queues, why not keep just one copy of the data and make it available to analytical processors? This concept, known as the stream-table duality, allows for the sharing of transactional\u002Foperational data with analytical processors. I'll explore this concept further later in the blog post. Before that, I want to discuss “Unified” and “Data Sharing”.",[40,29651,29653],{"id":29652},"unified-vs-data-sharing","Unified vs Data Sharing",[48,29655,29656],{},"You probably hear much about \"unify\" and \"unification\" from the industry and the market. Common terms include unifying queuing and streaming, batch and streaming, etc. We used to discuss “unified messaging” quite extensively. However, this is actually a trap, as it creates data gravity by leading people to adopt certain protocols, which contradicts the openness and data-sharing nature of data streaming.",[48,29658,29659],{},"We are escaping this trap by elevating the Kafka protocol to a first-class citizen, evolving from a single protocol to a multi-protocol platform. We have opened up the underlying stream storage engine to support various protocols, enabling data sharing among teams who can choose the most suitable protocol for their business needs.",[48,29661,29662],{},"This mindset also extends to our goal of making operational and transactional data in the data streams available for analytical processors. Instead of unifying batch and stream processing on the compute engine side, we are reversing this approach. By implementing the Stream-Table duality in the storage engine layer, we make the data shareable and usable by analytical processors. Thus, you can bring your own compute (yet another BYOC) to process the tables materialized in the lakehouse.",[48,29664,29665],{},"Clearly, this Stream-Table duality would not be possible without the rise of the lakehouse and its open standard storage formats. Without adhering to a standard lakehouse table format, we would risk creating data gravity that forces people to adopt a vendor’s proprietary table format.",[48,29667,29668],{},"Hopefully, you understand our vision and the first principle behind building Ursa and StreamNative. Let’s dive deeper into the technology to understand how we follow this First Principle and how we built Ursa as a data streaming engine enabling data sharing.",[40,29670,29672],{"id":29671},"data-streams-turning-tables-inside-out","Data Streams: Turning Tables Inside-Out",[48,29674,29675,29676,29681],{},"If you're familiar with Kafka and stream processing, you probably know the concept of \"",[55,29677,29680],{"href":29678,"rel":29679},"https:\u002F\u002Fmartin.kleppmann.com\u002F2015\u002F11\u002F05\u002Fdatabase-inside-out-at-oredev.html",[264],"turning tables inside-out",".\" This idea, championed by Martin Kleppmann and Confluent in 2014, has influenced the entire data streaming technology and industry. Data streams have become a primitive for sharing in-motion data between microservices, business applications, and more. Vendors like Confluent, StreamNative, and many others have built their platforms around this concept, each with its unique implementation flavor.",[48,29683,29684],{},"Pulsar is notably unique. In the market, there are many different flavors of Kafka, but there is only one Pulsar. We have written various blog posts and presented numerous talks about Pulsar, so I won’t delve deeper here; I'll discuss this at a higher level.",[48,29686,29687],{},"The fundamental component powering a data streaming engine is a store of data streams or logs. Most implementations manage logs on a per-topic basis, known as the \"partition-based\" storage model.",[48,29689,29690],{},"Pulsar deviates significantly from this norm. It can be thought of as a giant write-ahead log that aggregates data from various topics (streams), with each topic represented by a different collar in the diagram below (Figure 1).",[1666,29692,29693,29696],{},[324,29694,29695],{},"All writes from different topics are first aggregated and appended to this giant write-ahead log. This approach allows efficient batching from millions of topics, supporting extremely high throughput without compromising latency. This secret sauce allows Pulsar to handle millions of topics, with ambitions to support hundreds of millions.",[324,29697,29698],{},"After data is appended to this write-ahead log, it is compacted by relocating messages or entries from the same topics into continuous data blocks (shown as “Data Segments” in the diagram), accompanied by a distributed index (shown as “Distributed Index” in the diagram). This index, used for locating the data segments and the data within those data segments, ensures fast data scans and lookups.",[48,29700,29701],{},[384,29702],{"alt":18,"src":29703},"\u002Fimgs\u002Fblogs\u002F664cbd3ce1f027e2b739b155_y1ftcv-5IGNlAMdwXhqDh0dPwkLQ4WYV2QltwEJYsXOYY0qrV-_TOHWVzNJ1fQJYz3sHaRytGmxghrfaDlsuXeCQcOR7ajiGThBGcwLqsAoDNAEJxeb97NJyLD1AlG3LpesWLhnYSpRtuHfhSsYpdag.png",[48,29705,29706],{},"The entire storage engine comprises three logical components:",[321,29708,29709,29712,29715],{},[324,29710,29711],{},"Write-Ahead Log (WAL): A giant WAL aggregates data for fast writing.",[324,29713,29714],{},"Data Segments: Compacted, continuous data blocks designed for quick scans and lookups.",[324,29716,29717],{},"Distributed Index: An index to locate and read the data segments.",[48,29719,29720],{},"Originally, Pulsar used BookKeeper for low-latency log storage, utilizing inter-node (inter-AZ, in the context of the cloud) data replication for high availability and reliability. In this setup, BookKeeper stores both the giant write-ahead log and the data segments. At the same time, BookKeeper and ZooKeeper manage the distributed index—the latter indexing the segments and the former indexing the data within those segments.",[48,29722,29723,29724,29729],{},"Now, with all these logical components in place, a natural way to reduce costs is data tiering by relocating those data segments from BookKeeper to Object Storage. This led to the introduction of ",[55,29725,29728],{"href":29726,"rel":29727},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.2.x\u002Ftiered-storage-overview\u002F",[264],"tiered storage",", a concept now widely adopted by many vendors. It has become a must-have feature for any data streaming platform.",[40,29731,29733],{"id":29732},"lakehouse-tables-turning-streams-outside-in","Lakehouse Tables: Turning Streams Outside-In",[48,29735,29736],{},"But do we really need data tiering here? Implementing tiered storage often creates another layer of data gravity around the data streaming platform, as almost all tiered storage implementations use their own proprietary formats. This means the only way to retrieve the data is through the data stream API, whether Kafka or Pulsar. This does not align with our vision of enabling data sharing.",[48,29738,29739,29740,29742],{},"More importantly, the rationale behind generating data segments is not “data tiering.” The data is already appended to this giant WAL for durability. Data Segments are continuous blocks compacted from the giant WAL and designed for fast scans and lookups. What if we leverage the schema information already present in the data streams and convert the row-based data written in the giant WAL into columnar data formats, making the streams available as tables in the lakehouse? This process of data compaction is actually turning the streams outside-in into tables. This is the logic and principle behind the “",[55,29741,29602],{"href":29601},"” concept, presented at the Pulsar Summit in 2022 and 2023. It is now called “Lakehouse Storage'' in the Ursa engine, and we have removed “tiered” because there is no actual “data tiering” involved.",[48,29744,29745],{},"With this storage model, we can convert and materialize the WAL data into any format we want to support as part of the compaction process. This approach takes our vision of enabling data sharing to an extreme, allowing us to break the wall between the transactional & operational realm and the analytical realm, share the same copy of data across different use cases, and meet diverse business requirements.",[40,29747,29749],{"id":29748},"latency-and-cloud-economics","Latency and Cloud Economics",[48,29751,29752,29753,29756],{},"This brings us to the last two components of the entire storage engine: the write-ahead log (WAL) and the distributed index. We don’t believe that one model fits all. We operate in hybrid and complex environments: some companies move to the cloud while others revert; some people require ultra-low, single-digit millisecond latency, while others need solutions that align with cloud economics. We believe in understanding trade-offs, so we introduced the concept of the ",[55,29754,29755],{"href":18969},"New CAP theorem"," to explain the necessary compromises when selecting technology.",[48,29758,29759],{},"These trade-offs can be transformed into options that help enterprises find the right balance. This is the idea behind introducing a cost-optimized WAL while keeping BookKeeper as a latency-optimized WAL. Users can choose the best option based on the latency profiles of their workloads.",[321,29761,29762,29765,29768],{},[324,29763,29764],{},"Suppose your workload demands ultra-low latency, or you operate in an environment without cloud economic concerns around inter-AZ traffic (e.g., on-premise, private cloud, or certain public cloud environments). In that case, you can continue to use BookKeeper as your storage engine. That remains our secret sauce.",[324,29766,29767],{},"Alternatively, suppose your workload can tolerate higher latency, or you prefer to prioritize cost over latency. In that case, you can use a cost-optimized WAL implementation, which will soon be generally available (GA).",[324,29769,29770],{},"Furthermore, suppose you wish to make your transactional data available for analytical purposes. In that case, you can flip a switch to convert your data streams into lakehouse tables or vice versa.",[48,29772,29773],{},"Linking this back to a multi-tenancy model, there is no need to set up separate clusters for low-latency and high-throughput workloads. Everything can reside in one cluster, configured on a per-tenant basis. This enables effective data sharing across your teams and departments without adding operational burdens.",[40,29775,29777],{"id":29776},"the-future-of-ursa","The Future of Ursa",[48,29779,29780],{},"It has been a long journey to realize our vision of enabling data sharing across various teams, departments, and organizations, culminating in the Ursa engine. We believe that a data streaming platform is fundamentally different from other platforms. The Ursa engine is inherently open and designed to enhance organizational capabilities by facilitating data sharing between services and people. The future of data streaming platforms will be multi-protocol, multi-tenant, and multi-modal, much like Ursa.",[48,29782,29783,29784,29787,29788,5157,29793,29798],{},"If you want to try out the Kafka API capabilities in Ursa, sign up for ",[55,29785,3550],{"href":17075,"rel":29786},[264]," today. If you want to learn more about the industry's direction, please consider signing up for ",[55,29789,29792],{"href":29790,"rel":29791},"https:\u002F\u002Fhs.streamnative.io\u002Fgigaom-webinar-join-us-for-a-deep-dive-into-data-streaming-trends",[264],"our upcoming Gigaom webinar",[55,29794,29797],{"href":29795,"rel":29796},"https:\u002F\u002Fwww.linkedin.com\u002Fbuild-relation\u002Fnewsletter-follow?entityUrn=7170952834232860672",[264],"Sign up for our newsletter"," to stay updated on our products and news.",{"title":18,"searchDepth":19,"depth":19,"links":29800},[29801,29802,29803,29804,29805,29806],{"id":29622,"depth":19,"text":29623},{"id":29652,"depth":19,"text":29653},{"id":29671,"depth":19,"text":29672},{"id":29732,"depth":19,"text":29733},{"id":29748,"depth":19,"text":29749},{"id":29776,"depth":19,"text":29777},"2024-05-21","Discover Ursa, StreamNative's latest innovation unveiled at the Pulsar Summit, a Kafka API-compatible data streaming engine that enhances data sharing and simplifies management by integrating Lakehouse storage and eliminating the need for ZooKeeper and BookKeeper. Explore the journey and technology behind Ursa, designed to enable effective cross-organizational data sharing.","\u002Fimgs\u002Fblogs\u002F664cc3b763604d5d53a39b14_stream_table_duality-copy.png",{},{"title":29573,"description":29808},"blog\u002Fstream-table-duality-and-the-vision-of-enabling-data-sharing",[1331,1332],"DW8z3rD_ppiC61nw9NUmw7btbVzXhfr3QGImXl7CEPU",{"id":29816,"title":29817,"authors":29818,"body":29819,"category":290,"createdAt":290,"date":30120,"description":29817,"extension":8,"featured":294,"image":30121,"isDraft":294,"link":290,"meta":30122,"navigation":7,"order":296,"path":23631,"readingTime":290,"relatedResources":290,"seo":30123,"stem":30124,"tags":30125,"__hash__":30126},"blogs\u002Fblog\u002Funlocking-lakehouse-storage-potential-seamless-data-ingestion-from-streamnative-to-databricks.md","Unlocking Lakehouse Storage Potential : Seamless Data Ingestion from StreamNative to Databricks",[311],{"type":15,"value":29820,"toc":30110},[29821,29828,29846,29850,29853,29858,29860,29863,29865,29876,29878,29882,29902,29904,29908,29911,29915,29920,29924,29933,29936,29941,29944,29955,29957,29961,29964,29967,29972,29975,29980,29995,30000,30002,30007,30016,30018,30023,30025,30028,30030,30037,30040,30045,30047,30052,30055,30058,30060,30063,30068,30071,30073,30076,30108],[48,29822,29823,29824,29827],{},"In November 2023, StreamNative unveiled a vision poised to transform Lakehouse data storage solutions for businesses globally: ",[55,29825,29826],{"href":29601},"Streaming Lakehouse: Introducing Pulsar’s Lakehouse Tiered Storage",". The Lakehouse Storage vision explains how Apache Pulsar uses a tiered storage system to organize data into hot, warm, and cold categories based on its lifecycle, which helps reduce storage costs. It also introduces Lakehouse Storage, which meets the ideal standards for tiered storage by embracing open standards, allowing for changes in data schema, supporting data streaming and transactions, and managing metadata effectively.",[48,29829,29830,29831,29836,29837,29841,29842,29845],{},"In this post, we'll explore how ",[55,29832,29835],{"href":29833,"rel":29834},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Flakehouse-tiere-storage-overview",[264],"StreamNative's Lakehouse Storage"," offloads and stores data in the open Lakehouse file formats, like ",[55,29838,1157],{"href":29839,"rel":29840},"https:\u002F\u002Fdelta.io\u002F",[264],". This format is particularly beneficial for applications that handle large amounts of data and need strong data management and complex processing pipelines. We'll also discuss how StreamNative's Lakehouse Storage works with Lakehouse Storage providers like ",[55,29843,2599],{"href":26349,"rel":29844},[264],", the creators of the Delta Lake standards.",[40,29847,29849],{"id":29848},"the-core-vision-of-streamnatives-lakehouse-storage","The Core Vision of StreamNative's Lakehouse Storage",[48,29851,29852],{},"StreamNative's Lakehouse Storage is designed to facilitate a seamless transition of data from the StreamNative cloud to various low-cost storage services. This model allows organizations to store data for extended periods without incurring the high costs traditionally associated with high-frequency data access and storage solutions. The essence of this vision lies in providing flexibility and reducing operational costs for businesses.",[48,29854,29855],{},[384,29856],{"alt":18,"src":29857},"\u002Fimgs\u002Fblogs\u002F6642574e3e504788b40cb93e_yp4gJzdkAWFvcCo55kdY-YUdXMu96vME0AA3y67KpyRVRVJiZOubGyC5v7-vlPmZYnd1w17llBpRA0Dzyo8Uoe1hj6m1MMaItxESJBiHfMyyw1hI85Xqyv2bU049LvUA8O9AVRtY9StlLoz4UU-3_ps.png",[48,29859,3931],{},[48,29861,29862],{},"Lakehouse Storage offers the following capabilities.",[48,29864,3931],{},[321,29866,29867,29870,29873],{},[324,29868,29869],{},"Data Offloading to Lakehouse: Easily transfer data from Pulsar topics to top Lakehouse formats like Delta Lake and Apache Iceberg instantly, and keep it in widely used data formats.",[324,29871,29872],{},"Streaming Read Capabilities: Enable streaming read operations from Lakehouse tables using Pulsar clients, ensuring timely access to real-time data streams for various use cases such AI\u002FML.",[324,29874,29875],{},"Batch Read Functionality: Facilitate batch read operations from Lakehouse products through popular query engines like Spark SQL, Flink SQL, and Trino, enhancing data analytics and processing capabilities.",[40,29877,3931],{"id":18},[40,29879,29881],{"id":29880},"streamnatives-lakehouse-storage-improves-upon-the-apache-pulsar-tiered-storage-by-offering-specific-advantages","StreamNative’s Lakehouse Storage improves upon the Apache Pulsar Tiered Storage by offering specific advantages.",[321,29883,29884,29887,29890,29893,29896,29899],{},[324,29885,29886],{},"Long-Term Data Retention: Define offload policies to store data in BookKeeper for real-time processing and in Lakehouse products for batch processing, ensuring comprehensive data retention strategies.",[324,29888,29889],{},"Cost-Effective Storage: Utilize Lakehouse products for storing cold data with open formats and compression, offering a cost-effective storage solution.",[324,29891,29892],{},"Unified Data Platform: Pulsar serves as a unified data storage and processing platform for real-time and batch data processing needs, enhancing operational efficiency.",[324,29894,29895],{},"Schema Evolution Management: Lakehouse Storage seamlessly handles schema evolution, ensuring synchronization between Pulsar topics and Lakehouse tables.",[324,29897,29898],{},"Data Query and Analysis: Enable data querying in Lakehouse products and utilize Pulsar consumers\u002Freaders to access data from BookKeeper and Lakehouse products.",[324,29900,29901],{},"Advanced Data Management Features: Benefit from data versioning, auditing, indexing, caching, and query optimization capabilities, merging the advantages of data lakes and data warehouses.",[48,29903,3931],{},[40,29905,29907],{"id":29906},"offloading-data-with-flexibility-and-efficiency","Offloading Data with Flexibility and Efficiency",[48,29909,29910],{},"A standout feature of the Lakehouse Storage is the ability to offload data in preferred Lakehouse formats such as Delta Lake, Apache Iceberg, and Apache Hudi. This flexibility ensures that organizations can choose the format that best fits their operational needs and technological preferences. Moreover, once the data is offloaded, it can be queried using popular third-party tools like Amazon Athena, Apache Spark, and Trino, for batch processing use cases.",[40,29912,29914],{"id":29913},"lakehouse-storage-in-ursa-engine","Lakehouse Storage In Ursa Engine",[48,29916,29917,29919],{},[55,29918,1332],{"href":24893}," is a data streaming engine that offers compatibility with Apache Kafka and can operate on low cost storage services like AWS S3, GCP GCS, and Azure Blob Storage. It saves data streams in formats compatible with lakehouse tables such as Hudi, Iceberg, and Delta Lake. This approach allows data to be readily available in the lakehouse and streamlines management by removing the need for ZooKeeper and soon, BookKeeper, which also cuts down on bandwidth costs between availability zones. Users can opt in or opt out to use or skip Apache BookKeeper on StreamNative cloud and stream data directly to a Lakehouse.",[40,29921,29923],{"id":29922},"spotlight-on-delta-lake","Spotlight on Delta Lake:",[48,29925,29926,29927,29932],{},"StreamNative Lakehouse Storage supports the ",[55,29928,29931],{"href":29929,"rel":29930},"https:\u002F\u002Fdocs.delta.io\u002Flatest\u002Findex.html",[264],"Delta Lake format",", pioneered by Databricks. Delta Lake is an open format to store data in lakehouse and it offers numerous capabilities for data management like ACID transactions, Schema enforcement and evolution, Time Travel, and more. Delta Lake support is currently in Private Preview within StreamNative  Private Cloud and Bring Your Own Cloud (BYOC) offerings.",[48,29934,29935],{},"With Delta Lake support, the data offloaded by StreamNative cloud stores a rich transaction history for tables as JSON files in a metadata folder, which includes transaction logs of operational and maintenance actions. The table data itself is stored in Parquet format. Delta Lake even supports reads in Hudi and Iceberg formats through a feature called UniForm to enable complex data ecosystems.",[48,29937,29938],{},[384,29939],{"alt":18,"src":29940},"\u002Fimgs\u002Fblogs\u002F6642574fe7b43cddc60967d6_XmI9hEfKQxY948BO2CZ2NjqyqgUZXZDeEfU84XSpqLkGO4utJ5PFzPJhu5aKXsTjbJv_-KzeIjmqSjq4s7Rpefhq04wY5j6jE4ppzbP3JddLeaTItGZtYs_SFBqbCkkpbP79t3IE609m1mkXzYT81XU.png",[48,29942,29943],{},"The integration of Delta Lake in StreamNative’s Lakehouse Storage provides major advantages for businesses, as outlined below",[321,29945,29946,29949,29952],{},[324,29947,29948],{},"Helps teams work together on accurate data, speeding up decision-making.",[324,29950,29951],{},"Lowers infrastructure and maintenance costs with best price performance.",[324,29953,29954],{},"Provides a secure, multi-cloud analytics platform based on an open format.",[48,29956,3931],{},[40,29958,29960],{"id":29959},"lakehouse-storage-integration-with-databricks","Lakehouse Storage Integration With Databricks",[48,29962,29963],{},"Looking towards the future, StreamNative aims to integrate more closely with various Lakehouse storage vendors like Databricks who created the Delta Lake format. These partnerships will likely standardize specific Lakehouse formats, facilitating a smoother data management process across different platforms. Such integration will also enhance compatibility and interoperability between different data systems, fostering a more cohesive data management ecosystem.",[48,29965,29966],{},"Users must undertake only a few manual steps to configure Databricks with StreamNative Lakehouse Storage.",[48,29968,29969],{},[384,29970],{"alt":18,"src":29971},"\u002Fimgs\u002Fblogs\u002F6642574e3d3414bef6182cd9_wteiJjB27gH5EHWNKXJGK7pKwd4bQxghEvI6Jc7juJBJp3MMODBFGWntG5rX4fbLUnaCA7fYc6VOL6t7ylJ1fBm-j-Uxu78oIgwzdXpT_W9I2sv8eKFcPKZwEe3A7nR45oOdmjcF4kDbbgFstJRvX88.png",[48,29973,29974],{},"Let's talk about a scenario where an enterprise is using StreamNative cloud in a Bring Your Own Cloud (BYOC) environment and wants to set up Databricks with it. In such a scenario users can perform the following three steps to successfully mount, discover, and query the offloaded data within Databricks workspace.",[1666,29976,29977],{},[324,29978,29979],{},"Add an external storage location",[48,29981,29982,29983,29988,29989,29994],{},"Within the Databricks Unity Catalog, ",[55,29984,29987],{"href":29985,"rel":29986},"https:\u002F\u002Fdocs.databricks.com\u002Fen\u002Fconnect\u002Funity-catalog\u002Fexternal-locations.html",[264],"add an external location"," which points to the storage bucket where StreamNative Cloud is streaming data. The external location can be set up by ",[55,29990,29993],{"href":29991,"rel":29992},"https:\u002F\u002Fdocs.databricks.com\u002Fen\u002Fconnect\u002Funity-catalog\u002Fstorage-credentials.html",[264],"setting up the right access permissions",". You can configure the access policy to be read only.",[48,29996,29997],{},[384,29998],{"alt":18,"src":29999},"\u002Fimgs\u002Fblogs\u002F6642574e3a6e718ccf395ba9_8cT0GagaWsqu0ac6TOP4Eys4-eZu64NGvcosFgAsSjIVxIL5WHWe0GhQ_0-orK2rw2GsyAznDLMmBWaIo719tRzBg4yz4L0a0M2oRfLRhONoXAP1Mkg3zA08nnjGZ-B27CGlDnyhxuV-W5pP2hyz-O4.png",[48,30001,3931],{},[1666,30003,30004],{},[324,30005,30006],{},"Create external table",[48,30008,30009,30010,30015],{},"Within the Databricks SQL Editor ",[55,30011,30014],{"href":30012,"rel":30013},"https:\u002F\u002Fdocs.databricks.com\u002Fen\u002Fdata-governance\u002Funity-catalog\u002Fcreate-tables.html#create-an-external-table",[264],"create an external table"," which points to the external storage location.",[48,30017,3931],{},[48,30019,30020],{},[384,30021],{"alt":18,"src":30022},"\u002Fimgs\u002Fblogs\u002F6642574e3ba65eebd0501148_Yaj_equrgxLYdZueSOOV09TQwF7dQnWIVf2RZNWI1GH38fbEi6LdJe3uKFq_VuoZdcqAzez7BTxqVAD7xlbSmI6YHcCgkjlZJZHj00kyTWkx0v5yu-ZagCbwKwIk3oSi7SOq27hwHiEtLlqDMaJtMc0.png",[48,30024,3931],{},[48,30026,30027],{},"Here is an example query which creates an external table called employeeinfo pointing to an external location path of an S3 bucket.",[48,30029,3931],{},[48,30031,30032,30033,30036],{},"CREATE TABLE IF NOT EXISTS employeeinfo AS SELECT * FROM delta.",[4926,30034,30035],{},"s3:\u002F\u002Fo-f3eih-lakehouse-storage-869f9603\u002Femployee-info",";",[48,30038,30039],{},"Once the table is created, you can view and explore the table and its schema within the Unity Catalog.",[48,30041,30042],{},[384,30043],{"alt":18,"src":30044},"\u002Fimgs\u002Fblogs\u002F6642574e57cfa06d2c6f84ba_9bON8-j02WobY52JN7wcz8V4Oh0CpB7gM1c_xOlzaOP3lwRVP8qNGqN259od3JC6KZSB1EXOb2EyDwHstWM6mflg-G7u_K6zbai7_1tuAFdguqhSBnRfNqSS4gKdBGeI2vV587i9JN8hp2Hd6Z4Om7w.png",[48,30046,3931],{},[1666,30048,30049],{},[324,30050,30051],{},"Query data",[48,30053,30054],{},"Once the external table is created in the Unity Catalog, users can perform queries to list, filter, and aggregate data.",[48,30056,30057],{},"Here is an example of a query fetching a few specific columns from the employee info table:",[48,30059,3931],{},[48,30061,30062],{},"SELECT firstName,lastName,middleName,email,address FROM kvyas_workspace_mappedto_free_tier.default.employeeinfo;",[48,30064,30065],{},[384,30066],{"alt":18,"src":30067},"\u002Fimgs\u002Fblogs\u002F6642574f8f622c4caa703ec0_s2VI7U5fOnYy5wGPrKE41bkyKk_cERR1F_n0JXP9t9jcdCur0B-RqJPLNKi_lAOD9tLjIJIoLUM3ec1GAhNF9TbyRan449nI87YSzRFAHxvyoaO66aGVbDc3qVWHlzatOimYUOBWA7-g194u87aOw-E.png",[48,30069,30070],{},"StreamNative's long-term strategy includes direct integration with Databricks' Unity Catalog. This integration will streamline processes, significantly reducing the manual effort required to discover and manage data.By directly publishing data to a unified catalog, StreamNative will enable users to leverage a centralized repository for managing and securing data across various environments. This evolution in data handling and storage exemplifies StreamNative's commitment to innovation and customer-centric solutions in the data lakehouse domain.",[40,30072,319],{"id":316},[48,30074,30075],{},"Here's a quick summary of the key content covered in the blog:",[321,30077,30078,30081,30090,30093,30096,30099,30102,30105],{},[324,30079,30080],{},"Lakehouse Storage Part of Ursa Engine: Ursa engine within StreamNative cloud natively supports Lakehouse Storage so users can opt in or out of using Apache BookKeeper for storage and directly write data to a Lakehouse.",[324,30082,30083,30084,30089],{},"Private Preview - The Lakehouse Storage is currently in Private Preview. Customers can",[55,30085,30088],{"href":30086,"rel":30087},"https:\u002F\u002Fsupport.streamnative.io\u002Fhc\u002Fen-us",[264]," file a ticket to enable Lakehouse Storage"," in their cloud environment.",[324,30091,30092],{},"Setting Industry Standards: StreamNative's Lakehouse Storage is defining the future of data management with its flexible, cost-effective solutions.",[324,30094,30095],{},"Leading Integration Efforts: By aligning with popular open lakehouse formats and tools, StreamNative is at the forefront of creating an efficient and interconnected data management landscape.",[324,30097,30098],{},"Cost Efficiency and Enhanced Accessibility: This initiative promises significant cost savings while improving data accessibility and analytics capabilities.",[324,30100,30101],{},"Empowering Businesses: It paves the way for businesses to fully leverage the potential of their data assets.",[324,30103,30104],{},"Streamlining Decision-Making: The partnership with Databricks simplifies and accelerates decision-making processes for customers.",[324,30106,30107],{},"Enhanced Collaboration: The collaboration with Databricks ensures that customers benefit from seamless integration and optimized data workflows, reducing complexities in data handling and analysis.",[48,30109,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":30111},[30112,30113,30114,30115,30116,30117,30118,30119],{"id":29848,"depth":19,"text":29849},{"id":18,"depth":19,"text":3931},{"id":29880,"depth":19,"text":29881},{"id":29906,"depth":19,"text":29907},{"id":29913,"depth":19,"text":29914},{"id":29922,"depth":19,"text":29923},{"id":29959,"depth":19,"text":29960},{"id":316,"depth":19,"text":319},"2024-05-14","\u002Fimgs\u002Fblogs\u002F66427cc6ec7874035688ca0e_architecture.png",{},{"title":29817,"description":29817},"blog\u002Funlocking-lakehouse-storage-potential-seamless-data-ingestion-from-streamnative-to-databricks",[821,799,1332,2599],"RHRMuma-g5paqD_sEHe60Kt7QWkSZEWsuzO-XfH-5ks",{"id":30128,"title":20287,"authors":30129,"body":30130,"category":290,"createdAt":290,"date":30120,"description":20287,"extension":8,"featured":294,"image":30501,"isDraft":294,"link":290,"meta":30502,"navigation":7,"order":296,"path":10389,"readingTime":290,"relatedResources":290,"seo":30503,"stem":30504,"tags":30505,"__hash__":30506},"blogs\u002Fblog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming.md",[806],{"type":15,"value":30131,"toc":30486},[30132,30135,30138,30142,30145,30159,30161,30166,30170,30173,30175,30180,30187,30201,30205,30208,30211,30218,30221,30224,30227,30230,30234,30237,30240,30251,30254,30257,30260,30264,30267,30269,30274,30277,30288,30292,30295,30298,30305,30308,30311,30315,30318,30321,30324,30326,30331,30334,30337,30340,30343,30346,30349,30352,30355,30359,30362,30364,30375,30381,30385,30388,30391,30393,30398,30401,30404,30407,30410,30414,30417,30420,30442,30444,30449,30452,30460,30463,30467,30470,30484],[48,30133,30134],{},"Today, we are really excited to unveil the next-generation data streaming engine - Ursa, which powers the entire StreamNative Cloud. Ursa is a data streaming engine that speaks the native Kafka protocol and is built directly on top of Lakehouse storage. Developed atop Apache Pulsar, Ursa removes the need for BookKeeper and ZooKeeper, pushing the architectural tenets of Pulsar to new heights, specifically tailored for the cost-conscious economy.",[48,30136,30137],{},"Both Kafka and Pulsar are robust open-source platforms. Our development of the Ursa engine leverages the extensive knowledge and operational insights we've gained from our years working with both Pulsar and Kafka. Throughout this post, I will explore the reasons behind Ursa's creation, highlight its benefits, and provide insight into its underlying mechanisms.",[40,30139,30141],{"id":30140},"understand-the-origin-of-apache-kafka","Understand the origin of Apache Kafka",[48,30143,30144],{},"Before diving into Ursa, it’s important to grasp the significance of Kafka. Originally developed at LinkedIn, Kafka was open-sourced in 2011 and quickly became the go-to framework for building data streaming platforms. It emerged during the On-prem \u002F Hadoop Era (2000 to 2010), a time characterized by on-premises deployments with slow network speeds. Infrastructure software from this period was optimized for rack awareness to compensate for these limitations. Kafka, designed under these conditions, coupled data serving and storage on the same physical units—an approach that matched the technological constraints of the time.",[48,30146,30147,30148,30153,30154,190],{},"Fast forward to 2015, and the landscape has drastically shifted, particularly with the move towards cloud-native environments. Despite these changes, Kafka's core architecture has remained largely unchanged. Organizations have attempted to transition Kafka to the cloud, but the reality is that Kafka is cumbersome and costly to operate at scale in modern settings. The problem lies not with Kafka's API but with its implementation, which was conceived for on-prem data centers. This tightly coupled architecture is ill-suited for the cloud, leading to significant data rebalancing challenges when adjusting cluster topologies, resulting in high inter-AZ bandwidth costs and ",[55,30149,30152],{"href":30150,"rel":30151},"https:\u002F\u002Fdeveloper.paypal.com\u002Fcommunity\u002Fblog\u002Fscaling-kafka-to-support-paypals-data-growth\u002F",[264],"potential service disruptions",". Managing a Kafka cluster in such environments requires extensive, specialized tooling and ",[55,30155,30158],{"href":30156,"rel":30157},"https:\u002F\u002Fwww.confluent.io\u002Fblog\u002Funderstanding-and-optimizing-your-kafka-costs-part-2-development-and-operations\u002F",[264],"a dedicated support team",[48,30160,3931],{},[48,30162,30163],{},[384,30164],{"alt":18,"src":30165},"\u002Fimgs\u002Fblogs\u002F66f16f5ec93b13a30217f7ee_66424e2cda8b078d8259f8f9_Sl9T0EpCjEv28bNK-qb98CzS__7jd7usC6WE-bOn27Yp5A6OEzdkg06QdkPJspfk3xohJFXN0vdLgGVTxfWdu9ljRI5vubTAeNRmiMLZfr0vWyfLfsOzWpML3UCMpxeXd5NTYK9SZuIGQTMJsnNHS3c.png",[40,30167,30169],{"id":30168},"pulsar-reimagine-kafka-with-a-rebalance-free-architecture","Pulsar: Reimagine Kafka with a Rebalance-Free Architecture",[48,30171,30172],{},"In contrast to Kafka, Pulsar emerged during the cloud-native era (2010 to 2020), a time marked by the rise of containerized deployments and significantly faster network speeds. As organizations transitioned from on-premises to cloud environments, system designs increasingly prioritized elasticity over cost. This shift led to the widespread adoption of architectures that separate compute from storage, a strategy exemplified by platforms like Snowflake and Databricks.",[48,30174,3931],{},[48,30176,30177],{},[384,30178],{"alt":18,"src":30179},"\u002Fimgs\u002Fblogs\u002F66f16f5dc93b13a30217f7e8_66424e2cb3021b286e4eb919_1hkld2uxKe3jqe-8wocFapU2jkp8pEkFC-mz4hYGztZRY-SPeh4fRekJ0Z0KxGfVn7O2B5kdeRHYNGENUvKZUy0xGGX-lvcVHCMB4D2DCq-AbE6J4bVqHhQmu_YRDgA-9pE6JRDKODagUShx_-nN9O4.png",[48,30181,30182,30183,30186],{},"Pulsar embraced this modern design by decoupling data serving capabilities from the storage layer. With this architecture, it became the pioneer in the market, making it ",[55,30184,30185],{"href":21492},"1000x more elastic than Apache Kafka",". This architectural innovation has made Pulsar extremely attractive to those dealing with the challenges of data rebalancing when operating Kafka clusters.",[48,30188,30189,30190,4003,30195,30200],{},"Pulsar's architecture is rebalance-free and supports a unified messaging model that accommodates data streaming and message queuing. Its features, like built-in ",[55,30191,30194],{"href":30192,"rel":30193},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fconcepts-multi-tenancy\u002F",[264],"multi-tenancy",[55,30196,30199],{"href":30197,"rel":30198},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fconcepts-replication\u002F",[264],"geo-replication",", are key reasons many prefer Pulsar over other data streaming technologies.",[40,30202,30204],{"id":30203},"can-we-further-reduce-costs-in-this-complex-cost-conscious-economy","Can we further reduce costs in this complex, cost-conscious economy?",[48,30206,30207],{},"Today, there is a heightened focus on operational efficiency and cost reduction, which has slowed the migration to cloud solutions and, in some cases, even reversed it. We navigate a complex landscape that straddles both on-premises and cloud environments, where cost and network efficiency are critical considerations. This scenario underscores the need to shift towards more cost-aware and sustainable architectural approaches in data streaming services.",[48,30209,30210],{},"To meet the evolving demands in the current environment and further reduce the costs of running a data streaming platform, we need to revisit and possibly redesign the architecture we have built on with Apache Pulsar. We have identified the following main areas:",[48,30212,30213,30214,30217],{},"Infrastructure Cost: As mentioned in our guide on ",[55,30215,30216],{"href":24363},"evaluating the infrastructure costs of Apache Pulsar and Apache Kafka",", networking often represents the most significant expense for a data streaming platform. Because Pulsar operates as an AP (Availability and Performance) system, cross-AZ traffic incurs substantial costs due to the necessity for cross-AZ replication to ensure high availability and reliability. This availability, unfortunately, comes at a high cost. The cost of inter-AZ data transfer from replication can balloon for high-throughput workloads, accounting for up to 90% of infrastructure costs when self-managing Apache Kafka. Is it possible to completely eliminate cross-AZ traffic from Pulsar?",[48,30219,30220],{},"Operational Cost: Pulsar’s modular design, which includes components like ZooKeeper for metadata management and BookKeeper for ultra-low latency log storage, leverages the elasticity of Kubernetes but can be challenging for beginners. What if we could replace these Keeper services to simplify operations?",[48,30222,30223],{},"Migration Cost: Although Pulsar provides a unified messaging and streaming API, some applications are still written using the Kafka API. Could making Pulsar compatible with the Kafka API eliminate the need for costly application rewrites?",[48,30225,30226],{},"Integration Cost: Pulsar could instantly tap into the vast Kafka ecosystem by supporting the Kafka API, blending robust architectural design with an established user base.",[48,30228,30229],{},"These considerations mark the beginning of our new journey to reimagine Kafka and Pulsar, aiming for a more cost-effective data-streaming architecture that aligns with our industry's evolving needs.",[40,30231,30233],{"id":30232},"object-storage-lakehouse-are-all-you-need","Object Storage & Lakehouse are All You Need",[48,30235,30236],{},"Beyond the cost considerations already discussed, streaming data ultimately finds its home in a data lakehouse, which serves as the foundation for all subsequent analytical processing. However, connecting data streams to these lakehouses typically requires additional integrations to transfer data from the streaming system to the designated lakehouse, incurring significant costs for networking and compute resources.",[48,30238,30239],{},"Given these expenses, we pondered whether developing a Kafka-compatible data streaming system that runs directly atop a data lakehouse would be feasible. This approach could address several of the major challenges we face with current systems:",[1666,30241,30242,30245,30248],{},[324,30243,30244],{},"Cost Reduction: Operating directly on a lakehouse would significantly cut costs, as no major cloud provider charges for data transfer between VMs and object storage. For example, AWS has dedicated countless engineering resources to ensure the reliability and scalability of S3, thereby reducing the operational burden on users.",[324,30246,30247],{},"Simplified Management: Such a system would be easier to manage without needing local disk storage.",[324,30249,30250],{},"Immediate Data Availability: Data would be instantly available in ready-to-use lakehouse formats, allowing for more efficient and cost-effective real-time ETL processes and bypassing the costs of complex networking and bespoke integrations.",[48,30252,30253],{},"Implementing this concept is no small feat. Building a low-latency streaming infrastructure on top of the inherently high-latency lakehouse storage while maintaining full compatibility with the Kafka protocol and adhering to strict data agreements between streaming and lakehouse platforms poses a significant challenge.",[48,30255,30256],{},"So we asked ourselves: “What would Kafka or Pulsar look like if it was redesigned from the ground up today to run in the modern cloud data stack, directly on top of a lakehouse (which is the destination for most of the data streams) over commodity object storage, with no ZooKeeper and BookKeeper to manage, but still had to support the existing Kafka and Pulsar protocols?”",[48,30258,30259],{},"Ursa is our answer to that question.",[40,30261,30263],{"id":30262},"introducing-ursa","Introducing Ursa",[48,30265,30266],{},"Ursa is an Apache Kafka API-compatible data streaming engine that runs directly on top of commodity object stores like AWS S3, GCP GCS, and Azure Blob Storage and stores streams in lakehouse table formats (such as Hudi, Iceberg, Delta Lake). This setup makes data immediately available in the lakehouse and simplifies management by eliminating the need for ZooKeeper and, soon, BookKeeper—thereby reducing inter-AZ bandwidth costs.",[48,30268,3931],{},[48,30270,30271],{},[384,30272],{"alt":18,"src":30273},"\u002Fimgs\u002Fblogs\u002F66f16f5ec93b13a30217f7fe_66424e2c8592d4b0ddd495ea_XYmrGwDv7ngDdZZ5yWt78p1Tmpo-4aXKWZujklw6GkGAW1XoJnPu3Ng-A1RJZPuRwJDmJmJwsYgCYY7-OlzDklqmo_d6QZHdhHclurjvYT_2fl5Vt-v1uqgeCqZ0DRjd_rhkO-dKq-T8_uh9aXcksXs.png",[48,30275,30276],{},"That’s a lot to digest, so let’s unpack it by highlighting three major features within Ursa.",[1666,30278,30279,30282,30285],{},[324,30280,30281],{},"Kafka API Compatibility",[324,30283,30284],{},"Native Lakehouse Storage",[324,30286,30287],{},"No Keepers",[40,30289,30291],{"id":30290},"kafka-api-compatibility-embracing-the-best-of-pulsar-kafka","Kafka API Compatibility - Embracing the Best of Pulsar & Kafka",[48,30293,30294],{},"The development of the Ursa engine began with a project called KoP (Kafka-on-Pulsar). The original idea of KoP was to develop a Kafka API-compatible layer using the distributed log infrastructure of Apache Pulsar and its pluggable protocol handler framework. The project gained significant traction in the Apache Pulsar community and has been adopted by large-scale tech companies (such as WeChat, Didi, etc) to migrate their Kafka workloads to Apache Pulsar.",[48,30296,30297],{},"However, we quickly realized that more than KoP was needed to fulfill the mission of building a data streaming engine directly on a lakehouse. We needed to revolutionize the Kafka and Pulsar protocol implementations to fit the broad vision we had laid out with Ursa.",[48,30299,30300,30301,30304],{},"Hence, we took the experience gained in building KoP and evolved it into KSN (",[55,30302,30303],{"href":29597},"Kafka-on-StreamNative","), which became the core foundation of the Ursa Engine.",[48,30306,30307],{},"The Ursa Engine is compatible with Apache Kafka versions from 0.9 to 3.4. Modern Kafka clients will automatically negotiate protocol versions or utilize an earlier one that Ursa accepts. In addition to the basic produce and consume protocols, Ursa also supports Kafka-compatible transaction semantics and APIs and has built-in support for a schema registry.",[48,30309,30310],{},"With the Ursa Engine, your Kafka applications can directly work and run on StreamNative Cloud without rewriting your code. This eliminates the costs of rewriting and migrating your existing Kafka applications to the Apache Pulsar protocol. Ursa incorporates interoperability between Kafka and Pulsar protocols, enabling you to either begin developing new streaming applications with Pulsar's unified protocol, continue using the Kafka protocol if you already have Kafka developers, or start migrating some of your existing Kafka applications immediately. This also allows you to immediately enjoy the benefits of Apache Pulsar (such as multi-tenancy, geo-replication, etc.) along with a robust Kafka ecosystem. You get the best of both worlds.",[40,30312,30314],{"id":30313},"built-on-top-of-lakehouse-unify-data-streaming-and-data-lakehouse","Built on top of Lakehouse - Unify Data Streaming and Data Lakehouse",[48,30316,30317],{},"Ursa is built on top of lakehouse, enabling StreamNative users to store their Pulsar & Kafka topics and associated schemas directly into lakehouse tables. Our goal with Ursa is to simplify the process of feeding streaming data into your lakehouse.",[48,30319,30320],{},"Ursa utilizes the innovations we have developed to evolve the Pulsar tiered storage. Pulsar was the first data streaming technology to introduce tiered storage, which offloads sealed log segments into commodity object storage like S3, GCS, and Azure Blob Store. While trying to enhance the offloading performance and streamline the process, we realized that tiered storage could evolve for data destined for lakehouses. We asked ourselves: Why can't we store the data directly in lakehouses?",[48,30322,30323],{},"Taking a step back, Pulsar stores the data first in a giant, aggregated write-ahead log (WAL) backed by Apache BookKeeper (as illustrated in Figure 4 below), consolidating data entries from different topics with a smart distributed index for fast lookups.",[48,30325,3931],{},[48,30327,30328],{},[384,30329],{"alt":18,"src":30330},"\u002Fimgs\u002Fblogs\u002F66f16f5ec93b13a30217f7f8_66424e2ce7e2434485798b67_kgy1v3W0w5XE5mW7VrxLq2Kv20U5aZOvo85S5gamUuCwXu_5jebRuT4nUGHSgh8rJ32ipMmevwxuNfBGvqsEtmumWkTk6AVMqyPsiT4W87_lTtGHFrwDRSgFS0eIEzn8mWQQbPU3hraFhr2cimSe8sg.png",[48,30332,30333],{},"After the data is persisted to the WAL, it will be compacted and stored as continuous data objects in commodity object storage. Thus, in Pulsar’s design, there is no actual data tiering. Writing or moving data to object storage is effectively a “compaction” operation that reorganizes the data stored in WAL into continuous data objects grouped by topics for faster scans and lookups.",[48,30335,30336],{},"Given this capability, if the system is intelligent enough when compacting data, we can leverage schema information to store the data in columnar formats directly in standard lakehouse formats. This approach would make the data immediately available in the lakehouse, eliminating the need for bespoke integration between a data streaming platform and a data lakehouse.",[48,30338,30339],{},"These insights have led to the development of Lakehouse Storage, which now serves as Ursa's primary storage. We now refer to it simply as \"Lakehouse Storage,\" eliminating traditional data \"tiering.\" The data can be made immediately accessible in the lakehouse.",[48,30341,30342],{},"Hence, in the Ursa engine, instead of compacting the data into Pulsar’s proprietary storage format, the Ursa engine can now compact the data into other open standard formats, like lakehouse formats such as Apache Hudi, Apache Iceberg, and Delta Lake. Ursa taps into the schema registry during this compaction process to generate lakehouse metadata while managing schema mapping, evolution, and type conversions. This system eliminates the need for manual mappings, which often break when the upstream application updates. Data schemas are enforced upstream as part of the data stream contract—ensuring that incompatible data is detected early and not processed.",[48,30344,30345],{},"In addition to managing schemas, Ursa continuously compacts small parquet files generated by the streaming data into larger files to maintain good read performance. We are collaborating with lakehouse vendors such as Databricks, OneHouse, and others to offload some of these complexities, enabling users to optimize their use of these products for superior performance.",[48,30347,30348],{},"Lakehouse storage is currently available as a Public Preview feature in StreamNative Hosted and BYOC (Bring Your Own Cloud) deployments. StreamNative users can now access their data streams as Delta Lake tables, with the development of Iceberg & Hudi tables coming soon.",[40,30350,30287],{"id":30351},"no-keepers",[48,30353,30354],{},"Pulsar is designed as a modular system with distinct components for different functionalities, such as ZooKeeper for metadata management and BookKeeper for ultra-low latency log storage. While this design capitalizes on the elasticity of Kubernetes, it also introduces overhead that can challenge beginners. As a result, this inherent barrier has prompted initiatives within the Pulsar community and StreamNative to replace these 'Keeper' services, including ZooKeeper and BookKeeper.",[32,30356,30358],{"id":30357},"no-zookeeper","No ZooKeeper",[48,30360,30361],{},"Traditionally, Apache Pulsar has relied on Apache ZooKeeper for all coordination and metadata. Although ZooKeeper is a robust and consistent metadata service, it is difficult to manage and tune. Instead of simply replacing ZooKeeper, we adopted a more sophisticated approach by introducing a pluggable metadata interface, enabling Pulsar to support additional backends such as Etcd. However, there remains a need to design a system that can effectively overcome the limitations of existing solutions like ZooKeeper and Etcd:",[48,30363,3931],{},[321,30365,30366,30369,30372],{},[324,30367,30368],{},"Fundamental Limitation: These systems are not horizontally scalable. An operator cannot add more nodes and expand the cluster capacity since each node must store the entire dataset for the cluster.",[324,30370,30371],{},"Ineffective Vertical Scaling: Since the maximum dataset and throughput are capped, the next best alternative is to scale vertically by increasing CPU and IO resources on the same nodes. However, this stop-gap solution doesn’t fully resolve the issue.",[324,30373,30374],{},"Inefficient Storage: Storing more than 1 GB of data in these systems is highly inefficient due to their periodic snapshots. This snapshot process repeatedly writes the same data, consuming all the IO resources and slowing down write operations.",[48,30376,30377,30378,190],{},"Oxia represents a step toward overcoming these limitations and scaling Pulsar’s ability to support from 1 million topics to hundreds of millions, with efficient hardware and storage. Oxia is currently available for public preview on StreamNative Cloud. For more details about Oxia, you can check out our ",[55,30379,30380],{"href":21529},"blog post",[32,30382,30384],{"id":30383},"no-bookkeeper","No BookKeeper",[48,30386,30387],{},"BookKeeper is a high-performance, scalable log storage system that is the secret behind Pulsar’s ability to achieve a rebalance-free architecture and deliver vastly greater elasticity than Apache Kafka. However, BookKeeper’s design depends on replicating data across multiple storage nodes in different availability zones to ensure high availability. It is ideally suited for latency-optimized workloads. Deploying BookKeeper for high-volume data streaming workloads involves significant inter-AZ traffic, making operating in a multi-AZ deployment expensive. The cost of inter-AZ data transfer from replication can balloon for high-throughput workloads, accounting for up to 90% of infrastructure costs when self-managing Apache Kafka. Although deploying Pulsar in a single-zone environment could reduce cross-AZ network traffic, it would trade off availability for cost and performance.",[48,30389,30390],{},"Since Ursa already utilizes object storage, what if we eliminate the need for BookKeeper as a WAL storage solution and instead directly leverage commodity object storage (S3, GCS, or ABS) as a write-ahead log for data storage and replication? This approach would eliminate the need for inter-AZ data replication and its associated costs. This is the essence of introducing a cost-optimized WAL based on object storage, which is at the heart of the Ursa engine (as illustrated in the diagram below).",[48,30392,3931],{},[48,30394,30395],{},[384,30396],{"alt":18,"src":30397},"\u002Fimgs\u002Fblogs\u002F66f16f5ec93b13a30217f7f5_66424e2d89a600a903ee8a19_Q_-JUi2fdM54P9J9Vl6q302K60L2fK8vE4R_zQIrGG1Hm5Yy1hNM_GVWb_YP86oFfNEUNwRQnWgF7NqTCvVArVhCehAfNz8icuPpsEdJl8lnM7TlPYQubjKmzmLHEvNP-o0c57t28_vH-UGQB4jUaLI.png",[48,30399,30400],{},"With this cost-optimized WAL, we can eliminate almost all cross-AZ traffic, significantly reducing the total infrastructure costs of running high-throughput, latency-relaxed workloads.",[48,30402,30403],{},"Due to its unbeatable durability and cost, Ursa was designed to use cloud object storage as a major storage layer. However, as most workloads need lower latency than what object stores typically provide, we didn't choose one implementation over the other; instead, we incorporated multi-tenancy features that allow users to select the most optimized storage profiles based on their needs for throughput, latency, and cost.",[48,30405,30406],{},"Therefore, you can optimize your tenants and topics based on latency versus cost. Workloads optimized for latency can continue using a latency-optimized WAL without converting data to lakehouse formatted tables. High-throughput and latency-relaxed workloads can choose a cost-optimized WAL to avoid costly cross-AZ data transfers. Data can be stored in lakehouse table formatted streams for longer-term storage and analytical purposes.",[48,30408,30409],{},"With this multi-model and modular Ursa storage engine, we are developing a unified data streaming platform that supports all types of workloads not only with the Kafka API for data streaming and the Pulsar API for messaging queues but also with lakehouse formats as the emerging standard for feeding your analytics systems.",[40,30411,30413],{"id":30412},"our-ambition-unify-data-streaming-and-data-lakes","Our Ambition: Unify Data Streaming and Data Lakes",[48,30415,30416],{},"The Ursa Engine, available on StreamNative Cloud, represents the culmination of years of development and operational experience with Pulsar and Kafka. It's designed to meet the evolving needs of StreamNative customers with a new data streaming engine to support a modern, cost-conscious data streaming cloud. Key developments include full support for the Kafka protocol, a transition from tiered storage to lakehouse storage, the introduction of the more robust metadata management with Oxia, and a shift to make BookKeeper optional by utilizing a WAL system based on commodity object storage.",[48,30418,30419],{},"The rollout of Ursa is structured into distinct phases:",[321,30421,30422,30430,30436,30439],{},[324,30423,30424,30425,30429],{},"Phase 1: Kafka API Compatibility. Achieved ",[55,30426,30428],{"href":28906,"rel":30427},[264],"general availability on StreamNative Cloud"," in January 2024.",[324,30431,30432,30433,30435],{},"Phase 2: Lakehouse Storage. Set for a ",[55,30434,22878],{"href":23631}," on StreamNative Cloud in May 2024.",[324,30437,30438],{},"Phase 3: No Keepers. Plans to remove ZooKeeper with Oxia entering public preview by Q2 2024 and to remove BookKeeper later in the year.",[324,30440,30441],{},"Phase 4: Stream \u003C-> Table Duality. Ursa currently enables the writing of data streams as data tables for storage, with future ambitions to allow users to consume Lakehouse Tables as Streams.",[48,30443,3931],{},[48,30445,30446],{},[384,30447],{"alt":18,"src":30448},"\u002Fimgs\u002Fblogs\u002F66f16f5ec93b13a30217f7f1_66424e2c55fc05c861673bed_eXG5P9aYmYKZSUnW79a-vXvDlE2XccwC6DP0Z2fwK5Szmw4QEVZFp51yQDPQS3gDJytuxh-aJCbPHooGveTNFGE3psnULVxyAXNLOMcO6Jj5tDbCeqzXr_Vb8gVRYwk2YV-2cgi0MN-GsxIYo99yROk.png",[48,30450,30451],{},"The rollout of Ursa engine dramatically reduces the time to insights on your data by uniting data streaming and data lakehouse technologies. We are thrilled about the advancements our team has made and the potential that the launch of the Ursa engine has for both your data streaming platform (DSP) and your lakehouse:",[321,30453,30454,30457],{},[324,30455,30456],{},"For the Lakehouse: Data remains perpetually fresh. It is received, processed, and made available in real time, ensuring it's ready for immediate analysis.",[324,30458,30459],{},"For the Data Streaming Platform: Stream processing jobs benefit from access to the entire historical dataset, simplifying tasks such as reprocessing old data or performing complex joins.",[48,30461,30462],{},"Additionally, we are streamlining the data ingestion pipeline to make it more robust and efficient, ensuring that defined data streams seamlessly integrate into your lake without the need for manual intervention.",[40,30464,30466],{"id":30465},"ursa-next-steps","Ursa Next Steps",[48,30468,30469],{},"Ursa represents a significant advancement in data streaming. We're simplifying the deployment and operation of data streaming platforms, accelerating data availability for your applications and lakehouses, and reducing the costs of managing a modern data streaming stack.",[48,30471,30472,30473,30476,30477,30480,30481,190],{},"While Ursa is still in its early stages, our ambitions are high, and we are eager for you to experience its capabilities. The Ursa engine, featuring Kafka API compatibility, Lakehouse storage, and the No Keeper architecture, is available on ",[55,30474,3550],{"href":17075,"rel":30475},[264],". If you want to learn more or try it out, ",[55,30478,29176],{"href":17075,"rel":30479},[264]," today or ",[55,30482,30483],{"href":6392},"talk to our data streaming experts",[48,30485,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":30487},[30488,30489,30490,30491,30492,30493,30494,30495,30499,30500],{"id":30140,"depth":19,"text":30141},{"id":30168,"depth":19,"text":30169},{"id":30203,"depth":19,"text":30204},{"id":30232,"depth":19,"text":30233},{"id":30262,"depth":19,"text":30263},{"id":30290,"depth":19,"text":30291},{"id":30313,"depth":19,"text":30314},{"id":30351,"depth":19,"text":30287,"children":30496},[30497,30498],{"id":30357,"depth":279,"text":30358},{"id":30383,"depth":279,"text":30384},{"id":30412,"depth":19,"text":30413},{"id":30465,"depth":19,"text":30466},"\u002Fimgs\u002Fblogs\u002F66427ce8a899003a44963824_SN-SM-UrsaAnnounce.png",{},{"title":20287,"description":20287},"blog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming",[1332,821,799],"75ORZlnpf7BQ7ZExnL06MhfX7tO0om2QUTPAyPZvV3o",{"id":30508,"title":30509,"authors":30510,"body":30511,"category":821,"createdAt":290,"date":31035,"description":31036,"extension":8,"featured":294,"image":31037,"isDraft":294,"link":290,"meta":31038,"navigation":7,"order":296,"path":24363,"readingTime":31039,"relatedResources":290,"seo":31040,"stem":31041,"tags":31042,"__hash__":31043},"blogs\u002Fblog\u002Fa-guide-to-evaluating-the-infrastructure-costs-of-apache-pulsar-and-apache-kafka.md","A Guide to Evaluating the Infrastructure Costs of Apache Pulsar and Apache Kafka",[806],{"type":15,"value":30512,"toc":31014},[30513,30516,30519,30526,30529,30534,30537,30541,30558,30561,30564,30567,30571,30574,30581,30584,30591,30594,30599,30602,30606,30609,30620,30625,30630,30635,30640,30645,30650,30655,30660,30663,30666,30671,30674,30678,30681,30684,30692,30695,30697,30702,30705,30708,30712,30715,30719,30724,30728,30733,30737,30742,30745,30750,30754,30761,30766,30769,30773,30779,30784,30788,30791,30793,30798,30801,30804,30807,30821,30823,30828,30832,30835,30840,30845,30850,30858,30861,30864,30868,30871,30878,30880,30885,30889,30892,30903,30906,30917,30920,30923,30926,30929,30933,30936,30941,30944,30948,30963,30967,30970,30972,30977,30981,30984,30995,31001],[48,30514,30515],{},"In an era where cost optimization and efficiency are more crucial than ever due to rising interest rates, surging inflation, and the threat of a recession, businesses are being forced to scrutinize expenses that have gone unchecked for years. Among these, the cost of data streaming, a vital component of modern cloud infrastructure, stands out as a significant and complex challenge.",[48,30517,30518],{},"Having worked with numerous customers, users, and prospects across our fully managed cloud offerings and private cloud software, we at StreamNative have gained a profound understanding of the factors determining and optimizing costs associated with data streaming. Our experience, coupled with managing costs in operating our own StreamNative Hosted Cloud, has provided insights into the key cost drivers and optimization levers for Pulsar and other data streaming platforms.",[48,30520,30521,30522,30525],{},"However, assessing the true cost of operating a data streaming platform is a complex task. Traditional analyses often fall short, focusing predominantly on compute and storage requirements while neglecting the substantial costs associated with networking—the often overlooked yet largest infrastructure cost factor. Our previous discussions, such as in the blog post “",[55,30523,30524],{"href":18969},"The New CAP Theorem for Data Streaming: Understanding the Trade-offs Between Cost, Availability, and Performance","”, highlight these overlooked aspects and the significant, albeit less tangible, costs associated with development and operations personnel.",[48,30527,30528],{},"This blog series aims to shed light on the critical cost components of data streaming platforms, including infrastructure, operational, and downtime costs. These are pivotal due to their direct financial implications and potential to affect an organization's reputation and compliance standing. We are excited to launch this multi-part series to guide you through understanding and managing the costs associated with Apache Pulsar and Apache Kafka, providing insights that will help optimize your data streaming budget.",[321,30530,30531],{},[324,30532,30533],{},"Part 1: A Guide to Evaluating Infrastructure Costs of Apache Pulsar vs. Apache Kafka",[48,30535,30536],{},"Our first blog examines how much it costs to run Pulsar and Kafka. We'll compare the cost of compute, storage, networking, and extra tools they need to work well all the time.",[40,30538,30540],{"id":30539},"define-a-performance-profile-for-cost-consideration","Define a performance profile for cost consideration",[48,30542,30543,30544,30548,30549,30553,30554,30557],{},"As we compare infrastructure costs across various data streaming technologies, setting a standard performance benchmark is crucial to evaluate all technologies under equivalent conditions. This blog post focuses on selecting a low and stable end-to-end latency performance profile for cost analysis. We conducted \"",[55,30545,30547],{"href":30546},"\u002Fblog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark#benchmark-tests","Maximum Sustainable Throughput","\" tests using the ",[55,30550,30552],{"href":22218,"rel":30551},[264],"Open Messaging Benchmark"," to gauge how different systems maintain throughput alongside consistent end-to-end latency. Our ",[55,30555,30556],{"href":21458},"2022 benchmark"," report highlighted that Apache Pulsar achieved a throughput 2.5 times greater than Apache Kafka on an identical hardware setup consisting of three i3en.6xlarge nodes.",[48,30559,30560],{},"Employing an identical machine profile across the board negates the impact of various external factors, enabling a balanced comparison. Nonetheless, it's pertinent to acknowledge that adjustments in several parameters—such as the number of topics, producers, consumers, message size, and message batching efficiency—can influence performance and cost outcomes.",[48,30562,30563],{},"In preparation for a deeper analytical dive and comparison, we must also mention that numerous infrastructure elements are excluded from this preliminary discussion for brevity. These essential components, including load balancers, NAT gateways, Kubernetes clusters, and the observability stack, are integral to forging a production-grade data streaming setup and contribute to the overall infrastructure expenses. Additionally, this initial analysis focuses on the costs associated with managing a single cluster, though many organizations will likely operate multiple clusters across various setups.",[48,30565,30566],{},"Hence, while this comparative analysis provides a foundational framework that may underrepresent the total costs of independently running and supporting Pulsar or Kafka, it introduces a standardized method for facilitating cost comparisons among a broad spectrum of data streaming technologies.",[40,30568,30570],{"id":30569},"compute-costs","Compute Costs",[48,30572,30573],{},"The journey towards cost efficiency often begins with compute resources, which, while forming a minor part of the overall infrastructure expenses, are traditionally the first target for savings. This mindset refers to a time before cloud computing, when scaling compute resources was a significant challenge, largely due to their tight integration with storage solutions.",[48,30575,30576,30577,30580],{},"Understanding the compute costs associated with Apache Pulsar and Apache Kafka necessitates a look into ",[55,30578,30579],{"href":27695},"their architectural foundations",". Kafka employs a monolithic design that integrates serving and storage capabilities within the same node. Conversely, Pulsar opts for a more flexible two-layer architecture, allowing for the separation of serving and storage functions. Establishing a standardized cost measure is essential to accurately comparing computing costs between these two technologies.",[48,30582,30583],{},"To standardize compute cost evaluation, we introduce the concept of a Compute Unit (CU), defined by the capacity of 1 CPU and 8 GB of memory, as a baseline for comparison. This allows us to evaluate a) the cost per compute unit and b) the throughput each technology can achieve per compute unit.",[48,30585,30586,30587,30590],{},"Our analysis used three i3en.12xlarge machines, totaling 72 CUs, costing $0.11 per hour for each CU. Our ",[55,30588,30589],{"href":21458},"benchmark report"," revealed that Kafka could support 280 MB\u002Fs for both ingress and egress traffic, equating to 3.89 MB\u002Fs for both ingress and egress per CU. Pulsar, benefiting from its two-layer architecture, supports 700 MB\u002Fs for both ingress and egress. In this benchmark for Apache Pulsar, each node runs one broker and one bookie. One-third of the computing power is allocated for running brokers, while the remaining two-thirds is allocated for running bookies. This allocation translates to 24 CUs for brokers and 48 CUs for bookies, with a Broker Compute Unit supporting 29.17 MB\u002Fs and a Bookie Compute Unit supporting 14.58 MB\u002Fs for both ingress and egress.",[48,30592,30593],{},"So, the throughput efficiency per compute unit stands as follows between Pulsar and Kafka:",[48,30595,30596],{},[384,30597],{"alt":5878,"src":30598},"\u002Fimgs\u002Fblogs\u002F66f16fb2574712ac414405a6_6639781884fb5ef70a406679_Screenshot-2024-05-06-at-5.36.52-PM.png",[48,30600,30601],{},"This standardization aids in estimating the total compute cost for different workloads, revealing significant differences in compute unit requirements and the associated costs for various ingress and egress scenarios.",[32,30603,30605],{"id":30604},"compute-costs-among-different-workloads","Compute Costs among different workloads.",[48,30607,30608],{},"With this standardization, we compare the compute costs of Pulsar and Kafka under three different workloads and evaluate how costs change when the fanout ratio is changed:",[321,30610,30611,30614,30617],{},[324,30612,30613],{},"Low Fanout: 100 MBps Ingress, 100 MBps Egress",[324,30615,30616],{},"Moderate Fanout: 100 MBps Ingress, 500 MBps Egress",[324,30618,30619],{},"High Fanout: 100 MBps Ingress, 1000 MBps Egress",[1666,30621,30622],{},[324,30623,30624],{},"100 MBps Ingress, 100 MBps Egress Example Workload",[48,30626,30627],{},[384,30628],{"alt":5878,"src":30629},"\u002Fimgs\u002Fblogs\u002F66f16fb2574712ac414405a9_66397840758ef15f8d770483_Screenshot-2024-05-06-at-5.37.52-PM.png",[1666,30631,30632],{},[324,30633,30634],{},"100 MBps Ingress, 300 MBps Egress Example Workload",[48,30636,30637],{},[384,30638],{"alt":5878,"src":30639},"\u002Fimgs\u002Fblogs\u002F66f16fb2574712ac414405d7_6639788b45e2d10462f6f089_Screenshot-2024-05-06-at-5.40.34-PM.png",[1666,30641,30642],{},[324,30643,30644],{},"100 MBps Ingress, 500 MBps Egress Example Workload",[48,30646,30647],{},[384,30648],{"alt":5878,"src":30649},"\u002Fimgs\u002Fblogs\u002F66f16fb2574712ac414405d4_663978e087f82f6cba525809_Screenshot-2024-05-06-at-5.41.59-PM.png",[1666,30651,30652],{},[324,30653,30654],{},"100 MBps Ingress, 1000 MBps Egress Example Workload",[48,30656,30657],{},[384,30658],{"alt":5878,"src":30659},"\u002Fimgs\u002Fblogs\u002F66f16fb2574712ac414405da_6639792787f82f6cba52a562_Screenshot-2024-05-06-at-5.43.11-PM.png",[48,30661,30662],{},"Compiling the data from various workload scenarios highlights the significant difference in computing costs between Apache Pulsar and Apache Kafka. Notably, Pulsar offers a compute cost advantage, being 2.5 times more cost-effective than Kafka. The costs of Apache Kafka increase linearly as the egress rate increases because Kafka has to provision resources to meet egress requirements, which results in the unnecessary overprovisioning of ingress and the processing power of storage components.",[48,30664,30665],{},"In contrast, as the fanout ratio increases, Pulsar's cost efficiency becomes even more pronounced, reaching a point where it is six times more economical than Kafka with a tenfold increase in fanout. This superior cost efficiency stems from Pulsar's dual-layer architecture, allowing for scaling of resources as needed – for example, adding more brokers to increase serving capacity without necessarily increasing the processing power of storage components – which becomes increasingly advantageous as the fanout number grows.",[48,30667,30668],{},[384,30669],{"alt":18,"src":30670},"\u002Fimgs\u002Fblogs\u002F66397619b0abb3ca47801e8d_-kne_AbRcnfKEG-UioS1pTtztQLqJzbs4FzjNjW_XKtlIwXhIC6yFkFJDO_fAtzQd_aSR0fseJzoVZNjuPtkYxQw3yEciWz_MQahxi0BLYCVCm9eFwxo4xqVvtWH27eb-QOswNuVHI9oLzshJ0GDERg.png",[48,30672,30673],{},"While this analysis offers a clear framework for comparing compute costs between Pulsar and Kafka, it's important to note that real-world scenarios involve more complexity. Decisions regarding the optimal machine type for specific components, the appropriate number of these components, and how to best optimize machine types for each workload are nuanced and require careful consideration. The examples provided serve as a basis for illustration, emphasizing that the practical application of these findings can be as complex as the technologies themselves.",[40,30675,30677],{"id":30676},"storage-costs","Storage Costs",[48,30679,30680],{},"Calculating storage costs for data streaming platforms like Apache Pulsar and Apache Kafka can be complex, especially with factors such as IOPs and throughput influencing pricing. For simplicity, this analysis will focus solely on local EBS storage costs. However, it's crucial to consider additional IOPs and throughput expenses for clusters experiencing significant vertical scaling or high throughput demands.",[48,30682,30683],{},"Before we examine cost estimations, it's crucial to understand the distinct storage models utilized by Pulsar and Kafka. Kafka combines serving and storage functions within a single node, typically using general-purpose SSDs (e.g., gp3) to support a range of workloads. This configuration leads to a storage cost of $0.08 per GB-month for Kafka.",[48,30685,30686,30687,30691],{},"In contrast, Pulsar adopts a different approach by separating serving from storage, which involves ",[55,30688,30690],{"href":30689},"\u002Fblog\u002Fhow-pulsars-architecture-delivers-better-performance-than-kafka#pulsar-better-isolation","a dual-disk system on its storage nodes ","to handle write and read operations efficiently. BookKeeper, Pulsar’s storage component, uses separate storage devices for its journal and main ledger.",[48,30693,30694],{},"The journal, which requires fast and durable storage to manage write-heavy loads effectively, is usually stored on SSDs. For read operations, tailing reads are sourced directly from the memTable, while catch-up reads come from the Ledger Disk and Index Disk. This separation ensures that intense read activities do not affect the performance of incoming writes due to their isolation on different physical disks.",[48,30696,3931],{},[48,30698,30699],{},[384,30700],{"alt":18,"src":30701},"\u002Fimgs\u002Fblogs\u002F66397619014256f2490581cf_xrM0G7ptupn9YDQmUCDtXhK_Wh2K6_MNW18ZF1ozTott5JitzOI2Re_m91A-3givDFRGjUM_DQIoRiMbhOWdKmD1R8i_YsNIM0BhYpXqdY1iPQHWlffKbM5qBnsBzuN-bfhQD9Ac7MMXPldr8yYCXgE.png",[48,30703,30704],{},"The architecture employs general-purpose SSDs (at $0.08\u002FGB-month, plus additional throughput charges of $0.040 per MB\u002Fs for usage above 125MB\u002Fs) for the journal disk and throughput-optimized HDDs (at $0.045\u002FGB-month) for ledger disks. This dual-disk strategy allows for the optimization of both cost and performance.",[48,30706,30707],{},"To estimate storage costs, consider factors like ingress rate, replication factor, and retention period, which determine the required storage capacity. The cost can then be calculated based on the specific unit prices of the storage solutions employed.",[32,30709,30711],{"id":30710},"storage-costs-among-different-workloads","Storage Costs among different workloads",[48,30713,30714],{},"We compare the storage costs of Pulsar and Kafka with different ingress and data retention.",[3933,30716,30718],{"id":30717},"_100-mb-ingress-7-days-retention","100 MB Ingress, 7 days retention",[48,30720,30721],{},[384,30722],{"alt":5878,"src":30723},"\u002Fimgs\u002Fblogs\u002F66f16fb3574712ac414405f1_66398a1764a250b181f9807d_Screenshot-2024-05-06-at-6.52.44-PM.png",[3933,30725,30727],{"id":30726},"_100-mb-ingress-30-days-retention","100 MB Ingress, 30 days retention",[48,30729,30730],{},[384,30731],{"alt":5878,"src":30732},"\u002Fimgs\u002Fblogs\u002F66f16fb3574712ac414405f4_66398a4419f6fd7b2f626f59_Screenshot-2024-05-06-at-6.52.57-PM.png",[3933,30734,30736],{"id":30735},"_1000-mb-ingress-7-days-retention","1000 MB Ingress, 7 days retention",[48,30738,30739],{},[384,30740],{"alt":5878,"src":30741},"\u002Fimgs\u002Fblogs\u002F66f16fb3574712ac414405fa_66398b1a216f3fc262412aef_Screenshot-2024-05-06-at-6.59.34-PM.png",[48,30743,30744],{},"Compiling the data from various workload scenarios highlights the significant difference in storage costs between Apache Pulsar and Apache Kafka. Notably, Pulsar offers a storage cost advantage, being 1.7 times more cost-effective than Kafka.",[48,30746,30747],{},[384,30748],{"alt":18,"src":30749},"\u002Fimgs\u002Fblogs\u002F6639761987ca9625a39083e1_xCY_PnRQr1DCmmLjpfxpftAFVnc_SEyoZl4-dxRqKlFjjIJlnYbh-9KGMPb5aYE9uP2fVikxe1Aee_wCUYhjLNTet7EjZ5Q_KFJJoC0swsaVXWQ2sI10csAA9AS06CWxXj5SfiBd6TSYtp66GB0SAwc.png",[32,30751,30753],{"id":30752},"cost-efficiency-from-tiered-storage-solutions","Cost Efficiency from Tiered Storage Solutions",[48,30755,30756,30757,30760],{},"Pulsar's adoption of a ",[55,30758,29728],{"href":29726,"rel":30759},[264]," system in 2018 marks a significant step forward in reducing storage costs. By offloading older data to cheaper cloud-based object storage, such as Amazon S3, Pulsar reduces local storage needs and cuts costs dramatically, potentially by over 90%, depending on the distribution between local and object storage. This system also offers the benefit of lowering compute costs for high-retention scenarios by negating the need for additional storage nodes.",[48,30762,30763],{},[384,30764],{"alt":18,"src":30765},"\u002Fimgs\u002Fblogs\u002F663976198e3b49bb91aeeb13_xm2xucnS4kXSeF2RqnciC-OYlEXCO7xhy5N-ILbNWk9YWC3KD7ot3ztKCLGPGd3Hp7rTOlVgpXZP116pmN39fGeDMBZ2_fb53Id4FtE9X4bvqX_L_ltTRB5gxoddkzhChvm7jxqErOyFcJSrTNWk0_I.png",[48,30767,30768],{},"Kafka is evolving to incorporate tiered storage solutions as well, though with a notable difference: while Pulsar can directly serve data from tiered storage without reloading it to local disks, Kafka's model necessitates data reloading, requiring additional local storage planning and potentially incurring higher costs.",[32,30770,30772],{"id":30771},"from-tiered-storage-to-lakehouse-storage","From Tiered Storage to Lakehouse Storage",[48,30774,30775,30778],{},[55,30776,30777],{"href":29601},"StreamNative's introduction of Lakehouse tiered storage"," further enhances Pulsar's storage efficiency. By leveraging columnar data formats aligned with Pulsar's schema, this feature significantly reduces the volume of data stored in S3 and the operational costs associated with data access, offering savings of up to 3-6 times depending on retention policies and schema specifics. This innovation represents a critical advancement in optimizing storage costs and efficiency for Pulsar users.",[48,30780,30781],{},[384,30782],{"alt":18,"src":30783},"\u002Fimgs\u002Fblogs\u002F663976190339ea676b392022_OQo32EOA_aT1rSJvdTxxLkazSFNs__Z2B0aklXDfZjxm3OEo5YdrowqLLRnLy7Hv0NPNvEoNityJ-gk0RaRRE-At3lGCJ3TW8CZIK7lIF_5GkaXa6c5sXMLy0uCNWZPUBeQqWVZzZhXcZ82GTyWg5VU.png",[40,30785,30787],{"id":30786},"comparing-and-optimizing-resource-utilization","Comparing and Optimizing Resource Utilization",[48,30789,30790],{},"At this juncture, we've temporarily moved past the necessity of overprovisioning resources to accommodate your workload variations. Nevertheless, to maintain reliability and optimal performance, it's crucial to overprovision computing resources to safeguard against unforeseen spikes in throughput and to overprovision storage resources to prevent the risk of depleting disk space. In this section below, we will look into the resource utilization between Pulsar and Kafka in handling workload changes.",[48,30792,3931],{},[48,30794,30795],{},[384,30796],{"alt":18,"src":30797},"\u002Fimgs\u002Fblogs\u002F6639761993aaed0bc04a1e76_B2lmBfrQb4WWXPgKevnmu0x7ocL2x66NynSjNpjbl-JU9dXOvGLZsjzs0tXa225B1Abg1bv4FdrRKO-8AoTmQlKyhRpB5Dce7ldCCdNHxCVyqvXU8Oa-vstBJ_qtf_enS6Yh9yUheL8LZPbeJnwSuyY.png",[48,30799,30800],{},"In modern cloud-native environments, organizations frequently utilize Kubernetes and the inherent elasticity of cloud resources for scaling nodes in response to real-time traffic demands. However, Apache Kafka's design, which integrates serving and storage functions, presents a scalability challenge. Kafka necessitates a proportional increase in storage capacity to enhance serving capabilities and vice versa. Scaling operations in Kafka involve partition rebalancing, leading to potential service disruptions and significant data transfer over the network, thus impacting both performance and network-related costs.",[48,30802,30803],{},"This scenario often forces Kafka users to operate their clusters in a state of chronic overprovisioning, maintaining resource utilization levels well below optimal (sometimes below 20%). Despite the low usage, the financial implications of maintaining these additional resources are reflected in their cloud billing.",[48,30805,30806],{},"In contrast, engineered with the cloud-native landscape in mind, Apache Pulsar introduces an innovative approach to resource management, enabling precise optimization of compute and storage usage to control infrastructure expenses efficiently. Several key features distinguish Pulsar's architecture:",[321,30808,30809,30812,30818],{},[324,30810,30811],{},"Two-Layered Architecture: By decoupling message serving from storage, Pulsar facilitates independent scaling of resources based on actual demand. This flexibility allows for the addition of storage capabilities to extend data retention or the enhancement of serving nodes to increase consumption capacity without the need for proportional scaling.",[324,30813,30814,30817],{},[55,30815,30816],{"href":21492},"Rebalance-Free Storage",": Pulsar's design minimizes the network load associated with scaling by eliminating the need for partition rebalancing. This approach prevents service degradation during scaling events and curtails spikes in networking costs.",[324,30819,30820],{},"Quota Management and Resource Limits: With built-in support for multi-tenancy, Pulsar enables effective quota management and the imposition of resource limits, safeguarding high availability and scalability across diverse operational scenarios.",[48,30822,3931],{},[48,30824,30825],{},[384,30826],{"alt":18,"src":30827},"\u002Fimgs\u002Fblogs\u002F6639761976f22438881eea30_s2g2RzhG12VChgRVMgKIaoRe2ZpPDQ5ZXK6CLMapoaHvUQ2YoXS3WFze4ov3WTx4D3pUmwRTYKTtmahRypK98LQ4u41_TC0yTX5JN1CX2ktizoXBk98u4GpvVhbsTTnw5GyVvQ32tZ73vRfTk5lufmM.png",[32,30829,30831],{"id":30830},"a-note-on-multi-tenancy-in-data-streaming-platforms","A Note on Multi-Tenancy in Data Streaming Platforms",[48,30833,30834],{},"This blog post primarily addresses assessing infrastructure costs associated with one cluster. However, it is crucial to highlight the significant advantages of adopting a multi-tenant platform, particularly how multi-tenancy can decrease the total cost of ownership for such data streaming platforms. Here are several reasons why your business stands to benefit from multi-tenancy.",[48,30836,30837],{},[384,30838],{"alt":18,"src":30839},"\u002Fimgs\u002Fblogs\u002F6639761919aeef3f7bace98e_lD5w_euuiBPoYEsRXngD4rTpZ1aLSFHwCnAVVe1qzOgpVdN2cGopYNvLirnIliZ9hcEFk8fmb_mQXdZIMlPsVs9wmWofNQ4I2VkuuZmcSQKQGlVo1aUPtIuMuZ87HfYrLZ6uxibSEiGXMZLu46_gyfI.png",[1666,30841,30842],{},[324,30843,30844],{},"Lower cost: Companies adopting Apache Kafka often require multiple Kafka clusters to cater to different teams within an organization, primarily due to the inherent limitations in Kafka’ architecture (See Figure 8). This scenario typically leads to each cluster operating at less than 30% capacity. Conversely, Apache Pulsar allows for the use of a single, multi-tenant cluster. Such deployment facilitates cost-sharing across various teams and enhances overall resource utilization. By pooling resources, organizations can significantly diminish the fixed costs and reduce the total cost of ownership, as resources are more efficiently allocated and used (See Figure 9).",[48,30846,30847],{},[384,30848],{"alt":18,"src":30849},"\u002Fimgs\u002Fblogs\u002F663976191b63f3af890a5065_pblf3w95OyaGub89dmDu288J30p8I_u30BfssE4PbgqaOsd6QQB_OQhzU525eOuGssp_GWUAULsO_54jN4jyrGgyd-_9UQEoE8hX7m-st3mtOz-runmXG5ADc2Stme3ccvSqwtxEnAtP3CyvFHbdhSg.png",[1666,30851,30852,30855],{},[324,30853,30854],{},"Simpler operations: Managing a single multi-tenant cluster simplifies operations compared to maintaining multiple single-tenant clusters. With fewer clusters, there’s less complexity in managing access controls, credentials, networking configurations, schema registries, roles, etc. Moreover, the challenge of tracking and upgrading different software versions across various clusters can become an operational quagmire. By centralizing on a multi-tenant platform, organizations can drastically cut operational burdens and associated costs, streamlining the management process and reducing the likelihood of errors.",[324,30856,30857],{},"Greater reuse: When all teams utilize a common cluster, reusing existing data becomes significantly more straightforward. A simple adjustment in access controls can enable different teams to leverage topics and events created by others, thereby accelerating the delivery of new projects and value creation. Furthermore, minimizing data duplication between clusters can lead to substantial savings. In data streaming platforms, where networking often constitutes a major portion of the expenses, a multi-tenant solution that reduces the need for data copying can markedly decrease the total network costs. This optimizes infrastructure cost and enhances the speed and efficiency of data-driven initiatives within the organization.",[48,30859,30860],{},"By embracing a multi-tenant data streaming technology like Apache Pulsar, businesses can achieve a more cost-effective, streamlined, and scalable data streaming environment, thereby enhancing operational efficiency and reducing operational overheads.",[48,30862,30863],{},"Looking ahead, our future blog posts will delve deeper into these aspects, exploring how Pulsar's cloud-native capabilities can be leveraged to achieve optimal resource utilization and cost efficiency in data streaming applications.",[40,30865,30867],{"id":30866},"network-costs","Network Costs",[48,30869,30870],{},"Networking is often the most significant expense in integrating operational business applications with analytical data services through data streaming platforms. Unraveling networking costs is challenging, as cloud providers typically bundle these expenses with other organizational network usage without distinguishing between costs for specific technologies like Kafka and Pulsar.",[48,30872,30873,30874,30877],{},"To tackle this, we can construct a model to estimate these costs more accurately. In ",[55,30875,30876],{"href":18969},"AP (Availability and Performance) data streaming systems"," such as Kafka and Pulsar, cross-AZ (Availability Zone) traffic incurs substantial costs due to the necessity for cross-AZ replication to ensure high availability and reliability. For robustness, operating in a multi-zone cluster is recommended, avoiding single-zone deployments that risk downtime during zonal outages. Pulsar's two-layer architecture offers a strategic advantage, allowing storage nodes to span multiple zones for durability while consolidating broker nodes within a single zone to expedite failover processes.",[48,30879,3931],{},[48,30881,30882],{},[384,30883],{"alt":18,"src":30884},"\u002Fimgs\u002Fblogs\u002F66397619f4e55e3fa86a664a_WqfQ36V5ohjPSJcz1RsXDYp_BEuOhh7Nu7AMSdpLVPvkeYBdAqUyA4IiLsWe3WpGgjBQmzLQ3SQ7T4tt4-OgvGyEZest4jt3MxD9wB9iVNLhIIJPClccgFQZzBBPfRzOlaZWba1d9TNkfuJg6xAwnaQ.png",[32,30886,30888],{"id":30887},"networking-cost-models","Networking Cost Models",[48,30890,30891],{},"Hence, we examine the following three deployment scenarios:",[1666,30893,30894,30897,30900],{},[324,30895,30896],{},"Kafka (Multi-AZ): A Kafka cluster spans multiple availability zones.",[324,30898,30899],{},"Pulsar (Multi-AZ): A Pulsar cluster's broker and storage nodes are distributed across multiple zones.",[324,30901,30902],{},"Pulsar (Single-AZ Broker, Multi-AZ Storage): Storage nodes are multi-zone, but broker nodes are confined to a single zone for swift failover.",[48,30904,30905],{},"The primary driver for cross-AZ traffic for Kafka (Multi-AZ) and Pulsar (Multi-AZ) are similar.",[1666,30907,30908,30911,30914],{},[324,30909,30910],{},"Producers: Typically, topic owners in Pulsar or partition leaders in Kafka are distributed across three zones, leading to about 2\u002F3 of producer traffic crossing zones.",[324,30912,30913],{},"Consumers: Similarly, consumers often fetch data from topic owners or partition leaders in a different zone, generating cross-zone traffic roughly 2\u002F3 of the time.",[324,30915,30916],{},"Data Replication: Both systems replicate data across two additional zones for resilience.",[48,30918,30919],{},"We calculate the cross-AZ traffic for Multi-AZ deployments using the following formula:",[48,30921,30922],{},"Cross-AZ throughput (MB \u002F sec) = Producer cross-AZ throughput + Consumer cross-AZ throughput + Data replication cross-AZ throughput = (Ingress MBps * ⅔) + (Egress MBps * ⅔) + (Ingress MBps * 2)",[48,30924,30925],{},"Cross-AZ traffic from producers and consumers is eliminated for the unique Pulsar setup with Single-AZ Brokers and Multi-AZ Storage (see Figure 8), significantly reducing overall networking costs. So, we calculate only the data replication throughput across availability zones as the cross-AZ traffic.",[48,30927,30928],{},"Cross-AZ throughput (MB \u002F sec) = Data replication cross-AZ throughput = Ingress MBps * 2",[32,30930,30932],{"id":30931},"networking-cost-calculation-and-implications","Networking Cost Calculation and Implications",[48,30934,30935],{},"We’ve modeled how much cross-AZ traffic results from our workload below, multiplied by the standard cross-AZ charge of two cents per GB in AWS.",[48,30937,30938],{},[384,30939],{"alt":5878,"src":30940},"\u002Fimgs\u002Fblogs\u002F66f16fb2574712ac414405cc_66397b5ec586a5f71f852437_Screenshot-2024-05-06-at-5.52.35-PM.png",[48,30942,30943],{},"Reflecting on the data presented, it becomes evident that with the expansion of throughput and the increase in fanout, networking expenses swiftly become the predominant component of your infrastructure costs. Furthermore, while implementing Tiered Storage can significantly lower storage expenses, networking costs alone may still account for approximately 90% of total infrastructure expenditure. This underscores the importance of Apache Pulsar's dual-layer architecture, which is vital in minimizing networking fees while simultaneously upholding system reliability and availability.",[32,30945,30947],{"id":30946},"innovations-in-reducing-networking-costs","Innovations in Reducing Networking Costs",[48,30949,30950,30951,30956,30957,30962],{},"The introduction of technologies like ",[55,30952,30955],{"href":30953,"rel":30954},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FKIP-392%3A+Allow+consumers+to+fetch+from+closest+replica",[264],"follower fetching"," in Kafka and ",[55,30958,30961],{"href":30959,"rel":30960},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-63:-Readonly-Topic-Ownership-Support",[264],"ReadOnly Broker"," in Pulsar is pivotal in further reducing networking expenses. These allow consumers to read from a broker within the same zone, avoiding the costs associated with cross-zone data leader access. This approach ensures that cross-zone replication costs are incurred just once, irrespective of the number of consumers, offering a path to substantial savings on networking expenses. For example, in a cluster setup with 100 MBps ingress and 300 MBps egress, the cross-AZ traffic can be reduced from 466.7 MBps to 266.7 MBps, resulting in a 42.9% reduction in traffic.",[40,30964,30966],{"id":30965},"putting-these-together","Putting these together",[48,30968,30969],{},"We're consolidating compute, storage, and networking expenses to assess the overall infrastructure costs of Pulsar versus Kafka. In a standard deployment scenario with 100 MBps ingress, 300 MBps egress, and a 7-day data retention workload, Pulsar offers a 77% reduction in compute costs and nearly a 60% reduction in total infrastructure costs, including both storage and network expenses.",[48,30971,3931],{},[48,30973,30974],{},[384,30975],{"alt":18,"src":30976},"\u002Fimgs\u002Fblogs\u002F66397619ffeba2ecbbdc0441_meUh2vJomh09FQeUeJzoTWfmxEfnpJzmIEUoOfrDaFe-Ar9wTY32kMyGyG9D1fTTj8EVNRunALEH4tWpJY20NPVnGnR5MnzkRrH3Wg_yHYwXwWUveecTSVyVfJx7U0FSEtpCEVIq7oe_hUjWYKVM7Ak.png",[40,30978,30980],{"id":30979},"so-how-can-you-save-money","So, how can you save money?",[48,30982,30983],{},"Up to this point, we've dissected the cost structure of your data streaming infrastructure, hopefully providing you with a clearer understanding of how a data streaming platform can impact your finances. This guide outlines a methodical approach for Kafka users to tackle each cost component. A logical first step could be enabling follower fetching to mitigate cross-AZ charges. Lowering the replication factor for non-essential topics can further optimize networking expenses tied to partition replication. It's also beneficial to evaluate different instance types to find the most cost-effective fit for your requirements. Finally, adjusting your cluster's scale based on workload fluctuations ensures you're not overspending on underutilized infrastructure.",[48,30985,30986,30987,30990,30991,30994],{},"In our experience assisting numerous Kafka users in evaluating the costs of self-hosted open-source Kafka against open-source Pulsar and our fully managed Kafka API-compatible cloud service, ",[55,30988,3550],{"href":30989},"\u002Fgetting-started",", we've consistently found that transitioning to a fully managed Kafka API-compatible cloud service and utilizing the architectural benefits of Apache Pulsar is the most cost-effective strategy. Apache Pulsar's distinctive, cloud-native architecture not only facilitates cost reductions but also introduces at-scale efficiencies. StreamNative Cloud, developed atop Apache Pulsar, offers ",[55,30992,30993],{"href":29597},"full Kafka compatibility",", enabling you to migrate without needing to overhaul your Kafka applications and capitalize on the cost advantages of Pulsar’s cloud-native design.",[48,30996,30997,30998,31000],{},"Don't miss our next installment in this series, which will delve into another crucial cost factor for data streaming platforms: development and operations personnel. If you're interested in calculating your Kafka expenses or discovering potential savings with StreamNative, we encourage you to ",[55,30999,30483],{"href":6392}," today and conduct a TCO analysis.",[48,31002,31003,31004,31008,31009,20076],{},"If you want to learn more about our development around Kafka API-compatible data streaming platforms, don’t miss the upcoming ",[55,31005,29387],{"href":31006,"rel":31007},"https:\u002F\u002Fpulsar-summit.org\u002Fevent\u002Feurope-2024",[264]," on May 14th next week. Our team will present our next-generation data streaming engine, purposely designed to reduce costs in this cost-conscious era. ",[55,31010,31013],{"href":31011,"rel":31012},"https:\u002F\u002Fregistration.socio.events\u002Fe\u002Fpulsarvirtualsummiteurope2024",[264],"Register for the event today",{"title":18,"searchDepth":19,"depth":19,"links":31015},[31016,31017,31020,31025,31028,31033,31034],{"id":30539,"depth":19,"text":30540},{"id":30569,"depth":19,"text":30570,"children":31018},[31019],{"id":30604,"depth":279,"text":30605},{"id":30676,"depth":19,"text":30677,"children":31021},[31022,31023,31024],{"id":30710,"depth":279,"text":30711},{"id":30752,"depth":279,"text":30753},{"id":30771,"depth":279,"text":30772},{"id":30786,"depth":19,"text":30787,"children":31026},[31027],{"id":30830,"depth":279,"text":30831},{"id":30866,"depth":19,"text":30867,"children":31029},[31030,31031,31032],{"id":30887,"depth":279,"text":30888},{"id":30931,"depth":279,"text":30932},{"id":30946,"depth":279,"text":30947},{"id":30965,"depth":19,"text":30966},{"id":30979,"depth":19,"text":30980},"2024-05-06","Is Apache Pulsar better than Kafka? StreamNative can reduce the total infrastructure cost of running a Kafka workload by 70%. Check out our blog post on how to evaluate the infrastructure costs for Pulsar and Kafka and find guidance on choosing the most cost-effective data streaming platform for the current cost-conscious era.","\u002Fimgs\u002Fblogs\u002F663e6969cf2c91731b2d59d7_SN-BP-CostComparison-2.png",{},"12 min read",{"title":30509,"description":31036},"blog\u002Fa-guide-to-evaluating-the-infrastructure-costs-of-apache-pulsar-and-apache-kafka",[799,821,5954],"TI2u3MWlDJJvLnn08Ccv9SE8RO3tiYqbo27lsR4xPIU",{"id":31045,"title":31046,"authors":31047,"body":31048,"category":821,"createdAt":290,"date":31275,"description":31276,"extension":8,"featured":294,"image":31277,"isDraft":294,"link":290,"meta":31278,"navigation":7,"order":296,"path":21492,"readingTime":4475,"relatedResources":290,"seo":31279,"stem":31280,"tags":31281,"__hash__":31282},"blogs\u002Fblog\u002Fno-data-rebalance-needed-kafka-and-pulsar.md","No Data Rebalance Needed! That's Why We Reimagined Kafka with Apache Pulsar to Make it 1000x More Elastic ",[806],{"type":15,"value":31049,"toc":31263},[31050,31053,31060,31064,31067,31070,31073,31078,31080,31084,31087,31090,31093,31098,31101,31103,31108,31115,31119,31122,31125,31128,31141,31145,31148,31159,31162,31165,31168,31171,31174,31178,31189,31193,31201,31205,31208,31211,31215,31223,31227,31230,31241,31244,31254,31261],[48,31051,31052],{},"Elasticity is fundamental to cloud-native computing, enabling businesses to rapidly scale, enhance system resilience, and reduce costs. At StreamNative, we cater to a wide range of customers who demand robust and cost-efficient data infrastructure. They seek a genuinely cloud-native data streaming platform that seamlessly scales across global data centers without the complexity of implementation.",[48,31054,31055,31056,31059],{},"To satisfy the need for modern streaming data pipelines, we developed ",[55,31057,821],{"href":31058},"\u002Fpulsar\u002Fwhat-is-pulsar"," with a cloud-native architecture designed to significantly enhance the platform's horizontal scalability. We didn't just improve it; we revolutionized it, boosting Apache Pulsar's elasticity by 100x to even 1000x, compared to other data streaming technologies such as Apache Kafka. But what exactly did we do to achieve such a leap in performance?",[40,31061,31063],{"id":31062},"understanding-how-apache-kafka-scales","Understanding How Apache Kafka Scales",[48,31065,31066],{},"To comprehend why Apache Pulsar is 100x or 1000x more elastic than Kafka, it is crucial to understand how Apache Kafka scales.",[48,31068,31069],{},"Consider a cluster consisting of three brokers. This cluster retains messages for 30 days and receives an average of 100 MBps from its producers. With a replication factor of three, the cluster holds up to 777.6 TB of storage, approximately 259.2 TB per broker if evenly distributed.",[48,31071,31072],{},"When the cluster is expanded by adding another broker to increase capacity, the dynamics change. Now with four brokers, the new broker must be integrated to start handling reads and writes. Assuming optimal data balancing, this new broker would need to store about 194.4 TB (777.6 TB divided by 4), which must be transferred from the existing brokers. On a 10 gigabit network, this data rebalance would take about 43 hours using Apache Kafka, and potentially even longer in practical scenarios.",[48,31074,31075],{},[384,31076],{"alt":18,"src":31077},"\u002Fimgs\u002Fblogs\u002F662fcf07387fefe514a95a1e_hEhKe7mp_8j40Ef1B8ypmdvdKaMcmwWuXZviPvtEJlzxhQCWiLtOvWMIJl40EBwo_Ox5CphlXJOfTtB7fPbl38BZs908WkA9lViqvnQiiY_np0HilPmoNRkqEPW_qp1Zl5Sj3F9UouaAGCwRMPmnaSg.png",[48,31079,3931],{},[40,31081,31083],{"id":31082},"delivering-1000x-elasticity-with-apache-pulsar","Delivering 1000x elasticity with Apache Pulsar",[48,31085,31086],{},"With Apache Pulsar, scaling up is 1000x faster than with Apache Kafka—without the typical burdens of capacity planning, data rebalancing, or other operational challenges involved in scaling streaming data infrastructure. The key to this 1000x elasticity is Apache Pulsar’s rebalance-free architecture.",[48,31088,31089],{},"In Apache Kafka, the capabilities of data serving and data storage are bound together on the same machines. This setup requires data rebalancing and movement whenever the cluster topology changes, such as when adding new nodes, removing old nodes, or when nodes fail. This data movement and rebalancing can be minimized but is often inevitable.",[48,31091,31092],{},"In contrast, Apache Pulsar decouples data serving and storage into two distinct layers: the serving and storage layers. This allows for independent scalability based on capacity needs without the risk of overprovisioning. Crucially, Pulsar uses a segment-based storage architecture that does not require data rebalancing when scaling up storage.",[48,31094,31095],{},[384,31096],{"alt":18,"src":31097},"\u002Fimgs\u002Fblogs\u002F662fcf07ee730f510374113c_XB7rT2Js237TeF1IPTdA1YnDFq5R0B2kPorn4Zp2NVhC3AJpeWayS0NvQhRvXMnrUURXJrId6HMveUeTN2vETeOPzGipPg2c4niVQruftB4CCeVJo7AQFXyE1NmpQ1VWSL7CKHTzXULBNznjtcg6IRs.png",[48,31099,31100],{},"So how does this work in practice? Brokers in Apache Pulsar divide the partitions into segments, which are evenly distributed across storage nodes. The location metadata of these segments is stored in a metadata storage for rapid access. Adding a broker is straightforward and immediate, as there is no actual data rebalancing or movement—since the data resides in the storage layer. Pulsar simply needs to rebalance the ownership of a given topic, which typically takes just seconds and involves only metadata retrieval. Similarly, adding a storage node is both easy and instantaneous. It does not affect the topic ownership at the broker layer, nor is there a need to move old segments from existing storage nodes to the new one. New data segments are immediately allocated in the newly added nodes, allowing for instant scalability.",[48,31102,3931],{},[48,31104,31105],{},[384,31106],{"alt":18,"src":31107},"\u002Fimgs\u002Fblogs\u002F662fcf072d49454f9331c9e1_QN1h4qmRO558B1qcgFLM2n_Muxb1Os8FKBBrzfHWfTPm-93092HCYXACSlFPeJA-kcP_otk5AEyON9SrG4YONTWNO7o1eV2yqVZ0Ylxehlfo_96xmC5IsAZw_0fOw99w1vBO1dp4WPBNuQmnyP6cws8.png",[48,31109,31110,31111,190],{},"When scaling the Pulsar cluster, the operation begins as soon as the system is prompted for more resources. The entire process is conducted online, meaning there is no downtime, no data rebalancing, and no service degradation. Typically, the complete scaling operation can be completed within a minute. Compared to the 43 hours required with Apache Kafka, this represents an improvement of up to 2580x faster! In reality, scaling an Apache Kafka cluster can take a couple of days, and ",[55,31112,31114],{"href":30150,"rel":31113},[264],"it usually makes the clusters unavailable during the rebalance period",[40,31116,31118],{"id":31117},"mitigating-data-rebalancing-impact-with-tiered-storage","Mitigating Data Rebalancing Impact with Tiered Storage",[48,31120,31121],{},"Many Kafka vendors aim to enhance elasticity by integrating various forms of tiered storage, which utilize multiple layers of cloud storage and workload heuristics to accelerate data rebalancing and movement. However, tiering data involves significant trade-offs between cost and performance, as efficiently moving data among in-memory caches, local disks, and object storage is a complex task. Despite these efforts, tiered storage doesn’t eliminate the need for data rebalancing during scaling events.",[48,31123,31124],{},"The strategy relies on the principle that faster data movement within the system equates to quicker scaling. For instance, consider a scenario where Kafka brokers store most of their data in object storage, keeping only a small fraction on local disks. Assuming a dynamic ratio where one day’s worth of data resides on local disks and the remainder in object storage—about a 1-to-30 ratio:",[48,31126,31127],{},"In this setup, each broker holds 8.6 TB locally and 251.6 TB in object storage. When the cluster is scaled up by adding a broker, tiered storage significantly reduces the amount of data that needs to be moved—only the data on physical brokers (resulting in 6.5 TB total, or 2.2 TB per broker) along with minimal references to the data in object storage. On a 10 Gigabit network, the scaling operation might only take 1.4 hours, which is up to 30 times faster than Kafka without tiered storage. Yet, when compared to Apache Pulsar, which requires no data rebalancing, it is still 84 times slower.",[48,31129,31130,31131,31135,31136,31140],{},"Apache Pulsar also incorporates ",[55,31132,29728],{"href":31133,"rel":31134},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Ftiered-storage-overview\u002F",[264],", being ",[55,31137,31139],{"href":31138},"\u002Fblog\u002Fapache-pulsar-kafka-protocol-tiered-storage-and-beyond-heres-what-happened-at-pulsar-meetup-beijing-2023","the first data streaming system"," in the market to do so. This feature allows for longer data retention in more cost-effective storage solutions, although it is not designed to address the inherent limitations of data streaming architectures that require data rebalancing during scaling events.",[40,31142,31144],{"id":31143},"the-hidden-costs-of-data-rebalancing","The Hidden Costs of Data Rebalancing",[48,31146,31147],{},"Removing data rebalancing from an architecture doesn't only speed up scaling; it also eradicates the associated hidden costs. These costs include:",[321,31149,31150,31153,31156],{},[324,31151,31152],{},"Cross-AZ networking costs during data rebalancing",[324,31154,31155],{},"Labor costs for managing data rebalancing operations",[324,31157,31158],{},"Infrastructure and licensing costs for tools and automatic data rebalancers from Kafka vendors.",[48,31160,31161],{},"For instance, in a typical Kafka setup managing 194.4 TB of data on local disks that needs rebalancing data across three zones, approximately two-thirds of the traffic is cross-AZ during scaling events. This generates costs of around $2,654 per scaling operation, potentially higher with larger clusters.",[48,31163,31164],{},"Additionally, a Site Reliability Engineer (SRE) or operator, costing on average $140,000 annually in the US (~$50 per hour), may spend 43 hours on a rebalance operation, adding roughly $2,150 to each event. Although the SRE may not dedicate 100% of their time exclusively to data rebalancing, this process often leads to service degradation and unavailability. The duration of these interruptions can extend to a few days, depending on the volume of data flowing into the clusters, and significantly affect availability. Such disruptions can have a substantial impact on business revenue and divert the SRE's focus from other responsibilities. Consequently, in most cases, SREs find themselves primarily engaged in managing and executing these data rebalancing events.",[48,31166,31167],{},"Disregarding the cost implications of service degradation and unavailability caused by data rebalancing, and focusing solely on the combined networking and labor costs, a single scaling event could cost nearly $5,000. If scaling events occur monthly, both up and down, the total annual expense could amount to $120,000.",[48,31169,31170],{},"While optimizing the work of an SRE with open-source tools or purchasing vendor solutions for automatic rebalancing might reduce some expenses, it still represents a significant cost.",[48,31172,31173],{},"Conversely, by adopting a rebalance-free platform like Apache Pulsar, these costs are almost eliminated. This allows organizations to reallocate capital towards developing and enhancing business applications, speeding up time to market significantly.",[40,31175,31177],{"id":31176},"elasticity-is-beyond-faster-scaling","Elasticity is beyond faster scaling",[48,31179,31180,31181,31183,31184,31188],{},"Making Apache Pulsar 1000x more elastic than Apache Kafka doesn't only enhance the speed of scaling. With the fully managed Apache Puslar services like ",[55,31182,4496],{"href":10259},", it also ",[55,31185,31187],{"href":25530,"rel":31186},[264],"automates"," the operations and facilitates the reduction of cluster size when demand decreases, increases resilience against failures, and reduces the total cost of ownership.",[32,31190,31192],{"id":31191},"adapting-to-changing-workloads","Adapting to Changing Workloads",[48,31194,31195,31196,31200],{},"After peak periods like the holiday rush, you wouldn’t want an excessively provisioned cluster continuing to incur costs. Unlike teams using Apache Kafka, who face limited and time-consuming options for resizing—requiring complex processes of sizing, provisioning new clusters, setting up networks, and balancing traffic across brokers and partitions—Pulsar's architecture supports quick and efficient ",[55,31197,31199],{"href":25530,"rel":31198},[264],"scaling both up and down",". Often, the effort involved in Kafka systems does not justify the savings from operating a smaller cluster, leading to costly excess capacity just to avoid downtime risks.",[32,31202,31204],{"id":31203},"resiliency-through-elastic-rebalance-free-architecture","Resiliency Through Elastic Rebalance-Free Architecture",[48,31206,31207],{},"In the cloud, rapid response to failures is crucial. Apache Pulsar’s elastic nature ensures resilience by promptly addressing failures. For instance, managing a broker node failure is as simple as transferring topic ownership without moving any data. If a storage node fails or a service volume slows down, Pulsar's storage layer automatically detects and starts to decommission the faulty node, re-replicating data to functioning nodes—all without client disruption. This capability is pivotal, as Pulsar’s architecture requires no data rebalancing when cluster topology changes.",[48,31209,31210],{},"Upgrades also benefit from this elasticity. With no need for data rebalancing, Pulsar’s clusters are more agile, allowing for rapid system-wide updates, such as during critical vulnerabilities like the log4j issue, with no downtime. These frequent, disruption-free updates prevent costly outages and security breaches.",[32,31212,31214],{"id":31213},"broker-autoscaling","Broker Autoscaling",[48,31216,31217,31218,31222],{},"Leveraging its rebalance-free architecture, Apache Pulsar efficiently adapts to fluctuating workloads by easily scaling its brokers. Although Site Reliability Engineering (SRE) teams often integrate systems like Kubernetes' Horizontal Pod Scaler to maximize this architecture, StreamNative Cloud simplifies the process with its ",[55,31219,31221],{"href":25530,"rel":31220},[264],"built-in auto-scaling"," features. These features dynamically adjust resources in response to changes in workload, capable of managing abrupt increases in traffic or downscaling during quieter periods to optimize resource utilization and performance. Additionally, auto-scaling enhances load balancing across brokers, distributing message processing loads evenly and preventing any single broker from becoming a bottleneck, thereby enhancing system reliability and maintaining high throughput and low latency even under heavy load conditions.",[40,31224,31226],{"id":31225},"reducing-total-cost-of-ownership","Reducing Total Cost of Ownership",[48,31228,31229],{},"Combining these features, Pulsar’s rebalance-free data streaming architecture not only speeds up elasticity but also substantially lowers the total cost of ownership. This enables users to:",[321,31231,31232,31235,31238],{},[324,31233,31234],{},"Avoid Over-Provisioning: Unlike Kafka, which requires clusters to be provisioned for peak usage well in advance, Pulsar can scale up just hours before needed, avoiding weeks of unutilized capacity. Similarly, it can quickly scale down, saving costs on unnecessary capacity.",[324,31236,31237],{},"Deliver Faster, More Cost-Effective Service: Leveraging a rebalance-free architecture and object storage for tiered storage, Pulsar provides services at lower latency and competitive prices.",[324,31239,31240],{},"Eliminate Unnecessary Operational Efforts: Scaling with StreamNative can be automated using Kubernetes Operators or simply done with a click in the Cloud Console, freeing up engineering resources from routine tasks to focus on innovation and unique business solutions.",[48,31242,31243],{},"These aspects highlight how Apache Pulsar not only enhances operational efficiency but also positions businesses for better agility and economic performance.",[48,31245,11159,31246,31248,31249,31253],{},[55,31247,3550],{"href":30989},", you can access a cloud-native data streaming platform that is fully ",[55,31250,31252],{"href":28906,"rel":31251},[264],"compatible with Kafka"," but offers elastic scaling, resiliency in the face of failure, and a lower total cost of ownership, all powered by a rebalance-free architecture.",[48,31255,31256,31257,190],{},"To experience how StreamNative Cloud scales faster than Apache Kafka by leveraging a rebalance-free architecture, ",[55,31258,31260],{"href":17075,"rel":31259},[264],"sign up for a free trial",[48,31262,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":31264},[31265,31266,31267,31268,31269,31274],{"id":31062,"depth":19,"text":31063},{"id":31082,"depth":19,"text":31083},{"id":31117,"depth":19,"text":31118},{"id":31143,"depth":19,"text":31144},{"id":31176,"depth":19,"text":31177,"children":31270},[31271,31272,31273],{"id":31191,"depth":279,"text":31192},{"id":31203,"depth":279,"text":31204},{"id":31213,"depth":279,"text":31214},{"id":31225,"depth":19,"text":31226},"2024-04-29","To satisfy the need for modern streaming data pipelines, we developed Apache Pulsar with a cloud-native architecture designed to significantly enhance the platform's horizontal scalability. We didn't just improve it; we revolutionized it, boosting Apache Pulsar's elasticity by 100x to even 1000x, compared to other data streaming technologies such as Apache Kafka. This blog post outlines what exactly we did to achieve such a leap in performance.","\u002Fimgs\u002Fblogs\u002F662fd25d01990449cd5ffd58_broker-scaling.png",{},{"title":31046,"description":31276},"blog\u002Fno-data-rebalance-needed-kafka-and-pulsar",[799,821],"oCB34eQKvhu62kBK6bpH3NEQHo8VHQ2g4BrlkLiFvPU",{"id":31284,"title":30524,"authors":31285,"body":31286,"category":3550,"createdAt":290,"date":31494,"description":31495,"extension":8,"featured":294,"image":31496,"isDraft":294,"link":290,"meta":31497,"navigation":7,"order":296,"path":18969,"readingTime":11508,"relatedResources":290,"seo":31498,"stem":31499,"tags":31500,"__hash__":31501},"blogs\u002Fblog\u002Fcap-theorem-for-data-streaming.md",[806],{"type":15,"value":31287,"toc":31487},[31288,31302,31305,31308,31311,31315,31318,31321,31332,31335,31346,31349,31353,31356,31367,31370,31373,31377,31380,31382,31387,31390,31401,31404,31408,31411,31416,31420,31444,31451,31453,31458,31462,31465,31469,31472,31475,31479,31482,31485],[48,31289,31290,31291,31295,31296,31301],{},"Recently, I had the opportunity to immerse myself in the Kafka Summit London, a major conference on data streaming organized by Confluent. My interactions with attendees, partners, and vendors sparked deep dives into the latest trends and technologies shaping data streaming. As insights from the conference flow through various channels, including insights from colleagues like ",[55,31292,31294],{"href":31293},"\u002Fblog\u002Fdata-streaming-trends-from-kafka-summit-london-2024","Julien Jakubowski"," and industry experts like ",[55,31297,31300],{"href":31298,"rel":31299},"https:\u002F\u002Frisingwave.com\u002Fblog\u002Fchanges-you-should-know-in-the-data-streaming-space-takeaways-from-kafka-summit-2024\u002F",[264],"Yingjun Wu"," of RisingWave, a recurring theme emerges: the critical consideration of cost in data streaming technologies. Amidst the buzz around Kafka, Flink, Iceberg, and more, the discourse is heavily skewed toward cost, with each vendor vying to be seen as the more cost-efficient solution.",[48,31303,31304],{},"In the current economic landscape, the focus on cost is understandable. However, this emphasis often comes without the necessary context of the trade-offs involved, leading us into a \"vendor's trap\" where cost discussions lack depth and utility. Noticing this gap, I was compelled to share my perspective on the intricate balance between cost, availability, and performance in the realm of data streaming platforms.",[48,31306,31307],{},"Almost twenty-five years ago, in 2000, Eric Brewer introduced the idea that there is a fundamental trade-off between consistency, availability, and partition tolerance. This trade-off, which has become known as the CAP Theorem, has been widely used to evaluate distributed system architectures ever since.",[48,31309,31310],{},"In light of the emerging proliferation of streaming platforms, a similar model for evaluating the trade-offs between cost, availability, and performance within data streaming platforms is required. Recognizing this, we propose what we term the New CAP Theorem for cloud-based data streaming systems.",[8300,31312,31314],{"id":31313},"introducing-the-new-cap-theorem","Introducing the New CAP theorem",[48,31316,31317],{},"The letters in CAP refer to three desirable properties of cloud-based data streaming platforms: Cost efficiency, Availability of the system for reads and writes, and Performance tolerance.",[48,31319,31320],{},"The new CAP theorem states that it is not possible to guarantee all three of the desirable properties in a cloud-based data streaming platform at the same time.",[321,31322,31323,31326,31329],{},[324,31324,31325],{},"Cost refers to the total infrastructure cost of the platform to provide data streaming functionality. This can be quantified in terms of cost per unit of work, e.g., MB of throughput, etc.",[324,31327,31328],{},"Availability simply means that the data streaming system can continue to serve requests even if a single availability zone experiences an outage.",[324,31330,31331],{},"Performance Tolerance refers to the minimal acceptable latency of the platform when servicing requests. This can be quantified in terms of p99 latencies for message publication and consumption.",[48,31333,31334],{},"This theorem is based on three foundational aspects of cloud-based data streaming systems:",[1666,31336,31337,31340,31343],{},[324,31338,31339],{},"To ensure high availability, data streaming systems employ a data replication algorithm. This algorithm replicates data across multiple availability zones to withstand single-zone failures, making inter-zone traffic indispensable for maintaining high availability.",[324,31341,31342],{},"Inter-zone traffic significantly contributes to the overall cost structure of a data streaming system. Although eliminating inter-zone traffic can reduce costs, it may adversely affect availability or performance.",[324,31344,31345],{},"The latency of low-cost cloud object storage solutions is not on par with persistent block storage or local SSDs. To meet specific performance requirements, the use of persistent block storage or local SSDs is essential.",[48,31347,31348],{},"In order to help you better understand the New CAP theorem, we will dive deeper into the cost structure of a data streaming system.",[40,31350,31352],{"id":31351},"infrastructure-costs-in-data-streaming-platforms","Infrastructure costs in data streaming platforms",[48,31354,31355],{},"The infrastructure costs of a data streaming infrastructure are typically comprised of 3 cost categories:",[1666,31357,31358,31361,31364],{},[324,31359,31360],{},"Compute: This refers to the servers and computing resources required to run the data streaming platform.",[324,31362,31363],{},"Storage: Essential for data retention, storage can range from local disks and persistent volumes to cost-effective cloud-based object stores.",[324,31365,31366],{},"Network: Critical for data movement, networking costs are incurred for transferring data to and from the platform and replicating data across various availability zones to ensure the availability of the platform.",[48,31368,31369],{},"Technologies designed for data streaming leverage cloud resources in different ways, leading to varied cost structures. Comparing these costs directly is challenging because technologies are not uniform in design or efficiency. Benchmark tests, commonly used by vendors to demonstrate the cost-effectiveness of their technology, often do not account for the full spectrum of business needs and scenarios. These benchmarks, typically conducted under conditions favorable to the vendor's product, can give rise to \"benchmarketing,\" which may not always present a realistic picture of costs.",[48,31371,31372],{},"A notable oversight in cost estimation is the heavy reliance on benchmarks to gauge the computational throughput of a system, leading to calculations based on how many units of compute are needed. This approach is fundamentally flawed as it fails to account for critical factors like storage and networking costs. The latter, in particular, can reveal hidden costs that become painfully apparent when organizations are confronted with unexpectedly high bills for network usage from their cloud service providers.",[40,31374,31376],{"id":31375},"the-overlooked-cost-of-networking-in-data-streaming","The Overlooked Cost of Networking in Data Streaming",[48,31378,31379],{},"Often underestimated in infrastructure software considerations, networking emerges as a significant (up to 90%!) expense in the realm of data streaming technologies. The voluminous data flow within and through these platforms incurs substantial costs. While these costs are often overlooked during the planning stages, they do represent a significant cost when operating at scale.",[48,31381,3931],{},[48,31383,31384],{},[384,31385],{"alt":18,"src":31386},"\u002Fimgs\u002Fblogs\u002F661ca480e22075bb757cf173_yCwn24p0ZyoFVCgmGW73bj7CJZCk2eahDKEGJMs4gBoDZzhq6faudv3v9wrfw6DleImz8Er_Lx4RQQK5An5AuV8gObM9A_QqpPVEiZIGdtY-tK1-KOeESS7_Ntua3ySDS4jGhJvwfCM9C4Ai030TuwM.png",[48,31388,31389],{},"As illustrated in the diagram above, networking costs are primarily attributed to:",[1666,31391,31392,31395,31398],{},[324,31393,31394],{},"Data Transmission: The ingress and egress of data to and from the data streaming system can lead to substantial fees.",[324,31396,31397],{},"Internal Replication Traffic: Data streaming platforms replicate data across multiple instances to ensure high availability. This replication, especially across different availability zones, can significantly increase costs based on the volume of data involved.",[324,31399,31400],{},"Inter-Cluster Replication: For purposes like geographic redundancy or disaster recovery, data streaming solutions often replicate data across regions, incurring significant expenses based on the distance and volume of data transferred.",[48,31402,31403],{},"Cloud service providers levy considerable fees for data egress and traffic between availability zones and regions. Without a deep understanding of these infrastructural nuances, organizations may face unexpectedly high charges from their cloud services, underscoring the criticality of incorporating networking cost considerations in the early planning and design phases of data streaming platforms.",[8300,31405,31407],{"id":31406},"categorization","Categorization",[48,31409,31410],{},"After understanding the cost structure of a data streaming system plus the foundational aspects of inter-zone traffic, we can categorize data streaming systems into three broad types:",[48,31412,31413],{},[384,31414],{"alt":18,"src":31415},"\u002Fimgs\u002Fblogs\u002F661d47732cca5bae7849b831_rYMQ_HN5iWszjS_qIcWWKB08Rgp8f3_jgwzDlUmlF9OnigJtYDIpMpdhqyn-0HhyiiTrsLVolQPH476uVXu6Nj296DR_gwVUhrM4EAsqZjJLw2QzNp3tpndp5pqUFbQb4cCkzM3zm66eK1Wr4K8K6uc.png",[40,31417,31419],{"id":31418},"ap-availability-and-performance-data-streaming-systems","AP (Availability and Performance) Data Streaming Systems",[48,31421,31422,31423,1186,31426,5422,31430,31435,31436,30956,31440,31443],{},"Popular systems like ",[55,31424,821],{"href":23526,"rel":31425},[264],[55,31427,799],{"href":31428,"rel":31429},"https:\u002F\u002Fkafka.apache.org\u002F",[264],[55,31431,31434],{"href":31432,"rel":31433},"https:\u002F\u002Fredpanda.com\u002F",[264],"Redpanda"," fall into this category, adopting a multi-AZ deployment strategy to ensure high availability despite single-zone failures. These systems utilize sophisticated replication mechanisms across availability zones, inherently increasing inter-zone traffic and, by extension, infrastructure costs. However, they deliver superior availability and reduced latency, making them ideal for mission-critical applications. Noteworthy efforts to mitigate cross-zone traffic include technologies like ",[55,31437,31439],{"href":30953,"rel":31438},[264],"Follower Fetching",[55,31441,30961],{"href":30959,"rel":31442},[264]," in Pulsar.",[48,31445,31446,31447,31450],{},"Despite the inevitability of cross-AZ traffic, Apache Pulsar stands out due to its unique two-layer architecture, which separates storage nodes from broker\u002Fserving nodes. This design allows for the strategic deployment of broker\u002Fserving nodes within a single AZ, while storage nodes can be distributed across multiple AZs. Since all broker nodes are stateless, they can swiftly failover to a different AZ in case of an outage in the current zone. ",[55,31448,31449],{"href":27695},"This compute-and-storage-separation architecture"," significantly reduces cross-zone traffic and ensures higher availability, demonstrating Pulsar's innovative approach to balancing the challenges of distributed data streaming infrastructure. In future blog posts, we will explain in more detail how compute-and-storage-separation architecture can help reduce cost while achieving high availability and low latency.",[48,31452,3931],{},[48,31454,31455],{},[384,31456],{"alt":18,"src":31457},"\u002Fimgs\u002Fblogs\u002F661ca480376d623b1b9d3739_XMSS6eveC3WFNwCxYqi64di58J-wqcjAFmPD5WTha3yic5IOiQqpcgwmnD8z7LJxw0AypAZSedb_wRsUEQLCg6kdw-Zth14O-CIGz2HBn4P02I-2LQzA6sd5zxvMCbwJVBzrd7KFQG_UJhmuvvftzPE.png",[40,31459,31461],{"id":31460},"cp-cost-and-performance-data-streaming-systems","CP (Cost and Performance) Data Streaming Systems",[48,31463,31464],{},"In a CP system, the focus shifts towards balancing high availability with cost efficiency, a goal often achieved by restricting operations to a single availability zone. This approach eliminates inter-zone traffic, reducing costs and latency at the risk of decreased availability in the event of a zone outage. While no technology is inherently designed as CP, adaptations of AP systems for single-zone deployment are common, with vendors including Confluent and StreamNative offering specialized zonal cluster configurations.",[40,31466,31468],{"id":31467},"ca-cost-and-availability-data-streaming-systems","CA (Cost and Availability) Data Streaming Systems",[48,31470,31471],{},"Addressing use cases like non-critical, high-volume data streaming (e.g., log data ingestion), CA systems prioritize low total cost and availability over performance. New innovations, which use cloud object storage for replication, exemplify efforts to minimize costs by avoiding inter-zone traffic. These systems, however, necessitate a tolerance for higher latency (> 1s).",[48,31473,31474],{},"Through the New CAP Theorem lens, we gain a structured approach to navigate the trade-offs between cost, availability, and performance tolerance, enabling informed decision-making in selecting data streaming technologies.",[8300,31476,31478],{"id":31477},"from-cost-saving-to-balanced-cost-optimization","From Cost Saving to Balanced Cost Optimization",[48,31480,31481],{},"The New CAP Theorem doesn't serve to rank systems but to highlight the importance of making informed choices based on a balance of cost, availability, and performance. At StreamNative, we champion the principle of cost-awareness over mere cost-efficiency. Recognizing that no single technology suits every scenario, evaluating each platform's trade-offs in the context of specific use cases and business requirements is vital.",[48,31483,31484],{},"In conclusion, the data streaming technology selection journey is complex and nuanced. By adopting a cost-aware approach and understanding the inherent trade-offs, engineers and decision-makers can navigate this landscape more effectively, selecting the right technologies to meet their unique challenges and opportunities.",[48,31486,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":31488},[31489,31490,31491,31492,31493],{"id":31351,"depth":19,"text":31352},{"id":31375,"depth":19,"text":31376},{"id":31418,"depth":19,"text":31419},{"id":31460,"depth":19,"text":31461},{"id":31467,"depth":19,"text":31468},"2024-04-11","The new CAP theorem states that it is not possible to guarantee all three of the desirable properties in a cloud-based data streaming platform at the same time. At StreamNative, we champion the principle of cost-awareness over mere cost-efficiency. Recognizing that no single technology suits every scenario, evaluating each platform's trade-offs in the context of specific use cases and business requirements is vital.","\u002Fimgs\u002Fblogs\u002F661d6f64bdd8337b6ef6481b_BlogThe-New-CAP-Theorem-for-Data-Streaming-1.png",{},{"title":30524,"description":31495},"blog\u002Fcap-theorem-for-data-streaming",[799,821,1331],"s5dFDAeGaDBD3Bpftxm3WHJ9LP6U9RLxLL_m7qAGnGQ",{"id":31503,"title":31504,"authors":31505,"body":31506,"category":7338,"createdAt":290,"date":31706,"description":31707,"extension":8,"featured":294,"image":31708,"isDraft":294,"link":290,"meta":31709,"navigation":7,"order":296,"path":31293,"readingTime":11508,"relatedResources":290,"seo":31710,"stem":31711,"tags":31712,"__hash__":31713},"blogs\u002Fblog\u002Fdata-streaming-trends-from-kafka-summit-london-2024.md","Data Streaming Trends from Kafka Summit London 2024",[31294],{"type":15,"value":31507,"toc":31695},[31508,31511,31514,31517,31521,31524,31527,31535,31538,31545,31547,31551,31554,31557,31560,31564,31567,31570,31574,31577,31580,31584,31587,31590,31593,31597,31600,31603,31611,31614,31618,31621,31624,31630,31633,31636,31639,31642,31655,31657,31660,31664],[48,31509,31510],{},"I recently attended Kafka Summit London, a major data streaming conference by Confluent. This exciting event brought together a large community of messaging and data-streaming enthusiasts. Our team had valuable discussions with attendees, vendors, and colleagues about the growing importance of these technologies in today's industries.",[48,31512,31513],{},"I am excited to share the key trends we observed at the conference.",[48,31515,31516],{},"These insights confirmed that the design choices on Apache Pulsar and ONE StreamNative Platform have been innovative. They not only address common challenges faced by the data streaming community but have also been ahead of the curve for several years now.",[40,31518,31520],{"id":31519},"kafka-as-a-protocol","Kafka as a Protocol",[48,31522,31523],{},"The data streaming platform landscape is experiencing an increasing diversity of vendors, with established players facing increasing competition. Aiven and Redpanda are prominent examples, having built a presence over several years. Newer entrants like WarpStream are bringing innovative approaches, which we will discuss further in this blog post.",[48,31525,31526],{},"This diversity offers significant benefits to the data streaming community. We now have a more comprehensive range of choices with distinct value propositions. These solutions address common Kafka challenges with unique approaches.",[48,31528,31529,31530,31534],{},"Notably, some vendors provide implementations that remain compatible with Kafka clients to varying degrees while being independent of the original Apache Kafka codebase. Quoting Chris Riccomini's recent ",[55,31531,30380],{"href":31532,"rel":31533},"https:\u002F\u002Fmaterializedview.io\u002Fp\u002Fce-nest-pas-un-kafka",[264],": isn’t it time to accept the Kafka protocol is what really matters?",[48,31536,31537],{},"This trend of considering Kafka's protocol independently of its implementation might actually be a more general trend in the world of data infrastructure. Indeed, this is what happened with S3, and it's also happening with Postgres.",[48,31539,31540,31541,31544],{},"With our ",[55,31542,31543],{"href":10259},"ONE StreamNative Platform",", we are proud to be part of this evolving ecosystem. We are fully Kafka-compatible while providing Pulsar's unique advantages, such as multi-tenancy, unparalleled elasticity, and tiered storage. StreamNative allows the Kafka community to leverage these Pulsar features, which we believe is a significant benefit.",[48,31546,3931],{},[40,31548,31550],{"id":31549},"the-emergence-of-serverless-object-storage-for-reducing-costs","The emergence of ‘serverless’ object storage for reducing costs",[48,31552,31553],{},"A pervasive topic at the Kafka Summit London is moving streaming data out of the cluster to so-called 'serverless' storage systems (a marketing buzzword that actually means here: servers managed by someone else). This approach, primarily facilitated by cloud storage solutions like Amazon S3, offers a compelling blend of cost-efficiency and scalability that's hard to overlook.",[48,31555,31556],{},"It's fascinating to observe that what is considered a novel trend isn't really new. In fact, it's been over a decade since Apache Pulsar introduced an architecture that separates storage from computing. Moreover, Pulsar has natively incorporated Tiered Storage for over five years.",[48,31558,31559],{},"It’s encouraging to see that the market finally recognizes the Apache Pulsar approach as superior to traditional Kafka's architecture.",[32,31561,31563],{"id":31562},"cost-and-scalability","Cost and Scalability",[48,31565,31566],{},"The driving force behind this trend is cloud storage services' cost-effectiveness and scalability. With their virtually infinite storage capabilities provided at a minimal cost, cloud storage platforms like S3 are becoming the go-to solution for businesses looking to manage their data more efficiently.",[48,31568,31569],{},"A prime example of this trend in action is WarpStream's use of a stateless message broker, coupled with the placement of all data within S3. This architectural decision underscores the benefits of scalability and reduced storage costs. However, it's important to note that such a model may not be universally applicable. It's particularly suited to use cases where low latency isn't a critical requirement and dependency on a specific cloud storage provider is an acceptable trade-off.",[32,31571,31573],{"id":31572},"tiered-storage","Tiered Storage",[48,31575,31576],{},"The concept of Tiered Storage is also gaining traction, especially as a means to manage 'cold' data efficiently. Businesses can significantly reduce operational costs by relocating less frequently accessed data to cost-effective storage solutions like S3.",[48,31578,31579],{},"This approach has been implemented in platforms like the ONE StreamNative Platform and Apache Pulsar for years, although it's still in its nascent stages within the Kafka ecosystem. Tiered Storage for Kafka is implemented in a proprietary, commercial solution, and there is still no other production-ready implementation yet. It's not easy to navigate among those multiple implementations when you're a Kafka user.",[32,31581,31583],{"id":31582},"using-s3-as-a-workaround","Using S3 as a workaround",[48,31585,31586],{},"Scaling Apache Kafka clusters becomes increasingly challenging as data volumes grow. The traditional partition-based storage model hinders elasticity due to partition reassignment operations. Indeed, with more data to manage, partition reassignment operations become increasingly slow, resource-intensive, and detrimental to performance and reliability. Our booth discussions with attendees and several conference talks highlighted these concerns.",[48,31588,31589],{},"One strategy gaining traction involves offloading data to Amazon S3. This approach involves reducing as much as possible the amount of data stored locally in the cluster nodes or even not storing any data in the cluster at all. The goal is to circumvent the inherent limitations of Kafka's storage model based on partitions. Indeed, the less data there is in partitions, the less painful the partition reassignment operations are.",[48,31591,31592],{},"However, this approach introduces trade-offs, including latency increases and a strong dependency on S3. Migrating data out of the cluster should be a strategic decision, not solely a workaround for the limitations of the core Kafka storage model based on partitions.",[32,31594,31596],{"id":31595},"no-dilemma-with-pulsar","No dilemma with Pulsar",[48,31598,31599],{},"In contrast, Apache Pulsar has offered a compelling alternative for over ten years.",[48,31601,31602],{},"Indeed, Pulsar was designed from the start with a separation of compute and storage. This allows for exceptional elasticity, which is incomparable to what you can achieve with traditional Kafka. The decisive advantage of Pulsar is that it doesn't force you to choose between latency and elasticity. Indeed, with Pulsar, there's no dilemma:",[321,31604,31605,31608],{},[324,31606,31607],{},"You can benefit from Tiered Storage to reduce the storage costs of cold data. Pulsar’s Tiered Storage has been battle-tested for years and is available as open-source.",[324,31609,31610],{},"Thanks to an alternative storage model, you can also benefit from elasticity without the need for cloud storage and without sacrificing latency.",[48,31612,31613],{},"For more information, feel free to read the resources shared at the end of this blog post.",[40,31615,31617],{"id":31616},"stream-batch-processing-convergence","Stream & batch processing convergence",[48,31619,31620],{},"Historically, a separation existed between analytics and streaming data. These domains functioned within distinct infrastructures and ecosystems. Analytics relies on querying tables in batches while streaming data flows continuously.",[48,31622,31623],{},"However, there are signs of convergence. The boundaries are blurring, as evidenced by Confluent's recent announcement regarding Tableflow as the ability to expose a topic’s data as Iceberg tables.",[48,31625,31626,31627,190],{},"This recent announcement is particularly interesting. StreamNative has addressed this need for a long time, allowing users to seamlessly integrate streaming data with their data lake platform and leverage data warehouses and data lakehouses' native query capabilities. While the announcement itself was positive, it also validated the approach we implemented more than a year ago with the introduction of ",[55,31628,31629],{"href":29601},"Pulsar’s Lakehouse Tiered Storage",[48,31631,31632],{},"This industry trend of converging towards solutions bridging the gap between streaming and analytics aligns perfectly with StreamNative's position as a thought leader. It reinforces our belief that we're delivering the functionality users demand: the ability to capture streaming data, analyze it, and make it readily available for data-driven decision-making. Queryable open formats like Iceberg play a crucial role in this.",[48,31634,31635],{},"Given Confluent’s recent acquisition of Immerock, there was an increased focus on Flink at this year's conference. Notably, Flink facilitates processing in both streaming and batching modes using a unified programming model, further contributing to the dissolving boundaries.",[48,31637,31638],{},"The prominence of Flink at the Kafka Summit, with approximately thirty dedicated presentations, underscores its growing importance. Additionally, Confluent announced the general availability of their managed Flink offering during their keynote.",[48,31640,31641],{},"However, several alternatives, such as RisingWave and Timeplus, exhibit significant potential to capture substantial market share.",[48,31643,31644,31645,31650,31651,31654],{},"Another popular option is Databricks' Lakehouse platform. This mature platform provides seamless data streaming and analytics integration by combining ",[55,31646,31649],{"href":31647,"rel":31648},"https:\u002F\u002Fdocs.databricks.com\u002Fen\u002Fstructured-streaming\u002Findex.html",[264],"Apache Spark Structured Streaming"," for stream processing and ",[55,31652,1157],{"href":29839,"rel":31653},[264]," for storage. This platform ensures that streaming data is immediately ready for analytics.",[40,31656,2125],{"id":2122},[48,31658,31659],{},"The Kafka Summit London provided valuable insights, particularly how Apache Pulsar, the technology powering ONE StreamNative, addresses challenges highlighted in presentations, attendee discussions, and emerging trends. This underscores the advanced capabilities and continued relevance of ONE StreamNative in the streaming data landscape.",[3933,31661,31663],{"id":31662},"want-to-learn-more-about-apache-pulsar-and-streamnative","Want to Learn More About Apache Pulsar and StreamNative?",[1666,31665,31666,31673,31680,31687],{},[324,31667,31668,31669,31672],{},"Use our ONE StreamNative Platform to spin up a Kafka-compatible Pulsar cluster in minutes.",[55,31670,31671],{"href":27773}," Get started today"," with 200$ credit.",[324,31674,31675,31676],{},"Deep dive into the partition vs segment models: ",[55,31677,31679],{"href":31678},"\u002Fblog\u002Fdata-streaming-patterns-series-what-you-didnt-know-about-partitioning-in-stream-processing","Data Streaming Patterns: What You Didn't Know About Partitioning",[324,31681,31682,31683],{},"Explore the hurdles encountered in managing data retention in traditional Kafka, and the comparative benefits Pulsar & our Kafka-compatible platform provide: ",[55,31684,31686],{"href":31685},"\u002Fblog\u002Fchallenges-in-kafka-the-data-retention-stories-of-kevin-and-patricia","Challenges in Kafka: the Data Retention Stories of Kevin and Patricia",[324,31688,31689,31690,190],{},"Engage with the Pulsar community by joining the",[55,31691,31694],{"href":31692,"rel":31693},"https:\u002F\u002Fcommunityinviter.com\u002Fapps\u002Fapache-pulsar\u002Fapache-pulsar",[264]," Pulsar Slack channel",{"title":18,"searchDepth":19,"depth":19,"links":31696},[31697,31698,31704,31705],{"id":31519,"depth":19,"text":31520},{"id":31549,"depth":19,"text":31550,"children":31699},[31700,31701,31702,31703],{"id":31562,"depth":279,"text":31563},{"id":31572,"depth":279,"text":31573},{"id":31582,"depth":279,"text":31583},{"id":31595,"depth":279,"text":31596},{"id":31616,"depth":19,"text":31617},{"id":2122,"depth":19,"text":2125},"2024-03-31","Explore insights from Kafka Summit London on Apache Pulsar's innovations and trends in data streaming, including serverless storage and the convergence of stream and batch processing.","\u002Fimgs\u002Fblogs\u002F660e6808fe5de17f7f180507_ksl50.jpg",{},{"title":31504,"description":31707},"blog\u002Fdata-streaming-trends-from-kafka-summit-london-2024",[799,1331],"4_JVh3zJskCQZ9N9vYres87-dEW7YI4thQfws1vcnvc",{"id":31715,"title":31716,"authors":31717,"body":31719,"category":290,"createdAt":290,"date":31919,"description":31920,"extension":8,"featured":294,"image":31921,"isDraft":294,"link":290,"meta":31922,"navigation":7,"order":296,"path":31923,"readingTime":17161,"relatedResources":290,"seo":31924,"stem":31925,"tags":290,"__hash__":31926},"blogs\u002Fblog\u002Fintroduction-to-stream-processing.md","Introduction to Stream Processing",[31718],"Caito Scherr",{"type":15,"value":31720,"toc":31917},[31721,31724,31727,31730,31732,31735,31738,31758,31761,31764,31768,31771,31774,31777,31779,31784,31786,31789,31792,31804,31806,31811,31814,31817,31826,31829,31832,31835,31838,31841,31844,31847,31850,31853,31856,31859,31862,31865,31868,31871,31873,31876,31883,31885,31915],[48,31722,31723],{},"One of the ways in which the tech industry has become so competitive is the expectation that software should be able to intake mass amounts of data, process it rapidly, and with incredible precision. In the past, this standard was typically saved for areas like medical technology or FinTech, but it is now becoming the norm across the industry. Stream processing is the most reliable method for successfully meeting this expectation, although each streaming technology has its own strengths and weaknesses.",[48,31725,31726],{},"The benefit of stream processing is how well it can handle incredibly complex situations. Unfortunately, that also means it can be a challenge to learn. Even for those already working with streaming, that world is constantly changing, and it can be hard to keep up without up-to-date refreshers.",[48,31728,31729],{},"This post is part of a series that offers a full introduction to modern stream processing, including the newest innovations and debunking misconceptions. In the following posts, we’ll cover the challenges of stream processing and how to avoid them, as well as the streaming ecosystem like real time analytics and storage.",[3933,31731],{"id":18},[48,31733,31734],{},"Use Case: Fraud Detection",[48,31736,31737],{},"In this post, we’ll be using fraud detection as a use case. On the simplest level, fraud detection covers anything where it is important to verify that a user is who they claim to be, or that the user behavior is consistent with the user’s intentions. In terms of what makes for successful fraud detection, the focus is usually on security - how hard is it to access the data, are there opportunities for data to be exposed accidentally, etc.",[48,31739,31740,31741,15755,31746,31751,31752,31757],{},"Cyber attacks are becoming increasingly sophisticated, so we can’t rely only on security to prevent 100% of attacks. In the talk on ",[55,31742,31745],{"href":31743,"rel":31744},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=wdyKpDQxUHg",[264],"Cyber Security Breaches",[55,31747,31750],{"href":31748,"rel":31749},"https:\u002F\u002Fwww.meetup.com\u002Ffutureofdata-princeton\u002F",[264],"Future of Data’s 2020 Cyber Summer Data Slam",", speaker Carolyn Duby (Field CTO and Data at ",[55,31753,31756],{"href":31754,"rel":31755},"https:\u002F\u002Fwww.cloudera.com\u002F",[264],"Cloudera",") mentions that state sponsored hackers could compromise an entire network in less than 18 minutes. She also emphasized this need for access to real time AND correlative data in order to have any chance at responding to an anomaly accurately in a time that’s meaningful to the customer.",[48,31759,31760],{},"This means that you need a system that can ingest large amounts of live data. However, you also need a way to analyze and aggregate that data, and correlate it with other data. This data or metadata is likely being ingested from external systems, since current user data and historic user behavior patterns (and things like subscription or account information) would not be stored or coming in from the same source.",[48,31762,31763],{},"This is where stream processing really shines. Not only that, this is where stream processing becomes necessary. With the most powerful stream processing engines, you can achieve real-time data transfer and processing for large amounts of data, as well as support precision accuracy of the data. This also makes fraud detection a prime example to showcase what each of the core elements of streaming architecture is built for.\n‍",[3933,31765,31767],{"id":31766},"stream-processing-basics-architecture","Stream Processing Basics & Architecture",[48,31769,31770],{},"As the name implies, stream processing involves a continuous stream of data and is often referred to as “data in motion.” Another way of describing this has been as an “unbounded” stream of data. With an unbounded, continuous stream of data, there are a lot of benefits- most data is already naturally in this form, whether it’s input from a video game, weather sensor readings, or customer usage patterns. And, without breaks and waiting for a new batch of incoming records, it naturally has more potential for faster data transfer and processing.",[48,31772,31773],{},"Data stream processing is used as a distinction from batch processing, where data is published to a system or application in batches, or bounded units of data.",[48,31775,31776],{},"The most prevalent streaming use cases, and the ones we’ll be referring to here, are event streaming cases, where “event-driven” architecture is used. On the simplest level, event-driven architecture is a system where actions are triggered by an event. They are highly decoupled, asynchronous, and most importantly, allow for the possibility of real-time data processing. By contrast, a traditional transactional application depends on request to elicit a response, and on receiving acknowledgment to proceed.",[48,31778,3931],{},[48,31780,31781],{},[384,31782],{"alt":18,"src":31783},"\u002Fimgs\u002Fblogs\u002F65fd3599164edd59bf2aec3c_LUecACV6wDNqU13VZKgF2_VTVbPjjcynmi-KKgrPuGifCOgFFApBmStBErgJI9V4fdQ3Gtjv9M0IP1PbcLCX13RsEh3MpGcvdWziFCiJ7SjYQ1Q_jaCW1x7FmJju8UbNb0SVt4gWubVyPnWgSXEmOIE.png",[48,31785,3931],{},[48,31787,31788],{},"Stream processing architecture can come in many different “shapes.” However, the most common is a data pipeline, which is also a great way to showcase how streaming works.",[48,31790,31791],{},"The pipeline can consume data from a variety of “sources”, such as from another app, or, with some of the more advanced data streaming frameworks and platforms, directly from where the data is collected, which could be hardware like a weather sensor system, or it could be the original application where a fraudulent credit card purchase occurs.",[48,31793,31794,31795,4003,31798,31803],{},"In reality, the bulk of this pipeline is actually one or more stream processing applications, powered by a streaming framework or platform. Many of these technologies are open source, including some of the most powerful ones like ",[55,31796,821],{"href":23526,"rel":31797},[264],[55,31799,31802],{"href":31800,"rel":31801},"https:\u002F\u002Fflink.apache.org\u002F",[264],"Apache Flink",". Flink is a stateful processing engine (meaning that state can be persisted within a Flink application). Pulsar is a distributed messaging and streaming platform which is now also cloud-native!",[48,31805,3931],{},[48,31807,31808],{},[384,31809],{"alt":18,"src":31810},"\u002Fimgs\u002Fblogs\u002F65fd3599dd364072fd3fdf04_QgSGCzuKYJTCGyqAR0KvvI5X1ZXQDpwPc1Adc7uba1wWxDAi0L6iGFN95OaY_qNPHnPVRpGp3DE5RNzAbbkI7UBtIM8GxYcAnhcm1h_4nMDrnXFRGf-TBJ8iKZ5-egBTlqj1CeD_5Jgr63TonKHh61Q.png",[48,31812,31813],{},"In a complex use case like fraud detection, there would likely be several stream processing applications making up the whole system. Let’s say we have an example where a large tech company sells multiple different types of software products, and most of their users use at least two different products, ranging from hobbyists to enterprise customers. This company wants to use a stream processing system to ensure that there is no credit card fraud. This initial pipeline would likely need to aggregate the different accounts and subscriptions into values that can be compared to each other. Meanwhile, this system would likely be storing metadata and possibly other information like the user account status, in separate topics, logs or queues.",[48,31815,31816],{},"Queues and logs are such a core part of stream processing, that streaming platforms are often classified by whether they are queue or log-based. Both are used to route data from one system to another, but also to group related data together. Queues are first-in, first out, which can be beneficial to ensure that the order of messages is retained. With queues, the data is stored only until it is read by a consumer. Logs, on the other hand, can persist data and can also be shared among multiple different applications- they are a one-to-many routing system. Topics are more sophisticated and high-level than logs, but logs are still the underlying data structure. Logs (and, therefore, topics) also support publish-subscribe (or pub-sub) applications.",[48,31818,31819,31820,31825],{},"With pub-sub, consumers can “subscribe” to a topic, and messages that are published to that topic can be broadcast to all subscribed parties. At the same time, topics can also provide message filtering, so that subscribers only receive messages that match the set filters. Being able to broadcast and filter messages in this way allows this to be a highly efficient option, which is why topics are used for event-driven applications and real time stream processing. That being said, a true pub-sub system is not guaranteed just by using topics over queues. Making sure your application has an ",[55,31821,31824],{"href":31822,"rel":31823},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F",[264],"advanced pub-sub structure like Apache Pulsar’","s is particularly useful for scalability.",[48,31827,31828],{},"In our fraud detection scenario, the topics become the “sinks” for this initial streaming system. A stream processing application will allow you to write to or, in this case, publish data to different varieties of sinks, like a datastore, logs, various types of storage, or even to another data stream.",[48,31830,31831],{},"At this point, the aggregated data topic, which is this pipeline’s sink, is now the source for another data pipeline. In a real world scenario, this could be even more complicated, with even more data streams. Incoming data could be joined and merged as it’s being consumed, or there could be more topics being ingested that originated from other applications with varying different data types that need to be aggregated upon ingest.",[48,31833,31834],{},"However, in this example, we’ll say there’s just the two streaming applications. In this second one, this is where the heavy lifting occurs. The best fraud detection will combine a machine learning model that has been trained on their use case with a high power stream processing engine.",[48,31836,31837],{},"If an anomalous action occurs - let's say a long time hobbyist purchases a large subscription to an enterprise level product, the machine learning model should pick this up quickly. However, the work isn’t done once the anomaly is detected. In a highly sensitive case, the anomalous transaction may require additional analysis, and either way, it would need to be sorted and alerted on in as near real-time as possible. There are several ways this anomaly could be alerted on. The simplest way would be that the fraudulent transaction would be sent to a topic designated just for anomalies, where each event in that topic is alerted on, meanwhile all other non-fraudulent transactions would continue on to a datastore or other sink.",[48,31839,31840],{},"At this point, a non-streaming application could likely handle the alerts, although it is just as likely that a company would want an additional streaming pipeline to rapidly sort the alerts, particularly if there are different algorithms for different combinations of thresholds: maybe there are alerts for anyone purchasing something over a certain amount, and separate alerts for any hobbyist attempting to buy enterprise software and the system needs to ensure the user is not being notified twice, etc. This could get progressively more complicated the more products and types of users and subscriptions that the company has, which would necessitate an additional need for rapid, large-scale data processing.",[48,31842,31843],{},"This brings us to throughput and latency. In this case, we’ve mentioned several times the importance of being able to quickly process large amounts of data. The “throughput” relates to how much data a streaming system can consume (and\u002For process). Ideally, a system would be capable of as high throughput as possible.",[48,31845,31846],{},"“Latency” refers to how much lateness there is in the data getting from one point to another. This means that the ideal scenario would be to have as low latency as possible - the least amount of delay in a message moving through the stream.",[3933,31848],{"id":31849},"_1",[48,31851,31852],{},"The importance of benchmarking",[48,31854,31855],{},"This combination has historically been very tricky to produce. The relationship between throughput and latency is often how scalability of a stream processing platform is measured. For instance, if you’re testing the performance of two different stream processing platforms, and you are only sending a small amount of data through, you’ll likely have the same amount of latency for both applications.",[48,31857,31858],{},"However, if you start adding drastically higher amounts of data into your system, you will likely notice that suddenly the latency goes up for the less performant software. This means that, unfortunately, when building out a proof of concept and selecting a streaming system to use, many people often select a less performant stream processing option if they aren’t testing with large amounts of data. In this case, these users often won’t find out that their software is insufficient for scaling with their company’s growth until they have already launched and are committed to ingesting increasingly larger amounts of user data. This is why checking out benchmarks and testing with as high throughput as possible is so important.",[3933,31860],{"id":31861},"_2",[48,31863,31864],{},"Data Accuracy and “Exactly Once”",[48,31866,31867],{},"In our fraud detection use case, we’ve discussed the requirement for high throughput of data and speed of processing and transferring that data. However, there’s another essential component. When ingesting large amounts of data, there is a risk that messages could get dropped or duplicated. The best machine learning models and algorithms won’t do much if the event is dropped and never makes it into the application. Duplicated events can be equally problematic, and particularly for this use case, could trigger a false positive as being an anomaly. This is where a concept called “exactly once guarantees” comes in. Conceptually, it is basically what it sounds like - a guarantee from the stream processing platform that messages will neither be dropped nor duplicated, but that each message will arrive successfully and only once. In practice though, this can be difficult to implement, particularly at scale.",[3933,31869],{"id":31870},"_3",[48,31872,2125],{},[48,31874,31875],{},"Stream processing is the necessary choice for any use case that requires data to be truly real-time accessible, particularly when there are large amounts of data, or could be large amounts of data in the future (always good to think ahead for scalability requirements!). However, not all stream processing platforms will have the right features or performance capabilities for every use case. It’s important to understand whether your application should be log- or queue-based, and if it can scale up with your project’s projected growth.",[48,31877,31878,31879,31882],{},"If you would like to learn more, stay tuned for our next blog in the intro series! Or if you’re ready for best practices and data patterns, check out our ",[55,31880,31881],{"href":31678},"Data Streaming Patterns"," series.",[3933,31884,31663],{"id":31662},[1666,31886,31887,31894,31900,31907],{},[324,31888,31889,31890,190],{},"Learn more about how leading organizations are using Pulsar by checking out ",[55,31891,31893],{"href":31892},"\u002Fsuccess-stories","the latest Pulsar success stories",[324,31895,31896,31897,190],{},"Use StreamNative Cloud to spin up a Pulsar cluster in minutes. ",[55,31898,31899],{"href":27773},"Get started today",[324,31901,31902,31903,190],{},"Engage with the Pulsar community by joining the ",[55,31904,31906],{"href":31692,"rel":31905},[264],"Pulsar Slack channel",[324,31908,31909,31910,190],{},"Expand your Pulsar knowledge today with free, on-demand courses and live training from ",[55,31911,31914],{"href":31912,"rel":31913},"https:\u002F\u002Fwww.academy.streamnative.io\u002F",[264],"StreamNative Academy",[48,31916,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":31918},[],"2024-03-20","intro to stream processing","\u002Fimgs\u002Fblogs\u002F65fd38b29073e93f182cf6c6_IMG_4496.jpeg",{},"\u002Fblog\u002Fintroduction-to-stream-processing",{"title":31716,"description":31920},"blog\u002Fintroduction-to-stream-processing","JCIcC7RJhU4RRnxPnjmkTUVAvY5y1Ulf-WC9PYJTxs8",{"id":31928,"title":31686,"authors":31929,"body":31930,"category":821,"createdAt":290,"date":32279,"description":32280,"extension":8,"featured":294,"image":32281,"isDraft":294,"link":290,"meta":32282,"navigation":7,"order":296,"path":31685,"readingTime":4475,"relatedResources":290,"seo":32283,"stem":32284,"tags":32285,"__hash__":32286},"blogs\u002Fblog\u002Fchallenges-in-kafka-the-data-retention-stories-of-kevin-and-patricia.md",[31294],{"type":15,"value":31931,"toc":32265},[31932,31935,31938,31943,31947,31950,31953,31956,31959,31962,31966,31970,31973,31976,31979,31984,31987,31990,31993,31998,32001,32004,32007,32010,32015,32018,32023,32026,32031,32034,32037,32040,32049,32052,32055,32059,32062,32065,32068,32071,32076,32081,32084,32089,32094,32097,32103,32107,32110,32113,32116,32121,32124,32129,32132,32135,32138,32141,32144,32149,32151,32155,32160,32163,32166,32169,32172,32179,32183,32188,32191,32194,32198,32203,32206,32220,32223,32245,32248],[48,31933,31934],{},"In the realm of data streaming platforms like Kafka, retaining data is often a necessity. Yet, navigating the complexities of data storage management within Kafka can pose significant challenges. This blog post explores the hurdles encountered in managing data retention in Kafka, and the comparative benefits Pulsar provides in this context.",[48,31936,31937],{},"Meet Kevin, a Kafka aficionado, and Patricia, a Pulsar professional. They are both tasked with managing a large amount of data in their clusters. What will their experiences be like?",[48,31939,31940],{},[384,31941],{"alt":18,"src":31942},"\u002Fimgs\u002Fblogs\u002F65ca630ad7fef36c81f472f1_uZgBPw1RNC_vHhXlD_zndPDxWkOuxXV7hmhLjyuBdP5OgfhDub6r3EBOr7dIjkn-8_l-GcLoogyHVFedNMcb1x-T88biYzgAfEDpjJTefLnHa3bmjvme3SHzuOD7lVMaddPVvyyDbMC5h9AzMQX3lZo.png",[40,31944,31946],{"id":31945},"why-store-data-in-kafka","Why store data in Kafka",[48,31948,31949],{},"Before joining the adventures of our friends Kevin and Patricia, let's briefly recall the reasons why it can be useful to store a certain volume of data on a data streaming platform like Kafka and Pulsar.",[48,31951,31952],{},"In numerous scenarios, the necessity to reprocess data arises, whether due to updates in processing logic or to rectify errors. The availability of historical data allows for this reprocessing.",[48,31954,31955],{},"Kafka, commonly utilized for event sourcing, captures every modification to the application's state as a series of events. This retention of records offers a comprehensive history of state alterations, aiding in audit, debugging, and tracing the trajectory of an entity.",[48,31957,31958],{},"Furthermore, some applications demand data processing within specific time frames, such as the most recent 24 hours. Keeping these records allows for such time-bound analyses.",[48,31960,31961],{},"Additionally, various industries face regulatory mandates that dictate the duration for which data must be preserved. Kafka's configurable retention settings accommodate these compliance needs efficiently.",[40,31963,31965],{"id":31964},"challenge-1-reassigning-partitions","Challenge #1 - reassigning partitions",[32,31967,31969],{"id":31968},"kevins-experience","Kevin’s experience",[48,31971,31972],{},"Kevin designed his cluster to accommodate up to 128TB of data, which, with a replication factor of 3, requires 384TB due to redundancy. Kevin's cluster comprises 6 nodes, each with a 64TB disk attached - 64TB is the maximum storage capacity per broker node possible for Kevin’s cloud provider.",[48,31974,31975],{},"Kevin's cluster hosts over 100 topics, totaling over 1000 partitions. Like most Kafka brokers at a non-trivial scale, the data distribution across nodes and partitions is not perfectly uniform.",[48,31977,31978],{},"This is illustrated in the diagram below. For improved readability, only three nodes and three topics are depicted, and the replicas are not displayed.",[48,31980,31981],{},[384,31982],{"alt":18,"src":31983},"\u002Fimgs\u002Fblogs\u002F65f41e1dce91dfd31c1b59cf_nrbfhcHqoj01v0QZBX0MjRoxb8eGt41rWnWsaXW6TjKArSJC-pGdSmTk8hpvCZ_IWDLgf243eTr3OpEmetcJOWKF2TXr6f-kqkAS1KLI9UXvemgdEa_ppelrFYr08JcYRgoPNzq4Uj7GAXsKI1NSOF0.png",[48,31985,31986],{},"As business requirements evolve, Kevin must now accommodate an additional 20TB of data on a new existing topic divided into 3 partitions.",[48,31988,31989],{},"The total storage capacity available on the cluster is 30TB, so Kevin might believe the cluster can accommodate this.",[48,31991,31992],{},"However, one of the topics' partitions will exceed the capacity of one of the disks. Kevin faces an issue, as illustrated in the animation below:",[48,31994,31995],{},[384,31996],{"alt":18,"src":31997},"\u002Fimgs\u002Fblogs\u002F65f41e1db06684fc94b8960f_tqerpFaY8_zNS0eVoWmotryud8UwoyFCcJU6ziamW9z1floqHFS6-MEp3Nsw8lFJ6pPuMlX4epaGBAm6jKImUXn4EDg9bdWiRxEXWEEFo7mKluunwpaFrPbAiMEtF2y5n0TBWqfodrdWITHriLtndlU.gif",[48,31999,32000],{},"If Kevin doesn't intervene to prevent the partition from overflowing the first node's local storage, he will lose data.",[48,32002,32003],{},"Unfortunately, increasing the disk size isn't an option because it already reached the maximum size allowed by the cloud provider.",[48,32005,32006],{},"Here's the key point: in Kafka, message assignment to partitions is determined solely by message keys (for ordered delivery) or round-robin (for even distribution). The amount of free space on individual nodes does not play a role in this process.",[48,32008,32009],{},"Therefore, Kevin's only option to prevent data loss is to rebalance the partitions across nodes. The (very simplified) diagram below illustrates this concept. The yellow partition needs to be moved to another node to create space on the first node.",[48,32011,32012],{},[384,32013],{"alt":18,"src":32014},"\u002Fimgs\u002Fblogs\u002F65f41e1d2d8e4744548e9049_FCa_VcUb07S6ta0Iq80YsvYA3PUfD9dAG_XUuJPlZo5bKLEGLu4AvlG3xOT51yWiPXuR_vnehybLIEHyU1VOrMYr-UiRh8B-kevlm_TQhsRHn8VeWUVrGynsIy7yymhr8XLdka0gotEeXAeTaH-pEzM.png",[48,32016,32017],{},"Kevin generates a partition assignment plan and executes it to move these partitions so the cluster can accommodate the new topic’s data, as illustrated below:",[48,32019,32020],{},[384,32021],{"alt":18,"src":32022},"\u002Fimgs\u002Fblogs\u002F65f41e1dc7591073f246d3f5_jLLJbccmAEevXEHDdtgybFHQthFJnRsOyW7OByh2SYz3K2aKGfaz_I1spKnMRX3aLURwCxZ5FNAHfID9wTkdEZk218EhsSaa5PusJj9XJN4yHWbLR2H0ZDsaRoXb_5baYPJ8iXm2mTI7mAaR_QG2V3o.png",[48,32024,32025],{},"After the data migration, the new topic and its three partitions will fit on the nodes due to sufficient storage capacity on each node.",[48,32027,32028],{},[384,32029],{"alt":18,"src":32030},"\u002Fimgs\u002Fblogs\u002F65f41e1d599a80f3a5af37e7_DtMLHynw9lv7RzZRjZ3BrtEXW1EVeiR55bK1ZjLdF-NwNYjvt9g6s0vMwMkZiIC6deNUZnP8B2Px9d1E9rVv0afFA0ut2yhjl0wltzaHHxyqzNjpAhmXK1wmAiYjfuLiQJwpPQZP2VvGsOqfhvEKibs.png",[48,32032,32033],{},"The reassignment of partitions presents Kevin with several challenges.",[48,32035,32036],{},"This process transfers 1 TB of data, significantly consuming bandwidth and disk IO. With a network speed of 100 MB\u002Fs, this data transfer can take several hours!",[48,32038,32039],{},"The brokers' network and disks, being shared resources, face contention between the re-partitioning process and the ongoing data ingestion\u002Fconsumption by other topics. This contention affects the entire system's performance.",[48,32041,32042,32043,32048],{},"To minimize this impact, Kevin must enable ",[55,32044,32047],{"href":32045,"rel":32046},"https:\u002F\u002Fkafka.apache.org\u002Fdocumentation\u002F#rep-throttle",[264],"throttling",", which further increases the time required.",[48,32050,32051],{},"Unfortunately for Kevin, this operation took too long: one of the disks became full. Kevin starts losing data, and there is nothing he can do to alleviate it immediately.",[48,32053,32054],{},"While handling these issues in a small-scale Kafka cluster with limited data might be manageable, the challenges significantly increase as the volume of retained data grows.",[32,32056,32058],{"id":32057},"patricias-experience","Patricia’s experience",[48,32060,32061],{},"In this scenario, Patricia simply needs to ensure enough space is available in her Pulsar cluster for the incoming 20TB of data. She does not need to worry about the distribution of the existing data across the cluster.",[48,32063,32064],{},"The reason? Pulsar employs a segment-based storage model rather than a partition-based one. This approach enables smooth and even data distribution across all nodes, eliminating the need for large and long-running data transfers across broker nodes caused by partition reassignments.",[48,32066,32067],{},"Unlike partition-based storage models, where adding more data to a partition is capped by a single disk's available space, Pulsar's model avoids this constraint. Indeed, Pulsar stores the topic data using segments instead of partitions.",[48,32069,32070],{},"Below is a visual representation of these models:",[48,32072,32073],{},[384,32074],{"alt":18,"src":32075},"\u002Fimgs\u002Fblogs\u002F65f41e1da3eb64ed9b52e7bd_2cV-mjysDF7yNJWmf7gBw2rPe_27v7kPyQRAx9UQbDXpQk_6yWG1U8eApYkTzwDzAhMBpeQTuFwyApErKfW1smG52AYuUyLQWd6dfREsHSQ-tpR4j3grEJi0smME5p9DLorrf8IX3SSclimdv4sWkUE.png",[48,32077,32078],{},[384,32079],{"alt":18,"src":32080},"\u002Fimgs\u002Fblogs\u002F65f41e1d6b3a93245130a9e8_PMVbbBTBK2azuKahZ0jMQNSeuqb6lsSI76ANoZzfFxCA5HZzcail8D1xQhwNr8MSmU4AYWrN2lTPiyPzdOf-_36NDLoLemhZfnaQQJHkzhcsIbw9S7Gu7ZfSKoCrJNOWa9mMN0GlXcaBeABAsHgNlY8.png",[48,32082,32083],{},"‍‍Using the segment-based storage model above, adding more data is not capped by a single disk. The data distribution across the nodes is not coupled to the distribution of message keys in the topics. This allows a single topic to efficiently use the storage available across the entire cluster, as illustrated below:",[48,32085,32086],{},[384,32087],{"alt":18,"src":32088},"\u002Fimgs\u002Fblogs\u002F65f41e1d3fb1ae5d7c0c5c6d_FSJlHYo2ovbB0GTiuYPhUlQVKNHSH2fY6CoPbf_ZIxnR6uXZfOtBIjcCrz7qbGxdzi8_TUt1IyCjfwwRKgE685zt3XJ-MxQ-B31TN2HpUCaX2LlFMpUcbrXkia3QKF7MhTZ8mktL5H16UydsUdtd48Y.png",[48,32090,32091],{},[384,32092],{"alt":18,"src":32093},"\u002Fimgs\u002Fblogs\u002F65f41e1d9d263d337171b865_I3xX5vVzOKEfGy6d2xxABnjl75P8RU11U-yU5aDG-eZDnSaYUaxih3d1u1ii2JGDoI-VxO0TL4cJV3jgVgGeo0sZI4kqwS67Vov3sPmYf5cjJ03PpCT5vq-rWDbS9m_PWpkjXL9ddYYjyWkYRYPqeCg.png",[48,32095,32096],{},"This segment-based storage model means less operational burden for Patricia, reduced risks, and a better quality of service.",[48,32098,32099,32100,190],{},"For an in-depth exploration of the segment-based model and its distinctions from the partition-based model, feel free to read the ",[55,32101,32102],{"href":31678},"Data Streaming Patterns Series blog post on Segmentation",[40,32104,32106],{"id":32105},"challenge-2-expanding-the-storage-capacity","Challenge #2 - expanding the storage capacity",[48,32108,32109],{},"Kevin and Patricia have to prepare their respective clusters to store an additional 10TB on an existing topic. Unlike the previous scenario, their clusters do not have enough free storage space for that. They have to expand their cluster storage capacity by adding new nodes. To ensure the data is replicated at least three times, they each add three nodes.",[32,32111,31969],{"id":32112},"kevins-experience-1",[48,32114,32115],{},"Kevin has added three broker nodes. However, the partitions continue to grow, leading to storage space depletion on the existing nodes. The cluster isn't ready to utilize the additional storage capacity yet. This is illustrated in the diagram below - for better clarity, the diagram shows one new node without replicas:",[48,32117,32118],{},[384,32119],{"alt":18,"src":32120},"\u002Fimgs\u002Fblogs\u002F65f41e1eb06684fc94b89635_ks292TOzGzUQH75ToZcn8liu0ym-EmuaWa2LqtgbnyEhPgGUuRCByp7XSqUo00c-XppPgrEO1SKeLEd0Lzj7Y9z78aT3r3NostIV3AZIbWiNK8ot8-Q-_s5K9gB3Spp6Hnap6f45VOZF7AjkciZPA8Y.png",[48,32122,32123],{},"To avoid the situation described above, Kevin needs to reassign topics' partitions, ensuring that data is distributed across all nodes and that the storage space of the new nodes is effectively utilized. Therefore, Kevin has decided to rebalance the data to create more space on the nodes and achieve an even distribution of data across all nodes, preparing for future incoming messages on the topics:",[48,32125,32126],{},[384,32127],{"alt":18,"src":32128},"\u002Fimgs\u002Fblogs\u002F65f41e1d9306ecee89214083_YVAyI_KPTpxjbKdS5LeoAD7ZqlEUWEPrToUOxVM1PlBAd6QDmAbZ58WUCuHrDMZUJ0hJTmX56p-oUop0iOqE57VftTpRKz-lLmIcI2s3tJAf4MpXbVmBjSnMdOySEtQpIcAzixVoh8QG5evoUIBFg7Y.png",[48,32130,32131],{},"This operation requires the transfer of an enormous amount of data. Kevin finds himself dealing with the issues he faced in the previous scenario but on a larger scale.",[32,32133,32058],{"id":32134},"patricias-experience-1",[48,32136,32137],{},"Patricia needed to store an extra 10 terabytes of data on her Pulsar cluster. To achieve this, she simply added new storage nodes, following a similar approach as Kevin's.",[48,32139,32140],{},"However, unlike Kevin's experience, Patricia's process was smooth sailing. She avoided the time-consuming and manual tasks Kevin had to perform, like designing, executing, and monitoring partition reassignments. There was also no noticeable decline in message consumption or production for her clients.",[48,32142,32143],{},"This efficiency is, again, all thanks to Pulsar's segment-based storage model. In this model, as new data arrives on topics, new segments are automatically created to store it. These new segments are then automatically distributed and stored on the new nodes, allowing the new nodes to store new messages right away.",[48,32145,32146],{},[384,32147],{"alt":18,"src":32148},"\u002Fimgs\u002Fblogs\u002F65f41e1d3bc14448a43bb36c_i5qoEx4qTJYjwoNcaKTwvZZiJNf29___Vt-VPMFnRRzaIBNbOjxMKpnvEOOq4Ezzvv6TbY99_W5vhL09y41PMHM_PaRn3wCjJ4Jf2TKt1Wm7y5Y9AfUCkGO7JMFPhf__M23gd7zZ7UMfjHniDiR8iNM.png",[48,32150,3931],{},[40,32152,32154],{"id":32153},"keeping-costs-under-control","Keeping costs under control",[48,32156,32157],{},[384,32158],{"alt":18,"src":32159},"\u002Fimgs\u002Fblogs\u002F65ca630aeeb5738726dc052b_t5gR2yPfV0r35bgxFEJkSw3plw98FtQqCFkswIYSDLUBisCt-SEpkfPwJjz_mQ5VlZcNOk6A1OMujuawt0D8TR0UMMA4H8IrAXyvkMKy_P-G7wKGoLh0FhKqLrMrdkiW09TfSSDxsijUY4EvD9vD2QI.png",[48,32161,32162],{},"For Kevin and Patricia, managing costs in their massive clusters is crucial. However, Kevin faces a steeper challenge.",[48,32164,32165],{},"Firstly, scaling out Kevin's cluster impacted his budget significantly. The massive data movement during repeated partition reassignments caused data transfer costs to skyrocket. Indeed, cloud providers charge for transferred data, and these costs can be significant.",[48,32167,32168],{},"On the other hand, Patricia's infrastructure budget only increased due to additional nodes and disks, with no data transfer cost spikes.",[48,32170,32171],{},"Secondly, Patricia required far fewer manual, time-consuming, and risky operations in those scenarios mentioned in this blog post, saving both time and money.",[48,32173,32174,32175],{},"To explore the broader topic of TCO for messaging and data streaming platforms, which goes beyond just data retention, check out the following blog post: ",[55,32176,32178],{"href":32177},"\u002Fblog\u002Freducing-total-cost-of-ownership-tco-for-enterprise-data-streaming-and-messaging","Reducing Total Cost of Ownership (TCO) for Enterprise Data Streaming and Messaging",[40,32180,32182],{"id":32181},"what-about-tiered-storage","What about Tiered Storage?",[48,32184,32185],{},[384,32186],{"alt":18,"src":32187},"\u002Fimgs\u002Fblogs\u002F65f41e1d14c443ce7beff9ed_oBbv-Qy1y5R5sx1pAUoTv84D5CJyWy7etNq9IJQLjtOD94840f4_rdpSgdqMPAZG1oKVIBOaYJBl05tnE-rTc2gO6YsgIZGvrHfwHu1ZUykR28IuDRh96u-Fs6mSFLrQQsr7NtdaLAJ4lO_NcMBEWIo.png",[48,32189,32190],{},"Kevin has heard of Kafka's offloading feature, which allows data to be stored out of the cluster. He thinks this feature could solve all these problems. This would mean less data would be moved during partition reassignments because most of the data would not be stored on local disks. Kevin hopes this will remove performance issues, improve flexibility, and cut data transfer costs. Offloading could also lower storage costs by needing fewer nodes for data storage and leveraging a cost-effective storage solution such as Amazon S3.",[48,32192,32193],{},"However, Kevin will soon realize that Tiered Storage, despite its potential, also presents unique challenges. Keep an eye out for our next blog post, where we'll delve into Kevin's experiences with Tiered Storage.",[40,32195,32197],{"id":32196},"want-to-learn-more","Want to learn more?",[48,32199,32200],{},[384,32201],{"alt":18,"src":32202},"\u002Fimgs\u002Fblogs\u002F65f41e1d189459c7fb4d198a_L2EnoFCoCvYIgrbfcgYNsk-sfTm9csRxd1I42UJT1TtOvzcWoqroeZcpMJeKVmgUQp5v9vI5p59sRRUZQ70n4euBVQE8pvaSxCzjkPYTWTnKXY_BvQwuUunKLEDiiNCFpUJJDbTtyR41Wk2a_JkXwAA.jpeg",[48,32204,32205],{},"For deeper exploration of these subjects, you're encouraged to:",[321,32207,32208,32215],{},[324,32209,32210,32211],{},"View David Kjerrumgaard’s webinar for an insightful analysis of challenges and Pulsar’s solutions:  ",[55,32212,32214],{"href":32213},"\u002Fwebinars\u002Fdecoding-kafka-challenges-addressing-common-pain-points-in-kafka-deployments","Decoding Kafka Challenges - free webinar",[324,32216,32217,32218],{},"Discover the nuances between partitioning and segmentation in our blog by Caito Scherr: ",[55,32219,31679],{"href":31678},[48,32221,32222],{},"If you're inclined to experience Pulsar from Patricia's perspective rather than Kevin's:",[321,32224,32225,32233],{},[324,32226,32227,32228],{},"I invite you to try out Pulsar here: ",[55,32229,32232],{"href":32230,"rel":32231},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.2.x\u002Fgetting-started-home\u002F",[264],"Get started | Apache Pulsar",[324,32234,32235,32236,1154,32240,32244],{},"An excellent way to quickly get started with Pulsar in production is through StreamNative: ",[55,32237,32239],{"href":32238},"\u002Fbook-a-demo","Book a demo",[55,32241,32243],{"href":32242},"\u002Fpricing","get started on StreamNative Hosted"," with a 200$ credit.",[48,32246,32247],{},"Looking to grasp Pulsar’s concepts quickly?",[321,32249,32250,32258],{},[324,32251,32252,32253],{},"Check out the video: ",[55,32254,32257],{"href":32255,"rel":32256},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TKs5T6N78Tc",[264],"Understanding Apache Pulsar in 10 minutes video",[324,32259,32260,32261],{},"Read a concise guide: ",[55,32262,32264],{"href":32263},"\u002Fblog\u002Funderstanding-pulsar-10-minutes-guide-kafka-users","Understanding Apache Pulsar in 10 Minutes: A Guide for Kafka Users | StreamNative",{"title":18,"searchDepth":19,"depth":19,"links":32266},[32267,32268,32272,32276,32277,32278],{"id":31945,"depth":19,"text":31946},{"id":31964,"depth":19,"text":31965,"children":32269},[32270,32271],{"id":31968,"depth":279,"text":31969},{"id":32057,"depth":279,"text":32058},{"id":32105,"depth":19,"text":32106,"children":32273},[32274,32275],{"id":32112,"depth":279,"text":31969},{"id":32134,"depth":279,"text":32058},{"id":32153,"depth":19,"text":32154},{"id":32181,"depth":19,"text":32182},{"id":32196,"depth":19,"text":32197},"2024-03-15","Explore Kafka's data retention challenges and Pulsar's solutions in our blog. Learn about Kafka's storage limits, Pulsar's elastic storage, and how to reduce cloud costs while improving elasticity.","\u002Fimgs\u002Fblogs\u002F65f42b471a81499b808ac93c_kevin-patrici-1200x630.png",{},{"title":31686,"description":32280},"blog\u002Fchallenges-in-kafka-the-data-retention-stories-of-kevin-and-patricia",[799,7347],"h6lEd0AeTIEKxv5I-xmG6kVmJuItjJ31Qqu0Nat3QXI",{"id":32288,"title":32289,"authors":32290,"body":32292,"category":3550,"createdAt":290,"date":32474,"description":32289,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":32475,"navigation":7,"order":296,"path":32476,"readingTime":5505,"relatedResources":290,"seo":32477,"stem":32478,"tags":32479,"__hash__":32480},"blogs\u002Fblog\u002Fnew-streamnative-academy-course-getting-started-with-kafka-on-streamnative-ksn-for-kafka-developers.md","New StreamNative Academy Course: Getting Started with Kafka on StreamNative (KSN) for Kafka Developers",[32291],"Dustin Nest",{"type":15,"value":32293,"toc":32469},[32294,32296,32303,32306,32310,32313,32340,32343,32347,32350,32370,32373,32384,32387,32390,32394,32397,32414,32417,32421,32430,32433,32438,32443,32446,32450,32456,32464,32467],[40,32295,46],{"id":42},[48,32297,32298,32299,190],{},"Kafka developers can quickly start using StreamNative Cloud clusters for their streaming needs using Kafka on StreamNative (KSN). While this blog post doesn’t focus on the many benefits of using Pulsar clusters over Kafka clusters, the list is lengthy, and the benefits are significant. If you’re interested in learning more, check out ",[55,32300,32302],{"href":32301},"\u002Fblog\u002Foptimize-and-scale-how-kafka-on-streamnative-transforms-your-data-streaming-platform","Optimize and Scale: How Kafka-on-StreamNative Transforms Your Data Streaming Platform",[48,32304,32305],{},"Instead, in this blog post, we focus on how a Kafka Developer can get started using a StreamNative Cloud cluster as quickly as possible with their existing Kafka code. KSN embeds the Kafka protocol directly inside the Pulsar broker, allowing KSN-enabled StreamNative Pulsar clusters to support existing Kafka workloads.",[40,32307,32309],{"id":32308},"how-do-i-start-testing-kafka-code-against-a-ksn-enabled-streamnative-pulsar-cluster","How do I start testing Kafka code against a KSN-enabled StreamNative Pulsar cluster?",[48,32311,32312],{},"StreamNative Academy provides Kafka developers with three options for getting started testing Kafka code against a KSN-enabled StreamNative Pulsar cluster:",[1666,32314,32315,32324,32332],{},[324,32316,32317,32318,32323],{},"Recommended Get immediate and free access to StreamNative Academy’s ",[55,32319,32322],{"href":32320,"rel":32321},"https:\u002F\u002Fwww.academy.streamnative.io\u002Fcourses\u002Fcourse-v1:streamnative+DEV-210+2024\u002Fabout",[264],"Getting Started with Kafka on StreamNative (KSN) for Kafka Developers",". To complete the hands-on exercises, request access to a free coding environment with pre-loaded code examples to be tested against a training cluster. Access to the coding environment is typically granted within one business day. You will have access to the environment for one week.",[324,32325,32326,32327,32331],{},"Get free access to the course, but create your own free StreamNative Hosted cluster at ",[55,32328,32330],{"href":19356,"rel":32329},[264],"streamnative.io"," for testing KSN using $200 credit. The course includes detailed directions on creating a service account, creating a tenant and namespaces, and applying permissions required for KSN. This process should take only 15 minutes to complete. Use this option to test your own Kafka code.",[324,32333,32334,32335,190],{},"Just want to browse the course videos? ",[55,32336,32339],{"href":32337,"rel":32338},"https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PL7-BmxsE3q4XrdXD9EvdCoX4qYe_GQfYK&si=hc37rZV4Kc1aIOHp",[264],"View course content on StreamNative Academy’s YouTube Channel",[48,32341,32342],{},"The short course includes ~3-4 hours of training content.",[3933,32344,32346],{"id":32345},"_1-enroll-in-streamnative-academys-free-short-course-getting-started-with-kafka-on-streamnative-ksn-for-kafka-developers-and-request-access-to-the-coding-environment-and-training-cluster","1. Enroll in StreamNative Academy’s free short course Getting Started with Kafka on StreamNative (KSN) for Kafka Developers and request access to the coding environment and training cluster",[48,32348,32349],{},"Upon registering for the course, you will be granted immediate access to the following training content using the Kafka Java Client:",[321,32351,32352,32355,32358,32361,32364,32367],{},[324,32353,32354],{},"Convert an existing client.properties file to use Pulsar JWT Tokens or StreamNative API-Keys to connect to a KSN-enabled StreamNative Hosted Pulsar cluster.",[324,32356,32357],{},"Convert an existing client.properties file to use OAuth2.",[324,32359,32360],{},"Connect to the KSN Kafka protocol handler for producing and consuming messages and register schemas using the Kafka schema registry.",[324,32362,32363],{},"In addition to using a client.properties file, code samples are also provided for configuring cluster connections directly in your Java code (JWT Token and OAuth2, for both protocol handler and schema registry).",[324,32365,32366],{},"Code samples tested during the course include producing and consuming messages, transactions, and KStreams. Further information for topic partitioning, compaction, and message retention is also provided.",[324,32368,32369],{},"In addition, you will have an opportunity to try KSN’s support for multi-tenancy and enable geo-replication to a second StreamNative Hosted Pulsar cluster by executing just one command.",[48,32371,32372],{},"To complete the hands-on exercises, we encourage you to request access to the free coding environment and cluster access. Access is provided for one week and includes the following (access is typically granted within one business day):",[321,32374,32375,32378,32381],{},[324,32376,32377],{},"Preconfigured web-based coding environment with code samples to complete the hands-on exercises mentioned above.",[324,32379,32380],{},"Tenant access in a StreamNative Hosted Pulsar cluster to complete all hands-on exercises. All required permissions are preconfigured for you.",[324,32382,32383],{},"You will have access to StreamNative’s Technical Trainer during the course.",[48,32385,32386],{},"After completing the short course, you will receive a certificate of completion and a badge for LinkedIn.",[48,32388,32389],{},"If you wish to test your own Kafka code, please use Option 2 below by creating your own StreamNative Hosted Pulsar cluster.",[3933,32391,32393],{"id":32392},"_2-enroll-in-the-free-short-course-but-create-your-own-streamnative-hosted-pulsar-cluster-for-testing-ksn-using-a-200-credit","2. Enroll in the free short course, but create your own StreamNative Hosted Pulsar cluster for testing KSN using a $200 credit",[48,32395,32396],{},"Want to test your own Kafka code against a StreamNative Hosted Pulsar cluster? StreamNative provides a $200 credit for creating your own StreamNative Hosted Pulsar cluster for testing. How will I convert my existing Kafka code to connect to the StreamNative Hosted Pulsar cluster? Skip to the part of the course that most closely resembles your existing Kafka code:",[321,32398,32399,32402,32405,32408,32411],{},[324,32400,32401],{},"Since you’re working in your own StreamNative Hosted Pulsar cluster, directions are provided on how to create a service account and obtain any needed security tokens or keys.",[324,32403,32404],{},"Use JWT or Oauth2 to connect to the Kafka protocol handler or Kafka schema registry of the cluster you created. It’s your choice. We have code examples for both and directions on obtaining all the relevant endpoints from the console.",[324,32406,32407],{},"Code examples are provided for configuring permissions using a client.properties file or directly in your Java code.",[324,32409,32410],{},"If you would like to test KSN support for multi-tenancy, directions are provided on creating tenants, namespaces, and all required permissions.",[324,32412,32413],{},"Since you’ll be using your own StreamNative Hosted Pulsar cluster, we won’t provide tenant access in our training cluster or one week's access to the preconfigured coding environment. What if you still want one week of access to the coding environment or our coding samples? Just ask! Coding example are available for download in the course and can be easily installed into your local IDE. If you request access to the pre-loaded coding environment, directions are included in the course to point this to your StreamNative Hosted Pulsar cluster.",[48,32415,32416],{},"Since there is no way to check for course completion, you will not be provided a certificate of completion or badge for LinkedIn after completing the short course when using your own StreamNative Hosted Pulsar cluster.",[3933,32418,32420],{"id":32419},"_3-view-course-content-on-streamnative-academys-youtube-channel","3. View course content on StreamNative Academy’s YouTube Channel",[48,32422,32423,32424,32429],{},"The short course videos are available on ",[55,32425,32428],{"href":32426,"rel":32427},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-BmxsE3q4XrdXD9EvdCoX4qYe_GQfYK",[264],"StreamNative Academy’s YouTube Channel",". Feel free to browse the video content to see how easy it is to convert an existing Kafka application to KSN. View the video descriptions for relevant code snippets showcased in each video.",[48,32431,32432],{},"As an example, the first exercise of the course, Connect Existing Kafka Application to Kafka on StreamNative, start with the simplest example of connecting to KSN with a JWT Token. In this first example, only the URL and sasl.jaas.config of client.properties must be edited to start publishing messages to KSN.",[48,32434,32435],{},[4926,32436,32437],{},"url=ksn endpoint",[48,32439,32440],{},[4926,32441,32442],{},"sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='public\u002Fdefault' password='token:\u003Ctoken>';",[48,32444,32445],{},"If you plan on testing KSN, we highly recommend enrolling in the free short course. Course access includes access to StreamNative’s Technical Trainer.",[40,32447,32449],{"id":32448},"free-course-access","Free Course Access",[48,32451,32452,32453,190],{},"Get free and immediate access to the course ",[55,32454,267],{"href":32320,"rel":32455},[264],[48,32457,32458,32459,32463],{},"For Option 1 (Recommended), contact ",[55,32460,32462],{"href":32461},"mailto:training-help@streamnative.io","training-help@streamnative.io"," to request your free coding environment and cluster access. Access is typically granted within one business day. You will have access to the coding environment and cluster for one week.",[48,32465,32466],{},"For Option 2, sign up for the course and spin up your own StreamNative Pulsar cluster, using the course content to apply the required permissions. Contact training if you would like access to a pre-configured coding environment. Code examples are provided in the course.",[48,32468,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":32470},[32471,32472,32473],{"id":42,"depth":19,"text":46},{"id":32308,"depth":19,"text":32309},{"id":32448,"depth":19,"text":32449},"2024-03-14",{},"\u002Fblog\u002Fnew-streamnative-academy-course-getting-started-with-kafka-on-streamnative-ksn-for-kafka-developers",{"title":32289,"description":32289},"blog\u002Fnew-streamnative-academy-course-getting-started-with-kafka-on-streamnative-ksn-for-kafka-developers",[799,3550,821],"hCp_p9y8s6Kp_-c60-DLIMq4Z7MfjVX3nSMIlvi3n4U",{"id":32482,"title":32302,"authors":32483,"body":32485,"category":3550,"createdAt":290,"date":32615,"description":32302,"extension":8,"featured":294,"image":32616,"isDraft":294,"link":290,"meta":32617,"navigation":7,"order":296,"path":32301,"readingTime":32618,"relatedResources":290,"seo":32619,"stem":32620,"tags":32621,"__hash__":32623},"blogs\u002Fblog\u002Foptimize-and-scale-how-kafka-on-streamnative-transforms-your-data-streaming-platform.md",[32484],"Karan Malhi",{"type":15,"value":32486,"toc":32609},[32487,32490,32494,32497,32514,32517,32520,32523,32534,32536,32540,32551,32555,32569,32571,32575,32593,32602,32607],[48,32488,32489],{},"We are very excited to announce the general availability of Kafka-on-StreamNative (KSN). Kafka-on-StreamNative (KSN) allows you to migrate your Kafka workloads to the ONE StreamNative Platform without rewriting client code. It promises to build upon your investments in Kafka with Pulsar's enterprise capabilities. Using KSN, you can quickly and cost-effectively modernize your Kafka-based data infrastructure, achieving a truly scalable, highly available, multi-tenant data streaming platform.",[32,32491,32493],{"id":32492},"understanding-ksn-and-the-one-streamnative-platform","Understanding KSN and the ONE StreamNative Platform",[48,32495,32496],{},"Kafka-on-StreamNative (KSN) brings native Apache Kafka protocol support to Apache Pulsar by introducing a Kafka protocol handler on Pulsar brokers. You can migrate your existing Kafka applications and services to Pulsar without modifying client code. This enables Kafka applications to leverage Pulsar’s powerful features, such as:",[321,32498,32499,32502,32505,32508,32511],{},[324,32500,32501],{},"Streamlined operations with enterprise-grade multi-tenancy",[324,32503,32504],{},"Simplified operations with a rebalance-free architecture",[324,32506,32507],{},"Automated topic rebalancing",[324,32509,32510],{},"Infinite event stream retention with Apache BookKeeper and tiered storage",[324,32512,32513],{},"Serverless event processing with Pulsar Functions",[48,32515,32516],{},"Kafka-on-StreamNative supports all enterprise features, including KStreams, KSQL, KTables with Topic Compaction, Schema Registry for the Java Client, and Kerberos Authentication for Kafka Clients.",[48,32518,32519],{},"The ONE StreamNative platform is a modern data streaming platform that uses Apache Pulsar and offers built-in multi-tenancy, geo-replication, and tiered storage.",[48,32521,32522],{},"The ONE StreamNative Platform is available in three deployment models:",[1666,32524,32525,32528,32531],{},[324,32526,32527],{},"Hosted - A fully managed SaaS offering",[324,32529,32530],{},"Bring-Your-Own-Cloud (BYOC) - Fully managed in your public cloud",[324,32532,32533],{},"Private Cloud - On-premises in your private cloud",[48,32535,3931],{},[32,32537,32539],{"id":32538},"kafka-challenges-addressed-by-ksn","Kafka challenges addressed by KSN",[1666,32541,32542,32545,32548],{},[324,32543,32544],{},"Increased downtime with Kafka: Kafka users need to repartition their cluster when they reach the storage limit, which results in downtime. With KSN, you run your Kafka workloads more efficiently on Apache Pulsar without any migration headaches. Built-in multi-tenancy and broker auto-rebalancing allow you to scale Kafka workloads without worrying about maintenance overhead.",[324,32546,32547],{},"Manual HA\u002FDR: Kafka users need to manually configure geo-replication to replicate data in multiple regions for HA\u002FDR and higher uptime. With KSN, real-time data is now part of all your mission-critical applications. You can leverage built-in geo-replication functionality to replicate your Kafka workloads seamlessly around the globe.",[324,32549,32550],{},"High bills: Kafka users often get very high bills due to unpredictable workloads that result in over provisioning, and do not get the right ROI from Kafka investment. With KSN, you dramatically reduce data streaming costs and management complexities by consolidating 10’s of Kafka clusters into a single Pulsar cluster.",[32,32552,32554],{"id":32553},"what-else-makes-ksn-stand-out","What else makes KSN stand out?",[1666,32556,32557,32560,32563,32566],{},[324,32558,32559],{},"Scalability: One of the key advantages of Pulsar is its ability to scale effortlessly. With KSN, you can tap into Pulsar's horizontally scalable architecture, enabling you to handle massive volumes of data without compromising on performance.",[324,32561,32562],{},"Multi-Tenancy: Pulsar's built-in support for multi-tenancy allows different teams or applications to share the same cluster securely. KSN extends this capability to Kafka users, providing isolation and resource management for diverse workloads.",[324,32564,32565],{},"Tiered Storage: Pulsar's tiered storage architecture, which seamlessly integrates with cloud object stores like Amazon S3 or Azure Blob Storage, enables cost-effective long-term data retention. KSN inherits this feature, offering organizations flexibility in managing data retention policies and reducing storage costs.",[324,32567,32568],{},"Compatibility with Kafka Ecosystem: KSN maintains compatibility with Kafka's APIs, including producer and consumer APIs, Kafka Connect, and Kafka Streams. This allows existing Kafka applications and tools to seamlessly integrate with Pulsar, minimizing migration efforts and ensuring a smooth transition.",[48,32570,3931],{},[32,32572,32574],{"id":32573},"getting-started-with-ksn","Getting started with KSN",[48,32576,32577,32578,32581,32582,32586,32587,32592],{},"Experience KSN with a free trial ($200 credit) by visiting our ",[55,32579,32580],{"href":32242},"pricing page",". Additionally, enhance your skills with our short training course, '",[55,32583,32585],{"href":32320,"rel":32584},[264],"Kafka on StreamNative (KSN) for Kafka Developers","'. Explore the comprehensive ",[55,32588,32591],{"href":32589,"rel":32590},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Findex-kafka",[264],"KSN documentation"," to delve deeper into our platform.",[48,32594,32595,32596,32601],{},"Finally, I will be hosting a ",[55,32597,32600],{"href":32598,"rel":32599},"https:\u002F\u002Fstreamnative.zoom.us\u002Fwebinar\u002Fregister\u002FWN_bvm7kfvwTKGTBIxlHuiCvA#\u002Fregistration",[264],"webinar on April 2nd 9amPT"," with my colleague Dustin to demonstrate this product.",[48,32603,32604,32605,190],{},"For further inquiries, don't hesitate to ",[55,32606,24379],{"href":6392},[48,32608,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":32610},[32611,32612,32613,32614],{"id":32492,"depth":279,"text":32493},{"id":32538,"depth":279,"text":32539},{"id":32553,"depth":279,"text":32554},{"id":32573,"depth":279,"text":32574},"2024-03-13","\u002Fimgs\u002Fblogs\u002F65f2f15066015834c49c3247_SN_blog_KSN_01-1.png",{},"10min read",{"title":32302,"description":32302},"blog\u002Foptimize-and-scale-how-kafka-on-streamnative-transforms-your-data-streaming-platform",[799,821,27847,32622,5954],"Migration","AssbWgLiAiJYh-jxHZu0VxE6Uatg82_14ksdVJDOUlg",{"id":32625,"title":32626,"authors":32627,"body":32628,"category":3550,"createdAt":290,"date":32696,"description":32697,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":32698,"navigation":7,"order":296,"path":29509,"readingTime":290,"relatedResources":290,"seo":32699,"stem":32700,"tags":32701,"__hash__":32702},"blogs\u002Fblog\u002Fstreamnative-cloud-supports-microsoft-azure.md","StreamNative Cloud supports Microsoft Azure",[24776],{"type":15,"value":32629,"toc":32692},[32630,32633,32635,32638,32643,32647,32649,32666,32668,32670,32673,32675,32680,32682,32690],[48,32631,32632],{},"Today, we are happy to announce StreamNative Cloud on Microsoft Azure. This expands your options for deploying mission-critical event streaming and messaging workloads, unlocking the power of Pulsar on your preferred cloud platform, in addition to Amazon Web Services (AWS) and Google Cloud.",[48,32634,3931],{},[48,32636,32637],{},"Apache Pulsar has become one of the most popular messaging platforms in modern cloud environments. With StreamNative Cloud on Azure, our enterprise customers can easily build their event-driven applications with Apache Pulsar and get real-time value from their data",[48,32639,32640],{},[384,32641],{"alt":18,"src":32642},"\u002Fimgs\u002Fblogs\u002F65ea0e69c20aadaf7535c8b6_KbFyDocdgc-8YTGkUsxQEF2xWp3JIMDR-B-VmoIEnC6Fg94-c3h_5D-qolwWAz5G81OshM3aCg-DgINbet3UsNOjEbKiWeTBVBzkd63aESWYv3RU4aR6xgWkHavSvarnwwSjVm_RIJ7FRLvDh_7jlCM.png",[40,32644,32646],{"id":32645},"key-benefits-of-streamnative-cloud-on-azure","Key Benefits of StreamNative Cloud on Azure",[48,32648,3931],{},[321,32650,32651,32654,32657,32660,32663],{},[324,32652,32653],{},"Seamless Integration: StreamNative Cloud seamlessly integrates with Azure services, enabling users to harness the full power of both platforms without friction.",[324,32655,32656],{},"Global Scalability: Leveraging Azure's global network of data centers, StreamNative Cloud users can effortlessly scale their real-time applications to meet the demands of a global audience.",[324,32658,32659],{},"Enterprise-grade Security: Azure's robust security features, including identity management, encryption, and compliance certifications, bolster the security posture of StreamNative Cloud deployments, instilling confidence in users regarding data protection and regulatory compliance.",[324,32661,32662],{},"Cost Optimization: With Azure's flexible pricing models and StreamNative Cloud's resource-efficient architecture, organizations can optimize their cloud spending while maximizing the value derived from real-time data processing.",[324,32664,32665],{},"Enhanced Developer Productivity: By abstracting away the complexities of infrastructure management, StreamNative Cloud empowers developers to focus on writing code and building innovative streaming applications, accelerating time-to-market and fostering a culture of continuous innovation.",[48,32667,3931],{},[40,32669,2890],{"id":749},[48,32671,32672],{},"Whether you're an organization considering a move to the cloud or already running on Azure, supporting Azure with StreamNative Cloud gives you more choice and flexibility to accelerate your journey to cloud-native streaming data processing! How to get started? When you start StreamNative Cloud, you can simply choose Microsoft Azure as your preferred environment and select availability zones.",[48,32674,3931],{},[48,32676,32677],{},[384,32678],{"alt":18,"src":32679},"\u002Fimgs\u002Fblogs\u002F65ea0e694c1c60403cdd4276_BoS8i7p46M1KQyFkKYQBleZ6U37eEI7_UqwnLokWWcbl_t4UW_lnq7hXp_wsOJr0-IeXOnm5faDQfLIwLnQYXpxglGiANeTw8dKOFs5kquZCe3trkfJ2KBvDXEs6cqkdOkFFmN2DnWt-v_KEz2HV3kI.png",[48,32681,3931],{},[48,32683,32684,32685,190],{},"Microsoft Azure deployment is currently in Public Preview for StreamNative Cloud and BYOC, If you would like to try it out, please ",[55,32686,32689],{"href":32687,"rel":32688},"https:\u002F\u002Fsupport.streamnative.io\u002Fhc\u002Fen-us\u002Frequests\u002Fnew",[264],"contact our sales team",[48,32691,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":32693},[32694,32695],{"id":32645,"depth":19,"text":32646},{"id":749,"depth":19,"text":2890},"2024-03-08","streamnative cloud supports microsoft azure",{},{"title":32626,"description":32697},"blog\u002Fstreamnative-cloud-supports-microsoft-azure",[3550,821,8058],"uHMIqZL47hwQcTtJQUuR9d6fer8aVOz5tg0asuZl2qM",{"id":32704,"title":32705,"authors":32706,"body":32708,"category":290,"createdAt":290,"date":32805,"description":32806,"extension":8,"featured":294,"image":32807,"isDraft":294,"link":290,"meta":32808,"navigation":7,"order":296,"path":32809,"readingTime":290,"relatedResources":290,"seo":32810,"stem":32811,"tags":32812,"__hash__":32813},"blogs\u002Fblog\u002Fthe-oxia-java-client-library-is-now-open-source.md","The Oxia Java Client Library is Now Open Source",[807,32707],"Vik Narayan",{"type":15,"value":32709,"toc":32803},[32710,32719,32721,32724,32726,32739,32741,32744,32786,32788,32801],[48,32711,32712,32713,32718],{},"Today marks an important milestone for StreamNative and the broader Apache Pulsar community as we proudly announce the open-sourcing of the ",[55,32714,32717],{"href":32715,"rel":32716},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia-java",[264],"Oxia Java client library",". This release is not just about sharing code - it's a testament to our commitment to the open-source community and driving innovation in real-time data streaming technologies.",[48,32720,3931],{},[48,32722,32723],{},"By open-sourcing Oxia Java, we are laying the groundwork for a direct integration between Oxia and Pulsar within the Pulsar GitHub repository, representing a significant enhancement to Pulsar and the data streaming world.",[48,32725,3931],{},[48,32727,32728,32729,32734,32735,32738],{},"The forthcoming integration of Oxia in Pulsar 3.3 will revolutionize how users architect and scale their Pulsar clusters. The shift away from ZooKeeper dependency means that Pulsar will soon be capable of scaling to support over one million topics per cluster, an unprecedented level of scalability and performance. The discussion within the Pulsar community about incorporating out of the box Oxia support into Pulsar 3.3, outlined in ",[55,32730,32733],{"href":32731,"rel":32732},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F22009",[264],"Pulsar Improvement Proposal 335",", underscores the excitement and anticipation for this integration. For a deeper dive into the impact of Oxia on Pulsar's performance, we highly recommend reading the ",[55,32736,32737],{"href":21529},"Oxia announcement blog post"," by StreamNative CTO Matteo Merli.",[48,32740,3931],{},[48,32742,32743],{},"The Oxia Java library enhances the functionality and developer experience of working with the Oxia metadata service. It includes several key components:",[321,32745,32746,32754,32762,32770,32778],{},[324,32747,32748,32753],{},[55,32749,32752],{"href":32750,"rel":32751},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia-java\u002Ftree\u002Fmain\u002Fclient",[264],"The Oxia Java Client Library",": This is the core component that enables Java applications to interact seamlessly with Oxia's metadata service, featuring request batching, asynchronous operation, notification callbacks, and record caching.",[324,32755,32756,32761],{},[55,32757,32760],{"href":32758,"rel":32759},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Ftree\u002Fmaster\u002Fpulsar-metadata\u002Fsrc\u002Fmain\u002Fjava\u002Forg\u002Fapache\u002Fpulsar\u002Fmetadata\u002Fimpl\u002Foxia",[264],"The Oxia integration for Pulsar",": This is the layer that allows Pulsar to be able to store and retrieve metadata in Oxia. This component was already donated to the Apache Software Foundation and it is now part of Apache Pulsar.",[324,32763,32764,32769],{},[55,32765,32768],{"href":32766,"rel":32767},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia-java\u002Fblob\u002Fmain\u002Fclient-metrics-opentelemetry",[264],"OpenTelemetry Metrics Integration",": This integration allows users to collect, analyze, and export metrics data, making it easier to monitor the performance and health of applications using Oxia.",[324,32771,32772,32777],{},[55,32773,32776],{"href":32774,"rel":32775},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia-java\u002Fblob\u002Fmain\u002Ftestcontainers",[264],"Testcontainer for Local Testing",": This tool simplifies the writing and executing of integration tests by providing a lightweight, disposable instance of Oxia for local testing environments.",[324,32779,32780,32785],{},[55,32781,32784],{"href":32782,"rel":32783},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia-java\u002Ftree\u002Fmain\u002Fperf",[264],"Performance Testing Tool",": With the inclusion of a dedicated performance testing tool, developers can benchmark the performance of their applications with Oxia, identify bottlenecks, and optimize for speed and efficiency.",[48,32787,3931],{},[48,32789,32790,32791,32795,32796,32800],{},"The open sourcing of Oxia Java represents a critical step towards achieving better performance, scalability, and efficiency in data streaming operations. We invite developers, contributors, and the wider community to explore the ",[55,32792,32794],{"href":22142,"rel":32793},[264],"Oxia GitHub repo"," and the ",[55,32797,32799],{"href":32715,"rel":32798},[264],"Oxia Java repo",". Your contributions, whether through starring the repository, forking it, engaging in discussions, or submitting pull requests, are invaluable as we continue to push the boundaries of real-time data streaming technology together.",[48,32802,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":32804},[],"2024-02-28","open sourcing zookeeper replacement, Oxia","\u002Fimgs\u002Fblogs\u002F65e112c4dfe3d744b4a61eb6_image.png",{},"\u002Fblog\u002Fthe-oxia-java-client-library-is-now-open-source",{"title":32705,"description":32806},"blog\u002Fthe-oxia-java-client-library-is-now-open-source",[821,303],"egJ6jwPOiigMz9Y1dExlrbykDkcr96xlaBB7eQuVNsY",{"id":32815,"title":32816,"authors":32817,"body":32818,"category":821,"createdAt":290,"date":33199,"description":33200,"extension":8,"featured":294,"image":33201,"isDraft":294,"link":290,"meta":33202,"navigation":7,"order":296,"path":33203,"readingTime":33204,"relatedResources":290,"seo":33205,"stem":33206,"tags":33207,"__hash__":33208},"blogs\u002Fblog\u002Fchallenges-in-kafka-the-scaling-stories-of-kevin-and-patricia.md","Challenges in Kafka: The Scaling Stories of Kevin and Patricia",[31294],{"type":15,"value":32819,"toc":33182},[32820,32823,32826,32829,32832,32838,32841,32844,32847,32851,32855,32858,32861,32869,32875,32879,32882,32885,32888,32891,32894,32900,32902,32905,32908,32916,32919,32922,32931,32934,32936,32941,32943,32947,32950,32953,32956,32959,32962,32966,32969,32973,32976,32979,32982,32985,32989,32992,32997,33001,33004,33007,33011,33014,33017,33020,33023,33035,33039,33042,33045,33048,33051,33056,33060,33063,33066,33069,33072,33075,33079,33083,33086,33091,33096,33101,33104,33107,33110,33113,33116,33119,33122,33125,33128,33131,33134,33137,33139,33142,33154,33157,33168,33171],[48,32821,32822],{},"Delivering optimal services to users while ensuring cost-effectiveness - this is the age-old problem. How can a data streaming platform be leveraged to achieve this dream?",[48,32824,32825],{},"Elastic horizontal scalability is a key feature in what makes a reality.",[48,32827,32828],{},"Indeed, a horizontally scalable system allows for supporting a significant workload by distributing it across multiple nodes on commodity hardware. A horizontally scalable system can handle a workload of any size, provided it has enough nodes available.",[48,32830,32831],{},"To maintain cost-effectiveness, resources should be allocated to match the workload, scaling the system up by adding nodes or down by removing them as needed.",[48,32833,32834],{},[384,32835],{"alt":32836,"src":32837},"A chart representing the elastic scalability concept: provisioning just the right amount of resources to meet the demand","\u002Fimgs\u002Fblogs\u002F65ca62ec8c739bafad6709f9_8BG9RJGqSHqK_RENQYtDVscyMnsUacmZ75z9mj9wjjsCYL_9Pq0y3frbbR4ZwL4Z7EcTdN7KpQGK3yNvt-cI-853-cJjFLZJpfe_ZT9SkR8XtBMH6hqbReRcWQ2faaBS9UwV1r6Cmr3c-tzCrKApOvs.png",[48,32839,32840],{},"However, in the world of data streaming, not all platforms are equally elastic.",[48,32842,32843],{},"This blog post will explore scaling challenges through the experiences of Kafka users and Pulsar users.",[48,32845,32846],{},"Meet Kevin, a Kafka aficionado, and Patricia, a Pulsar professional. They work at competing e-commerce companies. Both face the task of expanding their clusters. How will their journeys unfold as they encounter challenges related to scaling a customer application, scaling the broker, and scaling the storage?",[48,32848,32849],{},[384,32850],{"alt":18,"src":31942},[40,32852,32854],{"id":32853},"challenge-1-scaling-a-consumer-application","Challenge #1: scaling a consumer application",[48,32856,32857],{},"Both companies share the same architecture pattern: a microservice within their e-commerce platform generates 'Order Placed' events and publishes them to a designated topic.",[48,32859,32860],{},"There are several applications subscribing to this topic, including:",[321,32862,32863,32866],{},[324,32864,32865],{},"A processing payment system. If the payment isn't successfully processed, the order will not be confirmed, and the customer will not receive their order.",[324,32867,32868],{},"A system performing real-time analytics on the 'order placed' events, requiring a strong ordering guarantee to ensure accurate analytics.",[48,32870,32871],{},[384,32872],{"alt":32873,"src":32874},"Diagram that visually represent the text above","\u002Fimgs\u002Fblogs\u002F65ca630a2af57b144f3629c3_ZpervGbjhPgCYpVwK-Lia_GVLo0qht8DHM5JyXKtAdcQ52WMo_7UKErtNqdXXHjtGWUeyUC7twiz1neFuiVvgzoa1LXkr98V6z3FYuJhGXlvtldT3UGs0iw9vRLWIpqRPDpe5_lcIiL4vYRilpYqPEI.png",[32,32876,32878],{"id":32877},"kevins-experience-with-kafka","Kevin’s experience with Kafka",[48,32880,32881],{},"Kevin has received an alert indicating that the average order payment processing time is increasing, resulting in a bad experience for the customers, and a bottleneck in the processing that impacts the bottom line. He understands that the payment processing system can no longer keep up with the pace.",[48,32883,32884],{},"Fortunately, since the payment system is horizontally scalable, Kevin can launch additional consumer instances to manage the load spike and catch up on processing the remaining orders.",[48,32886,32887],{},"That sounds easy, right?",[48,32889,32890],{},"The system has 3 active consumer instances. Kevin needs to allocate 3 more to bring their number to 6.",[48,32892,32893],{},"The topic was initially structured in 3 partitions. Since with Kafka, you cannot have more active consumers than partitions in the same consumer group, Kevin must then increase the topic from 3 to 6 partitions.",[48,32895,32896],{},[384,32897],{"alt":32898,"src":32899},"Diagram that represents the text above","\u002Fimgs\u002Fblogs\u002F65ca63098f0dee5afcd8f26c_1ChfoaQ6Vbhsi7aufti-jXl-bx_4Ypz-1eH_82NgTmukJ0G59qvvGHPaeuxLWnVB96BLgFYDxu-QSaUzInq3ZOkCpxtRDqyFzc0DlzXa1gffl11HOgij-2NdO-9fpCbsncKfCSxV7poDxHF5ZpMMFNA.png",[48,32901,3931],{},[48,32903,32904],{},"Kevin has to perform a partition rebalance to increase the number of partitions.",[48,32906,32907],{},"As a Kafka user, Kevin gets stressed out by repartitioning, especially under heavy load. But Kevin carries out the repartitioning because he has no other choice, and consequently, what he dreaded happens:",[321,32909,32910,32913],{},[324,32911,32912],{},"Changing the number of partitions also disrupts the other consumer systems because it changes the order in which messages are consumed, resulting in incorrect analytics outcomes.",[324,32914,32915],{},"Moreover, during the repartitioning, consumption has been blocked for several long minutes, worsening the customer experience.",[48,32917,32918],{},"Kevin has lost precious time, costing him customers.",[48,32920,32921],{},"Since repartitioning can be both painful and risky, Kevin needs to carefully plan the number of partitions in advance to be well-prepared for the anticipated workload. But you can’t expect the unexpected… Or can you?",[48,32923,32924,32925,32930],{},"Choosing the right number of partitions in advance is, indeed, a common challenge, as demonstrated by the ",[55,32926,32929],{"href":32927,"rel":32928},"https:\u002F\u002Fgoogle.gprivate.com\u002Fsearch.php?search?q=how+to+choose+the+right+number+of+partitions+for+a+Kafka+topic",[264],"plethora of resources ","on choosing the right partition count for a Kafka topic.",[48,32932,32933],{},"Kevin wonders how to prevent this issue from happening again. He might then be tempted to create extra partitions to prepare for such an unexpected surge in workload, but that leads to overprovisioning. And he still has to figure out how many of those extra partitions to create.",[48,32935,3931],{},[48,32937,32938],{},[384,32939],{"alt":18,"src":32940},"\u002Fimgs\u002Fblogs\u002F65ca630996795acea3156352_xt0_QKGeQf4kVPwLYqicn4fUipwXUdi0allzcVK47S9L8i0px-9YisJeGF-TUJTXrsSIFYNpROlqudLCXT_Qh_A41cZBbs9mKF9e9ZTFh6tyTriy_L7dvJMf7iWC66_efnR6N0RVHlVQ6WiSaOudnlM.png",[48,32942,3931],{},[32,32944,32946],{"id":32945},"patricias-experience-with-pulsar","Patricia’s experience with Pulsar",[48,32948,32949],{},"Patricia receives the same alert. She calmly responds, simply adding new consumer instances by scaling up the pods without needing additional operations.",[48,32951,32952],{},"She almost immediately observes a decrease in the order payment processing time.",[48,32954,32955],{},"The issue is resolved in under ten minutes, without any side effects.",[48,32957,32958],{},"Thanks to the Pulsar Shared Subscription feature, Patricia doesn’t need to concern herself with the topic's structure. In Pulsar, scaling consumer instances is independent of the underlying data storage distribution, offering flexibility in system scalability.",[48,32960,32961],{},"Moreover, Patricia contemplates that implementing auto-scaling for consumers could have automatically resolved the issue, eliminating the need for her manual intervention. She realizes the simplicity of doing this with Pulsar and plans to explore it soon.",[40,32963,32965],{"id":32964},"challenge-2-expanding-the-cluster-processing-power","Challenge #2: expanding the cluster processing power",[48,32967,32968],{},"Kevin has a Kafka cluster scaled for a specific load and storing 100TB+ of data. As his system gains popularity, it attracts more consumers and producers and manages more topics, pushing the cluster to its limits. Kevin needs to enhance the cluster's processing power by adding new nodes to accommodate the growing demand.",[32,32970,32972],{"id":32971},"scaling-kevins-kafka-cluster","Scaling Kevin’s Kafka Cluster",[48,32974,32975],{},"However, as an experienced Kafka user, Kevin knows this process can be challenging.",[48,32977,32978],{},"Indeed, the new nodes don't immediately contribute to handling the extra load. Kevin must first rebalance the data across the cluster, a resource-intensive task that affects performance and increases costs. Only after the data rebalance is complete can Kevin finally relax, knowing the cluster has been successfully scaled up.",[48,32980,32981],{},"Kevin is disappointed because when he urgently needs more processing power, he has to restructure the data storage, which impedes the system's ability to adapt to workload fluctuations quickly. Kevin recognizes that he should have anticipated this change, embodying the principle of \"expecting the unexpected.\" However, even if you can accurately anticipate your future needs, if you request those partitions ahead of time, you will be wasting that storage until you scale up to the amount requested. As we now know, re-partitioning in the future is also costly, so there is really no way to plan for the future when expansion is painful either way.",[48,32983,32984],{},"Kevin dreams of a solution where processing power and data storage are separate entities, enabling more efficient adaptability to change.",[32,32986,32988],{"id":32987},"scaling-patricias-pulsar-cluster","Scaling Patricia’s Pulsar Cluster",[48,32990,32991],{},"On the other hand, Patricia simply adds more broker nodes to her Pulsar cluster to manage the increased load. That's all there is to it! There's no need for Patricia to perform any data movement; the newly added node becomes instantly available. For Patricia, Kevin's dream of decoupling storage and computing is a reality.",[48,32993,32994],{},[384,32995],{"alt":18,"src":32996},"\u002Fimgs\u002Fblogs\u002F65ca630abf9723c895795bb1_TAbskdrAH7FpNVOlzseF3e99lz4utCO07gDZb5UWb0VRZ5BEwyGgPlTehmFCysHnnGsVqTg_aw6ZHcQ_zxR06tpp2bEWcOrVEX4vBYQtZoOreufQeJcv1VDX0syxciM4IZp42uZyheFczwsy64lldLQ.png",[40,32998,33000],{"id":32999},"challenge-3-expanding-the-storage-capacity","Challenge #3: expanding the storage capacity",[48,33002,33003],{},"Kevin must create a Kafka topic that can hold up to 100 TB of data.",[48,33005,33006],{},"Since Kevin is using Kafka, he must determine the number of partitions he needs to hold this much data, and since each of his Kafka broker nodes has 4TB of attached storage, he decides that 25 will be enough.",[32,33008,33010],{"id":33009},"kevins-experience-with-expanding-the-storage-capacity","Kevin’s experience with expanding the storage capacity",[48,33012,33013],{},"After a few months, Kevin sees that the Kafka broker nodes are starting to fill up. This is because he forgot to account for the replica storage. So he fixes it and thinks it’s done. But…",[48,33015,33016],{},"Now, the storage requirement for this topic has increased by 20%. Kevin has to increase the number of partitions on the topic AGAIN…",[48,33018,33019],{},"It means performing a partition rebalance, moving huge amounts of data across the broker, impacting the performance… Kevin feels wary of this. He feels like he should have anticipated this, but how?",[48,33021,33022],{},"Kevin understands that enabling tiered storage may not significantly alleviate his concerns due to several factors:",[1666,33024,33025,33028],{},[324,33026,33027],{},"The rate of incoming data exceeds the rate at which old data is offloaded. Since the data is coming in faster than the old data can be offloaded, he will still run out of space for the new data.",[324,33029,33030,33031],{},"When data needs to be read from tiered-storage, it must be stored on a local disk before it is read. Since Kevin’s cluster disk space is limited, then it doesn't have space to write it, making it inaccessible. See ",[55,33032,33033],{"href":33033,"rel":33034},"https:\u002F\u002Faiven.io\u002Fdocs\u002Fproducts\u002Fkafka\u002Fconcepts\u002Ftiered-storage-how-it-works#data-retrieval",[264],[32,33036,33038],{"id":33037},"patricias-experience-with-expanding-the-storage-capacity","Patricia’s experience with expanding the storage capacity",[48,33040,33041],{},"On the other hand, Patricia, as a Pulsar user, just creates a topic with a single partition and lets the Pulsar segment-based storage layer take care of the rest. She does not need to plan the number of partitions in advance.",[48,33043,33044],{},"Each of her storage nodes comes with 4TB of attached storage, yet there's no immediate need to set up 25 nodes to store the final 100TB. For the time being, 2 nodes suffice since the topic is nowhere near reaching 100TB for now, and she requires 4 additional nodes to ensure data replication.",[48,33046,33047],{},"After a few days, the volume of data stored in the topic is approaching the maximum accumulated storage capacity of the nodes. At this point, Patricia adds 3 more nodes, and that's all there is to it—no rebalancing, no data migration, and no downtime. Just the right amount of nodes, and the new ones are instantly ready to store more data.",[48,33049,33050],{},"Patricia is quite confident in her ability to smoothly expand storage, all thanks to Pulsar's segment-based architecture that intelligently distributes the data across the cluster.",[48,33052,33053],{},[384,33054],{"alt":18,"src":33055},"\u002Fimgs\u002Fblogs\u002F65ca630abc4f2a01264e9363_if_4sFFnGa3UYt7FpU7JjQY2k5qf4f_F7Ahf-0rvRmLNAr7N1TLZVAGEQOUbghVBD0ZBaW2lLJjE1FPhj6ndRpQFfLmu4z5f4fGQhzzJHFgszq4u6kiNTrKkNVT1P6JE-26CpMoXaNiXggqt333lTR4.png",[40,33057,33059],{"id":33058},"challenge-4-cutting-cluster-costs","Challenge #4: cutting cluster costs",[48,33061,33062],{},"Scaling down the cluster to cut costs is a smooth process for Patricia, but for Kevin, that's a tricky proposition.",[48,33064,33065],{},"Now that the workload has been reduced, Kevin ends up with an overprovisioned Kafka cluster.",[48,33067,33068],{},"Kevin faces challenges in reducing his cluster size. Shrinking means a complex rebalancing act, AGAIN, risking system availability and performance. Kevin knows it isn’t just about pulling plugs; careful planning and execution are needed to avoid disruptions. Kevin wonders: ‘What if I could have a truly elastic platform?’",[48,33070,33071],{},"Patricia, on the other hand, has a more flexible setup with Pulsar. Downscaling consumers? Just tear down consumer instances. Downscaling the cluster? Just tear down broker nodes. Shrinking the storage? Just offload the data to external, cheap storage like S3, then remove bookie nodes. All these capabilities are built into Pulsar and have been battle-tested for years in production.",[48,33073,33074],{},"Essentially, while Kevin navigates a tightrope to optimize costs with Kafka, Patricia finds it effortless with Pulsar.",[48,33076,33077],{},[384,33078],{"alt":18,"src":32159},[40,33080,33082],{"id":33081},"recap","Recap",[48,33084,33085],{},"Below is a summary of Kevin and Patricia's experiences.",[48,33087,33088],{},[44,33089,33090],{},"Challenge",[48,33092,33093],{},[44,33094,33095],{},"Kevin - Kafka",[48,33097,33098],{},[44,33099,33100],{},"Patricia - Pulsar",[48,33102,33103],{},"Scaling out consumers to manage an unexpected workload surge",[48,33105,33106],{},"The topic does not have enough partitions to parallelize consumption sufficiently. Kevin must then struggle to repartition the topic while minimizing the impact as much as possible.",[48,33108,33109],{},"The Pulsar partitionless subscription model allows Patricia to add more consumer instances seamlessly with peace of mind.",[48,33111,33112],{},"Expanding the processing power on the cluster",[48,33114,33115],{},"Kevin must wait for the data to rebalance across the cluster before the newly added nodes can take on the extra workload.",[48,33117,33118],{},"Patricia adds new broker nodes, immediately handling the additional workload.",[48,33120,33121],{},"Expanding the storage capacity on the cluster",[48,33123,33124],{},"Kevin must meticulously plan for the necessary storage, overprovisioning partitions and nodes.",[48,33126,33127],{},"Patricia is less concerned about these considerations. She can effortlessly expand storage capacity by adding more nodes.",[48,33129,33130],{},"Downscaling to cut costs",[48,33132,33133],{},"Shrinking the cluster by removing nodes means a painful rebalancing act for Kevin.",[48,33135,33136],{},"Patricia simply removes nodes.",[40,33138,32197],{"id":32196},[48,33140,33141],{},"To go further into these topics, feel free to:",[321,33143,33144,33149],{},[324,33145,33146,33147],{},"join David on February 22 as he delves into those challenges and explains how Pulsar addresses them: ",[55,33148,32214],{"href":32213},[324,33150,33151,33152],{},"Read our blog post in which Caito Scherr explores the differences between partitioning and segmentation: ",[55,33153,31679],{"href":31678},[48,33155,33156],{},"If you would rather be in Patricia's shoes instead of Kevin's:",[321,33158,33159,33164],{},[324,33160,32227,33161],{},[55,33162,32232],{"href":32230,"rel":33163},[264],[324,33165,32235,33166],{},[55,33167,32239],{"href":32238},[48,33169,33170],{},"Want to grasp Pulsar concepts in 10’?",[321,33172,33173,33178],{},[324,33174,33175],{},[55,33176,32257],{"href":32255,"rel":33177},[264],[324,33179,3931,33180],{},[55,33181,32264],{"href":32263},{"title":18,"searchDepth":19,"depth":19,"links":33183},[33184,33188,33192,33196,33197,33198],{"id":32853,"depth":19,"text":32854,"children":33185},[33186,33187],{"id":32877,"depth":279,"text":32878},{"id":32945,"depth":279,"text":32946},{"id":32964,"depth":19,"text":32965,"children":33189},[33190,33191],{"id":32971,"depth":279,"text":32972},{"id":32987,"depth":279,"text":32988},{"id":32999,"depth":19,"text":33000,"children":33193},[33194,33195],{"id":33009,"depth":279,"text":33010},{"id":33037,"depth":279,"text":33038},{"id":33058,"depth":19,"text":33059},{"id":33081,"depth":19,"text":33082},{"id":32196,"depth":19,"text":32197},"2024-02-12","This blog post delves into the world of data streaming, examining the scaling challenges on Kafka and Pulsar through stories.","\u002Fimgs\u002Fblogs\u002F65ca6138ef125c69ae4f20bc_kevin-patrici-1200x630.png",{},"\u002Fblog\u002Fchallenges-in-kafka-the-scaling-stories-of-kevin-and-patricia","7 min read",{"title":32816,"description":33200},"blog\u002Fchallenges-in-kafka-the-scaling-stories-of-kevin-and-patricia",[799,5954],"YTArMtmYt5B-F6dkKA7HRq-nAutVCXbApenfvguFIEA",{"id":33210,"title":33211,"authors":33212,"body":33213,"category":290,"createdAt":290,"date":33362,"description":33363,"extension":8,"featured":294,"image":33364,"isDraft":294,"link":290,"meta":33365,"navigation":7,"order":296,"path":31678,"readingTime":33366,"relatedResources":290,"seo":33367,"stem":33368,"tags":33369,"__hash__":33370},"blogs\u002Fblog\u002Fdata-streaming-patterns-series-what-you-didnt-know-about-partitioning-in-stream-processing.md","Data Streaming Patterns Series:  What You Didn’t Know About Partitioning in Stream Processing",[31718],{"type":15,"value":33214,"toc":33357},[33215,33219,33222,33225,33229,33232,33235,33239,33242,33244,33249,33252,33254,33257,33260,33263,33266,33270,33273,33276,33278,33283,33285,33288,33291,33294,33297,33304,33307,33310,33316,33318,33321,33324,33329,33332,33335,33355],[3933,33216,33218],{"id":33217},"partitioning-in-stream-processing-what-does-it-do-and-why-is-it-not-what-i-thought-it-was","Partitioning in stream processing: “what does it do, and why is it not what I thought it was?”",[48,33220,33221],{},"Partitioning has been strongly associated with stream processing from the beginning. At the start, Kafka changed the industry by introducing distributed logs as a groundbreaking way to solve the major challenges of ingesting large volumes of server logs into a single platform. The idea of separating out and distributing a single log file solved the problem of providing horizontally scalable storage, which is essential to sustainably streaming data. However, the choice to use partitions to implement this structure comes with drawbacks. The implementation has been exposed to the end user in a way that has made partitioning become synonymous with event streaming - it has become the default for many users in stream processing.",[48,33223,33224],{},"Unfortunately, the pain points that come with it have also become the norm for many use cases. However, different stream processing engines handle partitioning - and alternatives to partitioning - very differently. This article covers what is often misunderstood about partitioning, how it relates to steam processing (or does it?), and what alternatives are available.",[40,33226,33228],{"id":33227},"overview","Overview",[48,33230,33231],{},"Partitioning, and related concepts like logs, have been around for a long time in software engineering and are prevalent in other industries as well. This can make for a nice foundation for understanding how these concepts fit into stream processing. However, it can also be confusing when, even within the data streaming world, these concepts function very differently depending on the technology implementing them.",[48,33233,33234],{},"Even for concepts like stream processing itself, the definitions can vary widely, so following is a brief overview of how we will be using these concepts.",[3933,33236,33238],{"id":33237},"logs","Logs",[48,33240,33241],{},"Many of you will already be familiar with logs as a very basic data structure used across the software industry. Since logs are automatically and instantly instantiated, they are a highly reliable source of truth for a variety of functions. They are an essential component for analytics and testing, where the log acts as a record for any errors, anomalies, or configured alerts. Logs are also a core foundation for databases which rely on a changelog structure to track and record any modifications performed on their tables. On the most basic level, logs are an append-only sequence of inserted data, ordered by time. Logs can contain any data type.",[48,33243,3931],{},[48,33245,33246],{},[384,33247],{"alt":18,"src":33248},"\u002Fimgs\u002Fblogs\u002F65c6725bef2c5d8ab93a1cda_e-LQp83whIdhO5n3bVZC-GkINPzbGfydnzekylgE-XADm7qiucICaXflW12usL7Lyz72RAYDM3_fPwQFaHhPCtX_J6c6VfXG6a-5P8aefS6mCZkNzl97zTUc95Qaa5YHfBv51EpGELAjRUv-B-G6hY8.jpeg",[48,33250,33251],{},"Rebalancing can be risky.",[48,33253,3931],{},[48,33255,33256],{},"This has become the “norm” for many stream processing users, and it comes with some pain points that may be very familiar to you if you have experience in these technologies. Partitioning this way means the topics can’t accept more data than this fixed amount. This means that to scale the application at all, the number of partitions must be increased.",[48,33258,33259],{},"Unfortunately, this creates a dilemma where higher numbers of partitions can get very expensive, and increasing the number of partitions requires rebalancing. Rebalancing is required anytime partitions are increased because Kafka utilizes a hash key system for partitioning, where data placement is coupled to the hash key. A new partition being added means that the process essentially has to start all over again: all of the data will need to be reallocated across the new number of partitions. This also means that new data cannot continue to flow through the topics during the rebalancing.",[48,33261,33262],{},"Consider a typical process of building a new product or application within your company. In other technologies that implement partitions, the user would have to anticipate the number of partitions needed before the application can even be used- you can’t send data through a topic without having already established the number of partitions. It may be a while before the project needs to scale up at all, meaning that an expensive amount of storage could be reserved and not even used for a considerable amount of time, and many companies may not want to invest in this expense without a return on investment in sight. The only other option that the user has is to save on initial storage cost by setting a lower number of partitions. However, if the application needs to scale in the future (which is usually the goal for most stream processing applications), rebalancing is required, and this process comes with its own expenses, as well as risks as outlined above.",[48,33264,33265],{},"It is often overlooked that partitioning is also not inherently a streaming pattern. Users have had to predetermine the number of partitions from the very introduction of stream processing, making it the standard and expected experience. This strong association between partitioning and streaming leads to the impression that the two are intertwined. However, this is not the case: partitioning is simply an implementation detail of Kafka itself. Moreover, the fact that this internal detail is exposed to the end user violates basic OOP interface design principles, specifically of the implementation being able to be changed at a later point without breaking the API.",[40,33267,33269],{"id":33268},"segmentation-the-scalable-alternative","Segmentation: The Scalable Alternative",[48,33271,33272],{},"What is it?",[48,33274,33275],{},"Similar to partitioning, segmentation takes a single log and breaks it down into a sequence of smaller pieces. These pieces, or segments, are distributed across nodes that all reside within a separate storage layer. By default, there are 50k records per segment when the segments are instantiated, but this number is also configurable.",[48,33277,3931],{},[48,33279,33280],{},[384,33281],{"alt":18,"src":33282},"\u002Fimgs\u002Fblogs\u002F65c6725bc0273eff18538d4d_DzzAdjZYqPJ5Un22GJww4MUIDX6C5iIhiXKrTQ2UT7XJofLvj1cPF_l46V5KABIidaYkiZa5nRwtuG9Ww_rmiGiZY3Hwa1rHTVO5SF8fl4OTdPAOhhgQJ-CZCKTZVGXC3dcMhxWyq96MpMCfQbf9n3M.png",[48,33284,3931],{},[48,33286,33287],{},"A whole segment is stored on a single bookie. And, in order to ensure proper redundancy, Apache Pulsar, by default, creates several replicas of each segment. Moreover, these segments can be stored anywhere within this layer where storage is available. By comparison, in Kafka, partitions are stored on a specific broker within the cluster, which is very limiting. The only potential downside here would be tracking where all the segments are stored, but Apache Pulsar already automatically creates a metadata store that contains a record of segments in a topic.",[48,33289,33290],{},"How is this useful?",[48,33292,33293],{},"Utilizing the storage in this way allows any individual log the same storage availability of however many machines that its segments are stored across, making it truly horizontally scalable. Additionally, the number of nodes can be dynamically increased within this separate storage layer, thus making it easy to scale up the overall storage capacity. All that is required is to increase the number of bookies, and new segments will automatically be written to the newly created bookies.",[48,33295,33296],{},"As we covered earlier, in Kafka, this same action would have required the costly and risky act of rebalancing. The fact that each segment is stored on separate bookie brings even more of a contrast between Apache Pulsar vs Kafka. In Apache Pulsar, if a bookie goes down or incurs any issues, there is no impact on the rest of the system. However with Kafka, since brokers are shared between partitions, if there is an error with one partition, it could easily impact other topics that share that resource.",[48,33298,33299,33300,33303],{},"In Pulsar, the storage and computing layers are already separated, which naturally eliminates the need to rebalance the data. This creates a drastic difference in storage requirements and performance by eliminating most of the storage issues while maintaining resilient failover. Apache Pulsar’s architecture already decouples the layer serving the messages from the storage layer, which “allow",[2628,33301,33302],{},"s"," each to scale independently” (Kjerrumgaard, “Apache Pulsar in Action”). This contrasts with technologies like Kafka, where the serving and storage layers occur on the same instances or cluster nodes.",[48,33305,33306],{},"Additionally, the integration of Apache BookKeeper provides persistent message storage. Apache BookKeeper creates segments, converting and storing data streams into sequential logs. Each segment contains multiple logs distributed among multiple “Bookies” (BookKeeper nodes). These segments are all contained within the storage layer, and thus provide extra redundancy. They can also reside anywhere within this layer where there is sufficient storage capacity. Because of this and since there is not the same cost to increasing the number of segments, horizontal scalability is inherently possible.",[48,33308,33309],{},"Structuring the architecture in this way is what enables segmentation to be a true stream processing pattern. The storage is truly horizontally scalable, and all of this management and configuration can be done dynamically, with no limitations, slowdowns or pauses to the stream of data.",[48,33311,24328,33312,33315],{},[384,33313],{"alt":18,"src":33314},"\u002Fimgs\u002Fblogs\u002F65c6725b1470a364b08c12ac_NxnrTq_Ym8Esi4LtMoHdNL4w1ur50vhpW6CET6o1HD0uaOfZHEhgeMbN_kjRtkQTnReBsAmysiFBpUvA0OCXo_VfKZdEV4Bryz5j0HgXZWwMtLPR3-O6nNBYqBSufFJX4Mz6ffJF2JBDbixDkk6T_gQ.png","\n‍",[40,33317,2125],{"id":2122},[48,33319,33320],{},"Is partitioning a feature or a bug? The answer to this is that it is not a true stream processing concept. So, it may not be a bug in other contexts, however, there are alternatives out there that are better suited for stream processing.",[48,33322,33323],{},"Where partitioning is hindered by a rigid, predetermined number of partitions and therefore amount (and cost) of available storage, segmentation provides a dynamic option with true horizontal scaling. In contrast, segmentation also offers the best of both worlds -reliability and efficiency. Segmentation automatically creates replication where needed and guaranteed reads, while also providing a highly performant read\u002Fwrite process and capacity for low latency and high throughput. And then there’s the user experience - no rebalancing required! Not only does this add to the lowered risk and cost of this method, but it eliminates many of the pain points that have often become expected with the development process of stream processing.",[48,33325,24328,33326],{},[384,33327],{"alt":18,"src":33328},"\u002Fimgs\u002Fblogs\u002F65c6725bf56d6f915984b25b_z7iiH_lSWWHKqOpvKoZs-cvNC5za4YCAuI3ey8gJ6aSf3gzQMXi61dxTtimTfhzCUxVgnUZJ4XGvPrlWDUYgeYORkU2i0Del8MDSmOl7nV_FK5cFEkg_HkdEiiel7k0rQ7l4sJ6pk8bIF52EXv_ObYE.png",[3933,33330,33331],{"id":32196},"Want to Learn More?",[48,33333,33334],{},"For more on Pulsar, check out the resources below.",[1666,33336,33337,33341,33345,33350],{},[324,33338,31889,33339,190],{},[55,33340,31893],{"href":31892},[324,33342,31896,33343,190],{},[55,33344,31899],{"href":27773},[324,33346,31902,33347,190],{},[55,33348,31906],{"href":31692,"rel":33349},[264],[324,33351,31909,33352,190],{},[55,33353,31914],{"href":31912,"rel":33354},[264],[48,33356,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":33358},[33359,33360,33361],{"id":33227,"depth":19,"text":33228},{"id":33268,"depth":19,"text":33269},{"id":2122,"depth":19,"text":2125},"2024-02-09","A review of partitioning in stream processing - what does it do, and why this isn't what you were taught, and what available alternatives there are.","\u002Fimgs\u002Fblogs\u002F65ce579d76a7249d8216d22a_scaled-image.jpeg",{},"7 min.",{"title":33211,"description":33363},"blog\u002Fdata-streaming-patterns-series-what-you-didnt-know-about-partitioning-in-stream-processing",[7347,799,1331],"i0y81Ar6sJEmsDQ0_Rdi3vibW6pd1RneWxwxUGT24nM",{"id":33372,"title":33373,"authors":33374,"body":33375,"category":7338,"createdAt":290,"date":33687,"description":33688,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":33689,"navigation":7,"order":296,"path":33690,"readingTime":33691,"relatedResources":290,"seo":33692,"stem":33693,"tags":33694,"__hash__":33695},"blogs\u002Fblog\u002Fmeet-us-at-upcoming-conferences-jan-2024.md","Meet Us At Upcoming Conferences",[31294],{"type":15,"value":33376,"toc":33674},[33377,33380,33383,33386,33388,33392,33394,33398,33401,33404,33407,33410,33417,33421,33424,33427,33430,33436,33442,33446,33449,33452,33455,33461,33467,33471,33474,33477,33480,33484,33490,33494,33497,33500,33502,33506,33512,33516,33519,33522,33530,33534,33540,33544,33547,33551,33554,33557,33560,33564,33570,33574,33577,33580,33583,33587,33593,33597,33600,33603,33606,33610,33616,33620,33631,33635,33638,33641,33644,33648,33654,33656,33659,33672],[48,33378,33379],{},"Exciting news! StreamNative team of developer advocates will be attending a series of conferences worldwide, and we will also be at Kafka Summit London! This is your golden opportunity to meet our experts in person, dive into engaging conversations, and explore the latest in data streaming technology.",[48,33381,33382],{},"Our developer advocates are eager to share their knowledge, understand your challenges, and help you navigate the ever-evolving landscape of data streaming and messaging.",[48,33384,33385],{},"Here's where you can find us in the coming months:",[48,33387,3931],{},[8300,33389,33391],{"id":33390},"upcoming","Upcoming",[48,33393,3931],{},[40,33395,33397],{"id":33396},"jdevsummit-il","JDevSummit IL",[48,33399,33400],{},"Location: Tel Aviv, Israel",[48,33402,33403],{},"Date: April 4, 2024",[48,33405,33406],{},"Talk: Scalable distributed messaging & streaming with Apache Pulsar",[48,33408,33409],{},"Speaker: Asaf Mesika",[48,33411,33412,33413],{},"More information: ",[55,33414,33415],{"href":33415,"rel":33416},"https:\u002F\u002Fjdevsummitil.com\u002Fspeakers\u002Fasaf-mesika\u002F",[264],[40,33418,33420],{"id":33419},"devoxx-france","Devoxx France",[48,33422,33423],{},"Location: Paris, France",[48,33425,33426],{},"Date: April 17-19, 2024",[48,33428,33429],{},"Talk: Apache Pulsar: Finally an Alternative to Kafka? (in French)",[48,33431,33432,33433],{},"Speaker: ",[55,33434,31294],{"href":33435},"\u002Fpeople\u002Fjulien",[48,33437,33412,33438],{},[55,33439,33440],{"href":33440,"rel":33441},"https:\u002F\u002Fwww.devoxx.fr\u002Fen\u002F",[264],[40,33443,33445],{"id":33444},"real-time-analytics-summit","Real-Time Analytics Summit",[48,33447,33448],{},"Location: San Jose, California",[48,33450,33451],{},"Date: May 8, 2024",[48,33453,33454],{},"Talk: Empowering Real-Time IoT Analytics with Apache Pulsar and Apache Pinot",[48,33456,33432,33457],{},[55,33458,33460],{"href":33459},"\u002Fpeople\u002Fdavid","David Kjerrumgaard ",[48,33462,33412,33463],{},[55,33464,33465],{"href":33465,"rel":33466},"https:\u002F\u002Fwww.rtasummit.com\u002Fagenda\u002Fsessions\u002F566916",[264],[40,33468,33470],{"id":33469},"devops-pro-europe","DevOps Pro Europe",[48,33472,33473],{},"Location: Online",[48,33475,33476],{},"Date: May 19, 2024",[48,33478,33479],{},"Talk: Apache Pulsar: Finally an Alternative to Kafka?",[48,33481,33432,33482],{},[55,33483,31294],{"href":33435},[48,33485,33412,33486],{},[55,33487,33488],{"href":33488,"rel":33489},"https:\u002F\u002Fevents.pinetool.ai\u002F3152\u002F#sessions\u002F105084",[264],[40,33491,33493],{"id":33492},"jcon-open-blend","JCON Open Blend",[48,33495,33496],{},"Location: Portorož, Slovenia",[48,33498,33499],{},"Date: May 31, 2024",[48,33501,33479],{},[48,33503,33432,33504],{},[55,33505,31294],{"href":33435},[48,33507,33412,33508],{},[55,33509,33510],{"href":33510,"rel":33511},"https:\u002F\u002Fmakeit.si\u002Fsessions\u002F",[264],[40,33513,33515],{"id":33514},"community-over-code-eu","Community Over Code EU",[48,33517,33518],{},"Location: Bratislava, Slovakia",[48,33520,33521],{},"Date: June 3-5, 2024",[48,33523,33524,33525],{},"Talk: ",[55,33526,33529],{"href":33527,"rel":33528},"https:\u002F\u002Fsessionize.com\u002Fapp\u002Fspeaker\u002Fsession\u002F596860",[264],"Apache Oxia - Finally a Scalable Alternative to Apache Zookeeper",[48,33531,33432,33532],{},[55,33533,28],{"href":33459},[48,33535,33412,33536],{},[55,33537,33538],{"href":33538,"rel":33539},"https:\u002F\u002Feu.communityovercode.org\u002F",[264],[8300,33541,33543],{"id":33542},"past-talks-2024","Past talks - 2024",[48,33545,33546],{},"This year, we had the pleasure of speaking at the following conferences and meetups:",[40,33548,33550],{"id":33549},"spring-meetup-paris","Spring Meetup Paris",[48,33552,33553],{},"Location: Paris",[48,33555,33556],{},"Date: January 23, 2024",[48,33558,33559],{},"Talk: Apache Pulsar: Finally an Alternative To Kafka? (in French)",[48,33561,33432,33562],{},[55,33563,31294],{"href":33435},[48,33565,33412,33566],{},[55,33567,33568],{"href":33568,"rel":33569},"https:\u002F\u002Fwww.meetup.com\u002Fspring-meetup-paris\u002Fevents\u002F298027133\u002F",[264],[40,33571,33573],{"id":33572},"conf42-devops","Conf42 DevOps",[48,33575,33576],{},"Location: Virtual",[48,33578,33579],{},"Date: January 25, 2024",[48,33581,33582],{},"Talk: Choosing the Right Messaging Platform for Your Event-Driven Application: RabbitMQ, Kafka, or Pulsar?",[48,33584,33432,33585],{},[55,33586,31294],{"href":33435},[48,33588,33412,33589],{},[55,33590,33591],{"href":33591,"rel":33592},"https:\u002F\u002Fwww.conf42.com\u002FDevOps_2024_Julien_Jakubowski_right_messaging_platform_rabbitmq",[264],[40,33594,33596],{"id":33595},"bejug","BeJUG",[48,33598,33599],{},"Location: Kortrijk, Belgium",[48,33601,33602],{},"Date: February 26, 2024",[48,33604,33605],{},"Talk: Apache Pulsar: Finally an Alternative To Kafka?",[48,33607,33432,33608],{},[55,33609,31294],{"href":33435},[48,33611,33412,33612],{},[55,33613,33614],{"href":33614,"rel":33615},"https:\u002F\u002Fwww.meetup.com\u002Fbelgian-java-user-group\u002F",[264],[40,33617,33619],{"id":33618},"kafka-summit-london","Kafka Summit London",[48,33621,33622,33623,33627,33628,190],{},"From March 19 to March 20, you will find us at our booth at ",[55,33624,33619],{"href":33625,"rel":33626},"https:\u002F\u002Fwww.kafka-summit.org\u002Fevents\u002Fkafka-summit-london-2024\u002Fabout",[264],"! Book time with us ",[55,33629,267],{"href":33630},"\u002Flp\u002Fkafka-summit-meeting",[40,33632,33634],{"id":33633},"data-saturday-phoenix","Data Saturday Phoenix",[48,33636,33637],{},"Location: Phoenix, Arizona, United States",[48,33639,33640],{},"Date: March 24, 2024",[48,33642,33643],{},"Talk: From Zero to Streaming Hero: A Quick Introduction to Stream Processing",[48,33645,33432,33646],{},[55,33647,28],{"href":33459},[48,33649,33412,33650],{},[55,33651,33652],{"href":33652,"rel":33653},"https:\u002F\u002Fdatasaturdays.com\u002F2024-03-23-datasaturday0038\u002F#schedule",[264],[48,33655,3931],{},[48,33657,33658],{},"We're excited to meet you, share ideas, and explore the future of technology together. See you there!",[48,33660,33661,33662,4003,33667,190],{},"For the latest updates, follow us on ",[55,33663,33666],{"href":33664,"rel":33665},"https:\u002F\u002Ftwitter.com\u002Fstreamnativeio",[264],"X",[55,33668,33671],{"href":33669,"rel":33670},"https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fstreamnative\u002F",[264],"LinkedIn",[48,33673,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":33675},[33676,33677,33678,33679,33680,33681,33682,33683,33684,33685,33686],{"id":33396,"depth":19,"text":33397},{"id":33419,"depth":19,"text":33420},{"id":33444,"depth":19,"text":33445},{"id":33469,"depth":19,"text":33470},{"id":33492,"depth":19,"text":33493},{"id":33514,"depth":19,"text":33515},{"id":33549,"depth":19,"text":33550},{"id":33572,"depth":19,"text":33573},{"id":33595,"depth":19,"text":33596},{"id":33618,"depth":19,"text":33619},{"id":33633,"depth":19,"text":33634},"2024-01-22","Join the StreamNative team at global conferences including Kafka Summit London for insights into data streaming technology. Meet our experts like Julien Jakubowski, explore Apache Pulsar as an alternative to Kafka, and delve into real-time analytics and DevOps. Stay updated with our worldwide events from Paris to San Jose and engage in transformative tech conversations. Follow us for the latest on these exciting opportunities.",{},"\u002Fblog\u002Fmeet-us-at-upcoming-conferences-jan-2024","3 min",{"title":33373,"description":33688},"blog\u002Fmeet-us-at-upcoming-conferences-jan-2024",[799,5376],"h5bRz00NqziLLWSgWt--ulGgSpkz58neKYCVS8QDl3k",{"id":33697,"title":33698,"authors":33699,"body":33700,"category":3550,"createdAt":290,"date":34087,"description":34088,"extension":8,"featured":294,"image":34089,"isDraft":294,"link":290,"meta":34090,"navigation":7,"order":296,"path":34091,"readingTime":11508,"relatedResources":290,"seo":34092,"stem":34093,"tags":34094,"__hash__":34095},"blogs\u002Fblog\u002Fstreamnatives-2023-year-in-review.md","StreamNative’s 2023 Year in Review",[4496],{"type":15,"value":33701,"toc":34080},[33702,33705,33709,33712,33722,33725,33736,33739,33745,33748,33756,33759,33767,33770,33773,33780,33783,33789,33792,33799,33802,33810,33813,33828,33831,33838,33840,33846,33849,33863,33867,33870,33881,33884,33895,33898,33912,33915,33919,33968,33972,33975,34049,34053,34060,34063,34066,34074],[48,33703,33704],{},"At StreamNative, we had a productive 2023, improving the developer and operator experiences for our customers, and making it easier than ever to leverage Pulsar’s capabilities. We also continued to collaborate closely with the Apache Pulsar community and reiterate StreamNative’s commitment to open source ecosystems. Let’s conclude 2023 by recapping some of these accomplishments, and look at where we’re going this year.",[40,33706,33708],{"id":33707},"features","Features",[48,33710,33711],{},"Kafka on StreamNative (KSN) (Public Preview)",[48,33713,11159,33714,33716,33717,190],{},[55,33715,1582],{"href":29597},", now in Public Preview, it’s easier than ever for enterprises using Kafka to leverage Pulsar's enhanced capabilities, including multi-tenancy, geo-replication, tiered storage, and unmatched scalability and elasticity. KSN builds on KoP, but contains even more features. To learn more about KSN, check out our  \u002F ",[55,33718,33721],{"href":33719,"rel":33720},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-BmxsE3q4Vcs4_i1zTZ4nWL6Y39N91x",[264],"YouTube Playlist",[48,33723,33724],{},"StreamNative Private Cloud (Public Preview)",[48,33726,33727,33731,33732],{},[55,33728,33730],{"href":33729},"\u002Fblog\u002Fintroducing-streamnative-private-cloud","StreamNative Private Cloud",", the StreamNative Operator, streamlines the deployment, scaling, and management of self-managed Pulsar clusters. This allows businesses to effortlessly orchestrate intricate data streams, shifting their focus to gleaning valuable insights from their data. ",[55,33733,4108],{"href":33734,"rel":33735},"https:\u002F\u002Fdocs.streamnative.io\u002Fprivate\u002Fprivate-cloud-overview",[264],[48,33737,33738],{},"Streaming Lakehouse: Pulsar’s Lakehouse tiered storage (Private Preview)",[48,33740,33741,33744],{},[55,33742,33743],{"href":29601},"Pulsar's Streaming Lakehouse"," can integrate with well-known lakehouse storage solutions like Delta Lake, Apache Hudi, and Apache Iceberg, enabling cost savings by using the tiered storage solution of your choice.",[48,33746,33747],{},"Functions on StreamNative Cloud",[48,33749,33750,33755],{},[55,33751,33754],{"href":33752,"rel":33753},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Ffunctions-overview",[264],"Pulsar Functions are now available on StreamNative Cloud."," Functions enable you to build lightweight, real-time data pipelines for ETL jobs, event-driven applications, and simple data analytics applications.",[48,33757,33758],{},"Functions on StreamNative allow easy debugging via function logs. Sidecar mode enables use of function logs in production with minimum performance impact",[48,33760,33761,33762,190],{},"To learn more about using Pulsar Functions on StreamNative Cloud, check out our ",[55,33763,33766],{"href":33764,"rel":33765},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=l7VwA8HucH8&list=PL7-BmxsE3q4V8cMgsTtDA64OJtC25blxn",[264],"playlist of Pulsar functions",[48,33768,33769],{},"StreamNative also provides the SQL-like abstraction - pfsql to enable users to compose filtering\u002Frouting\u002Fprojecting functions. It largely simplifies the development and deployment process for these tasks.",[48,33771,33772],{},"Revocable Cloud API Keys (Public Preview)",[48,33774,33775,33779],{},[55,33776,33778],{"href":33777},"\u002Fblog\u002Fsecure-your-pulsar-cluster-with-revocable-api-keys","StreamNative API Keys"," offer a dual advantage: they provide a flexible authentication solution compatible with a wide range of clients and enable key rotation at regular intervals to bolster security and compliance.",[48,33781,33782],{},"Broker Autoscaling (Public Preview)",[48,33784,33785,33788],{},[55,33786,31214],{"href":25530,"rel":33787},[264]," balances the message processing load across brokers, proactively preventing any single broker from becoming a bottleneck. This guarantees the system maintains high throughput and low latency, even when dealing with hefty workloads.",[48,33790,33791],{},"Enhanced Connector Experience",[48,33793,33794,33798],{},[55,33795,33797],{"href":33796},"\u002Fblog\u002Funveiling-streamnatives-enhanced-connector-experience","Enhanced Connectors experience"," gives you the power to develop Connectors tailored to your unique data needs. With the ability to design connectors from the ground up, we're ensuring you will never be limited to the existing IO Connectors we offer.",[48,33800,33801],{},"Pulsar 3.0 available on StreamNative Cloud",[48,33803,33804,33809],{},[55,33805,33808],{"href":33806,"rel":33807},"https:\u002F\u002Fwww.streamnative.io\u002Fblog\u002Fpulsar-3-0-is-available-for-testing-on-streamnative-cloud",[264],"Pulsar 3.0, the next evolution of Pulsar",", is now available on StreamNative Cloud. Pulsar 3.0 allows teams to run even bigger workloads, and is easier for developers to work with on their local machines. Updates include LTS support, an improved load balancer, several performance improvements, and Docker images for M1\u002FM2 Macs.",[48,33811,33812],{},"Cluster Metrics now available on StreamNative console",[48,33814,33815,33816,33821,33822,33827],{},"StreamNative Cloud now has a ",[55,33817,33820],{"href":33818,"rel":33819},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-metrics-api",[264],"new endpoint and dashboard for Cluster Metrics"," to provide visibility into your Pulsar clusters and track performance over time. Collect and monitor cluster health and performance metrics in real-time, analyze trends, and proactively maintain your applications to ensure optimal performance. Additionally, ",[55,33823,33826],{"href":33824,"rel":33825},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fview-usage-console",[264],"you can also view and export your cluster usage metrics"," from the StreamNative Console.",[48,33829,33830],{},"Rest API",[48,33832,33833,33834,190],{},"In 2023, we also launched the StreamNative Rest API, allowing users to produce and consume messages with simple API calls, rather than having to use the Pulsar TCP protocol or client libraries to do so. To learn more about the Rest API, check out the ",[55,33835,7120],{"href":33836,"rel":33837},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Frestapi-reference",[264],[48,33839,5599],{},[48,33841,33842,33843,33845],{},"The StreamNative team continued contributing to the open source community by open sourcing Oxia, a scalable metadata store that enables Pulsar to scale to over one million topics. To learn more about Oxia, check out the ",[55,33844,25580],{"href":21529},". And stay tuned for its arrival on StreamNative Cloud.",[48,33847,33848],{},"Certification Programs and Learning Content",[48,33850,33851,33852,33856,33857,33862],{},"As part of our initiative to grow and empower our community, we launched our Developer Certification program through ",[55,33853,31914],{"href":33854,"rel":33855},"https:\u002F\u002Fwww.academy.streamnative.io\u002Fabout",[264],". We also released many videos on YouTube on learning to use Pulsar and StreamNative - check out the ",[55,33858,33861],{"href":33859,"rel":33860},"https:\u002F\u002Fwww.youtube.com\u002F@streamnativeacademy8484\u002Fplaylists",[264],"playlists"," posted on YouTube.",[40,33864,33866],{"id":33865},"key-events-recap","Key events recap",[48,33868,33869],{},"Pulsar Summit Europe 2023",[48,33871,33872,33875,33876,190],{},[55,33873,33869],{"href":33874},"\u002Fblog\u002Fpulsar-virtual-summit-europe-2023-key-takeaways",": This event witnessed a remarkable milestone as over 400 attendees from 20+ countries joined the virtual stage to explore the cutting-edge advancements in Apache Pulsar and the real-world success stories of Pulsar-powered companies. This record-breaking turnout at the Pulsar Summit not only demonstrates the surging adoption of Pulsar but also highlights the ever-growing enthusiasm and curiosity surrounding this game-changing technology. It featured 5 keynotes on Apache Pulsar and 12 breakout sessions on tech deep dives, use cases, and ecosystem talks. They came from companies like Lego, VMWare, Datastax, RisingWave, Axon, Zafin, and others. ",[55,33877,33880],{"href":33878,"rel":33879},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XjIu9nXSSiI&list=PLqRma1oIkcWjMn9ytQueYSP9HCc28756R",[264],"Watch the sessions",[48,33882,33883],{},"Pulsar Summit North America 2023",[48,33885,33886,33887,33890,33891,190],{},"We were honored to organize the ",[55,33888,33883],{"href":33889},"\u002Fblog\u002Fq3-23-streamnative-cloud-launch-deliver-a-modern-data-streaming-platform-for-enterprises"," in San Francisco served as a nexus for the Pulsar community and industry experts to converge, sharing invaluable insights into the future of Apache Pulsar and data streaming. Hosted in person, the summit featured nearly 200 attendees and showcased over 20 enlightening sessions, each a testament to the vibrancy and innovation within the Pulsar ecosystem. They came from companies like Cisco, Discord, Iterable, Attentive, VMware, Flipkart, Boomi, and others.  ",[55,33892,33880],{"href":33893,"rel":33894},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqRma1oIkcWhOZ6W-g4D_3JNxJzYnwLNX",[264],[48,33896,33897],{},"Partnering with DataBricks",[48,33899,33900,33901,33905,33906,33911],{},"StreamNative partnered with DataBricks on Pulsar-Spark Connector 3.4.1.0. Detailed changes can be found here: ",[55,33902,33903],{"href":33903,"rel":33904},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-spark\u002Fpull\u002F171",[264],". Also, check out the ",[55,33907,33910],{"href":33908,"rel":33909},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=K_7NTSFqucI",[264],"Pulsar Spark Connector talk"," at Pulsar Summit North America!",[48,33913,33914],{},"Collaborated with the Flink community on fixing bugs and contributing the Table API support.",[40,33916,33918],{"id":33917},"contributions-to-the-pulsar-ecosystem","Contributions to the Pulsar ecosystem",[321,33920,33921,33929,33937,33945,33957],{},[324,33922,33923,33928],{},[55,33924,33927],{"href":33925,"rel":33926},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-debezium-mssql-source",[264],"Debezium \u002F Microsoft SQL Server Source"," - Pulls messages from the SQL Server and persists the messages to Pulsar topics.",[324,33930,33931,33936],{},[55,33932,33935],{"href":33933,"rel":33934},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-google-bigquery-source",[264],"Google BigQuery Source"," - Feeds data from Google Cloud BigQuery tables and writes data to Pulsar topics.",[324,33938,33939,33944],{},[55,33940,33943],{"href":33941,"rel":33942},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-google-bigquery-sink",[264],"Google BigQuery Sink"," - Pulls data from Pulsar topics and persists data to Google Cloud.",[324,33946,33947,33952,33953],{},[55,33948,33951],{"href":33949,"rel":33950},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-snowflake-sink",[264],"Snowflake Sink"," (available through the CLI) - Loads data from Pulsar topics to Snowflake in real-time.",[55,33954,3931],{"href":33955,"rel":33956},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-aws-eventbridge-sink",[264],[324,33958,33959,33963,33964,190],{},[55,33960,33962],{"href":33955,"rel":33961},[264],"AWS EventBridge Sink Connector"," Pull data from a Pulsar cluster and persist it to AWS EventBridge, making it easier to build scalable event-driven applications. To learn more, check out ",[55,33965,33967],{"href":33966},"\u002Fblog\u002Famazon-eventbridge-connector-is-now-integrated-with-streamnative-cloud","our blog post",[40,33969,33971],{"id":33970},"top-content-of-2023","Top content of 2023",[48,33973,33974],{},"We provided 50+ new pieces of educational content to help more people understand and get more value out of Pulsar. Here are the top pieces of content enjoyed by the streaming community:",[1666,33976,33977,33984,33991,33998,34003,34009,34015,34022,34028,34035,34042],{},[324,33978,33979,33983],{},[55,33980,33982],{"href":33981},"\u002Fpulsar-vs-kafka","Comparing Pulsar and Kafka: Which streaming technology is right for you?:"," Compare architecture and features, and learn the best use cases for choosing Apache Pulsar over Apache Kafka.",[324,33985,33986,33990],{},[55,33987,33989],{"href":33988},"\u002Fblog\u002Fcomparison-of-messaging-platforms-apache-pulsar-vs-rabbitmq-vs-nats-jetstream","A Comparison of Messaging Platforms: Apache Pulsar vs. RabbitMQ vs. NATS JetStream"," The tests assessed each messaging platform’s throughput and latency under varying workloads, node failures, and backlogs.",[324,33992,33993,33997],{},[55,33994,33996],{"href":33995},"\u002Fblog\u002Ffutureproof-kafka-applications-and-embrace-pulsar-with-streamnative-cloud","Futureproof Kafka Applications and Embrace Pulsar with StreamNative Cloud"," The most important benefit of the Kafka protocol on StreamNative Cloud is that it allows organizations to harness the strengths of both systems without disrupting their legacy Kafka applications. With a unified event streaming platform, they can take advantage of the features that Pulsar has to offer.",[324,33999,34000,34002],{},[55,34001,32178],{"href":32177}," In a context of higher cost scrutiny, saving costs free up resources for other strategic initiatives, enhancing the overall efficiency and competitive edge of the organization. Apache Pulsar offers several key advantages to save costs.",[324,34004,34005,34008],{},[55,34006,34007],{"href":27695},"How Pulsar’s architecture delivers better performance than Kafka"," This blog explores the factors contributing to Pulsar's impressive performance despite the architectural differences.",[324,34010,34011,34014],{},[55,34012,34013],{"href":33777},"Secure Your Pulsar Cluster with Revocable API Keys"," A feature that’s only available on StreamNative Hosted and BYOC Pulsar clusters. StreamNative API keys offer both a flexible authentication solution that can work with any client, and a revokable key that can be rotated on a regular interval for security and compliance, or immediately revoked in the event of a security incident.",[324,34016,34017,34021],{},[55,34018,34020],{"href":34019},"\u002Fblog\u002Fgeneral-availability-for-pulsar-functions-on-all-new-clusters-in-sn-cloud","General Availability for Pulsar Functions on All New Clusters in SN Cloud"," With Pulsar Functions generally available, it’s now vastly easier to do lightweight stream processing on a StreamNative Pulsar cluster. There’s no separate computing cluster you need to set up, you can easily deploy Pulsar Functions using Terraform or pulsarctl, and you can see logs and exceptions directly within the StreamNative Console.",[324,34023,34024,34027],{},[55,34025,34026],{"href":29597},"Kafka on StreamNative: Bringing Enterprise-Grade Kafka Support to StreamNative Pulsar Clusters (KSN) ","Excited to present Kafka on StreamNative (KSN), running on Pulsar 3.1, now in Public Preview, an offering tailored to enterprises using Kafka who want to leverage Pulsar's enhanced capabilities.",[324,34029,34030,34034],{},[55,34031,34033],{"href":34032},"\u002Fblog\u002Fextensible-load-balancer-pulsar-3-0","Introducing Extensible Load Balancer in Pulsar 3.0"," Thrilled to introduce our latest addition to the Apache Pulsar version 3.0, Extensible Load Balancer, which improves the existing Pulsar Broker Load Balancer.",[324,34036,34037,34041],{},[55,34038,34040],{"href":34039},"\u002Fblog\u002Fcompliance-and-data-governance-with-apache-pulsar-and-streamnative","Enhance Your Compliance and Data Governance with Apache Pulsar and StreamNative"," In this article, Apache Pulsar, with its robust and flexible features, provides a comprehensive solution to meet the complex requirements of data compliance, even more in highly regulated industries where data lineage, governance, and compliance are critical.",[324,34043,34044,34048],{},[55,34045,34047],{"href":34046},"\u002Fblog\u002Fa-practical-guide-to-enterprise-grade-security-in-apache-pulsar","A Practical Guide to Enterprise-Grade Security in Apache Pulsar"," This blog introduced available security combinations in Pulsar and then give some best practices for implementing authentication and authorization.",[40,34050,34052],{"id":34051},"looking-ahead-to-2024","Looking ahead to 2024",[48,34054,34055,34056,20076],{},"We’re going to be launching a new podcast by the end of March, 2024! The podcast - Sweet Streams - is inclusive of all stream processing and related technologies! We’ll be interviewing community members as well as founders and experts from various streaming software companies and open-source organizations. If you are interested in being interviewed, you can fill out our form ",[55,34057,267],{"href":34058,"rel":34059},"https:\u002F\u002Fforms.gle\u002F2uuyQ2cgwnv6wXwu5",[264],[48,34061,34062],{},"In 2024, we’re excited to continue growing the StreamNative community and empowering our customers to build the next generation of cloud-native streaming and messaging applications.",[48,34064,34065],{},"Some of our upcoming product enhancements for StreamNative Cloud include Pulsar RBAC, KSN GA, Private Cloud 2.0 GA, Function autoscaling GA, Bookie Autoscaling, Oxia on cloud GA, LDAP integration, Azure support and many more.",[48,34067,34068,34073],{},[55,34069,34072],{"href":34070,"rel":34071},"https:\u002F\u002Fshare.hsforms.com\u002F1IS56E-RvSVuMXU-ghlkoFA3x5r4",[264],"Subscribe to the StreamNative newsletter"," to stay updated on the exciting announcements we have planned for 2024.",[48,34075,34076],{},[34077,34078],"binding",{"value":34079},"cta-blog",{"title":18,"searchDepth":19,"depth":19,"links":34081},[34082,34083,34084,34085,34086],{"id":33707,"depth":19,"text":33708},{"id":33865,"depth":19,"text":33866},{"id":33917,"depth":19,"text":33918},{"id":33970,"depth":19,"text":33971},{"id":34051,"depth":19,"text":34052},"2024-01-05","A recap of what features and events we launched last year that enhanced Apache Pulsar on StreamNative Cloud, making it easier for customers to improve their streaming and messaging.","\u002Fimgs\u002Fblogs\u002F65a1ad2c4018b0453abd54ca_Template.png",{},"\u002Fblog\u002Fstreamnatives-2023-year-in-review",{"title":33698,"description":34088},"blog\u002Fstreamnatives-2023-year-in-review",[302,9636,799,28572],"F7RPzT7LzlFF9XvQHi5Qi40bDEE1kKQrmAtkDx_PgzM",{"id":34097,"title":34098,"authors":34099,"body":34100,"category":7338,"createdAt":290,"date":34192,"description":34193,"extension":8,"featured":294,"image":34194,"isDraft":294,"link":290,"meta":34195,"navigation":7,"order":296,"path":34196,"readingTime":11508,"relatedResources":290,"seo":34197,"stem":34198,"tags":34199,"__hash__":34200},"blogs\u002Fblog\u002Fpulsar-summit-north-america-2023-a-deep-dive-into-the-on-demand-summit-videos.md","Pulsar Summit North America 2023 Recap",[31718],{"type":15,"value":34101,"toc":34184},[34102,34105,34109,34118,34122,34125,34129,34132,34136,34145,34149,34158,34162,34170,34177,34180],[48,34103,34104],{},"The recent Pulsar Summit North America 2023 in San Francisco served as a nexus for the Pulsar community and industry experts to converge, sharing invaluable insights into the future of Apache Pulsar and data streaming. Hosted in person, the summit featured nearly 200 attendees and showcased over 20 enlightening sessions, each a testament to the vibrancy and innovation within the Pulsar ecosystem.",[40,34106,34108],{"id":34107},"highlights-from-pulsar-summit-north-america-2023","Highlights from Pulsar Summit North America 2023",[48,34110,34111,34112,34117],{},"The event kicked off with a warm welcome from StreamNative's CTO, Matteo Merli, who set the stage with a deep dive into the evaluation of data streaming platforms for modern enterprises. ",[55,34113,34116],{"href":34114,"rel":34115},"https:\u002F\u002Fyoutu.be\u002FIyMRL_wvQ7A",[264],"Matteo’s insights"," paved the way for an engaging exploration of the latest advancements in Apache Pulsar, Kafka on StreamNative, and the broader data streaming landscape.",[40,34119,34121],{"id":34120},"streamnative-private-cloud-unveiled","StreamNative Private Cloud Unveiled",[48,34123,34124],{},"A standout announcement from this year's summit is the launch of StreamNative Private Cloud, StreamNative's self-managed offering of Apache Pulsar. This new addition promises enhanced flexibility and control for organizations seeking a self-managed Pulsar solution, providing a powerful tool in the evolving data management toolkit.",[40,34126,34128],{"id":34127},"pulsar-and-kafka-a-powerful-duo","Pulsar and Kafka: A Powerful Duo",[48,34130,34131],{},"The summit also announced improvements to Apache Pulsar, Pulsar Functions, and Connectors, highlighting the continuous commitment to refining the Pulsar experience. As the industry evolves, so does Pulsar, ensuring it remains at the forefront of data streaming innovation.",[40,34133,34135],{"id":34134},"ciscos-journey-to-apache-pulsar","Cisco’s journey to Apache Pulsar",[48,34137,34138,34139,34144],{},"The event was not just about announcements; it was a forum for deep insights and practical use cases. Cisco Senior Director Chandra Ganguly and Principal Engineer Alec Hothan shared ",[55,34140,34143],{"href":34141,"rel":34142},"https:\u002F\u002Fyoutu.be\u002FnNUKnxfJeeo",[264],"Cisco's journey deploying Pulsar on Cisco’s Cloud Native IoT Platform",", offering a firsthand account of leveraging Pulsar to modernize infrastructure and meet the demands of a rapidly evolving technological landscape.",[40,34146,34148],{"id":34147},"discords-transition-a-case-study-in-innovation","Discord's Transition: A Case Study in Innovation",[48,34150,34151,34152,34157],{},"David Christle, Staff Machine Learning Engineer at Discord, presented a captivating case study on their transition from Google Pub\u002FSub to Pulsar. In his session on \"",[55,34153,34156],{"href":34154,"rel":34155},"https:\u002F\u002Fyoutu.be\u002FijzZQqvRUT4",[264],"Streaming Machine Learning with Flink and Iceberg",",\" Christle illuminated how Discord utilizes Pulsar to power real-time machine learning applications, underscoring the platform's adaptability and efficacy in handling large-scale data streaming challenges.",[40,34159,34161],{"id":34160},"embark-on-your-deep-dive-journey","Embark on Your Deep Dive Journey",[48,34163,34164,34165,34169],{},"As we release the ",[55,34166,34168],{"href":33893,"rel":34167},[264],"on-demand videos"," from Pulsar Summit North America 2023, we invite you to embark on a deep dive into these illuminating sessions. Explore the insights, innovations, and practical applications that define the future of data streaming.",[48,34171,34172,34173,34176],{},"Access the on-demand videos ",[55,34174,267],{"href":33893,"rel":34175},[264]," and be part of the transformative journey in data streaming.",[48,34178,34179],{},"Your breakthrough awaits—happy watching!",[48,34181,34182],{},[34077,34183],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":34185},[34186,34187,34188,34189,34190,34191],{"id":34107,"depth":19,"text":34108},{"id":34120,"depth":19,"text":34121},{"id":34127,"depth":19,"text":34128},{"id":34134,"depth":19,"text":34135},{"id":34147,"depth":19,"text":34148},{"id":34160,"depth":19,"text":34161},"2023-11-30","Discover key takeaways from Pulsar Summit North America 2023 in San Francisco. Explore insights into Apache Pulsar's future, the launch of StreamNative Private Cloud, and real-world applications like Cisco’s journey and Discord's innovative case study. Access on-demand sessions for a comprehensive view of the evolving data streaming landscape.","\u002Fimgs\u002Fblogs\u002F656a0cc2a4a33b48d5bd1f23_PulsarSummit-blog-banner.png",{},"\u002Fblog\u002Fpulsar-summit-north-america-2023-a-deep-dive-into-the-on-demand-summit-videos",{"title":34098,"description":34193},"blog\u002Fpulsar-summit-north-america-2023-a-deep-dive-into-the-on-demand-summit-videos",[5376,302,799],"LYlPzqPeuthaxdTrB_zEEGEpbC1BwE0duO8DaHtDW8c",{"id":34202,"title":34203,"authors":34204,"body":34205,"category":821,"createdAt":290,"date":34419,"description":34203,"extension":8,"featured":294,"image":34420,"isDraft":294,"link":290,"meta":34421,"navigation":7,"order":296,"path":33729,"readingTime":3556,"relatedResources":290,"seo":34422,"stem":34423,"tags":34424,"__hash__":34425},"blogs\u002Fblog\u002Fintroducing-streamnative-private-cloud.md","Introducing StreamNative Private Cloud",[24776],{"type":15,"value":34206,"toc":34410},[34207,34210,34215,34218,34235,34239,34242,34253,34256,34260,34263,34266,34272,34278,34291,34295,34298,34301,34307,34310,34321,34325,34328,34331,34348,34352,34361,34372,34377,34380,34383,34394,34396,34399],[48,34208,34209],{},"We are excited to announce StreamNative Private Cloud, a powerful tool that simplifies the deployment, scaling, and management of Pulsar clusters.",[48,34211,34212],{},[384,34213],{"alt":18,"src":34214},"\u002Fimgs\u002Fblogs\u002F654a9108bcd27b611b5f3214_AvKj0Cjr0zeoyKdKgnZP-sc8okqU6AI7QrDa298pMx6NW6S3UWsRh4ubbrVTsTNZcUmCLe4dJQ4rLOc3NvpDAbZ_3Hn-tyEHIUDJswH1DGCd8cFBrY_nsH5bA8r7vTzcHWsHG_i0gv7TlFnf_4NMbBc.png",[48,34216,34217],{},"With StreamNative Private Cloud, you can simplify operations and maintenance, including:",[321,34219,34220,34223,34226,34229,34232],{},[324,34221,34222],{},"Simplified deployment: StreamNative Private Cloud automates the deployment of Pulsar clusters, so teams don't have to worry about manually configuring and managing the individual components.",[324,34224,34225],{},"High Availability: StreamNative Private Cloud sets up clusters in a highly available manner by default. It manages replica placement, broker distribution, and failover mechanisms, ensuring that event streams stay reliable even in the face of failures.",[324,34227,34228],{},"Declarative configuration:  StreamNative Private Cloud uses declarative APIs, so teams can define the Pulsar cluster configuration in Kubernetes manifest. This makes it easy to manage the Pulsar cluster and to roll back changes if necessary.",[324,34230,34231],{},"Automated operation: StreamNative Private Cloud supports Auto-Scaling so you can adjust resource allocation in response to incoming workloads.  ‍",[324,34233,34234],{},"Cost efficiency: StreamNative Private Cloud supports the Lakehouse tiered storage to offload your cold data to a lakehouse system in Parquet format which brings you cost savings on storage and also supports efficient historical data analysis.",[40,34236,34238],{"id":34237},"one-single-operator-for-a-seamless-deployment","One single operator for a seamless deployment",[48,34240,34241],{},"In 2021, we introduced StreamNative Platform, leveraging StreamNative Pulsar Operators. As we rolled out the initial design of StreamNative Platform, we discovered opportunities to enhance the user experience:",[321,34243,34244,34247,34250],{},[324,34245,34246],{},"We expanded the support of Pulsar Operators beyond core components to provide a more uniform experience, moving away from the dependence on Helm.",[324,34248,34249],{},"We unified Pulsar Operators to streamline the integration of new components like Oxia, Cloud API Keys, and pfSQL.",[324,34251,34252],{},"We introduced a centralized resource to manage vital configurations, such as authentication.",[48,34254,34255],{},"To address these enhancements, we revamped StreamNative Private Cloud to introduce an all-in-one operator. This new design encompasses the existing ZooKeeperCluster, BookKeeperCluster, PulsarBroker, and PulsarProxy resources along with StreamNative's new components. Furthermore, we introduced the PulsarCoordinator global resource, facilitating cluster-wide high-level configurations. Other components now conveniently refer to the PulsarCoordinator configurations through labels.",[40,34257,34259],{"id":34258},"simplified-pulsar-services-management-with-a-fully-supported-declarative-apis","Simplified Pulsar services management with a fully supported declarative APIs",[48,34261,34262],{},"On StreamNative Private Cloud, every component is managed through declarative APIs which simplifies your management for Pulsar services - you just have to define the desired state for them. In the earlier version of the StreamNative Platform, Helm was necessary to provision and manage services such as the Console, detector, and toolset. Now, these services can be effortlessly managed through CustomResourceDefinitions.",[48,34264,34265],{},"Enabling the Console, detector, and toolset is as easy as this:",[8325,34267,34270],{"className":34268,"code":34269,"language":8330},[8328],"apiVersion: k8s.streamnative.io\u002Fv1alpha1\nkind: Console\nmetadata:\n  name: sn-private-console\n  namespace: pulsar\n  labels:\n    k8s.streamnative.io\u002Fcoordinator-name: private-cloud\nspec:\n  image: streamnative\u002Fprivate-cloud-console:v2.3.3\n  webServiceUrl: http:\u002F\u002Fbrokers-broker:8080\n",[4926,34271,34269],{"__ignoreMap":18},[8325,34273,34276],{"className":34274,"code":34275,"language":8330},[8328],"apiVersion: k8s.streamnative.io\u002Fv1alpha1\nkind: PulsarCoordinator\nmetadata:\n  name: private-cloud\n  namespace: pulsar\nspec:\n  image: streamnative\u002Fprivate-cloud:3.0.1.4\n  detector:\n    serviceEndpoint:\n      pulsarServiceURL: pulsar:\u002F\u002Fbrokers-broker:6650\n      webServiceURL: http:\u002F\u002Fbrokers-broker:8080\n  toolSet:\n    enabled: true\n    replicas: 2\n",[4926,34277,34275],{"__ignoreMap":18},[48,34279,34280,34281,4003,34286,34290],{},"You can also install the ",[55,34282,34285],{"href":34283,"rel":34284},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Ffunction-mesh",[264],"function mesh operator",[55,34287,34289],{"href":20667,"rel":34288},[264],"pulsar-resources-operator"," to manage functions, connectors and Pulsar resources through declarative APIs which bring you the fully cloud native usage experience with Pulsar.",[40,34292,34294],{"id":34293},"autoscaling","AutoScaling",[48,34296,34297],{},"StreamNative Private Cloud supports AutoScaling to automatically adjust the available resources of your deployments, and eliminates the need for scripts or manual updates to make scaling decisions.",[48,34299,34300],{},"You can start the AutoScaling by specifying a range of minimum and maximum nodes that your Pulsar cluster can automatically scale to:",[8325,34302,34305],{"className":34303,"code":34304,"language":8330},[8328],"apiVersion: pulsar.streamnative.io\u002Fv1alpha1\nkind: PulsarBroker\nmetadata:\n  name: brokers\n  namespace: pulsar\n  labels:\n    k8s.streamnative.io\u002Fcoordinator-name: private-cloud\nspec:\n  image: streamnative\u002Fprivate-cloud:3.0.1.4\n  replicas: 4\n  zkServers: zookeepers-zk:2181\n  pod:\n    resources:\n      requests:\n        cpu: 200m\n        memory: 512Mi\n    securityContext:\n      runAsNonRoot: true\n  autoScalingPolicy:\n    minReplicas: 1\n    maxReplicas: 4\n",[4926,34306,34304],{"__ignoreMap":18},[48,34308,34309],{},"This feature brings the following benefits:",[321,34311,34312,34315,34318],{},[324,34313,34314],{},"Dynamic Resource Allocation: Pulsar Broker AutoScaling dynamically adjusts resources based on the incoming workload. Whether it's handling a sudden spike in traffic or scaling down during periods of low activity, StreamNative Private Cloud ensures optimal resource utilization, leading to cost savings and improved performance.",[324,34316,34317],{},"Efficient Load Balancing: AutoScaling in Pulsar ensures that the message processing load is evenly distributed across brokers. This prevents any single broker from becoming a bottleneck, allowing the system to maintain high throughput and low latency even under heavy loads.",[324,34319,34320],{},"Cost-Effective Scaling: Traditional scaling methods often result in over-provisioning to handle peak loads, leading to unnecessary costs. Pulsar Broker Autoscaling optimizes resource allocation, ensuring that organizations pay only for the resources they need, making it a cost-effective solution for real-time data processing.",[40,34322,34324],{"id":34323},"message-rest-api","Message Rest API",[48,34326,34327],{},"StreamNative Private Cloud supports a RESTful messaging interface to Pulsar clusters, meaning that you can produce and consume messages without using the native Pulsar protocol or clients. The Rest API supports both non-partitioned and partitioned topics as well as basic and Avro base struct schema.",[48,34329,34330],{},"Example use cases include:",[321,34332,34333,34336,34339,34342,34345],{},[324,34334,34335],{},"Send data to Pulsar from any frontend application built in any language",[324,34337,34338],{},"Integrate Pulsar with existing automation tools",[324,34340,34341],{},"Ingest Pulsar data into corporate dashboards and monitoring systems",[324,34343,34344],{},"Provide instant access to data in motion for data scientist notebooks",[324,34346,34347],{},"Ingest messages into a stream processing framework that may not support Pulsar",[40,34349,34351],{"id":34350},"certified-as-red-hat-openshift-operators","Certified as Red Hat OpenShift Operators",[48,34353,34354,34355,34360],{},"StreamNative Private Cloud is already ",[55,34356,34359],{"href":34357,"rel":34358},"https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainer-stacks\u002Fdetail\u002F6500b1d85cf8282ed8058fea",[264],"certified as a Red Hat OpenShift Operator",". The certifications of StreamNative Private Cloud on OpenShift brings three key benefits to StreamNative customers:",[321,34362,34363,34366,34369],{},[324,34364,34365],{},"Enterprise-grade security and reliability: Organizations with strict security protocols can confidently use the operators to run Pulsar on OpenShift knowing the operators meet Red Hat’s standards of security and reliability.",[324,34367,34368],{},"Easy installation: Available in the Red Hat Ecosystem Catalog, you can easily install Pulsar Operators in the OpenShift GUI at the click of a button.",[324,34370,34371],{},"Automated operator upgrades: You can automate upgrades for the operators through OpenShift without requiring extra effort to execute the upgrade.",[48,34373,34374],{},[384,34375],{"alt":18,"src":34376},"\u002Fimgs\u002Fblogs\u002F654a94ad0f32e4a77d7c2f6f_joGtqXm6yszbge0I4lnRGL8uhIf6TpiAPRu4VoRtaJ_xTiZOduD36jHDF20OhJVSKFWRohjmXiFAnXk2W4PaxX37--RnwsX-VyCvJAlnr6PSsVrfLtd0oN6YULBEr4R8wZBnhL7V9kj53VYmPc4txUc.png",[40,34378,34379],{"id":1727},"What’s next",[48,34381,34382],{},"StreamNative Private Cloud is in rapid iteration. Our next key roadmap items include:",[321,34384,34385,34388,34391],{},[324,34386,34387],{},"Integrate with API Keys: API keys offer both a flexible authentication solution that can work with any client, and a revokable key that can be rotated on a regular interval for security and compliance, or immediately revoked in the event of a security incident.",[324,34389,34390],{},"Integrate with pfSQL: pfSQL is a lightweight SQL-like tool that simplifies real-time data processing built on top of Pulsar functions.‍",[324,34392,34393],{},"Full support for Kafka on StreamNative (KSN): KSN is an enterprise solution supports KStreams, KSQL, KTables with Topic Compaction, Schema Registry for the Java Client, and Kerberos Authentication for Kafka Clients.",[40,34395,2125],{"id":2122},[48,34397,34398],{},"With StreamNative Private Cloud, it's easier than ever to manage your Pulsar Cluster, and you'll get more functionality than ever before.",[48,34400,34401,34405,34406,190],{},[55,34402,34404],{"href":34403},"\u002Fdeployment\u002Fstart-free-trial","Sign up now for a trial license"," to try StreamNative Private Cloud for yourself. For more detailed guidance and insights, check out ",[55,34407,34409],{"href":33734,"rel":34408},[264],"the Private Cloud docs",{"title":18,"searchDepth":19,"depth":19,"links":34411},[34412,34413,34414,34415,34416,34417,34418],{"id":34237,"depth":19,"text":34238},{"id":34258,"depth":19,"text":34259},{"id":34293,"depth":19,"text":34294},{"id":34323,"depth":19,"text":34324},{"id":34350,"depth":19,"text":34351},{"id":1727,"depth":19,"text":34379},{"id":2122,"depth":19,"text":2125},"2023-11-07","\u002Fimgs\u002Fblogs\u002F65651a8ab3803cd436578980_Introducing-StreamNative-Private-Cloud.png",{},{"title":34203,"description":34203},"blog\u002Fintroducing-streamnative-private-cloud",[302,821],"ODGV8I2c9V9Qn18pb_wl34Hv5-gbmSXt9Q9C1Zip8FA",{"id":34427,"title":34428,"authors":34429,"body":34430,"category":3550,"createdAt":290,"date":34707,"description":34434,"extension":8,"featured":294,"image":34708,"isDraft":294,"link":290,"meta":34709,"navigation":7,"order":296,"path":33889,"readingTime":5505,"relatedResources":290,"seo":34710,"stem":34711,"tags":34712,"__hash__":34713},"blogs\u002Fblog\u002Fq3-23-streamnative-cloud-launch-deliver-a-modern-data-streaming-platform-for-enterprises.md","Q3 '23 StreamNative Cloud Launch: Deliver a modern data streaming platform for enterprises",[32707,24776,806],{"type":15,"value":34431,"toc":34700},[34432,34435,34438,34445,34448,34452,34455,34458,34472,34474,34479,34483,34486,34502,34506,34509,34512,34527,34539,34542,34545,34548,34559,34562,34567,34570,34587,34598,34601,34604,34613,34622,34625,34634,34637,34648,34660,34663,34666,34669,34676,34680,34689],[48,34433,34434],{},"This Q3 StreamNative Cloud Launch comes to you from Pulsar Summit North America 2023, where the Pulsar community and messaging & data streaming industry experts have come together to share insights into the future of Apache Pulsar and data streaming, as well as explore new areas of innovation. This year, we’re introducing StreamNative Private Cloud- StreamNative’s self-managed offering of Apache Pulsar, Kafka on StreamNative, improvements to Apache Pulsar, Pulsar Functions, Connectors, insights into how Pulsar and the data streaming ecosystem work together, and much more.",[48,34436,34437],{},"This year's Pulsar Summit witnessed nearly 200 in-person attendees and featured over 20 enlightening sessions delivered by industry leaders. The event commenced with a warm welcome from StreamNative's CTO, Matteo Merli, who delved into the evaluation of data streaming platforms for modern enterprises. We then learned about Cisco’s journey deploying Pulsar on Cisco’s Cloud Native IoT Platform from Cisco Senior Director Chandra Ganguly and Principal Engineer Alec Hothan. Additionally, David Christle, Staff Machine Learning Engineer at Discord, elucidated their transition from Google Pub\u002FSub to Pulsar for Streaming Machine Learning with Flink and Iceberg.",[48,34439,34440,34441,34444],{},"Stay tuned to ",[55,34442,34443],{"href":10293},"our blog"," for a concise overview of the highlights from Pulsar Summit North America 2023; session recordings will be available on our website shortly.",[48,34446,34447],{},"We are thrilled to align our roadmap with the visionary insights shared by our esteemed speakers for the Q3 launch in the latest release of StreamNative Cloud. These features empower customers to deliver a state-of-the-art data streaming platform for enterprises, enabling them to build mission-critical business applications from end to end.",[8300,34449,34451],{"id":34450},"what-defines-a-modern-data-streaming-platform-for-enterprises","What defines a modern data streaming platform for enterprises?",[48,34453,34454],{},"So, what defines a modern data streaming platform for enterprises?",[48,34456,34457],{},"In his keynote address, Matteo Merli outlined the criteria for evaluating a data streaming platform or an organization. I summarized them as the following four pillars.",[1666,34459,34460,34463,34466,34469],{},[324,34461,34462],{},"Unified: Modern enterprises need a unified platform that allows them to store a single copy of data and enables different applications and teams to consume the data in various semantics. This includes both queuing and streaming, supporting their preferred APIs and protocols such as Pulsar, Kafka, AMQP, MQTT, and JMS. This integration bridges the operational and analytical domains, providing developers with a suite of tools to create end-to-end real-time data streaming applications.",[324,34464,34465],{},"Multi-tenant: Traditional messaging and data streaming systems were designed for single teams, making them unsuitable for organizations where multiple teams must share data and collaborate on innovation. A modern data streaming platform must be multi-tenant, reducing the need to manage and operate numerous clusters.",[324,34467,34468],{},"Cost-efficient: A modern data streaming platform must be cost-efficient across multi-cloud and hybrid cloud environments, facilitating effective operation even during economic downturns.",[324,34470,34471],{},"Global: Modern enterprises operate globally, spanning multiple regions, jurisdictions, and cloud environments. A modern data streaming solution must be designed to meet data privacy and sovereignty requirements across diverse regions while operating as a unified solution.",[48,34473,3931],{},[48,34475,34476],{},[384,34477],{"alt":18,"src":34478},"\u002Fimgs\u002Fblogs\u002F6538980baa37c7cd3d2f20fe_gYHOL4BUHpzjo33l9a_HqpiCMwVXEuGAlO5QjwwtLv5JRxo8qwBoaR13dTedYJozGnxF4RIwxusVWqsxoJgOPpB4qzgi5Ay5fjfrHOBPRVJJ5EaZRm-b7hBLbFr_5oOxOOiMwkFHcUS9at4ZNs0axXk.png",[8300,34480,34482],{"id":34481},"overview-of-the-latest-features","Overview of the latest features",[48,34484,34485],{},"Now, let's explore the latest features that advance our vision of a modern data streaming platform for enterprises:",[321,34487,34488,34491,34493,34495,34497,34499],{},[324,34489,34490],{},"Functions GA on StreamNative Cloud",[324,34492,1582],{},[324,34494,33724],{},[324,34496,33772],{},[324,34498,33782],{},[324,34500,34501],{},"Lakehouse tiered storage (Private Preview)",[40,34503,34505],{"id":34504},"functions-generally-available-on-streamnative-cloud","Functions Generally Available on StreamNative Cloud",[48,34507,34508],{},"Pulsar Functions stands as one of the distinctive features offered by Apache Pulsar. It presents an efficient means to consume messages from one or multiple topics, apply user-defined logic, and publish the processed results to other topics. This Pulsar-native lightweight computing framework solution empowers developers to focus on their core business logic and code creation, eliminating the necessity for complicated stream processing frameworks for the majority of use cases. However, the activation of Pulsar Functions was previously reliant on specific requests within Hosted and BYOC clusters. It has consistently ranked among the most sought-after features for broader availability within StreamNative Cloud.",[48,34510,34511],{},"We are thrilled to announce the General Availability of Pulsar Functions on StreamNative Cloud for all new Hosted and BYOC clusters. So, what does this mean for you?",[321,34513,34514,34517,34524],{},[324,34515,34516],{},"Serverless Computing: With Pulsar Functions seamlessly integrated into StreamNative Cloud, there is no need to concern yourself with setting up and managing a separate stream processing cluster. Everything your Pulsar Functions need is already set up by StreamNative Cloud.",[324,34518,34519,34520,34523],{},"Simplified Management: The process of managing and monitoring your functions has become significantly simplified. You can effortlessly submit or manage functions using Terraform or ",[4926,34521,34522],{},"pulsarctl",", and conveniently access function details and logs directly from the StreamNative Console.",[324,34525,34526],{},"Continued Support: At StreamNative, our unwavering commitment to providing top-notch support remains steadfast. Whether you represent an enterprise or a digital-native startup, our dedicated team is readily available to offer guidance and assistance.",[48,34528,34529,34530,34533,34534,34538],{},"For more comprehensive information regarding this announcement and our future plans for Functions on StreamNative Cloud, we invite you to explore ",[55,34531,34532],{"href":34019},"our informative blog post",". With this announcement, all new Hosted and BYOC clusters will offer access to Pulsar Functions. As for existing clusters, please don't hesitate to ",[55,34535,34537],{"href":16162,"rel":34536},[264],"reach out to our support team"," to request activation within your current setup.",[40,34540,1582],{"id":34541},"kafka-on-streamnative-ksn",[48,34543,34544],{},"Pulsar represents the future of data streaming technology, offering a plethora of advantages over Apache Kafka, particularly for enterprises. It offers native multi-tenancy, geo-replication, and unparalleled scalability and elasticity. However, transitioning to a new technology is not a simple decision, especially for enterprises heavily invested in Kafka.",[48,34546,34547],{},"StreamNative has been at the forefront of addressing this challenge. Over the past few years, we introduced  Kafka-on-Pulsar (KoP), an open-source project that ensures Kafka protocol compatibility within Pulsar. With KoP, Kafka developers can seamlessly harness Pulsar's innovations while retaining the familiarity of Kafka. Yet, we recognized the need to bridge certain gaps, especially to cater to the advanced requirements of enterprise Kafka users, including support for additional Kafka features like KStreams or KSQL.",[48,34549,34550,34551,34555,34556,190],{},"We are thrilled to introduce ",[55,34552,1582],{"href":34553,"rel":34554},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fkafka-on-cloud#ksn-vs-ko-p-why-you-need-ksn",[264],", a tailor-made enterprise solution designed for Kafka users looking to leverage Pulsar's enhanced capabilities. KSN encompasses all the familiar Kafka features you rely on, including but not limited to KStream, KSQL, KTables with Topic Compaction, Kafka Schema Registry, Kerberos Authentication, and more. Kafka Transaction is also available as part of KSN, currently in a private preview phase. For a more detailed overview, please explore ",[55,34557,34558],{"href":29597},"our dedicated blog post on KSN",[48,34560,34561],{},"Furthermore, we have extensively fortified KSN through rigorous testing, ensuring it not only meets but exceeds the demands of large-scale Kafka deployments. With KSN, it delivers throughput equivalent to what you can achieve with native Pulsar. Be sure to consult our performance comparison, which delves into the differences between native Pulsar and KSN, covering two distinct entry formats (entry format refers to how KSN stores data published from Kafka clients, with KSN supporting both Pulsar and Kafka formats).",[48,34563,34564],{},[384,34565],{"alt":18,"src":34566},"\u002Fimgs\u002Fblogs\u002F6538980b48a060b46da037b4_xP2RWvyJVlL-Z9SbW2SMvY8Y-CmJfMf5OFcdKOm-fcqxJzVzCIKas3F8tln_-yOTleQXwhtC48Th0La19asQSCp70aCqqqdm9IBDzyl-E0rjJ2w7WtLA0GUBjDmCeU43nnv3YSpNUwxysikVpoTxxGo.png",[48,34568,34569],{},"Benefits of the StreamNative Private Cloud include:",[321,34571,34572,34575,34578,34581,34584],{},[324,34573,34574],{},"Streamlined Deployment: StreamNative Private Cloud automates the deployment of Pulsar clusters, so teams don't have to worry about manually configuring and managing the individual components.",[324,34576,34577],{},"High Availability: StreamNative Private Cloud sets up clusters in a highly available manner by default. They manage replica placement, broker distribution, and failover mechanisms, ensuring that event streams stay reliable even in the face of failures.",[324,34579,34580],{},"Declarative Configuration: StreamNative Private Cloud uses a declarative API, so teams can define the Pulsar cluster configuration in Kubernetes manifests. This makes it easy to manage the Pulsar cluster and to roll back changes if necessary.",[324,34582,34583],{},"Automated operation: StreamNative Private Cloud supports Auto-Scaling, which supports adjusting resource allocation in response to incoming workloads.",[324,34585,34586],{},"Cost efficiency: StreamNative Private Cloud supports the Lakehouse tiered storage to offload your cold data to a lakehouse system in parquet format, which brings you the cost saving on storage and also supports efficient historical data analysis.",[48,34588,34589,34590,34593,34594,34597],{},"For a comprehensive overview of StreamNative Private Cloud, explore our ",[55,34591,7120],{"href":33734,"rel":34592},[264],". Interested in experiencing it firsthand? ",[55,34595,34596],{"href":34403},"Request a trial license"," and establish a Pulsar cluster in your private cloud setup.",[40,34599,33772],{"id":34600},"revocable-cloud-api-keys-public-preview",[48,34602,34603],{},"Before you can send or receive a single message from a StreamNative Cloud Pulsar cluster, configuring the authentication mechanism is a prerequisite. By default, StreamNative Cloud employs OAuth2 authentication, considered one of the most advanced methods for authenticating clients accessing your Pulsar clusters. However, despite its sophistication, configuring OAuth2 can be complex, and many clients and integrations may not fully support it. Therefore, it is crucial to strike a balance between flexibility and security, ensuring protection against unauthorized access without confining your development team to a limited set of tools and clients.",[48,34605,34606,34607,34612],{},"In pursuit of this goal, we have introduced ",[55,34608,34611],{"href":34609,"rel":34610},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-api-keys",[264],"StreamNative Cloud API Keys"," as a novel authentication mechanism for your StreamNative Cloud clusters. StreamNative Cloud API Keys are JWT-based authentication tokens that empower Pulsar clients to establish connections with your Pulsar clusters on StreamNative. These keys have a substantial lifespan, featuring configurable expiration dates and the ability to be revoked at any time through the API or StreamNative Console. This feature is now accessible for both Hosted and BYOC clusters.",[48,34614,34615,34616,34618,34619,20076],{},"The Cloud API Keys feature offers a dual advantage: it provides a flexible authentication solution compatible with a wide range of clients and enables key rotation at regular intervals to bolster security and compliance. Additionally, it allows for immediate revocation in the event of a security breach. For more in-depth information about this feature, please refer to our dedicated ",[55,34617,25580],{"href":33777},". If you wish to explore Cloud API Keys before its general availability, don't hesitate to",[55,34620,34621],{"href":6392}," reach out to us",[40,34623,33782],{"id":34624},"broker-autoscaling-public-preview",[48,34626,34627,34628,34633],{},"One of the standout features of Apache Pulsar has always been its ability to decouple storage from computing, enabling independent scalability between stateless serving and stateful storage. While this architectural innovation successfully addresses scalability challenges at their core, the process of scaling brokers based on CPU and Memory requirements has, until now, often required manual intervention or configuring a ",[55,34629,34632],{"href":34630,"rel":34631},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Ftasks\u002Frun-application\u002Fhorizontal-pod-autoscale\u002F",[264],"Horizontal Pod Autoscaler",". This manual approach has proven to be inefficient and cost-ineffective, increasing operational expenses. But fear not, because we're thrilled to announce the solution: Broker Autoscaling.",[48,34635,34636],{},"This feature brings the following benefits to our valued customers:",[321,34638,34639,34642,34645],{},[324,34640,34641],{},"Broker Autoscaling continually adjusts resource allocation in response to incoming workloads. Whether your system is suddenly inundated with traffic or experiences quieter periods, Pulsar ensures optimal resource utilization, resulting in substantial cost savings and notably improved system performance.",[324,34643,34644],{},"Broker Autoscaling balances the message processing load across brokers, proactively preventing any single broker from becoming a bottleneck. This guarantees the system maintains high throughput and low latency, even when dealing with hefty workloads.",[324,34646,34647],{},"Broker Autoscaling streamlines resource allocation, ensuring that organizations only pay for the precise resources they need. This strategic approach makes it a cost-effective solution, particularly in the realm of real-time data streaming.",[48,34649,34650,34651,34654,34655,34659],{},"You can read the ",[55,34652,7120],{"href":25530,"rel":34653},[264]," for more information. This feature is now available for open preview across all the Hosted clusters. For our BYOC (Bring Your Own Cluster) users, we encourage you to get in touch with ",[55,34656,34658],{"href":16162,"rel":34657},[264],"our dedicated support team"," to experience the benefits of Broker Autoscaling firsthand.",[40,34661,34501],{"id":34662},"lakehouse-tiered-storage-private-preview",[48,34664,34665],{},"Apache Pulsar has been a pioneer in introducing the concept of tiered storage. This feature, which has also been adopted by competitors like Kafka, Confluent, and Redpanda, has become a cornerstone for many companies, including tech giants like Tencent, in their pursuit of cost-effective long-term streaming data storage. However, while tiered storage has been a game-changer, it was initially implemented using Pulsar’s proprietary storage format. This approach comes with inherent limitations that restrict the full potential of Apache Pulsar. In response, we’ve taken a bold step by adopting open industry-standard storage formats, a move we believe will greatly benefit Apache Pulsar users and the broader data streaming community.",[48,34667,34668],{},"We are excited to introduce Lakehouse tiered storage to Apache Pulsar as a Private Preview feature on StreamNative Cloud. With this feature, well-known lakehouse storage options like Delta Lake, Apache Hudi, and Apache Iceberg become the tiered storage layer for Apache Pulsar. This development effectively transforms Apache Pulsar into a Streaming Lakehouse, allowing you to ingest directly into any lakehouse storage using popular messaging and streaming APIs and protocols such as Pulsar, Kafka (via KSN), AMQP (via AoP), and more. Our tests have demonstrated a 5x reduction in storage size compared to retaining data in BookKeeper and tiered storage using the existing Pulsar format.",[48,34670,34671,34672,34675],{},"For an in-depth understanding of the Streaming Lakehouse, we invite you to explore ",[55,34673,34674],{"href":29601},"our blog post series",". This feature is now available for BYOC customers. If you are interested in trying it out, please contact us. Your feedback will be invaluable as we continue to refine and enhance the tiered storage solution. Whether you're a Lakehouse vendor, a data processing or streaming SQL vendor, or an Apache Pulsar user, we welcome collaboration to define and iterate APIs for processing and querying data in this exciting realm of the \"Streaming Lakehouse.\"",[8300,34677,34679],{"id":34678},"start-building-with-new-streamnative-cloud-features","Start building with new StreamNative Cloud features",[48,34681,34682,34683,34688],{},"Are you eager to begin? We're excited to introduce our Quarterly Launch demo webinars. Don't forget to ",[55,34684,34687],{"href":34685,"rel":34686},"https:\u002F\u002Fstreamnative.zoom.us\u002Fwebinar\u002Fregister\u002FWN_TEmqVJgaRI2fZ_yw77Eb_w#\u002Fregistration",[264],"secure your spot"," for the Q3 '23 Launch demo webinar on November 14. It's an excellent opportunity to gain firsthand insights from our product and developer relations teams on how to effectively leverage these fresh features.",[48,34690,34691,34692,34695,34696,34699],{},"If you haven't already, we encourage you to ",[55,34693,34694],{"href":29078},"request a trial license"," for our new Private Cloud to experience self-management capabilities or ",[55,34697,34698],{"href":30989},"sign up for StreamNative Cloud"," to explore the latest features. Feel free to reach out to us for Proof of Concept (POC) opportunities with cloud credits as well.",{"title":18,"searchDepth":19,"depth":19,"links":34701},[34702,34703,34704,34705,34706],{"id":34504,"depth":19,"text":34505},{"id":34541,"depth":19,"text":1582},{"id":34600,"depth":19,"text":33772},{"id":34624,"depth":19,"text":33782},{"id":34662,"depth":19,"text":34501},"2023-10-25","\u002Fimgs\u002Fblogs\u002F65389bbb69bf2bf3eab5d70d_q3-cloud-launch-v2.png",{},{"title":34428,"description":34434},"blog\u002Fq3-23-streamnative-cloud-launch-deliver-a-modern-data-streaming-platform-for-enterprises",[302,3550,799,821,28572,9636,5376],"iv7YUW2SCbqVY7jQW-1ZajbIFATWNzJhh1iUsc0iJ1A",{"id":34715,"title":29826,"authors":34716,"body":34717,"category":3550,"createdAt":290,"date":34707,"description":34724,"extension":8,"featured":294,"image":34949,"isDraft":294,"link":290,"meta":34950,"navigation":7,"order":296,"path":29601,"readingTime":3556,"relatedResources":290,"seo":34951,"stem":34952,"tags":34953,"__hash__":34954},"blogs\u002Fblog\u002Fstreaming-lakehouse-introducing-pulsars-lakehouse-tiered-storage.md",[809,806],{"type":15,"value":34718,"toc":34945},[34719,34722,34725,34728,34731,34735,34738,34743,34746,34754,34757,34760,34771,34776,34779,34782,34801,34805,34808,34819,34822,34826,34829,34832,34835,34849,34853,34856,34859,34862,34870,34873,34878,34882,34885,34890,34894,34897,34900,34902,34907,34910,34913,34917,34923,34925,34928,34941,34943],[48,34720,34721],{},"Apache Pulsar has been a pioneer in introducing the concept of tiered storage. This feature, which has also been adopted by competitors like Kafka, Confluent, and Redpanda, has become a cornerstone for many companies, including tech giants like Tencent, in their pursuit of cost-effective long-term data storage. However, while tiered storage has been a game-changer, it was initially implemented using Pulsar's proprietary storage format. This approach comes with inherent limitations that restrict the full potential of Apache Pulsar. In response, we've taken a bold step by adopting open industry-standard storage formats, a move we believe will greatly benefit Apache Pulsar users.",[48,34723,34724],{},"We are thrilled to introduce Pulsar's Lakehouse Tiered Storage as a Private Preview feature on StreamNative Cloud. With this feature, well-known lakehouse storage solutions like Delta Lake, Apache Hudi, and Apache Iceberg become the tiered storage layer for Apache Pulsar. This development effectively transforms Apache Pulsar into a Streaming Lakehouse, allowing you to ingest data directly into your lakehouse using popular messaging and streaming APIs and protocols such as Pulsar, Kafka, AMQP, and more.",[48,34726,34727],{},"This series of blog posts will delve deep into the details of Pulsar’s Lakehouse Tiered Storage. In this first post, we will explore the origins of Pulsar’s tiered storage and how we’ve evolved it into a Lakehouse tiered storage solution. The second blog post will provide a comprehensive look at the implementation details of Lakehouse Tiered Storage, and we will conclude this series with a discussion on how data query engines can leverage the power of Lakehouse Tiered Storage to achieve unified stream and batch processing.",[48,34729,34730],{},"Let’s dive right into it.",[8300,34732,34734],{"id":34733},"tiered-storage-optimizing-data-storage-costs-for-streaming-infrastructure","Tiered Storage: Optimizing Data Storage Costs for Streaming Infrastructure",[48,34736,34737],{},"Apache Pulsar has always stood out for its ability to decouple storage from computing, allowing for independent scaling between stateless serving and stateful storage. The architecture's multi-layer structure is illustrated below.",[48,34739,34740],{},[384,34741],{"alt":18,"src":34742},"\u002Fimgs\u002Fblogs\u002F65388eab57c74c83b4ef45a5_waZ7YxG_AmvI84WY5dIhdy1MdNDymbmXhukDLmrM3DYgrdpBvNa9LehUsXLUXg6u7hGMSajw6Aki88eIPgPzcpiiN5cAC_XK4Q4w-xNzvAALbXWevD_HRff_UZCwldcyOLB5t3b5Aj5kGtuOBog11gk.png",[48,34744,34745],{},"Pulsar's storage layer is built on Apache BookKeeper, known for its robust, scalable log storage capabilities. It employs a quorum-based parallel replication mechanism, ensuring high data persistence, repeatable consistent reads, and high availability for both reading and writing. BookKeeper is particularly effective when used with high-performance disks like SSDs, providing low-latency streaming reads without compromising write latency.",[48,34747,34748,34749,34753],{},"However, as organizations seek to retain data for extended periods, the volume of data stored in BookKeeper can result in higher storage costs. Take ",[55,34750,34752],{"href":34751},"\u002Fblog\u002Fapache-pulsar-kafka-protocol-tiered-storage-and-beyond-heres-what-happened-at-pulsar-meetup-beijing-2023#tiered-storage-the-art-of-cost-efficient-data-streaming-infrastructure","WeChat",", for example; only 1% of WeChat's use cases demand real-time data processing, characterized by message lifecycles of less than 10 minutes. In contrast, 9% of use cases necessitate catch-up reads and batch processing, relying on data freshness within a 2-hour window. The remaining 90% of use cases revolve around data replay and data backup, spanning data older than 2 hours.",[48,34755,34756],{},"This pattern is common across enterprises. If we were to keep all the data of varying lifecycle requirements in the same storage layer, it would pose a large cost challenge. A natural approach to get around this is to move this 90% of data into a cold storage tier backed by much cheaper storage options such as object storage (S3, GCS, Azure Blob storage) or on-premise HDFS. Thus, we introduced Tiered Storage to Apache Pulsar in 2018, creating an additional storage layer.",[48,34758,34759],{},"With the introduction of the Tiered Storage layer, Pulsar could separate data based on its lifecycle:",[1666,34761,34762,34765,34768],{},[324,34763,34764],{},"Hot Data (~1% of data): Cached in Brokers' memory for low-latency streaming.",[324,34766,34767],{},"Warm Data (~9% of data): Stored in BookKeeper with replication for high availability . This data eventually gets moved to cold storage.",[324,34769,34770],{},"Cold Data (~90% of data): Tiered and stored in cost-efficient object storage .",[48,34772,34773],{},[384,34774],{"alt":18,"src":34775},"\u002Fimgs\u002Fblogs\u002F65388eaa83b8a2a470607d9f_AuQf3IPwBlX5R2u6SFdB9O5eGl4AszCSytJQu9xup8BPqDxqSoq3SyLV66_hsWw-7D72kWdhgwHcMsD9TkbJo1mUy5Xu2LhdvpJVTY7ndnQNxbuGyKI-wE3vRggfjqTLwTSIGfpm-fRl_eGaFw8yZV0.png",[48,34777,34778],{},"Pulsar's tiered storage was introduced with Pulsar 2.2.0, using a segmented stream model. When a segment in Pulsar is sealed, it's offloaded to tiered storage based on configured policies. Data is stored in Pulsar's format with additional indices for efficient reading. Unlike some other tiered storage solutions, Pulsar's approach allows brokers to read directly from tiered storage, saving memory, bandwidth, and cross-zone traffic.",[48,34780,34781],{},"However, while Pulsar's tiered storage is cost-effective and enhances stability, it has certain limitations:",[1666,34783,34784,34787,34795,34798],{},[324,34785,34786],{},"Proprietary Format: Data is stored in a proprietary format, making integration with broader data processing ecosystems challenging.",[324,34788,34789,34790,190],{},"Lakehouse Integration: Ingesting data into a lakehouse system requires additional effort and typically relies on tools like the ",[55,34791,34794],{"href":34792,"rel":34793},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-lakehouse",[264],"Pulsar Lakehouse IO connector",[324,34796,34797],{},"Performance Tuning: Substantial tuning efforts are needed for optimizing read performance across various workloads due to the proprietary format.",[324,34799,34800],{},"Lack of Schema Information: Offloaded data lacks schema information, necessitating schema retrieval via reading data from Pulsar brokers, increasing costs and limiting integration possibilities.",[8300,34802,34804],{"id":34803},"what-is-the-ideal-tiered-storage","What is the Ideal Tiered Storage?",[48,34806,34807],{},"So, what is the ideal tiered storage for Pulsar and data streaming? In our opinion, in addition to high performance and cost savings, the ideal tiered storage solution for Pulsar and data streaming should offer:",[1666,34809,34810,34813,34816],{},[324,34811,34812],{},"Schema Enforcement and Governance: The ability to reason about data schema and integrity without involving Pulsar brokers, with robust governance and auditing features.",[324,34814,34815],{},"Openness: An open, standardized storage format with APIs that allow various tools and engines to access data effectively.",[324,34817,34818],{},"Cost Efficiency: Reduced data size for storage and transfer, resulting in cost savings in storage and data transfer.",[48,34820,34821],{},"Does such an ideal tiered storage solution exist? The answer is yes. Lakehouse is the solution.",[8300,34823,34825],{"id":34824},"introducing-lakehouse","Introducing Lakehouse",[48,34827,34828],{},"Lakehouse represents a transformative approach to data management, merging the best attributes of data lakes and traditional data warehouses. Lakehouse combines data lake scalability and cost-effectiveness with data warehouse reliability, structure, and performance. Three key technologies—Delta Lake, Apache Hudi, and Apache Iceberg—play pivotal roles in the Lakehouse ecosystem.",[48,34830,34831],{},"Delta Lake ensures data integrity and ACID compliance within data lakes, enabling reliable transactions and simplified data management. Apache Hudi offers upsert capabilities, making it efficient to handle changing data in large-scale datasets. Apache Iceberg provides a table format abstraction that improves data discoverability, schema evolution, and query performance. Together, these technologies form the core of the Lakehouse ecosystem, facilitating a harmonious balance between data storage, reliability, and analytical capabilities within a single, unified platform.",[48,34833,34834],{},"The Lakehouse aligns with the criteria for an ideal tiered storage solution:",[1666,34836,34837,34840,34843,34846],{},[324,34838,34839],{},"Schema and Schema Evolution: They offer tools for managing schema and schema evolution.",[324,34841,34842],{},"Stream Ingest and Transaction Support: They support streaming data ingestion with transactional capabilities and change streams.",[324,34844,34845],{},"Metadata Management: These solutions excel in managing metadata for vast datasets.",[324,34847,34848],{},"Open Standards: Lakehouse technologies are open standards, enabling seamless integration with various data processing systems.",[8300,34850,34852],{"id":34851},"introducing-pulsars-lakehouse-tiered-storage","Introducing Pulsar’s Lakehouse Tiered Storage",[48,34854,34855],{},"Pulsar's Lakehouse Tiered Storage takes the form of a streaming tiered storage offloader. This offloader can operate within the Pulsar broker or as a separate service in Kubernetes. It streams messages received by the broker to the Lakehouse immediately upon reception. Data offloaded to the tiered storage can be read by the broker in a streaming manner or accessed directly by external systems such as Trino, Spark, Flink, and others.",[48,34857,34858],{},"The lifecycle management of offloaded data can be handled either by Pulsar in Managed mode or by external Lakehouse systems in External mode. With Lakehouse tiered storage, you can store data for extended periods in a cost-efficient manner.",[48,34860,34861],{},"Pulsar's Lakehouse Tiered Storage has effectively transformed Apache Pulsar into an infinite streaming lakehouse. In this streaming lakehouse, you can retain infinite streams and access them through two distinct APIs:",[1666,34863,34864,34867],{},[324,34865,34866],{},"Streaming API: Continue using popular streaming protocols like Pulsar and Kafka APIs to ingest and consume data in real time.",[324,34868,34869],{},"Table\u002FBatch API: Query the data that has been offloaded into your Lakehouse using external query engines such as Spark, Flink, and Trino, or managed cloud query engines like Snowflake, BigQuery, and Athena.",[48,34871,34872],{},"This approach not only accommodates existing streaming and batch applications but also enables query engines to combine both streaming and batch data for unified batch and stream processing—a concept that offers endless possibilities for data analytics and insights. At the upcoming Pulsar Summit North America 2023, Yingjun Wu, Founder and CEO of Risingwave, will demonstrate how Risingwave leverages this combination to unlock new capabilities in querying both streaming and historical data together.",[48,34874,34875],{},[384,34876],{"alt":18,"src":34877},"\u002Fimgs\u002Fblogs\u002F65388eaba677c0d80abfa513_IVoz1ZkBM-fRb29Sv8y2mPNxrkeTV77_2tSDfGlaVT8vTMCddqpIa2LIAqP05OOCKj7gpkh_TvLNA9Xjv22IWExG5IEvtphgQwk0uvupbFgix5vizboVIERw6Wl4Yh_DKedWEEyjlcQalTMVaNj7wp0.png",[40,34879,34881],{"id":34880},"pulsar-tiered-storage-vs-lakehouse-tiered-storage","Pulsar Tiered Storage vs. Lakehouse Tiered Storage",[48,34883,34884],{},"Besides using an open storage format standard, Lakehouse Tiered Storage has many differentiators compared to the existing Pulsar tiered storage. Those differentiators are highlighted in the following table:",[48,34886,34887],{},[384,34888],{"alt":18,"src":34889},"\u002Fimgs\u002Fblogs\u002F65388f7368188fbea0cba757_Screenshot-2023-10-24-at-8.45.42-PM.png",[40,34891,34893],{"id":34892},"additional-benefits-of-lakehouse-tiered-storage","Additional Benefits of Lakehouse Tiered Storage",[48,34895,34896],{},"With the Lakehouse tiered storage, you can enjoy additional benefits compared to the existing tiered storage implementations.",[48,34898,34899],{},"Cost Reduction: By leveraging schema information to convert row-based message data into columnar formats stored in Parquet within the Lakehouse, storage sizes are drastically reduced, resulting in significant cost savings. In tests, we achieved a 5x reduction in storage size compared to retaining data in BookKeeper or tiered storage using Pulsar's format.",[48,34901,3931],{},[48,34903,34904],{},[384,34905],{"alt":18,"src":34906},"\u002Fimgs\u002Fblogs\u002F65388eaa72af65ff3f9ff64e_UbCMNO0JSa-GKuQNRyNuvyXoeMqFpQ2CUo2tUAKvhjxQhwk0rMehLIlIDnGL28chSn96NAZeXSmctz75Qc2km6dMoN6YkRzf_I1vmIax7GK6PvIHu0VOKYgzga33fOtbS4mYvxbqczo7fLFz258KwcE.png",[48,34908,34909],{},"Bandwidth Savings: Reduced data retrieval from tiered storage results in lower network bandwidth usage. External processing engines can directly access data from Lakehouse storage, further reducing networking costs.",[48,34911,34912],{},"Extended Data Retention: Lakehouse Tiered Storage enables cost-effective long-term data retention, opening up numerous use cases previously hindered by data retention limitations in Pulsar. It facilitates effective batch access through Lakehouse storage formats, allowing seamless data processing with real-time streaming and historical batch data.",[8300,34914,34916],{"id":34915},"lakehouse-tiered-storage-private-preview-on-streamnative-cloud","Lakehouse Tiered Storage: Private Preview on StreamNative Cloud",[48,34918,34919,34920,34922],{},"Lakehouse Tiered Storage is now available for Private Preview on StreamNative Cloud, specifically for BYOC clusters. If you're interested in trying it out, please ",[55,34921,24379],{"href":6392},". Your feedback will be invaluable as we continue to refine and enhance the tiered storage solution. Whether you're a Lakehouse vendor, a data processing or streaming SQL vendor, or an Apache Pulsar user, we welcome collaboration to define and iterate APIs for processing and querying data in this exciting realm of the \"Streaming Lakehouse\".",[8300,34924,319],{"id":316},[48,34926,34927],{},"Tiered storage is the linchpin for cost-efficient data streaming in the cloud. However, most vendors tend to develop their proprietary storage formats for offloading data to cloud-native object stores, limiting integration possibilities. The introduction of Lakehouse Tiered Storage breaks down these silos, connecting the data streaming and data lakehouse ecosystems seamlessly. It streamlines integration for users and customers and marks a transformative shift in how we perceive end-to-end data streaming. In the upcoming blog posts, we will delve deeper into the implementation details of Lakehouse Tiered Storage and how query engines can leverage both streaming and historical data within a unified abstraction.",[48,34929,34930,34931,34933,34934,34936,34937,34940],{},"Excited about the new features from StreamNative? ",[55,34932,34596],{"href":29078}," for our new Private Cloud to experience self-management capabilities, or ",[55,34935,34698],{"href":30989}," to explore the latest features. Feel free to ",[55,34938,34939],{"href":6392},"reach out to us"," for Proof of Concept (POC) opportunities with cloud credits as well.",[48,34942,3931],{},[48,34944,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":34946},[34947,34948],{"id":34880,"depth":19,"text":34881},{"id":34892,"depth":19,"text":34893},"\u002Fimgs\u002Fblogs\u002F653891f5aac4dd3f6fe69abc_Screenshot-2023-10-24-at-8.37.37-PM.png",{},{"title":29826,"description":34724},"blog\u002Fstreaming-lakehouse-introducing-pulsars-lakehouse-tiered-storage",[302,1331],"gH5H-yCrOU2KawSGnGhH5_QRThDfkBmyn4dVQki3Dis",{"id":34956,"title":34020,"authors":34957,"body":34958,"category":3550,"createdAt":290,"date":35062,"description":35063,"extension":8,"featured":294,"image":35064,"isDraft":294,"link":290,"meta":35065,"navigation":7,"order":296,"path":34019,"readingTime":4475,"relatedResources":290,"seo":35066,"stem":35067,"tags":35068,"__hash__":35069},"blogs\u002Fblog\u002Fgeneral-availability-for-pulsar-functions-on-all-new-clusters-in-sn-cloud.md",[32707,810],{"type":15,"value":34959,"toc":35056},[34960,34963,34966,34971,34974,34985,34989,34992,34995,34998,35017,35021,35024,35035,35037,35040,35049,35054],[32,34961,15627],{"id":34962},"pulsar-functions",[48,34964,34965],{},"At StreamNative, we constantly strive to innovate and give you better tools to suit your streaming and messaging needs. Pulsar Functions have always stood out as one of those tools, providing an efficient and versatile solution for lightweight stream processing. Designed to operate seamlessly atop Pulsar, Pulsar Functions provide a lightweight way to consume messages from one or more topics, apply sophisticated user-defined logic, and then publish the processed messages to other topics.",[48,34967,34968],{},[384,34969],{"alt":18,"src":34970},"\u002Fimgs\u002Fblogs\u002F653820846a8bd15ab18cb834_pulsar-functions-diagram.png",[48,34972,34973],{},"For those who might be unfamiliar, Pulsar Functions offer an array of benefits:",[321,34975,34976,34979,34982],{},[324,34977,34978],{},"Seamlessly Integrated with Pulsar: Being natively integrated with Apache Pulsar, Pulsar Functions allow you to tap into the power of Pulsar with ease.",[324,34980,34981],{},"Lightweight: With Pulsar Functions, developers can focus on their business logic without setting up an elaborate stream processing framework.",[324,34983,34984],{},"Flexibility with Language Choices: Pulsar Functions can be written in Java or Python, with more options on the way, so you can use the language you are most comfortable with.",[32,34986,34988],{"id":34987},"pulsar-functions-on-streamnative-cloud","Pulsar Functions On StreamNative Cloud",[48,34990,34991],{},"While other Stream Processing Engines such as Apache Flink, Apache Spark, Apache Storm, and Apache Heron have undeniably carved a niche in the stream processing world, they often come with the overhead of prolonged ramp-up time, complex configurations, deployment challenges, and extra cost. On the other hand, Pulsar Functions offers a more integrated solution designed from the ground up to work in harmony with Pulsar.",[48,34993,34994],{},"That’s why we are excited to announce the General Availability of Pulsar Functions on StreamNative Cloud for all new Hosted and BYOC clusters.",[48,34996,34997],{},"What does this mean for you?",[321,34999,35000,35002,35005,35008],{},[324,35001,34516],{},[324,35003,35004],{},"Simplified Management: The process of managing and monitoring your Pulsar Functions has become significantly simplified. You can effortlessly submit or manage Functions using Terraform or pulsarctl, and conveniently access Function details and logs directly from the StreamNative Console.",[324,35006,35007],{},"Continued Support: At StreamNative, we remain committed to providing top-notch support for all your Pulsar Functions needs. Whether you represent an enterprise or a digital-native startup, our dedicated team is readily available to offer guidance and assistance with setting up and running Pulsar Functions.",[324,35009,35010,35011,35016],{},"pfSQL (Public Preview): pfSQL allows users to write Pulsar Functions using a SQL-like language, providing a rapid and lightweight stream processing method. In its first iteration, pfSQL will support filtering, routing, transformation queries, built-in user-defined functions (UDFs), and query result preview functionality, allowing users to inspect stream data for data analysis use cases. Check out our ",[55,35012,35015],{"href":35013,"rel":35014},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fpfsql-get-started",[264],"docs"," to learn more!",[32,35018,35020],{"id":35019},"whats-next-for-pulsar-functions-on-streamnative","What’s next for Pulsar Functions on StreamNative",[48,35022,35023],{},"While we’re excited to announce GA for Pulsar Functions, we’re not stopping there. We’re continuing to innovate on Functions and have a lot of upcoming updates coming to Functions on StreamNative Cloud.",[321,35025,35026,35029,35032],{},[324,35027,35028],{},"Stateful Functions (coming soon): functions that can hold state, rather than requiring all rules to be preprogrammed, are extremely useful for advanced and even intermediate stream processing use cases. We are building this functionality to support even more sophisticated uses of Pulsar Functions.",[324,35030,35031],{},"Autoscaling (coming soon): scaling Functions horizontally and vertically is important for your application’s ability to respond to greater traffic, and for cost savings when there is less traffic. We are working on a robust, configurable solution allowing you to scale your Pulsar Functions up and down based on your changing needs.",[324,35033,35034],{},"Generic Runtime (coming soon): we want to support writing Pulsar Functions in even more languages than we do now, so we’re working on a generic runtime for functions, which will support Node.js, web assembly, and many more in the future.",[32,35036,2125],{"id":2122},[48,35038,35039],{},"With Pulsar Functions generally available, it’s now vastly easier to do lightweight stream processing on a StreamNative Pulsar cluster. There’s no separate computing cluster you need to set up, you can easily deploy Pulsar Functions using Terraform or pulsarctl, and you can see logs and exceptions directly within the StreamNative Console. On top of that, we’re working on a wide variety of upcoming features that will let you do even more: writing stateful functions, scaling them up and down in response to traffic, and writing them in a wider variety of languages.",[48,35041,35042,35043,35048],{},"To learn more about Pulsar Functions, check out our ",[55,35044,35047],{"href":35045,"rel":35046},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-BmxsE3q4V8cMgsTtDA64OJtC25blxn",[264],"YouTube playlist",", which walks you through writing and deploying a Pulsar Function and our docs on them.",[48,35050,35051,35052,190],{},"If you’d like to try out Pulsar Functions, ",[55,35053,32689],{"href":32238},[48,35055,4446],{},{"title":18,"searchDepth":19,"depth":19,"links":35057},[35058,35059,35060,35061],{"id":34962,"depth":279,"text":15627},{"id":34987,"depth":279,"text":34988},{"id":35019,"depth":279,"text":35020},{"id":2122,"depth":279,"text":2125},"2023-10-24","Pulsar Functions can now be used on all new Pulsar Clusters in StreamNative Cloud, both BYOC and Hosted Clusters.","\u002Fimgs\u002Fblogs\u002F6538ae07121d67b1064784d9_functions-ga.png",{},{"title":34020,"description":35063},"blog\u002Fgeneral-availability-for-pulsar-functions-on-all-new-clusters-in-sn-cloud",[9636,5376],"hZatxa8GXDNeuK7DSBR1ApnoVdkg_Q8j2YGIahjoxUQ",{"id":35071,"title":35072,"authors":35073,"body":35074,"category":3550,"createdAt":290,"date":35062,"description":35164,"extension":8,"featured":294,"image":35165,"isDraft":294,"link":290,"meta":35166,"navigation":7,"order":296,"path":29597,"readingTime":11508,"relatedResources":290,"seo":35167,"stem":35168,"tags":35169,"__hash__":35170},"blogs\u002Fblog\u002Fkafka-on-streamnative-bringing-enterprise-grade-kafka-support-to-streamnative-pulsar-clusters.md","KSN: Bringing Enterprise-Grade Kafka Support to StreamNative Pulsar Clusters",[32707,808],{"type":15,"value":35075,"toc":35158},[35076,35084,35087,35095,35098,35102,35105,35116,35120,35123,35134,35138,35150,35156],[916,35077,35078],{},[48,35079,35080,35081,35083],{},"KSN is now part of the ",[55,35082,1332],{"href":24893}," engine.",[48,35085,35086],{},"Apache Pulsar is the next generation of data streaming technology, bringing a slew of advantages over Apache Kafka, especially for large-scale enterprise applications, offering native multi-tenancy, geo-replication, tiered storage, and unmatched scalability and elasticity. However, switching technologies isn't a simple decision. Organizations that have heavily invested in Kafka need a smooth transition path if they are to consider a move to Pulsar.",[48,35088,35089,35090,35094],{},"At StreamNative, we've long recognized this migration challenge. Over the past few years, we've empowered Kafka developers with the ability to utilize Pulsar via our open source project, ",[55,35091,35093],{"href":29592,"rel":35092},[264],"KoP",", which embeds the Kafka protocol handler inside the Pulsar broker. With KoP, Kafka developers can immediately begin leveraging Pulsar's innovations while retaining the familiarity of Kafka.",[48,35096,35097],{},"While KoP marked a significant milestone, we still identified gaps that needed to be filled, especially to cater to the sophisticated needs of enterprise Kafka users — specifically, the need for supporting key Kafka features like KStreams or KSQL.",[40,35099,35101],{"id":35100},"introducing-ksn","Introducing KSN",[48,35103,35104],{},"That's why we're excited to present KSN, a Kafka protocol compatible layer running on Pulsar 3.1, now in Public Preview, an offering tailored to enterprises using Kafka who want to leverage Pulsar's enhanced capabilities. KSN builds on KoP, but contains even more features:",[321,35106,35107,35110,35113],{},[324,35108,35109],{},"All of the Kafka features you're used to: KSN supports all the Kafka features you're accustomed to, including KStreams, KSQL, KTables with Topic Compaction, Schema Registry for the Java Client, and Kerberos Authentication for Kafka Clients.",[324,35111,35112],{},"Great developer experience: KSN features streamlined local testing, with its own testcontainers module.",[324,35114,35115],{},"Robust and reliable: Architecturally designed for scale and subjected to rigorous resilience testing. We've fortified KSN through comprehensive testing, ensuring it meets and exceeds the demands of large-scale Kafka operations.",[32,35117,35119],{"id":35118},"whats-next-for-ksn-coming-soon","What’s next for KSN (Coming Soon)",[48,35121,35122],{},"We’ve already started work on the next version of KSN, which will contain some exciting enhancements over and above what’s already in this version:",[321,35124,35125,35128,35131],{},[324,35126,35127],{},"Transaction support with Topic Compaction: KSN’s next version will support transactions with Topic Compaction, allowing for atomic writes and highly accurate stream processing.",[324,35129,35130],{},"Role Based Access Control: KSN’s next version will feature role based access control, built upon Pulsar’s updated authorization model.",[324,35132,35133],{},"Unified Schema Registry: KSN already implements a Kafka Schema Registry, but KSN’s next version will expand on this by featuring a unified schema registry that can be used by both Pulsar and Kafka clients interchangeably. In other words, Pulsar consumers can consume messages with a schema produced by Kafka producers, and vice versa, making the transition from Kafka to Pulsar on StreamNative even easier.",[40,35135,35137],{"id":35136},"take-your-streaming-capabilities-to-the-next-level","Take Your Streaming Capabilities to the Next Level",[48,35139,35140,35141,35145,35146,35149],{},"KSN is now available for our StreamNative Hosted and BYOC customers running Pulsar 3.1 or later. If you’re a current StreamNative customer and would like to try it out, just ",[55,35142,35144],{"href":32687,"rel":35143},[264],"contact our support team",". To learn more about KSN, check out our ",[55,35147,35047],{"href":33719,"rel":35148},[264],", which walks you through getting started, and using Kafka features on StreamNative, such as KStreams, KSQL and using the Kafka Schema Registry.",[48,35151,35152,35153,20076],{},"Position yourself at the forefront of streaming technology with KSN. If you have questions or want to know more, our sales team is available to guide you - ",[55,35154,35155],{"href":32238},"reach out today",[48,35157,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":35159},[35160,35163],{"id":35100,"depth":19,"text":35101,"children":35161},[35162],{"id":35118,"depth":279,"text":35119},{"id":35136,"depth":19,"text":35137},"KSN (now part of Ursa engine) builds on top of open source Kafka-on-Pulsar (KoP) to provide even more features for Kafka developers who want to migrate to Pulsar, including KSQL, KStreams, KTables with Topic Compaction, and Kerberos Authentication.","\u002Fimgs\u002Fblogs\u002F664da1b58905a31c14ef224f_66427ce8a899003a44963824_SN-SM-UrsaAnnounce.png",{},{"title":35072,"description":35164},"blog\u002Fkafka-on-streamnative-bringing-enterprise-grade-kafka-support-to-streamnative-pulsar-clusters",[302,799],"_rVpjl-uALqN6RWu7YTB9_A0lZj4ZblVfImZuDJNf8Q",{"id":35172,"title":34013,"authors":35173,"body":35174,"category":3550,"createdAt":290,"date":35062,"description":35231,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":35232,"navigation":7,"order":296,"path":33777,"readingTime":11508,"relatedResources":290,"seo":35233,"stem":35234,"tags":35235,"__hash__":35236},"blogs\u002Fblog\u002Fsecure-your-pulsar-cluster-with-revocable-api-keys.md",[32707],{"type":15,"value":35175,"toc":35226},[35176,35179,35184,35188,35191,35194,35199,35203,35206,35212,35214],[48,35177,35178],{},"Before you can send or consume a single message from a Pulsar cluster, you first must connect to it. It’s important to do so in a way that’s both flexible and secure, protecting against unauthorized access without limiting your developer team to a small set of tools\u002Fclients. To that end, I’m excited to announce StreamNative’s latest feature, now in Public Preview: StreamNative API Keys. StreamNative API Keys are JWT-based tokens enable Pulsar clients to connect to Pulsar clusters on StreamNative. They are long-lived, with a configurable expiration date, and revokable at any time via the StreamNative Console - a feature that’s only available on StreamNative Hosted and BYOC Pulsar clusters, not open source ones. StreamNative API keys offer both a flexible authentication solution that can work with any client, and a revokable key that can be rotated on a regular interval for security and compliance, or immediately revoked in the event of a security incident.",[48,35180,35181],{},[384,35182],{"alt":18,"src":35183},"\u002Fimgs\u002Fblogs\u002F65380af5dde9a2d8b0ea8696_create-api-key.gif",[32,35185,35187],{"id":35186},"secure-and-easy-to-manage","Secure and Easy to Manage",[48,35189,35190],{},"StreamNative API keys can be created and managed from within the StreamNative console, to easily grant and revoke API access to users, applications, or third-party services. Create a key by specifying a service account, instance, and expiration date.",[48,35192,35193],{},"It is also straightforward to rotate keys by setting an expiration date on a key, or by manually revoking a key and replacing it with a new one. This enables teams to comply with regulations, and keep your organization secure.",[48,35195,35196],{},[384,35197],{"alt":18,"src":35198},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F65380b2ffcbf722c23b404f7_Untitled%20(2).png",[32,35200,35202],{"id":35201},"works-everywhere","Works Everywhere",[48,35204,35205],{},"Because StreamNative API Keys are JWT-based, using them with a Pulsar client is as simple as copy-pasting them wherever you use token-based authentication. That also means that you don’t have to worry about whether a client supports OAuth2.0 - these API keys will work everywhere. See the snippet below for how it works:",[8325,35207,35210],{"className":35208,"code":35209,"language":8330},[8328],"import pulsar\n\nclient = pulsar.Client(\"pulsar+ssl:\u002F\u002Fcluster-url-here:6651\",authentication=pulsar.AuthenticationToken(\"paste-your-token-here\"))\nproducer = client.create_producer(\"persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftest\")\nfor i in range(10):\n    producer.send(('Hello-%d' % i).encode('utf-8'))\n\nclient.close()\n",[4926,35211,35209],{"__ignoreMap":18},[32,35213,2125],{"id":2122},[48,35215,35216,35217,35221,35222,20076],{},"StreamNative API Keys represent a big step forward in our ongoing commitment to security and customer satisfaction. We know security is a constant effort, and we’re just getting started - we will continue to invest in cutting-edge technologies and features to stay ahead of evolving threats. For more information about creating and using API keys, ",[55,35218,35220],{"href":34609,"rel":35219},[264],"check out our documentation",". If you’d like to try StreamNative API Keys while it’s in Public Preview before it’s generally available, ",[55,35223,35225],{"href":32687,"rel":35224},[264],"get in touch",{"title":18,"searchDepth":19,"depth":19,"links":35227},[35228,35229,35230],{"id":35186,"depth":279,"text":35187},{"id":35201,"depth":279,"text":35202},{"id":2122,"depth":279,"text":2125},"StreamNative Cloud Pulsar Clusters now have a secure way to connect, via revocable API keys that enhance security.",{},{"title":34013,"description":35231},"blog\u002Fsecure-your-pulsar-cluster-with-revocable-api-keys",[302,821],"YF5swSbfKlWFWIpwK30ww7HJoC3rBVkdo1qNRBU5qkA",{"id":35238,"title":35239,"authors":35240,"body":35241,"category":3550,"createdAt":290,"date":35062,"description":35340,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":35341,"navigation":7,"order":296,"path":33796,"readingTime":11508,"relatedResources":290,"seo":35342,"stem":35343,"tags":35344,"__hash__":35345},"blogs\u002Fblog\u002Funveiling-streamnatives-enhanced-connector-experience.md","Unveiling StreamNative’s Enhanced Connector Experience",[32707,810],{"type":15,"value":35242,"toc":35333},[35243,35246,35250,35253,35276,35280,35283,35291,35295,35303,35307,35310,35314,35317,35326,35329,35331],[48,35244,35245],{},"Today, with data spread far and wide across myriad systems, the seamless movement of this data between diverse systems and platforms has become paramount. Pulsar IO Connectors have been a bridge facilitating this critical data transfer, ensuring that every byte finds its rightful place whether you’re moving data into Pulsar or channeling it outwards into other systems. That’s why we’re excited to announce an enhanced IO Connector experience, exclusively on StreamNative Cloud. These updates will not only streamline the process but also significantly reduce development efforts, making the sometimes onerous task of data migration between systems as smooth as possible.",[32,35247,35249],{"id":35248},"custom-connectors-generally-available-on-all-new-hosted-and-byoc-pulsar-clusters-on-streamnative-cloud","Custom Connectors: Generally Available on all new Hosted and BYOC Pulsar Clusters on StreamNative Cloud",[48,35251,35252],{},"While the list of connectors available on StreamNative Cloud is ever-growing, there will always be a need to connect a new system to your Pulsar Cluster, or to connect an existing system to Pulsar in a way the current connector offerings don’t support. That realization convinced us that we need to do two things at once: not only add more StreamNative-build IO Connectors, but also give customers a way to build their own. To that end, we’re excited to announce that Custom Connectors are now generally available on StreamNative Cloud for all new Hosted and Bring Your Own Cloud (BYOC) Clusters!",[48,35254,35255,35256,35260,35261,1154,35265,35270,35271,35275],{},"Built atop Pulsar Functions, Custom Connectors give you the power to develop bespoke Connectors tailored to your unique data needs. With the ability to design connectors from the ground up, we're ensuring that you will never be limited to the existing IO Connectors we offer (though these are pretty good too - check them out ",[55,35257,267],{"href":35258,"rel":35259},"https:\u002F\u002Fhub.streamnative.io\u002F",[264],"). All you need to do is implement the ",[55,35262,27049],{"href":35263,"rel":35264},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fblob\u002Fmaster\u002Fpulsar-io\u002Fcore\u002Fsrc\u002Fmain\u002Fjava\u002Forg\u002Fapache\u002Fpulsar\u002Fio\u002Fcore\u002FSource.java",[264],[55,35266,35269],{"href":35267,"rel":35268},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fblob\u002Fmaster\u002Fpulsar-io\u002Fcore\u002Fsrc\u002Fmain\u002Fjava\u002Forg\u002Fapache\u002Fpulsar\u002Fio\u002Fcore\u002FSink.java",[264],"Sink"," interface, and you’ve got your own Connector! To learn more Custom Connectors, check out the ",[55,35272,7120],{"href":35273,"rel":35274},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Fio-develop\u002F",[264]," that walks users through creating, testing, and deploying a custom connector.",[32,35277,35279],{"id":35278},"io-connector-debugging-and-management-generally-available-on-all-streamnative-clusters","IO Connector Debugging and Management: Generally Available on All StreamNative Clusters",[48,35281,35282],{},"As the gates that control data flowing into and out of your Pulsar Cluster, Connectors are a fundamentally important part of your Cluster. So it’s critical to have insight into what's going on with them. We understand this, and have shipped a number of improvements to the IO Connector management and debugging experience:",[321,35284,35285,35288],{},[324,35286,35287],{},"Users can now easily access logs and exceptions directly from the StreamNative Console, providing a clearer window into IO Connector operations and streamlining the debugging process. Just navigate to an individual Connector’s page from the Connectors page in the Console, and click on the logs and exceptions tabs.",[324,35289,35290],{},"Additionally, IO Connector logs are now seamlessly routed to a Pulsar topic through sidecar, enabling programmatic consumption, so if you want to integrate your IO Connector logs into your logging and alerting solutions, you can now do so.",[32,35292,35294],{"id":35293},"terraform-and-pulsarctl-support-for-connectors-generally-available-for-new-pulsar-clusters-on-streamnative-cloud","Terraform and pulsarctl Support for Connectors: Generally Available for new Pulsar Clusters on StreamNative Cloud",[48,35296,35297,35298,35302],{},"For users who want to manage their Pulsar Cluster programmatically, we’ve got good news: you can now submit and modify Source and Sink Connectors via both Terraform and pulsarctl. So if you want to manage your Pulsar Cluster solely via Terraform so that all your settings are checked into source control, or you prefer to to do via CLI instead of the Console, you can now do so! To learn more about using Terraform and pulsarctl with Connectors check out our ",[55,35299,35015],{"href":35300,"rel":35301},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fdeploy-connector-index",[264]," on deploying Connectors.",[32,35304,35306],{"id":35305},"coming-soon-to-streamnative-cloud-a-comprehensive-overhaul-of-the-streamnative-console-connector-experience","Coming Soon to StreamNative Cloud: A Comprehensive Overhaul of the StreamNative Console Connector Experience",[48,35308,35309],{},"Setting up a Connector has always been an inherently laborious task, requiring users to track down various credentials and settings during the setup process, then testing whether the Connector works, and modifying it to optimize performance once setup. Our team has crafted a new experience that will redefine the Connector creation process, making it a breeze to set up new IO Connector on StreamNative Cloud. The new experience will guide you through setting up your connectors, with step-by-step instructions for each setting, and recommendations on configuration for maximum performance. Here’s a quick preview of it:",[32,35311,35313],{"id":35312},"the-future-of-io-connectors-on-streamnative-cloud","The future of IO Connectors on StreamNative Cloud",[48,35315,35316],{},"While we’re excited about the improvements we’ve made so far, this is merely the beginning of an exhilarating journey ahead. We’re going to continue and accelerate our work of onboarding more IO Connectors, and improving our existing Connectors. We envision a vibrant ecosystem of constantly improving connectors, and a rich experience across the StreamNative Console, CLI tools, and Terraform. Our goal is to give you a robust set of tools that work no matter your tech stack, that works so seamlessly you forget it’s even running.",[48,35318,35319,35320,2869,35323,35325],{},"If you’d like to try out our enhanced Connector experience for yourself, ",[55,35321,35322],{"href":30989},"sign up for a StreamNative Hosted Cluster today",[55,35324,32689],{"href":32238}," if you’d like to try a StreamNative BYOC Cluster.",[48,35327,35328],{},"Stay tuned for updates, and as always, happy Streaming!",[48,35330,3931],{},[48,35332,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":35334},[35335,35336,35337,35338,35339],{"id":35248,"depth":279,"text":35249},{"id":35278,"depth":279,"text":35279},{"id":35293,"depth":279,"text":35294},{"id":35305,"depth":279,"text":35306},{"id":35312,"depth":279,"text":35313},"StreamNative's new experience for Pulsar IO Connectors makes it easy to move data into and out of your Apache Pulsar Cluster. Featuring support for Custom Connectors, deploying Connectors via Terraform and pulsarctl, logs and exceptions from the StreamNative Console, and more, this product update significantly enhances the connector experience on StreamNative Cloud.",{},{"title":35239,"description":35340},"blog\u002Funveiling-streamnatives-enhanced-connector-experience",[28572,302],"CE6XHqTUfTOyYg791rvlS5YLYgs07TsXYuTs93QjZc0",{"id":35347,"title":35348,"authors":35349,"body":35350,"category":7338,"createdAt":290,"date":35552,"description":35553,"extension":8,"featured":294,"image":35554,"isDraft":294,"link":290,"meta":35555,"navigation":7,"order":296,"path":31138,"readingTime":11508,"relatedResources":290,"seo":35556,"stem":35557,"tags":35558,"__hash__":35560},"blogs\u002Fblog\u002Fapache-pulsar-kafka-protocol-tiered-storage-and-beyond-heres-what-happened-at-pulsar-meetup-beijing-2023.md","Apache Pulsar, Kafka Protocol, Tiered storage and Beyond! Here’s What Happened at Pulsar Meetup Beijing 2023",[806],{"type":15,"value":35351,"toc":35544},[35352,35360,35363,35366,35371,35375,35378,35383,35386,35395,35398,35403,35406,35413,35416,35419,35426,35430,35433,35438,35441,35449,35452,35460,35464,35467,35472,35475,35478,35482,35485,35490,35498,35505,35508,35512,35525,35527,35540],[48,35353,35354,35355,35359],{},"This past Saturday marked an eventful day in Beijing, as we hosted a one-day gathering dedicated to Apache Pulsar. Known as \"Apache Pulsar Meetup Beijing 2023,\" this event served as a condensed version of the renowned ",[55,35356,29387],{"href":35357,"rel":35358},"https:\u002F\u002Fpulsar-summit.org\u002F",[264]," APAC. It's been four years since we convened the very first in-person \"Apache Pulsar Meetup Beijing\" in collaboration with Tencent and Yahoo! JAPAN back in 2019. Despite the numerous virtual Pulsar Summits held during the pandemic, there was palpable excitement in bringing together the APAC community once again to delve into the world of Apache Pulsar and real-time data streaming.",[48,35361,35362],{},"Held in the heart of Beijing, this meetup drew approximately 100 in-person attendees, featuring adoption stories and best practices from tech giants such as Tencent, WeChat, Didi, Huawei, and Zhaopin.com. These companies are operating Pulsar at a large scale in their production environments. The event boasted a total of 12 engaging talks, each led by domain experts.",[48,35364,35365],{},"Without further ado, let's explore the highlights of this gathering.",[48,35367,35368],{},[384,35369],{"alt":18,"src":35370},"\u002Fimgs\u002Fblogs\u002F652e6189e9ff94701530f662_txXdm7fknPlr_1v06Q_L2Q3LrVffz4brW0CvpfbpgjqbUed8QrR8OzJ-gdxJiw2iSD9xEqDEk4lJoQL-emwDJVCtxlqNV5ZbOIVA-GgfYBe0xSSGBj2On4QXG8RkExEMff9I3pRb8HLsbrH2ReaX0KU.jpeg",[40,35372,35374],{"id":35373},"pulsar-30-and-beyond","Pulsar 3.0 and Beyond",[48,35376,35377],{},"Matteo Merli, the co-creator of Pulsar and the CTO of StreamNative, set the stage by sharing his inspirational journey of initiating Pulsar at Yahoo. He also narrated how the Pulsar community transformed Pulsar from a distributed pub-sub messaging platform into a multi-protocol streaming data platform. This metamorphosis led to Pulsar's full compatibility with other popular messaging and streaming protocols, including the likes of Kafka, AMQP, and MQTT. An interesting tidbit he shared was that Apache Pulsar was the pioneer in introducing the concept of tiered storage – a feature that has since been emulated by the competition, such as Kafka, Confluent, and Redpanda.",[48,35379,35380],{},[384,35381],{"alt":18,"src":35382},"\u002Fimgs\u002Fblogs\u002F652e619e1e0739426712e2e2_NVoShilZqejhqanDvx6UGyCblO5m7F150g88RT684C-gYJ_v03oc1QxdV2Q8etEQNf73H4YJef1_9l2_r4UXGS8_b2loWmsMrKTPO65v6gVTJEpXPCgFQzKBduIHEBgTe4plzUyheMFFnik2CJDV-6A.png",[48,35384,35385],{},"Matteo then pulled the curtain back on Pulsar 3.0, giving us a peek at the innovations and features nestled within this momentous release.",[48,35387,35388,35389,35394],{},"Pulsar 3.0 marked the introduction of the ",[55,35390,35393],{"href":35391,"rel":35392},"https:\u002F\u002Fpulsar.apache.org\u002Fcontribute\u002Frelease-policy\u002F",[264],"Long Term Support (LTS)"," release model, designed to provide prolonged support for releases. This approach aims to free users from the constant pressure of upgrading while still paving the way for rapid innovation to make Pulsar even more powerful.",[48,35396,35397],{},"Interestingly, Pulsar 3.0 also introduced BookKeeper 4.16, which came with numerous improvements aimed at decreasing both CPU usage and contention. These enhancements resulted in a remarkable boost in throughput and latency, particularly in scenarios involving numerous topics and scenarios where message batching was either ineffective or disabled. To put this into perspective, it almost doubled throughput when compared to running Pulsar 2.10 on a modest 3-node cluster housing over 10,000 topics.",[48,35399,35400],{},[384,35401],{"alt":18,"src":35402},"\u002Fimgs\u002Fblogs\u002F652e619d8f11016cfe9c8494_jGlQQGOXlvH5B08gNrA92uPzz7MJpXdwjikehUj7Eay5m-RElY8iL88ysFryvdUJKLrsIZ_92TWI4JXcDR_dc8juIGeQ2GI-HWgP-YwNacX4gu49SvlgAzsL9ty6SdDMUwKA3sM82FKPol2Qv3AAEFE.png",[48,35404,35405],{},"Pulsar 3.0 didn't stop there; it introduced features such as scalable delayed delivery, a novel load balancer, and support for multi-arch images. These additions promised to elevate Pulsar's performance and user-friendliness across a spectrum of use cases.",[48,35407,35408,35409,35412],{},"While Pulsar 3.0 represents a significant milestone, there are exciting innovations looming on the horizon. Matteo divulged that the Pulsar community's focal point is enhancing the user experience, scaling Pulsar to handle an excess of 1 million topics efficiently, and possibly even accommodating 10 million or 100 million topics to unlock previously unattainable use cases. The advent of ",[55,35410,5599],{"href":22142,"rel":35411},[264],", a scalable metadata storage solution crafted for modern cloud-native ecosystems, is set to tackle metadata and coordination challenges at an unprecedented scale. Furthermore, there is a concerted effort to address issues concerning metrics, service discovery, and session establishment within expansive clusters.",[48,35414,35415],{},"Pulsar 3.0 has inaugurated a new chapter in Pulsar's release management, with a plethora of improvements already introduced in 3.0 and 3.1, and many more slated for 2024. The Pulsar community now stands better equipped to expedite the delivery of features and enhancements, all while ensuring robustness and quality.",[8300,35417,35418],{"id":8006},"Session Highlights",[48,35420,35421,35422,35425],{},"Attempting to encapsulate all the sessions and panel discussions within the confines of a single blog post would be a Herculean task. Fret not, though – in a week's time, you can revisit this page ",[55,35423,267],{"href":35424},"\u002Fpulsar-summit"," to catch all the session recordings. In the interim, here's a sneak peek at some overarching themes encompassing use cases, deep dives into technology, and the ecosystem.",[40,35427,35429],{"id":35428},"message-queue-emerges-as-pulsars-killer-use-case","Message Queue Emerges as Pulsar's Killer Use Case",[48,35431,35432],{},"Were you aware that Didi, China's ridesharing giant, has harnessed the power of Apache Pulsar for several years to enrich the real-time experience for both its drivers and passengers? Qiang Huang from Didi graced the stage and regaled the audience with Didi's evolution from Kafka to RocketMQ, and eventually to Pulsar within their DDMQ (Didi Message Queue) platform. The DDMQ platform now spans thousands of machines, serving over 10,000 topics and processing trillions of messages daily, with peak traffic reaching a staggering 10 million messages per second. Remarkably, the DDMQ platform boasts a staggering 99.996%+ availability.",[48,35434,35435],{},[384,35436],{"alt":18,"src":35437},"\u002Fimgs\u002Fblogs\u002F652e619df8453b07949ab662_BpwveE37QnefdPkJcPflY4quSV_OM098oMJ_xqty3QClKA3TCqBGsRdF_SeJ-rCoukIbVPE3p0COrid_J9w8IxLuQFVo3-_SNJ0Wgh95edcowELT6pCGf_DAWOvCbcZbLFlEEZhELh6cMqCl_Q1xlqc.jpeg",[48,35439,35440],{},"Qiang delved into the rationale behind Didi's decision to transition from RocketMQ and Kafka-based backends to Pulsar. The motivations ranged from embracing a cloud-native architecture, drastically reducing operating costs (thanks to Pulsar's lower CPU usage, write latency, and end-to-end latency), exploiting the flexibility of SSDs and HDDs for storage, leveraging high-performance capabilities, and basking in the collaborative spirit of the rapidly growing Pulsar community.",[48,35442,35443,35444,35448],{},"Qiang further elucidated how Didi executed the seamless migration from RocketMQ and Kafka to Pulsar using the Protocol Handler framework, ensuring a frictionless transition. It's worth noting that this framework also serves as the bridge to facilitate protocol compatibility with ",[55,35445,35447],{"href":29592,"rel":35446},[264],"Kafka Protocol"," and various other messaging and streaming protocols.",[48,35450,35451],{},"Qiang then passed the baton to his colleague Bo Cong, a committer for both Pulsar and BookKeeper. Bo underscored Didi's prominent role as one of the largest contributors in China, with over 400 contributions to both Pulsar and BookKeeper communities. He shed light on some of the challenges encountered during Didi's adoption of Pulsar, including discussions around the incorporation of Pulsar-native features such as delayed messages and tiered storage.",[48,35453,35454,35455,35459],{},"These revelations from Didi perfectly aligned with ",[55,35456,35458],{"href":35457},"\u002Fblog\u002Femerging-patterns-in-data-streaming-insights-from-current-2023","our observations in Current 2023",". Notably, message queues have emerged as a leading use case for Pulsar, propelling its adoption. Additionally, our radar picked up on the trend of cost reduction and the strategic deployment of tiered storage solutions.",[40,35461,35463],{"id":35462},"tiered-storage-the-art-of-cost-efficient-data-streaming-infrastructure","Tiered Storage: The Art of Cost-Efficient Data Streaming Infrastructure",[48,35465,35466],{},"Yingqun Zhong from WeChat graced the stage to present WeChat's adept utilization of Pulsar's tiered storage. He commenced by elaborating on the various facets of data lifecycle and use cases within WeChat. Remarkably, only 1% of WeChat's use cases demand real-time data processing, characterized by message lifecycles of less than 10 minutes. In contrast, 9% of use cases necessitate catch-up reads and batch processing, relying on data freshness within a 2-hour window. The remaining 90% of use cases revolve around data replay and data backup, spanning data older than 2 hours. Within WeChat's operations, Pulsar holds the position of a mission-critical infrastructure. Yet, in certain scenarios, businesses require the long-term storage of data within Pulsar. This poses a cost challenge, particularly when employing BookKeeper with SSDs for storage.",[48,35468,35469],{},[384,35470],{"alt":18,"src":35471},"\u002Fimgs\u002Fblogs\u002F652e619dedcee5c748e434e7_dYtorfl2Ue_fd8KXapwWQ4jFBxQEuO6xKeSMahBeaPN0QAGiB4RiHA1yGz17jnM8dUFLezCt5Rj7BSkjkC8jHf4Caxr--M1h59F9Hd1yazMse35ePMnjZQDYBdoyOG3RXDK3AYZPUzEWQtpgwj9EGR8.jpeg",[48,35473,35474],{},"Enter Pulsar's tiered storage, also known as the offload framework, which proved to be a silver bullet solution for WeChat's challenges. With Pulsar's multi-layered architecture, WeChat adeptly segregates data – \"Hot Data\" remains in the brokers, \"Warm Data\" finds its home in the BookKeeper storage layer, while \"Cold Data\" is gracefully offloaded to more cost-effective storage options, such as cloud object stores or on-premises HDFS.",[48,35476,35477],{},"Zhong then delved into the meticulous optimizations performed to fine-tune Pulsar's tiered storage performance, ensuring it aligns seamlessly with WeChat's requirements. He also highlighted the collaborative efforts with StreamNative in adopting Lakehouse as a tiered storage solution for Pulsar. The resounding message was clear – tiered storage significantly diminishes data storage expenses, amplifies operational stability when Pulsar acts as the data bus, and propels Pulsar into the realm of infinite storage capacity.",[40,35479,35481],{"id":35480},"performance-and-scalability-pulsars-north-star","Performance and Scalability: Pulsar's North Star",[48,35483,35484],{},"In addition to the insightful use case presentations, many talks by Pulsar committers and contributors shone a spotlight on features and performance enhancements within Pulsar. Cong Zhao introduced a new implementation of Delayed Message, illuminating their genesis and evolution from an in-memory priority queue-based solution to a bucket-based solution. The revamped Delayed Message implementation can seamlessly accommodate arbitrary delayed messages while maintaining a low memory footprint (i.e., 40 MB memory usage to support 100 million delayed messages) and reducing delay message recovery time substantially. For example, it took 6120 seconds to recover a topic with 30 million delayed messages using in-memory priority queue-based solution, while it only took 90 seconds to recover a topic with 100 million delayed messages using the bucket-based solution.",[48,35486,24328,35487,33315],{},[384,35488],{"alt":18,"src":35489},"\u002Fimgs\u002Fblogs\u002F652e619f74d86658b89ed060_VHvRjbF-G14vly11wID9i31nNpOqFDts28ywCI4RzTwHB-zRW9CHhgL-uJ5Ikhuh7_CtTjTFCo-QLa4a3FxtUDV3xdlKmW0F0M28yn2CmKTJJJ17MdR1cxWCuMVwxAWuFTsJoYlj5blPf73Y1X4B59Y.png",[48,35491,35492,35493,190],{},"Yong Zhang, a BookKeeper committer, plunged into the AutoRecovery implementation – the linchpin ensuring data durability and fault tolerance within BookKeeper. Recognizing the shifting landscape where most Pulsar clusters are deployed within containerized or cloud environments, StreamNative is diligently evolving the AutoRecovery implementation into a modern cloud-native implementation, anchored by the ",[55,35494,35497],{"href":35495,"rel":35496},"https:\u002F\u002Fdocs.streamnative.io\u002Foperator",[264],"Kubernetes operator",[48,35499,35500,35501,35504],{},"Functions emerged as a hot topic of discussion. Rui Fu from StreamNative dissected the challenges associated with running functions using Function Worker on Kubernetes, showcasing how StreamNative developed ",[55,35502,29463],{"href":29461,"rel":35503},[264]," to bolster reliability. Meanwhile, Pengcheng shed light on a generic runtime implementation within Apache Pulsar. This innovation circumvents existing limitations by providing a framework that supports multiple programming languages, based on Rust and WebAssembly. As a cherry on top, any compiled WebAssembly modules can effortlessly slot into Pulsar Functions. The alpha version of this generic runtime for Pulsar Functions is set to debut on StreamNative Cloud in the near future – stay tuned for more!",[48,35506,35507],{},"Last but certainly not least, Penghui Li embarked on a comprehensive exploration of Topic Compaction, unraveling the intricacies within both Apache Pulsar and Apache Kafka. He explained how Topic Compaction was successfully implemented in Kafka on Pulsar, adding yet another feather to Pulsar's cap.",[40,35509,35511],{"id":35510},"the-rise-of-data-streaming-beyond-flink","The Rise of Data Streaming Beyond Flink",[48,35513,35514,35515,35518,35519,35524],{},"As highlighted in ",[55,35516,35517],{"href":35457},"our Current 2023 blog post",", Apache Flink has firmly established itself as the gold standard in streaming data processing. However, new innovators are emerging, poised to challenge Apache Flink's dominance. One such trailblazer is ",[55,35520,35523],{"href":35521,"rel":35522},"https:\u002F\u002Fwww.risingwave.com\u002F",[264],"Risingwave",", and we were thrilled to have Zilin Chen from Risingwave grace the meetup stage. He walked through the design details of Risingwave, how it is integrated with Apache Pulsar, and what is the difference between Apache Flink and Risingwave. The integration between Risingwave and Pulsar is now fully available in Risingwave. You can experience firsthand by downloading Risingwave and taking it for a spin.",[40,35526,319],{"id":316},[48,35528,35529,35530,35534,35535,35539],{},"The gathering of the APAC community and the vibrant showcase of Pulsar's pivotal role as a mission-critical component for real-time data streaming workloads left us brimming with excitement. This event served as a spirited prelude to the impending ",[55,35531,33883],{"href":35532,"rel":35533},"https:\u002F\u002Fpulsar-summit.org\u002Fevent\u002Fnorth-america-2023",[264],", just around the corner. We eagerly anticipate meeting more of our community members face-to-face at the summit – a reunion that is set to be quite exciting. ",[55,35536,10265],{"href":35537,"rel":35538},"https:\u002F\u002Fregistration.socio.events\u002Fe\u002Fpulsarsummitna2023",[264]," today!",[48,35541,35542],{},[34077,35543],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":35545},[35546,35547,35548,35549,35550,35551],{"id":35373,"depth":19,"text":35374},{"id":35428,"depth":19,"text":35429},{"id":35462,"depth":19,"text":35463},{"id":35480,"depth":19,"text":35481},{"id":35510,"depth":19,"text":35511},{"id":316,"depth":19,"text":319},"2023-10-16","Recap of Apache Pulsar Meetup Beijing 2023: Insights into Pulsar 3.0, tiered storage, best practices from tech giants, and the future of real-time data streaming.","\u002Fimgs\u002Fblogs\u002F652eac15b2c14020a359e9fb_IMG_9095.JPG",{},{"title":35348,"description":35553},"blog\u002Fapache-pulsar-kafka-protocol-tiered-storage-and-beyond-heres-what-happened-at-pulsar-meetup-beijing-2023",[302,799,35559,5376],"Success Stories","DRgjglw8AWEror79tzCy82B10eTBFnTJ7Qq4CGhpGjU",{"id":35562,"title":35563,"authors":35564,"body":35565,"category":821,"createdAt":290,"date":35757,"description":35758,"extension":8,"featured":294,"image":35759,"isDraft":294,"link":290,"meta":35760,"navigation":7,"order":296,"path":35761,"readingTime":11508,"relatedResources":290,"seo":35762,"stem":35763,"tags":35764,"__hash__":35765},"blogs\u002Fblog\u002Fapache-pulsar-enterprise-messaging-data-streaming-platform.md","Unpacking the Power of Apache Pulsar: The One-Stop Solution for Enterprise Messaging and Data Streaming",[31294],{"type":15,"value":35566,"toc":35746},[35567,35569,35572,35589,35592,35595,35598,35602,35607,35610,35613,35616,35619,35623,35626,35629,35632,35637,35644,35648,35651,35654,35659,35662,35666,35669,35672,35675,35678,35681,35685,35690,35693,35696,35700,35703,35706,35711,35714,35716,35719,35723,35726,35742],[40,35568,46],{"id":42},[48,35570,35571],{},"One of the key responsibilities of a centralized data platform for an enterprise is to provide essential services like Messaging and Data Streaming to multiple internal teams. This means facing numerous challenges, including:",[321,35573,35574,35577,35580,35583],{},[324,35575,35576],{},"Provisioning the right amount of resources at the right cost and at the right time without under-provisioning or over-provisioning.",[324,35578,35579],{},"Managing disparate technologies like RabbitMQ or Kafka",[324,35581,35582],{},"Managing multiple cluster instances",[324,35584,35585,35586],{},"Ensuring ",[55,35587,35588],{"href":34039},"compliance and data governance",[48,35590,35591],{},"In short, a lot of complexity!",[48,35593,35594],{},"Enter Apache Pulsar: a unified platform offering robust messaging and data streaming capabilities. With features like elasticity and multi-tenancy, Pulsar is uniquely equipped to help overcome these challenges.",[48,35596,35597],{},"This article delves into the key features that make Apache Pulsar the ideal choice for modern enterprises, focusing on its ability to deliver high performance, reliability, and cost-effectiveness.",[40,35599,35601],{"id":35600},"best-of-both-worlds-messaging-and-data-streaming","Best of Both Worlds: Messaging and Data Streaming",[48,35603,35604],{},[384,35605],{"alt":18,"src":35606},"\u002Fimgs\u002Fblogs\u002F652953735c71778d00230720_Screenshot-2023-09-08-at-6.26.11-PM.png",[48,35608,35609],{},"Data Streaming and Message Queuing may appear similar at first glance, but they serve distinct purposes. Data streaming platforms like Kafka are ill-suited for serving as a genuine message queue, whereas message queuing platforms like RabbitMQ are not intended for data streaming tasks. This is why separate solutions are available in the market to cater to these specific needs.",[48,35611,35612],{},"This is how you find yourself in the position of managing multiple separate technologies: one for data streaming and another for messaging.",[48,35614,35615],{},"However, Pulsar brings the best of both worlds into one package. Imagine the convenience of consolidating these disparate technologies into a single, unified platform. Not only does it cut down on the operational challenges of managing multiple systems, but it also creates a more streamlined developer experience.",[48,35617,35618],{},"No more juggling between multiple platforms for different needs. With Pulsar, you get a unified solution, making it simpler to manage and maintain.",[40,35620,35622],{"id":35621},"scalability-and-elasticity-adapt-quickly-to-your-needs-and-save-costs","Scalability and Elasticity: Adapt Quickly to Your Needs and Save Costs",[48,35624,35625],{},"One of the most compelling features of Pulsar is its unparalleled scalability and elasticity. While other platforms like Kafka can scale well, they don't offer the same level of elasticity.",[48,35627,35628],{},"Pulsar allows for the seamless addition of new consumers to address growing demand. Unlike Kafka, you usually don't need to create partitions to scale up consumers in Pulsar. In Kafka, this process can be cumbersome and use lots of resources. Pulsar's architecture lets you add new consumers easily, improving throughput without hassle.",[48,35630,35631],{},"Elasticity ensures you can quickly adapt to workload changes, which translates to both an excellent quality of service for demanding applications and significant cost savings. Pulsar’s design allows you to scale up or down easily, avoiding over-provisioning and thereby reducing infrastructure costs.",[48,35633,35634],{},[384,35635],{"alt":18,"src":35636},"\u002Fimgs\u002Fblogs\u002F652953b6e4fe2ebf82a04621_Untitled.png",[48,35638,35639,35640,190],{},"To understand Pulsar’s elastic architecture, check out ",[55,35641,18391],{"href":35642,"rel":35643},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TKs5T6N78Tc&t=141s",[264],[40,35645,35647],{"id":35646},"redundancy-native-geo-replication-a-safe-bet-for-your-data","Redundancy & Native Geo-replication: A Safe Bet for Your Data",[48,35649,35650],{},"Pulsar places a premium on the durability of your data. Features like message replication across multiple nodes ensure that data is never lost, even if some nodes fail.",[48,35652,35653],{},"Additionally, if a data center or cloud region experiences failure, Pulsar's built-in geo-replication ensures that data can be recovered.",[48,35655,35656],{},[384,35657],{"alt":18,"src":35658},"\u002Fimgs\u002Fblogs\u002F6529545b97b07d2dabb5214b_Screenshot-2023-09-08-at-6.34.21-PM.png",[48,35660,35661],{},"Your data is safe and replicated, giving you peace of mind.",[40,35663,35665],{"id":35664},"multi-tenancy-share-resources-not-data","Multi-Tenancy: Share Resources, Not Data",[48,35667,35668],{},"Multi-tenancy in Apache Pulsar enables a single software instance to serve multiple internal customers within an enterprise, keeping each one's data and configurations isolated.",[48,35670,35671],{},"This is invaluable for resource optimization, as it allows multiple departments to share a single Pulsar cluster, thereby reducing operational costs.",[48,35673,35674],{},"It also ensures data security, as each tenant's data remains segregated.",[48,35676,35677],{},"Moreover, management becomes more straightforward with centralized updates, monitoring, and backups.",[48,35679,35680],{},"From its inception, Pulsar has had built-in multi-tenancy, setting it apart from other messaging and data streaming platforms.",[40,35682,35684],{"id":35683},"protocol-handlers-compatibility-and-easy-transition","Protocol Handlers: Compatibility and Easy Transition",[48,35686,35687],{},[384,35688],{"alt":18,"src":35689},"\u002Fimgs\u002Fblogs\u002F6529548a82fc0676c44efa1b_Screenshot-2023-09-08-at-6.52.37-PM.png",[48,35691,35692],{},"Protocol Handlers in Apache Pulsar offer compatibility with Apache Kafka, RabbitMQ, and MQTT producers & consumers. They facilitate a more fluid migration process from one of these platforms to Pulsar.",[48,35694,35695],{},"Additionally, enterprises can benefit from Pulsar's advanced capabilities while leveraging their existing application portfolio and developers’ skillset. This ensures not only the ease of integration but also the preservation of past investments.",[40,35697,35699],{"id":35698},"rich-feature-set-with-no-vendor-lock-in","Rich Feature Set with No Vendor Lock-In",[48,35701,35702],{},"The rich feature set of Apache Pulsar is delivered without the burden of vendor lock-in, thanks to its open-source nature and the active and growing community behind it. This offers a great degree of flexibility and choice.",[48,35704,35705],{},"The open-source model also promotes innovation and continuous improvement, as contributions come from diverse organizations and individuals.",[48,35707,35708],{},[384,35709],{"alt":18,"src":35710},"\u002Fimgs\u002Fblogs\u002F652954960ce2830aa8d50166_Screenshot-2023-09-08-at-6.35.51-PM.png",[48,35712,35713],{},"Consequently, Pulsar is a future-proof solution that safeguards your investment while delivering robust and cutting-edge services.",[40,35715,2125],{"id":2122},[48,35717,35718],{},"Apache Pulsar addresses the common challenges faced by enterprises with its unique blend of features, offering a unified messaging and data streaming platform that is scalable, elastic, secure, and cost-effective. If you're looking for a centralized platform that can adapt to your evolving needs, look no further. Pulsar is not just another tool; it’s a comprehensive solution that brings numerous benefits, from cost savings to operational efficiency, making it the ideal choice for any modern enterprise.",[40,35720,35722],{"id":35721},"next-steps","Next Steps",[48,35724,35725],{},"The following resources will help you delve deeper into the topics covered in this blog article:",[321,35727,35728,35734,35738],{},[324,35729,35730],{},[55,35731,35733],{"href":32255,"rel":35732},[264],"Understanding Apache Pulsar in 10 minutes",[324,35735,35736],{},[55,35737,32178],{"href":32177},[324,35739,35740],{},[55,35741,34040],{"href":34039},[48,35743,35744],{},[34077,35745],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":35747},[35748,35749,35750,35751,35752,35753,35754,35755,35756],{"id":42,"depth":19,"text":46},{"id":35600,"depth":19,"text":35601},{"id":35621,"depth":19,"text":35622},{"id":35646,"depth":19,"text":35647},{"id":35664,"depth":19,"text":35665},{"id":35683,"depth":19,"text":35684},{"id":35698,"depth":19,"text":35699},{"id":2122,"depth":19,"text":2125},{"id":35721,"depth":19,"text":35722},"2023-10-13","Explore the transformative capabilities of Apache Pulsar, a unified platform for enterprise messaging and data streaming. Dive into its unique features, from scalability and elasticity to multi-tenancy and protocol compatibility, that make it a cost-effective, reliable, and high-performance alternative to Kafka.","\u002Fimgs\u002Fblogs\u002F65651ac2723ac0c831c4e8bb_image-9.png",{},"\u002Fblog\u002Fapache-pulsar-enterprise-messaging-data-streaming-platform",{"title":35563,"description":35758},"blog\u002Fapache-pulsar-enterprise-messaging-data-streaming-platform",[7347,799,821,11043,5954],"CZ5suBR6eSc1hPgKAmZnwHGQFZpMJPyTKbLIXMtIOYU",{"id":35767,"title":35768,"authors":35769,"body":35770,"category":7338,"createdAt":290,"date":35901,"description":35902,"extension":8,"featured":294,"image":35903,"isDraft":294,"link":290,"meta":35904,"navigation":7,"order":296,"path":35905,"readingTime":4475,"relatedResources":290,"seo":35906,"stem":35907,"tags":35908,"__hash__":35909},"blogs\u002Fblog\u002Fstreaming-sql-databases-meet-streaming.md","Streaming SQL: Databases Meet Streaming",[806],{"type":15,"value":35771,"toc":35893},[35772,35775,35779,35782,35785,35788,35792,35795,35798,35802,35805,35808,35811,35815,35818,35821,35846,35849,35857,35860,35863,35866,35870,35878,35880,35887,35889],[48,35773,35774],{},"SQL has been a fundamental tool for software engineers in building applications for several decades. However, the world has undergone significant transformations since SQL's inception in 1974. In this article, we will explore how SQL is adapting to the changing landscape of data usage in application development. We'll delve into the history of SQL, its traditional role in data systems, and its evolution to meet the demands of streaming data. Additionally, we'll take a closer look at various technologies and vendors that are shaping the field of Streaming SQL.",[40,35776,35778],{"id":35777},"the-role-of-sql-today","The Role of SQL Today",[48,35780,35781],{},"Over fifty years ago, when Edgar Codd introduced the concept of a relational database and SQL as its query language, our world and technological landscape were vastly different from what we experience today. Relational databases were designed to work with a single shared data set, enabling efficient query processing and computations to enhance productivity. SQL played a crucial role in simplifying data manipulation and storage, revolutionizing tasks such as inventory management and financial accounting.",[48,35783,35784],{},"However, our present-day reality is characterized by constant online activity and an incessant generation and consumption of data. Data never rests; it flows from various sources, driving computations and business logic in our applications. Unlike in the past, we no longer work with a single, shared data set. Instead, we interact with data streams that originate from diverse locations. In this new era, our infrastructures must be capable of processing these data streams in real time. Unfortunately, the traditional relational database and SQL were not designed to meet the demands of this futuristic world.",[48,35786,35787],{},"To address the challenges posed by streaming data, technologies like Apache Pulsar and Apache Kafka emerged, enabling the creation, collection, storage, and processing of streaming and messaging data. While these advancements have significantly improved the field of stream processing, the developer experience for working with streaming data is still a far cry from the simplicity and familiarity of writing declarative SQL statements in a traditional relational database.",[40,35789,35791],{"id":35790},"introducing-streaming-sql","Introducing Streaming SQL",[48,35793,35794],{},"One of the primary obstacles faced by companies adopting stream processing technologies is the steep learning curve associated with stream processing systems. Unlike conventional databases like MySQL and PostgreSQL, which provide SQL as the interactive interface, most streaming systems require users to learn platform-specific programming interfaces, often in Java, to manipulate streaming data. This learning process can be daunting, especially for non-technical individuals. Additionally, stream processing systems represent data in a different manner than databases, necessitating the creation of complex data extraction logic to facilitate data transit between streaming systems and databases.",[48,35796,35797],{},"Given the evolving landscape of data streaming and the need for user-friendly solutions, the concept of \"Streaming SQL\" has emerged. Streaming SQL aims to provide new language abstractions and query semantics that can handle both streaming and static data, simplifying the process of solving complex use cases. By leveraging the familiar declarative nature of SQL, Streaming SQL allows users to focus on what they want to achieve, while the underlying stream processing engine handles the intricacies of execution.",[40,35799,35801],{"id":35800},"the-basics","The Basics",[48,35803,35804],{},"When using Streaming SQL, several key distinctions become apparent. Traditional SQL queries on a database return static results from a specific point in time. In contrast, Streaming SQL queries operate on data streams, rendering point-in-time answers less relevant. Instead, continuous queries that update themselves, often referred to as materialized views, become more valuable in the streaming context. Each Streaming SQL vendor has its own approach to achieving materialized views.",[48,35806,35807],{},"Similarly, the concept of response time in traditional databases differs from the notion of lag in streaming SQL systems. While traditional databases focus on query response times, streaming SQL systems introduce the concept of time lag, which represents the delay between input events and the corresponding output results. Understanding the existence of time lag helps users write and utilize streaming SQL in ways that avoid potential issues.",[48,35809,35810],{},"Another distinction lies in the work creation process. Traditional databases remain idle until a query is received, whereas streaming SQL systems generate work based on incoming data from the stream. Different vendors employ various strategies to handle this work creation process.",[40,35812,35814],{"id":35813},"the-benefits-and-challenges","The Benefits and Challenges",[48,35816,35817],{},"Streaming SQL is particularly well-suited for use cases that involve repetitive queries, such as dashboards, reports, and automation. However, its introduction also presents challenges. The official SQL standard lacks support for Streaming SQL functionality, leading to vendors adopting their own syntax or dialect extensions to existing SQL standards like Postgres. As a result, users face the challenge of choosing the right Streaming SQL system among the diverse offerings in the market.",[48,35819,35820],{},"To help users navigate through the challenges, we categorize the Streaming SQL vendors into three groups.",[321,35822,35823,35834,35843],{},[324,35824,35825,35826,4003,35830,35833],{},"Stream Processors: ",[55,35827,2139],{"href":35828,"rel":35829},"https:\u002F\u002Fspark.apache.org\u002F",[264],[55,35831,31802],{"href":31800,"rel":35832},[264]," (Flink vendors include but are not limited to Ververica, Confluent, Decodable, and DeltaStream)",[324,35835,35836,35837,4003,35840],{},"Stream Storage Systems: ",[55,35838,821],{"href":23526,"rel":35839},[264],[55,35841,799],{"href":31428,"rel":35842},[264],[324,35844,35845],{},"New Vendors Building Streaming SQL Solutions: Risingwave, Timeplus, etc.",[48,35847,35848],{},"Apache Spark and Apache Flink are the most popular data processing engines that support both batch and stream processing. While Apache Flink is considered the de facto standard for stream processing, Apache Spark is widely used for batch processing. Both systems offer SQL layers on top of their data processing engines, simplifying the writing of data processing jobs for users. Ververica, the company founded by the original creators of Flink, was the first pioneer in commercializing Apache Flink. Both Confluent and Decodable also provide product offerings based on Apache Flink to address the management headache of Apache Flink. Deltastream, founded by the creator of KSQL, aims to provide a powerful solution powered by Apache Flink, offering both streaming analytics and a streaming database in one comprehensive package.",[48,35850,35851,35852,35856],{},"Apache Kafka and Apache Pulsar are two highly popular streaming data storage systems. Confluent, the company behind Kafka, introduced KStream and KSQL years ago, providing users with tools for data processing and streaming SQL within Kafka. This allowed Confluent to compete with the Flink ecosystem for data processing. However, after acquiring Immerok earlier this year, Confluent decided to adopt Apache Flink as its data processing engine. On the other hand, the founders of StreamNative took a different approach. Instead of introducing a separate data processing engine, they built Pulsar Functions, a lightweight serverless event processing framework with an emphasis on simplicity and seamless integration with Pulsar. Pulsar Functions address a significant portion of trivial stream processing use cases to help users reduce the steep learning curve and heavy maintenance overhead. StreamNative additionally introduced a ",[55,35853,35855],{"href":35854},"\u002Fvideos\u002Fpulsar-summit-san-francisco-2022-ecosystem-simplify-pulsar-functions-development-with-sql","SQL extension for Pulsar Functions called \"pfSQL\""," during Pulsar Summit San Francisco 2022, enabling Pulsar users to write SQL-like declarative statements for event processing.",[48,35858,35859],{},"While it is common to see stream processors and stream storage systems incorporating SQL to simplify stream processing, new players in the data streaming space completely abstract the data processing layer from end users. Instead, they introduce streaming SQL directly as the user interface for interaction.",[48,35861,35862],{},"Risingwave, for example, is an open-source distributed SQL streaming database designed for the cloud. It is built from scratch using Rust and seamlessly integrates with the Postgres SQL ecosystem.",[48,35864,35865],{},"Timeplus takes a different approach by offering a unified platform for streaming and historical OLAP. They have recently open-sourced their core stream processing engine, Proton.",[40,35867,35869],{"id":35868},"choosing-the-right-solution","Choosing the Right Solution",[48,35871,35872,35873,35877],{},"With the growing prominence of Streaming SQL, selecting the appropriate streaming SQL product can be a daunting task. To address this challenge, StreamNative is organizing a keynote panel discussion, “Streaming SQL: Databases Meet Streaming”, during ",[55,35874,35876],{"href":35532,"rel":35875},[264],"Pulsar Summit North America 2023 on Wednesday, October 25, in San Francisco",". The event will bring together core technologists from leading vendors such as Databricks, Deltastream, Risingwave, Timeplus, and StreamNative to discuss Streaming SQL and explore the future of data streaming. This conference offers an excellent opportunity for data streaming community users to connect with like-minded enthusiasts and engage in insightful discussions about the future of data streaming.",[40,35879,319],{"id":316},[48,35881,35882,35883,35886],{},"Streaming SQL has emerged as a game-changer in the data streaming landscape, enabling users to simplify their stream processing tasks. However, the lack of standardized support poses challenges for users seeking a unified solution. By understanding the different categories of vendors and their offerings, users can make informed decisions when selecting a Streaming SQL solution. Events like ",[55,35884,33883],{"href":35532,"rel":35885},[264]," provide a platform for industry experts to share their insights and collectively shape the future of data streaming.",[48,35888,3931],{},[48,35890,35891],{},[34077,35892],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":35894},[35895,35896,35897,35898,35899,35900],{"id":35777,"depth":19,"text":35778},{"id":35790,"depth":19,"text":35791},{"id":35800,"depth":19,"text":35801},{"id":35813,"depth":19,"text":35814},{"id":35868,"depth":19,"text":35869},{"id":316,"depth":19,"text":319},"2023-10-10","Explore the evolution of SQL in the world of data streaming, from its traditional role to the emergence of Streaming SQL. Learn about challenges, benefits, and key vendors shaping the landscape.","\u002Fimgs\u002Fblogs\u002F65651a36b3803cd436575085_image-8.png",{},"\u002Fblog\u002Fstreaming-sql-databases-meet-streaming",{"title":35768,"description":35902},"blog\u002Fstreaming-sql-databases-meet-streaming",[1331,5376],"LnJ920yFKa-KOyv3zjSGMpiWrsNakv9z_FffFCvxX_c",{"id":35911,"title":35912,"authors":35913,"body":35914,"category":290,"createdAt":290,"date":36045,"description":36046,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":36047,"navigation":7,"order":296,"path":35457,"readingTime":4475,"relatedResources":290,"seo":36048,"stem":36049,"tags":36050,"__hash__":36051},"blogs\u002Fblog\u002Femerging-patterns-in-data-streaming-insights-from-current-2023.md","Emerging Trends in Data Streaming: Insights from Current 2023",[806],{"type":15,"value":35915,"toc":36038},[35916,35919,35923,35926,35929,35932,35936,35939,35948,35951,35954,35957,35961,35970,35973,35976,35979,35982,35985,35988,35991,35995,35998,36007,36010,36017,36019,36022,36034],[48,35917,35918],{},"Last week, I had the pleasure to attend Current 2023, a data-streaming conference hosted by Confluent. This dynamic convention united thousands of aficionados from the realms of messaging and data streaming. The conference allowed our team to engage in profound discussions with a multitude of participants, vendors, and peers, elucidating the escalating impact of messaging and data streaming in the current industrial landscape. I am thrilled to impart some critical insights and observations I garnered during the conference. These insights and observations have confirmed that our design choices involving Apache Pulsar and StreamNative Cloud have been pioneering in several fields for years. These fields include but are not limited to, queue semantics, cost-efficient multi-tenancy, integration with Flink, BYOC model, and so forth.",[40,35920,35922],{"id":35921},"message-queuing-for-kafka-users","Message queuing for Kafka users",[48,35924,35925],{},"Kafka has established its reputation as an event streaming system, predominantly conceived for transferring data between pipelines and services. Confluent’s representation of Kafka as a messaging system has led to widespread acknowledgment in the community and industry, albeit with some missing pieces. Kafka, though impactful, misses several queuing functionalities found in conventional messaging queuing systems, such as scheduled messages, delayed messages, Time-To-Live (TTL), dead-letter-queue, individual acknowledgment, and more.",[48,35927,35928],{},"This limitation of Kafka API and semantics has caused adopters with extensive engineering resources like tech unicorns to layer additional queuing semantics atop Kafka. Others, feeling the constraints, turn to alternatives like Apache Pulsar. Apache Pulsar has effectively positioned itself as a unified data streaming platform, offering a flexible messaging model supporting both event streaming and message queuing, appealing to a broad user base seeking the scalability of Kafka and advanced messaging queue semantics.",[48,35930,35931],{},"Addressing the noted gaps, Confluent endeavors to integrate Queue semantics into Kafka through KIP-932, aiming to bring about a unified streaming data platform that accommodates both event streaming and message queuing needs, emulating what Pulsar has already achieved for years.",[40,35933,35935],{"id":35934},"cost-efficiency-is-what-everyone-cares-about","Cost efficiency is what everyone cares about",[48,35937,35938],{},"Given the current economic recession, it's imperative for companies globally to prioritize cost reduction. During the keynote at Current 2023, Confluent unveiled Kora, its cloud-native Kafka engine. This introduction promises a potential cost reduction of up to 40% for Confluent customers.",[48,35940,35941,35942,35947],{},"Similarly, Redpanda emerges as another pivotal vendor focusing on cost reduction, asserting a claim of delivering a sixfold lower cloud spend compared to conventional Kafka offerings. The ongoing battle for cost savings between Redpanda and Confluent is evident. Remarkably, Redpanda has been running a challenge aiming to cut Confluent customers’ bills by 50%. However, it’s noteworthy that Redpanda’s primary emphasis regarding cost reduction predominantly pivots on the performance of a single cluster. ",[55,35943,35946],{"href":35944,"rel":35945},"https:\u002F\u002Fjack-vanlightly.com\u002Fblog\u002F2023\u002F5\u002F15\u002Fkafka-vs-redpanda-performance-do-the-claims-add-up",[264],"Jack Vanlightly’s insightful post ","provides an extensive analysis contrasting the performance of Kafka and Redpanda to invalidate Redpanda’s claims.",[48,35949,35950],{},"It's crucial to acknowledge that although each of these systems may offer its distinct benefits in particular scenarios, the efficacy of a streaming data system is constrained by the inherent network and disk bandwidth of the underlying resources. Therefore, real cost reductions are achieved through adept optimization of network and disk utilization. The substantial cost predominantly emanates from the absence of multi-tenancy functionality. Consequently, within most organizations, the norm becomes establishing a separate Kafka cluster for each team and overprovisioning resources to accommodate growth projections. Dialogues with Kafka users managing dozens of Kafka clusters revealed that approximately 70-80% of those clusters are underutilized.",[48,35952,35953],{},"When juxtaposed with the approaches of Kafka (Confluent) and Redpanda, StreamNative’s resolution to this predicament is fundamentally ingrained in the incorporation of native multi-tenancy features within the core of Apache Pulsar. Hence, it’s accessible across deployments utilizing Pulsar, irrespective of whether the deployment utilizes open-source Pulsar or StreamNative products.",[48,35955,35956],{},"Multi-tenancy is ascending as the imminent significant trend within streaming data systems for achieving cost efficiency. I conjecture that to catch up with Pulsar and StreamNative, Confluent will inevitably integrate this feature into its future product offerings.",[40,35958,35960],{"id":35959},"bring-your-own-cloud-byoc-is-the-path-to-untangle-data-privacy-and-data-sovereignty-in-the-cloud","Bring Your Own Cloud (BYOC) is the path to untangle data privacy and data sovereignty in the cloud",[48,35962,35963,35964,35969],{},"While Jack Valightly's recent exposition on \"",[55,35965,35968],{"href":35966,"rel":35967},"https:\u002F\u002Fjack-vanlightly.com\u002Fblog\u002F2023\u002F9\u002F25\u002Fon-the-future-of-cloud-services-and-byoc",[264],"The Future Of Cloud Services And BYOC","\" has made for an engaging read, it notably leans towards a preference for Confluent Cloud. However, interactions with numerous vendors at the Current conference highlighted a considerable gap in perspective. A majority of vendors, including Decodable, Veverica, DeltaStream, RisingWave, and others, have expressed their unequivocal support for the Bring Your Own Cloud (BYOC) deployment model.",[48,35971,35972],{},"Currently, the prevalent offerings to manage data streaming platforms are self-hosted and vendor-hosted (SaaS). Both have distinct advantages and disadvantages.",[48,35974,35975],{},"Self-hosted solutions, revered for the unparalleled control they offer over data, are particularly appealing to organizations emphasizing data privacy, security, and sovereignty. However, these require significant initial investments in infrastructure and human resources.",[48,35977,35978],{},"In contrast, SaaS solutions serve as a comprehensive solution for setup, monitoring, maintenance, and scaling but might face challenges regarding transparency, access control, and residency, potentially resulting in trust issues.",[48,35980,35981],{},"Vendors championing BYOC assert that it amalgamates the advantages of both self-managed and SaaS solutions. It enables companies to set up their clusters within their Virtual Private Cloud (VPC), maintaining data within their environment while outsourcing operations and maintenance. This methodology not only assures data privacy and compliance but also facilitates scalability on the organization's infrastructure, aligning seamlessly with data sovereignty requisites.",[48,35983,35984],{},"Furthermore, BYOC allows organizations to capitalize on infrastructure discounts offered by cloud providers, rewarding long-term spending commitments with substantial discounts. In the prevailing economic recession, BYOC stands out as a beneficial approach, enabling organizations to optimize their existing cloud commitments.",[48,35986,35987],{},"Although the allure of a fully SaaS model is undeniable, the pragmatic reality underscores BYOC as a beacon for data sovereignty, providing a meticulously managed cloud model. Jack’s contention in his blog post is that BYOC falls short in delivering operational efficiency to customers, a statement that holds both validity and contradiction. It is indeed true for numerous systems, including Kafka, primarily due to its lack of multi-tenancy, necessitating the deployment of multiple Kafka clusters into a customer’s VPC by vendors. However, this is not the case for Apache Pulsar. Given its native multi-tenancy support, Pulsar inherently achieves operational efficiency even being deployed via BYOC.",[48,35989,35990],{},"The unfolding debate between Confluent and other BYOC vendors is indeed riveting. At StreamNative, we are steadfast in our belief that BYOC is the path forward for data privacy and sovereignty. It enables the provision of operational efficiency through native multi-tenancy and lays down robust foundations for ensuring data privacy and sovereignty.",[40,35992,35994],{"id":35993},"the-rise-of-a-data-streaming-platform-flink-is-the-de-facto-standard-for-stream-processing","The rise of a Data Streaming Platform; Flink is the de facto standard for stream processing",[48,35996,35997],{},"I am uncertain whether the nuanced shift in Confluent’s platform—from event streaming to data in motion and now to a data streaming platform—has caught widespread attention. This transformation occurred after Confluent’s acquisition of Immerok. This shift, arguably, signals the limitations of KStream and KSQL, as, within the framework of a Data Streaming platform, supporting two disparate processing technologies seems counterintuitive.",[48,35999,36000,36001,36006],{},"While Confluent is not the pioneer of Apache Flink, it has played a significant educational role in propagating this technology, inadvertently aiding other Flink vendors by positioning Flink more prominently in mainstream discussions. Ververica has maintained its market vigor post the Immerok spinoff. Conversations with the Ververica team resonate with palpable enthusiasm, making the upcoming ",[55,36002,36005],{"href":36003,"rel":36004},"https:\u002F\u002Fwww.flink-forward.org\u002Fseattle-2023",[264],"Flink Forward Seattle 2023"," in November a highly anticipated event. Beyond Confluent and Ververica, Decodable is streamlining Flink's intricate lower-level details to offer users simplified stream processing capabilities, and DeltaStream is introducing a serverless Streaming SQL platform empowered by Apache Flink.",[48,36008,36009],{},"Apache Flink, a noteworthy entity in the big-data ecosystem, has encountered critiques regarding its user-friendliness and cost-efficiency. Both Ververica and Confluent are navigating these challenges by providing fully managed Flink and Flink SQL services. However, emerging entities like RisingWave and Timeplus are demonstrating considerable potential to secure larger market segments.",[48,36011,36012,36013,36016],{},"Moreover, Streaming SQL is persistently generating discussions among vendors specializing in stream processing products. We at StreamNative, are slated to moderate a panel discussion “Streaming SQL: Databases Meet Stream Processing” with these vendors at the forthcoming ",[55,36014,33883],{"href":35532,"rel":36015},[264]," on Wednesday, October 25, in San Francisco. For those intrigued by industry trends surrounding Streaming SQL, this summit presents an invaluable opportunity to engage with the creators and vendors shaping streaming SQL.",[40,36018,319],{"id":316},[48,36020,36021],{},"The Current 2023 event showcased intriguing trends in the data streaming era, illuminating the future of multi-tenant data streaming platforms. These platforms are poised to support both event streaming and message queuing, facilitate interconnections between microservices and data pipelines\u002Fservices, and offer SQL and stream processing capabilities. The event was highly enlightening.",[48,36023,36024,36025,36028,36029,36033],{},"For those who have a keen interest in delving deeper into data streaming trends, I extend an invitation to attend the ",[55,36026,33883],{"href":35532,"rel":36027},[264]," on October 25, 2023. ",[55,36030,36032],{"href":35357,"rel":36031},[264],"Register now ","to continue exploring the exciting realm of data streaming in San Francisco!",[48,36035,36036],{},[34077,36037],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":36039},[36040,36041,36042,36043,36044],{"id":35921,"depth":19,"text":35922},{"id":35934,"depth":19,"text":35935},{"id":35959,"depth":19,"text":35960},{"id":35993,"depth":19,"text":35994},{"id":316,"depth":19,"text":319},"2023-10-02"," A comprehensive reflection on the Current 2023 data-streaming conference highlighting critical insights into the evolution of data streaming platforms. The article emphasizes Apache Pulsar's unique capabilities, the significance of multi-tenancy, BYOC's role in the cloud landscape, and the ascent of Apache Flink as the go-to for stream processing.",{},{"title":35912,"description":36046},"blog\u002Femerging-patterns-in-data-streaming-insights-from-current-2023",[1331,799,27847],"MkznTkod8KKEDUZiU785S4rTB0zki-B0RCEdagauZhE",{"id":36053,"title":36054,"authors":36055,"body":36057,"category":821,"createdAt":290,"date":36269,"description":36270,"extension":8,"featured":294,"image":36271,"isDraft":294,"link":290,"meta":36272,"navigation":7,"order":296,"path":36273,"readingTime":11180,"relatedResources":290,"seo":36274,"stem":36275,"tags":36276,"__hash__":36277},"blogs\u002Fblog\u002Fapache-pulsar-3-1.md","Introducing Apache Pulsar 3.1",[36056],"Tison Chen",{"type":15,"value":36058,"toc":36258},[36059,36062,36070,36074,36078,36087,36090,36093,36102,36106,36115,36118,36121,36129,36133,36136,36139,36142,36150,36154,36162,36165,36173,36177,36180,36183,36186,36188,36202,36206,36215,36256],[48,36060,36061],{},"We are proud to contribute to the release of Apache Pulsar 3.1.0, a new feature release! This is a remarkable community effort, with over 80 contributors submitting more than 360 commits for feature enhancements and bug fixes. We are glad to collaborate with all the contributors to make this release happen!",[48,36063,36064,36065,190],{},"This blog post will highlight some of the more prominent features. For a full list of changes, be sure to check ",[55,36066,36069],{"href":36067,"rel":36068},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002Fversioned\u002Fpulsar-3.1.0\u002F",[264],"the release notes",[40,36071,36073],{"id":36072},"whats-new-in-apache-pulsar-31","What’s new in Apache Pulsar 3.1?",[32,36075,36077],{"id":36076},"pluggable-topic-compaction-service","Pluggable topic compaction service",[48,36079,36080,36081,36086],{},"Pulsar's ",[55,36082,36085],{"href":36083,"rel":36084},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Fconcepts-topic-compaction\u002F",[264],"Topic Compaction"," feature provides a key-based data retention mechanism that allows users to keep only the most recent message associated with a specific key. This helps reduce storage space and improve system efficiency.",[48,36088,36089],{},"Data in topics can be stored in various formats. For example, KoP (Kafka protocol handler) can store data in Kafka format.",[48,36091,36092],{},"Previously, Pulsar always compacted topic data, assuming that messages were in the Pulsar data format. However, this approach had limitations, as it prevented protocol handlers from utilizing the topic compaction feature with customized data formats such as the Kafka format used by KoP.",[48,36094,36095,36096,36101],{},"That's why ",[55,36097,36100],{"href":36098,"rel":36099},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F20624",[264],"PIP-278"," introduced a pluggable topic compaction service interface to support customization of the actual compaction logic. This customization can be done while the major compaction task is still controlled by the Pulsar broker. This change primarily benefits protocol handlers developers.",[32,36103,36105],{"id":36104},"pluggable-partition-assignment-strategy","Pluggable partition assignment strategy",[48,36107,36108,36109,36114],{},"Pulsar offers robust support for ",[55,36110,36113],{"href":36111,"rel":36112},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Fconcepts-broker-load-balancing-concepts\u002F",[264],"load balancing"," to ensure efficient resource utilization across Pulsar clusters.",[48,36116,36117],{},"The fundamental unit for load balancing is the topic bundle, which refers to a group of topics within the same namespace.",[48,36119,36120],{},"Previously, the only strategy for assigning a topic to a topic bundle was consistent hashing. However, this strategy doesn't fit all scenarios.",[48,36122,36123,36128],{},[55,36124,36127],{"href":36125,"rel":36126},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F19806",[264],"PIP-255"," introduced a pluggable topic bundle (partition) assignment interface to allow customization of the assignment algorithm. This enables users to adjust the strategy according to their specific scenarios.",[32,36130,36132],{"id":36131},"metadata-size-threshold-for-compression","Metadata size threshold for compression",[48,36134,36135],{},"Previously, even if the metadata was small, we had to apply compression. Now, we support a size-based threshold.",[48,36137,36138],{},"Starting from version 2.9, Pulsar supports compressing managed ledger information and managed cursor information stored in the metadata store. This feature can significantly reduce the size of large metadata.",[48,36140,36141],{},"However, for small metadata, compression doesn't provide significant benefits and may consume unnecessary computational resources.",[48,36143,36144,36149],{},[55,36145,36148],{"href":36146,"rel":36147},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F20307",[264],"PIP-270"," introduces two configuration options: managedLedgerInfoCompressionThresholdInBytes and managedCursorInfoCompressionThresholdInBytes. These options allow users to customize the size threshold for compressing metadata, with the default value set to 16 KB.",[32,36151,36153],{"id":36152},"lazy-creation-of-offload-resources","Lazy creation of offload resources",[48,36155,36156,36161],{},[55,36157,36160],{"href":36158,"rel":36159},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Ftiered-storage-overview\u002F",[264],"Tiered storage"," is an essential technology that enables the migration of old topic data from BookKeeper to long-term and more cost-effective storage while maintaining transparent client access to the topic data.",[48,36163,36164],{},"Tiered storage operates through offloaders. Previously, when a topic was created, the offloader immediately generated the associated offload resources, even though these resources remained unused until the actual offloading task was triggered.",[48,36166,36167,36172],{},[55,36168,36171],{"href":36169,"rel":36170},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F20775",[264],"PR-20775"," modifies this behavior by lazily creating the offload blob store. This means that the actual allocation occurs only when the offloading task is triggered, preventing excessive preallocation of resources.",[40,36174,36176],{"id":36175},"compatibility-between-releases","Compatibility between releases",[48,36178,36179],{},"When upgrading an existing Pulsar installation, it's crucial to perform component upgrades in a sequential manner.",[48,36181,36182],{},"Starting from version 3.0, users have the option to perform live upgrades or downgrades between two consecutive LTS versions or two consecutive feature versions (which also include LTS versions).",[48,36184,36185],{},"For the 3.1 series, you should be able to upgrade directly from version 3.0 or downgrade from the subsequently released version 3.2. If you are currently using an earlier version, please ensure that you upgrade to version 3.0 before proceeding further.",[40,36187,752],{"id":749},[48,36189,36190,36191,36196,36197,190],{},"Pulsar 3.1.0 is now available for ",[55,36192,36195],{"href":36193,"rel":36194},"https:\u002F\u002Fpulsar.apache.org\u002Fdownload\u002F",[264],"download",". To get started with Pulsar, you can run a Pulsar cluster ",[55,36198,36201],{"href":36199,"rel":36200},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Fgetting-started-home\u002F",[264],"on your local machine, Docker, or Kubernetes",[40,36203,36205],{"id":36204},"getting-involved","Getting involved",[48,36207,36208,36209,36214],{},"Apache Pulsar is one of the fastest-growing open-source projects, recognized by the ",[55,36210,36213],{"href":36211,"rel":36212},"https:\u002F\u002Fthestack.technology\u002Ftop-apache-projects-in-2021-from-superset-to-nuttx\u002F",[264],"Apache Software Foundation"," as a Top 5 Project based on engagement. The vitality of Pulsar relies on continued community growth, which would not be possible without each and every contributor to the project. The Pulsar community welcomes contributions from anyone with a passion for open source, messaging, and streaming, as well as distributed systems! Looking for more ways to stay connected with the Pulsar community? Check out the following resources:",[321,36216,36217,36226,36245],{},[324,36218,36219,36220,36225],{},"Read the ",[55,36221,36224],{"href":36222,"rel":36223},"https:\u002F\u002Fpulsar.apache.org\u002Fcontribute\u002F",[264],"Apache Pulsar Contribution Guide"," to start your first contribution.",[324,36227,11133,36228,36233,36234,36239,36240,190],{},[55,36229,36232],{"href":36230,"rel":36231},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar",[264],"Pulsar GitHub repository",", follow ",[55,36235,36238],{"href":36236,"rel":36237},"https:\u002F\u002Ftwitter.com\u002Fapache_pulsar",[264],"@apache_pulsar"," on Twitter\u002FX , and join the ",[55,36241,36244],{"href":36242,"rel":36243},"https:\u002F\u002Fapache-pulsar.slack.com\u002F",[264],"Pulsar community on Slack",[324,36246,36247,36248,36250,36251,36255],{},"Get started with ",[55,36249,3550],{"href":30989}," to unlock the full power of Apache Pulsar as a cloud service. Follow ",[55,36252,36254],{"href":33664,"rel":36253},[264],"@streamnativeio"," on Twitter\u002FX for news.",[48,36257,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":36259},[36260,36266,36267,36268],{"id":36072,"depth":19,"text":36073,"children":36261},[36262,36263,36264,36265],{"id":36076,"depth":279,"text":36077},{"id":36104,"depth":279,"text":36105},{"id":36131,"depth":279,"text":36132},{"id":36152,"depth":279,"text":36153},{"id":36175,"depth":19,"text":36176},{"id":749,"depth":19,"text":752},{"id":36204,"depth":19,"text":36205},"2023-10-01","Discover the new features in Apache Pulsar 3.1.0! Dive into pluggable compaction, partition strategies, metadata compression thresholds, and more.","\u002Fimgs\u002Fblogs\u002F653b0de15866dac5fbac0228_pulsar-1200-630.png",{},"\u002Fblog\u002Fapache-pulsar-3-1",{"title":36054,"description":36270},"blog\u002Fapache-pulsar-3-1",[302,821],"qrNyu17XwIU2EK1EoazUkU7DL2eMlBc6mpR_k1ZvIUw",{"id":36279,"title":36280,"authors":36281,"body":36282,"category":3550,"createdAt":290,"date":36269,"description":36383,"extension":8,"featured":294,"image":36384,"isDraft":294,"link":290,"meta":36385,"navigation":7,"order":296,"path":36386,"readingTime":11180,"relatedResources":290,"seo":36387,"stem":36388,"tags":36389,"__hash__":36390},"blogs\u002Fblog\u002Fstreamnative-cloud-console-2023-beginners-guide.md","A Beginner’s Guide to the StreamNative Cloud Console in 2023",[32707],{"type":15,"value":36283,"toc":36376},[36284,36287,36290,36293,36297,36300,36307,36311,36314,36320,36324,36327,36333,36337,36340,36346,36350,36353,36359,36361,36364,36374],[48,36285,36286],{},"As a next-generation streaming technology, Apache Pulsar has many features representing a huge improvement over Kafka and other streaming technologies. Better performance and scalability, cost savings thanks to multi-tenancy, and tiered storage are just a few.",[48,36288,36289],{},"With the new StreamNative Hosted QuickStart, it is easier than ever to get started building applications atop Pulsar. Now, application developers have access to a concise two-minute video for each component of the StreamNative Hosted onboarding experience, allowing them to quickly grasp the essential knowledge needed to get started on StreamNative Hosted.",[48,36291,36292],{},"The StreamNative Hosted QuickStart videos provide comprehensive insights into every aspect of the onboarding process. From cluster setup to sending messages, each video offers a concise yet comprehensive overview of the corresponding component. By watching these videos, developers can gain a clear understanding of the steps involved, allowing them to accelerate their progress and start utilizing the full capabilities of StreamNative Hosted in no time.",[40,36294,36296],{"id":36295},"create-a-service-account","Create a Service Account",[48,36298,36299],{},"Service accounts are specialized accounts that let developers connect their applications to Pulsar clusters. After creating a service account and using it in an application, they can then produce and consume messages from Pulsar topics. This video walks developers through service account creation.",[48,36301,36302],{},[55,36303,36306],{"href":36304,"rel":36305},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KMVTz_yx3Ds&list=PL7-BmxsE3q4WHbm0eTZBjwy_R6CI9vH4j&index=1&pp=iAQB",[264],"watch video",[40,36308,36310],{"id":36309},"deploy-a-pulsar-cluster","Deploy a Pulsar cluster",[48,36312,36313],{},"While provisioning and deploying an open-source Pulsar can be tricky, on StreamNative Hosted it is as simple as clicking through a few settings. This video walks developers through connecting a cloud provider (either AWS or Google Cloud), what each setting means, and what settings to use, given the scale of messages a developer will send to Pulsar.",[48,36315,36316],{},[55,36317,36306],{"href":36318,"rel":36319},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=3q-xVJzL7Ok&list=PL7-",[264],[40,36321,36323],{"id":36322},"view-the-default-tenants-and-namespaces-and-learn-about-multi-tenancy","View the default tenants and namespaces, and learn about multi-tenancy",[48,36325,36326],{},"Tenants let multiple teams securely use the same Pulsar cluster without interfering with each other. This video explains how to view and understand tenants and namespaces, allowing each team within an organization to control and manage their own resources.",[48,36328,36329],{},[55,36330,36306],{"href":36331,"rel":36332},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eh1OcIl9cR4&list=PL7-BmxsE3q4WHbm0eTZBjwy_R6CI9vH4j&index=3",[264],[40,36334,36336],{"id":36335},"create-a-topic","Create a topic",[48,36338,36339],{},"Topics are how messages are grouped within Pulsar clusters, and understanding them is essential to getting the most value out of Pulsar. This video introduces the concept of topics and how to create and manage them on StreamNative Hosted.",[48,36341,3931,36342],{},[55,36343,36306],{"href":36344,"rel":36345},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=jOHeTfdP3_g&list=PL7-BmxsE3q4WHbm0eTZBjwy_R6CI9vH4j&index=4&pp=iAQB",[264],[40,36347,36349],{"id":36348},"send-messages-to-a-topic","Send messages to a topic",[48,36351,36352],{},"Once a cluster is set up with topics and a service account is configured, it’s time to start sending messages! This video walks through setting up a Pulsar client via a client library or CLI tool and using the client to send messages.",[48,36354,36355],{},[55,36356,36306],{"href":36357,"rel":36358},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=WU9A10befA8&list=PL7-BmxsE3q4WHbm0eTZBjwy_R6CI9vH4j&index=5&pp=iAQB",[264],[48,36360,3931],{},[48,36362,36363],{},"Developers can come back to these videos whenever they need a refresher or want to explore more advanced features.",[48,36365,36366,36367,36370,36371,20076],{},"To learn more about StreamNative Cloud, ",[55,36368,36369],{"href":32238},"reach out to book a demo",". Or - check out ",[55,36372,36373],{"href":33995},"our recent blog post about future-proofing Kafka Applications with StreamNative Cloud",[48,36375,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":36377},[36378,36379,36380,36381,36382],{"id":36295,"depth":19,"text":36296},{"id":36309,"depth":19,"text":36310},{"id":36322,"depth":19,"text":36323},{"id":36335,"depth":19,"text":36336},{"id":36348,"depth":19,"text":36349},"Discover StreamNative Hosted QuickStart with Apache Pulsar: Better performance, multi-tenancy, and easy onboarding. Watch concise videos for cluster setup, service accounts, topics, and more!","\u002Fimgs\u002Fblogs\u002F65651b43a218e58244845efe_image-10.png",{},"\u002Fblog\u002Fstreamnative-cloud-console-2023-beginners-guide",{"title":36280,"description":36383},"blog\u002Fstreamnative-cloud-console-2023-beginners-guide",[3550,799,821,27847,5954],"FwvyJWEJ40gkW0OOX2u3CpxsTejE3wMhfFukgGmiZO0",{"id":36392,"title":36393,"authors":36394,"body":36395,"category":7338,"createdAt":290,"date":36514,"description":36393,"extension":8,"featured":294,"image":36515,"isDraft":294,"link":290,"meta":36516,"navigation":7,"order":296,"path":36517,"readingTime":33691,"relatedResources":290,"seo":36518,"stem":36519,"tags":36520,"__hash__":36521},"blogs\u002Fblog\u002Fannouncing-speakers-and-agenda-for-pulsar-summit-north-america-2023.md","Announcing Speakers and Agenda for Pulsar Summit North America 2023!",[31718],{"type":15,"value":36396,"toc":36509},[36397,36400,36403,36412,36416,36421,36424,36427,36432,36435,36438,36443,36446,36449,36453,36468,36474,36478,36507],[48,36398,36399],{},"We are excited to announce our amazing speakers for Pulsar Summit North America 2023!",[48,36401,36402],{},"Join the Apache Pulsar community in person on October 25 at the Hotel Nikko for a full day of knowledge sharing, exciting announcements, and connecting with the vibrant Pulsar community.",[48,36404,36405,36406,36411],{},"The Pulsar Summit gathers developers, architects, and data engineers to discuss the latest in real-time data streaming and message queuing. Past Pulsar Summits have featured more than 200 interactive sessions presented by tech leaders from Intuit, Micro Focus, Salesforce, Splunk, Verizon Media, Tencent, and more. The Summits garnered 2,000+ global attendees representing top technology, fintech, and media companies, such as Google, Amazon, eBay, Microsoft, American Express, LEGO, Athena Health, Paypal, and many more.\n‍\n",[55,36407,36410],{"href":36408,"rel":36409},"https:\u002F\u002Fpulsar-summit.org\u002Fevent\u002Fnorth-america-2023\u002Fschedule",[264],"Full agenda here\n‍","\nThis year, Pulsar Summit North America will include tech deep dives, adoption stories, best practices, and insights into Pulsar’s global adoption and thriving community. Take a sneak peek below at a few of the featured sessions:",[40,36413,36415],{"id":36414},"featured-sessions","Featured Sessions",[1666,36417,36418],{},[324,36419,36420],{},"Streaming Machine Learning with Flink, Pulsar & Iceberg‍",[48,36422,36423],{},"David Christle, Staff Machine Learning Engineer, Discord Inc.",[48,36425,36426],{},"Learn how Discord leverages Pulsar along with Flink and Iceberg to power real-time machine learning applications for fighting abuse at scale & keeping over 150M active users safe. Together, the three technologies unlock faster feature engineering, backfilling, point-in-time accuracy, and minimize offline-online skew, making this architecture compelling for practical real-time ML in production.",[1666,36428,36429],{"start":19},[324,36430,36431],{},"The CAP Dilemma of Cloud-based Streaming Data Systems: Cost, Availability, Performance",[48,36433,36434],{},"Sijie Guo, Co-founder and CEO, StreamNative",[48,36436,36437],{},"Navigating the selection of a suitable streaming data technology for the cloud can be complex due to the myriad of factors involved. Among these, Cost, Availability, and Performance stand out as pivotal. Yet, a streaming data system in the cloud typically allows optimization in only two of these areas at once. In this session, we'll delve into the evolved CAP paradigm for Streaming Data systems in the cloud, analyze the behavior of various streaming data systems, and explore how Pulsar offers versatile solutions to cater to diverse needs.",[1666,36439,36440],{"start":279},[324,36441,36442],{},"A Journey to Deploy Pulsar on Cisco's Cloud Native IoT Platform",[48,36444,36445],{},"Alec Hothan, Principal Engineer, Cisco\nChandra Ganguly, Senior Director, Software Engineering, Cisco",[48,36447,36448],{},"Learn how Cisco, with one of the largest IoT platforms in the world, modernized this platform to be cloud native GitOps where Pulsar is replacing legacy message queue services and is deployed to multiple Kubernetes clusters. Discover how leveraging Pulsar contributed to dramatic performance improvement, operational overhead and OPEX reduction, including how they addressed disaster recovery, integrating with the cloud native platform, and integrating with the CI\u002FCD pipeline to allow seamless and fast Pulsar upgrades from the time a new release is available to rolling upgrades on test and production clusters using FluxCD.",[40,36450,36452],{"id":36451},"two-ways-to-participate","Two ways to participate:",[1666,36454,36455,36461],{},[324,36456,36457],{},[55,36458,36460],{"href":35537,"rel":36459},[264],"Sign up to attend.",[324,36462,36463,36464,20076],{},"Become a sponsor! Learn how your company can stand out as a thought leader and be highly visible to the Apache Pulsar community by becoming a Summit Sponsor. Secure your spot while they’re available ",[55,36465,267],{"href":36466,"rel":36467},"https:\u002F\u002Fpulsar-summit.s3.us-west-2.amazonaws.com\u002Fpulsar-north-america-2023-sponsorship-deck.pdf",[264],[48,36469,36470],{},[55,36471,36473],{"href":35537,"rel":36472},[264],"Register Now!",[40,36475,36477],{"id":36476},"more-resources","More Resources",[1666,36479,36480,36493],{},[324,36481,36482,36483,1154,36488,36492],{},"Apache Pulsar Training: Take the ",[55,36484,36487],{"href":36485,"rel":36486},"https:\u002F\u002Fwww.academy.streamnative.io\u002Ftracks",[264],"self-paced Pulsar courses",[55,36489,36491],{"href":36490},"\u002Ftraining\u002F","instructor-led Pulsar training"," developed by the original creators of Pulsar. This will get you started with Pulsar and help accelerate your learning.",[324,36494,36495,36496,36501,36502,36506],{},"Meet the Apache Pulsar community: ",[55,36497,36500],{"href":36498,"rel":36499},"https:\u002F\u002Fpulsar.apache.org\u002Fcommunity#section-welcome",[264],"Subscribe to the Pulsar mailing lists"," for user-related or Pulsar development discussions. You can also ",[55,36503,36505],{"href":31692,"rel":36504},[264],"join the Pulsar Slack"," for quick questions or join discussions on specialized topics.",[48,36508,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":36510},[36511,36512,36513],{"id":36414,"depth":19,"text":36415},{"id":36451,"depth":19,"text":36452},{"id":36476,"depth":19,"text":36477},"2023-09-25","\u002Fimgs\u002Fblogs\u002F6494540e4fe43cd7a1b56d49_image-4.png",{},"\u002Fblog\u002Fannouncing-speakers-and-agenda-for-pulsar-summit-north-america-2023",{"title":36393,"description":36393},"blog\u002Fannouncing-speakers-and-agenda-for-pulsar-summit-north-america-2023",[5376,821,303],"LvNBHQL18gfpD1AcZEz7k1Q46XcgHzSooQ4gChk0m5U",{"id":36523,"title":34033,"authors":36524,"body":36527,"category":821,"createdAt":290,"date":37150,"description":37151,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":37152,"navigation":7,"order":296,"path":34032,"readingTime":31039,"relatedResources":290,"seo":37153,"stem":37154,"tags":37155,"__hash__":37156},"blogs\u002Fblog\u002Fextensible-load-balancer-pulsar-3-0.md",[36525,36526],"Heesung Sohn","Kai Wang",{"type":15,"value":36528,"toc":37132},[36529,36540,36543,36552,36556,36559,36562,36571,36575,36578,36581,36584,36587,36590,36593,36597,36600,36603,36606,36609,36612,36615,36619,36628,36632,36641,36644,36668,36681,36684,36688,36693,36696,36700,36705,36708,36711,36719,36728,36731,36735,36743,36752,36755,36766,36771,36774,36777,36780,36783,36787,36790,36799,36802,36810,36819,36822,36826,36829,36834,36837,36840,36845,36848,36851,36856,36859,36862,36867,36870,36873,36875,36879,36882,36885,36888,36891,36896,36899,36902,36907,36910,36918,36920,36923,36926,36931,36934,36942,36944,36947,36950,36953,36956,36961,36963,36966,36969,36974,36977,36980,36988,36991,36993,37001,37003,37006,37009,37011,37016,37018,37023,37025,37033,37036,37039,37047,37050,37052,37055,37060,37067,37070,37075,37078,37086,37089,37097,37100,37105,37107,37110,37118,37127,37130],[916,36530,36531],{},[48,36532,36533,36534,36539],{},"If you use the StreamNative Platform, refer to ",[55,36535,36538],{"href":36536,"rel":36537},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Fbroker-lb",[264],"this guide"," for steps to activate or update the Extensible Load Balancer. For those on the StreamNative Cloud, please reach out to the support team for help.",[40,36541,7347],{"id":36542},"intro",[48,36544,36545,36546,36551],{},"We are thrilled to introduce our latest addition to the Apache Pulsar version 3.0, ",[55,36547,36550],{"href":36548,"rel":36549},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F16691",[264],"Extensible Load Balancer",", which improves the existing Pulsar Broker Load Balancer. For those seeking more details, in this blog, we're sharing the specifics of the enhancements and the obstacles we've overcome during the implementation process.",[32,36553,36555],{"id":36554},"what-is-the-pulsar-broker-load-balancer","What is the Pulsar Broker Load Balancer?",[48,36557,36558],{},"The Pulsar Broker Load Balancer is a component within the Apache Pulsar messaging system. The Pulsar’s compute-storage separation architecture enables the Pulsar Broker Load Balancer to seamlessly balance groups(bundles*) of topic sessions among brokers without involving message copies. This helps ensure efficient broker resource utilization, prevents individual brokers' overloading or underloading, and provides fault tolerance by promptly redistributing the orphan workload to available brokers.",[48,36560,36561],{},"Topics are grouped into bundles in Pulsar, and Bundle is the broker load balancer unit.",[48,36563,36564,36565,36570],{},"The Pulsar community has recently made notable improvements to its ",[55,36566,36569],{"href":36567,"rel":36568},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Fconcepts-broker-load-balancing-overview\u002F",[264],"load balancer documentation",". We recommend looking at the updated documentation if you're interested in this topic.",[32,36572,36574],{"id":36573},"why-do-we-introduce-a-new-load-balancer","Why do we introduce a new load balancer?",[48,36576,36577],{},"Legacy Maintenance Issues: Over time, the old load balancer's architecture introduced a long-standing maintenance challenge due to its historical design decisions. The old load balancer's design might not have been as modular as desired, making introducing new architectures, strategies, or logic challenging without affecting the existing functionality. This lack of modularity could hinder experimenting with improvements and innovations. Keeping up with maintenance might have become increasingly difficult, leading to the need for a more modern and manageable solution. This complexity could hinder implementing new features or fixing issues quickly.",[48,36579,36580],{},"Scalability Issues: As Pulsar clusters grew with more brokers and topics, the load balancer faced challenges in efficiently distributing metadata(including load balance data). The mechanism for replicating load data across brokers via metadata store(e.g. ZooKeeper) watchers became less scalable, resulting in potential performance bottlenecks and increased replication overhead.",[48,36582,36583],{},"Load Balancing Strategy: The previous load balancing strategy might have needed to have been more optimal for evenly distributing the workload, especially when dealing with dynamic load changes in adding or removing brokers.",[48,36585,36586],{},"Topic Availability During Unloading: The old load balancer might have led to resource access conflicts, causing longer temporary unavailability of topics during the unloading process, affecting the user experience and resource utilization.",[48,36588,36589],{},"Centralized Decision Making: Only the leader broker makes load balance decisions in the previous load balancer. This centralized approach could create bottlenecks and limit the system's ability to distribute the workload efficiently.",[48,36591,36592],{},"Operation: Sometimes, debugging the load balance decisions could have been clearer. Observability needs to be improved.",[32,36594,36596],{"id":36595},"how-do-we-solve-the-problems-with-the-new-load-balancer","How do we solve the problems with the New Load Balancer?",[48,36598,36599],{},"Legacy Maintenance Challenges: The new load balancer is written with new classes with a cleaner design. This will facilitate easier maintenance, updates, and the integration of new features without disrupting the existing functionality. This enhances the system's manageability and adaptability over time.",[48,36601,36602],{},"Scalability Issues: To overcome scalability challenges, the new load balancer stores load and ownership data in Pulsar native topics and reads them via Pulsar table views. This reduces replication overhead and potential bottlenecks, ensuring smooth load data distribution even in larger Pulsar clusters.",[48,36604,36605],{},"Load Balancing Strategy: All load balancing strategies(assignment, unloading, and splitting) are revisited with the new load balancer to ensure better workload distribution. It adapts to dynamic changes and efficiently handles new broker additions and deletions, resulting in a more balanced and optimized distribution of tasks. Load balance operations and states’ idempotency have been revisited when retrying upon failures.",[48,36607,36608],{},"Topic Availability During Unloading: The new load balancer minimizes topic unavailability during unloading by pre-assigning the owner broker and gracefully transferring ownership with the bundle transfer option. This minimizes resource access conflicts and reduces temporary topic downtime, enhancing user experience.",[48,36610,36611],{},"Centralized Decision Making: The new load balancer explores decentralized decision-making (assignment and splitting), distributing load balance decisions to local brokers as much as possible rather than relying solely on a central leader. This minimizes bottlenecks, enabling more efficient and distributed workload management.",[48,36613,36614],{},"Operation: Besides, the new load balancer also introduces a new set of metrics and load balancer debug-mode dynamic config to print more useful load balance decisions in the logs.",[32,36616,36618],{"id":36617},"how-do-we-enable-the-new-load-balancer","How do we enable the New Load Balancer?",[48,36620,36621,36622,36627],{},"The community updated the ",[55,36623,36626],{"href":36624,"rel":36625},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-migration\u002F",[264],"load balancer migration steps"," on the Pulsar website to explain how to migrate from the modular load balancer to the extensible load balancer and vice versa.",[40,36629,36631],{"id":36630},"extensible-load-balancer-design","Extensible Load Balancer Design",[48,36633,36634,36635,36640],{},"To summarize, the ",[55,36636,36639],{"href":36637,"rel":36638},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-types\u002F",[264],"modular (current) and extensible (new) load balancers"," implement similar load balancing functionalities with different system designs.",[48,36642,36643],{},"For example, they both employ a similar approach to distributing data loads among brokers, including:",[321,36645,36646,36654,36661],{},[324,36647,36648,36649],{},"Dynamic ",[55,36650,36653],{"href":36651,"rel":36652},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-concepts\u002F#bundle-assignment",[264],"bundle-broker assignment",[324,36655,36648,36656],{},[55,36657,36660],{"href":36658,"rel":36659},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-concepts\u002F#bundle-splitting",[264],"bundle splitting",[324,36662,36648,36663],{},[55,36664,36667],{"href":36665,"rel":36666},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-concepts\u002F#bundle-unloading",[264],"bundle unloading (shedding)",[48,36669,36670,36671,4003,36676,190],{},"However, for bundle ownership and load data stores, the modular load balancer uses a configurable metadata store (e.g., ZooKeeper), whereas the extensible load balancer uses Pulsar native ",[55,36672,36675],{"href":36673,"rel":36674},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F#system-topic",[264],"System topics",[55,36677,36680],{"href":36678,"rel":36679},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-clients\u002F#tableview",[264],"Table views",[48,36682,36683],{},"Table View has been introduced to Pulsar since 2.10, which provides a continuously updated key-value map view of the compacted topic data. This innovation greatly simplifies the new load balancer’s data architecture since each broker needs to publish load data to non-persistent (in-memory) system topics and replicate the latest views on table views. Similarly, for bundle ownership data, each broker can publish the ownership change messages to a persistent system topic and replicate the latest views on the table views.",[32,36685,36687],{"id":36686},"load-data-flow","Load Data Flow",[48,36689,36690],{},[384,36691],{"alt":18,"src":36692},"\u002Fimgs\u002Fblogs\u002F650df06a725d024ec33df194_179900738-b492415f-713a-4860-84ef-ab2aa8577240.png",[48,36694,36695],{},"The exchange of load data holds significant importance in achieving optimal load balancing, as incorrect or sluggish load data can negatively affect balancing efficiency. In this new design, brokers periodically share their broker load and top k bundle load data by publishing them to separate in-memory system topics. Each broker utilizes this broker load data for assignments and its local bundle load data for splitting, while the leader broker triggers the global bundle unloading based on both global broker and bundle load data. This new design decouples load data stores depending on the use cases to clean the data model and ensure the modularity of the load-balancing system.",[32,36697,36699],{"id":36698},"bundle-state-channel","Bundle State Channel",[48,36701,36702],{},[384,36703],{"alt":18,"src":36704},"\u002Fimgs\u002Fblogs\u002F650df08022183662e00eea8e_220518178-dadb7c34-f4c2-45ec-a85c-fa9f2ab1b2c3.png",[48,36706,36707],{},"The new load balancer introduced a bundle state machine like the above to define the possible states and transitions in the bundle(group of topics) life cycle. Also, to communicate these state changes across brokers and react to them, we introduced an event-source channel, Bundle State Channel, where each actor (broker) broadcasts messages containing these state transitions to the system topic and accordingly plays the roles upon received. Since these state changes persist in the system topic, it can ensure persistence, (eventual) consistency, and idempotency of the bundle state changes, even after failing and retrying.",[48,36709,36710],{},"Managing bundle ownership and resolving conflicts among brokers presents challenges. This complexity is heightened when multiple brokers concurrently assign ownership over the same bundle, or when such assignments occur during operations like bundle splitting or unloading. Effective conflict resolution strategies are pivotal in maintaining system integrity. Several approaches are available:",[1666,36712,36713,36716],{},[324,36714,36715],{},"Centralized Leadership Model: This method designates a singular leader among the brokers to oversee conflict resolution. The leader takes charge of resolving ownership conflicts and ensuring uniform state transitions. While this centralizes conflict resolution, it introduces the potential for a single point of failure and potential bottlenecks if the leader becomes overwhelmed.",[324,36717,36718],{},"Decentralized Approach: An alternative is a decentralized model, wherein individual brokers incorporate the identical conflict resolution mechanism. The brokers algorithmically deduce a consistent pathway for state transitions at the cost of additional messages on each broker.",[48,36720,36721,36722,36727],{},"The latter approach is pursued in the present implementation to circumvent reliance on the single leader. This involves embedding conflict resolution logic in each broker with the benefits of the “early broadcast” to defer the client lookups until the ownership is finalized. This could prevent clients from retrying lookups redundantly in the middle of bundle state changes. Also, this conflict resolution logic is straightforward to place on each broker without auxiliary metadata — given the linearized message sequence of this system topic, any message with a valid state transition and version ID will be accepted; otherwise, rejected. To generalize this custom conflict resolution strategy, the Pulsar community introduced a ",[55,36723,36726],{"href":36724,"rel":36725},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F18099",[264],"configurable conflict resolution strategy"," for both topic compaction and table views (only enabled for system topics as of today).",[48,36729,36730],{},"Also, we need to ensure the bundle ownership integrity recovers from disaster cases, such as network failure and broker crashes. Failure of this disaster recovery can cause ownership inconsistency, orphan ownerships, or state changes stuck in in-transit states. To rectify such invalid ownership states, the leader broker listens to any broker unavailability and metadata (ZooKeeper) connection stability and accordingly assigns new brokers. The leader also periodically monitors bundle states and fixes any invalid states that remain too long.",[32,36732,36734],{"id":36733},"transfershedder","TransferShedder",[48,36736,36737,36738,36742],{},"The new load balancer introduced a new shedding strategy, ",[55,36739,36734],{"href":36740,"rel":36741},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-concepts\u002F#transfershedder",[264],". Here, we would like to highlight the following characteristics.",[48,36744,36745,36746,36751],{},"One major improvement is that the bundle transfer option makes the unloading process more graceful. Previously, upon unloading, the modular load balancer relied on clients’ lookups to assign new owner brokers via the leader broker. (note that the modular load balancer has ",[55,36747,36750],{"href":36748,"rel":36749},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F20822",[264],"recently improved this behavior",") However, with this bundle transfer option (by default), TransferShedder pre-assigns new owner brokers and helps clients bypass the client-leader-involved assignment.",[48,36753,36754],{},"Another major algorithmic change is with this transfer protocol, TransferShedder unloads bundles from the next highest load brokers to the next lowest load brokers until all of the following are true:",[321,36756,36757,36760,36763],{},[324,36758,36759],{},"The standard deviation of the broker load distribution is below the configured threshold.",[324,36761,36762],{},"There are no significantly underloaded brokers.",[324,36764,36765],{},"There are no significantly overloaded brokers.",[48,36767,36768],{},[384,36769],{"alt":18,"src":36770},"\u002Fimgs\u002Fblogs\u002F650df0b43c80ac729a203250_image.jpeg",[48,36772,36773],{},"Essentially, the goal is to keep the load distribution under the target at minimal steps. For this, TransferShedder tracks the global load score distribution(Standard Deviation) and tries to keep it lower than the configured threshold, loadBalancerBrokerLoadTargetStd, by moving the loads from the highest to the lowest loaded brokers. If there are any outliers (significantly underloaded or overloaded brokers), it will try to prioritize them to unload.",[48,36775,36776],{},"Also, it helps the load balance convergence. Too aggressive load balancing could often result in infinite unloading or bundle oscillation (bouncing bundles). One example is that if one broker is slightly more overloaded than the others, unloading a bundle from that broker might overload the other broker (again slightly more than others). If the target bundle unloading is not as effective, the logic should stop further unloading to avoid this bundle oscillation. The bundle transfer option enables TransferShedder to consider this case, which helps the load balance convergence.",[48,36778,36779],{},"TransferShedder uses the same methodology for broker load score computation as ThresholdShedder, which is based on the exponential moving average of the max of the weighted resource usages among CPU, memory, and network load. It also introduced the loadBalancerSheddingConditionHitCountThreshold config to further control the sensitivity of unloading decisions when the traffic pattern is spiky. Sometimes, traffic might burst and come down soon, and users might want to avoid triggering unloading. In this case, the user could increase this threshold to make the unloading less sensitive to traffic bursts.",[48,36781,36782],{},"Additionally, the extensible load balancer exposes loadBalancerMaxNumberOfBundlesInBundleLoadReport and loadBalancerMaxNumberOfBrokerSheddingPerCycle configs to control the maximum number of bundles and brokers for each unloading cycle. If users need to slow down the load balance impact and limit the impacted bundles and brokers for each unloading cycle (default 1 min), these configs could help them.",[32,36784,36786],{"id":36785},"operational-improvement","Operational Improvement",[48,36788,36789],{},"Recently, Pulsar improved the bundle unload command to specify the destination broker. This will continue to work for the new load balancer, so if manual unloading is needed, admins could try this command as a one-time resolution.",[48,36791,36792,36793,36798],{},"Operationally, we introduced additional metrics from this new load balancer. The community recently ",[55,36794,36797],{"href":36795,"rel":36796},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Freference-metrics\u002F#loadbalancing-metrics",[264],"updated the metrics page"," to reflect this addition. To summarize, we are trying to show additional breakdown metrics for the decision count grouped by the reason label.",[48,36800,36801],{},"It is also possible to closely monitor the load score for each broker. This will better inform the actual load score used for the load balance decision instead of tracking the root signals, such as memory, CPU, and network load. Additionally, there are other metrics to show what the current load score distribution (avg and std) is.",[321,36803,36804,36807],{},[324,36805,36806],{},"pulsar_lb_resource_usage_stats{feature=max_ema, stat=avg} (gauge) - The average of brokers' load scores.",[324,36808,36809],{},"pulsar_lb_resource_usage_stats{feature=max_ema, stat=std} - The standard deviation of brokers’ load scores.",[48,36811,36812,36813,36818],{},"We added a ",[55,36814,36817],{"href":36815,"rel":36816},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fapache-pulsar-grafana-dashboard\u002Fpull\u002F93",[264],"new sample load balancer dashboard for these metrics here",", so please try it and let us know if you have any questions about how to read them.",[48,36820,36821],{},"Lastly, we added a dynamic config loadBalancerDebugModeEnabled. Often, printing out logs can be the best way to debug issues, and under this debug flag, we tried to put as many decision logs as possible. You can enable this flag without restarting brokers and check the logs for the load balance decisions. This could help to tune the configs. Once the debugging is done, the admin can simply turn off this flag again without restarting brokers.",[40,36823,36825],{"id":36824},"modularcurrent-vs-extensiblenew-load-balancer-performance-tests","Modular(Current) vs. Extensible(New) Load Balancer Performance Tests",[48,36827,36828],{},"We performed four tests to evaluate performance improvement from this new load balancer. These four tests separately ran on the existing load balancer (modular load balancer) and the new one (extensible load balancer). These tests used Puslar-3.0.",[1666,36830,36831],{},[324,36832,36833],{},"Assignment Scalability Test:",[48,36835,36836],{},"Goal: When many clients reconnect, many systems, including Pulsar, suffer from “thundering herd reconnection,” where the brokers are suddenly bombarded by many reconnection(lookup) requests. We expect the shortened lookup path by the new load balancer to help in this scenario.",[48,36838,36839],{},"Methodology: We measure the start and end time to reconnect a large number of publishers(100k) when a large cluster(100 brokers) with many bundles(60k) restarts all brokers in a short time frame(2 mins).",[1666,36841,36842],{"start":19},[324,36843,36844],{},"Assignment Latency Test:",[48,36846,36847],{},"Goal: We are also interested in how the new load balancer improves individual message delays(how quickly an individual message can be re-published) when restarting brokers one by one. Similarly, we expect the shortened lookup path to reduce the latency in this scenario.",[48,36849,36850],{},"Methodology: We compare p99.99 latency of messages(10k partitions, 1000 bundles at 1000 msgs\u002Fs) published when a cluster(10 brokers) restarts brokers one by one.",[1666,36852,36853],{"start":279},[324,36854,36855],{},"Unload Test:",[48,36857,36858],{},"Goal: Automatic Topic(bundle) unloading helps load balancing, especially when scaling brokers up or down because such scaling events suddenly cause load imbalance. We expect the new way of sharing load data, via in-memory non-persistent topics, to propagate load data faster and more lightweight than the Metadata store(ZK). Also, we want to compare a new unloading strategy, TransferShedder, with the current default, ThresholdShedder.",[48,36860,36861],{},"Methodology: We compare time to unload and balance the load(100 bundles, 10k topics\u002F publishers) when a set of brokers joins\u002Fleaves the cluster(5→10, 10→5 broker scaling).",[1666,36863,36864],{"start":20920},[324,36865,36866],{},"Split(Hot-spot) Test:",[48,36868,36869],{},"Goal: Automatic bundle splitting is the other important Pulsar load balance feature when  topics are suddenly overloaded, “hot-spot.” This bundle split can isolate such hot-spot topics by splitting the owner bundles into smaller pieces. The child bundles can be more easily unloaded to other brokers to reduce the load on the issuing broker. We want to measure how the new load balancer can improve this process.",[48,36871,36872],{},"Methodology: We compare the time to split one bundle to 128 bundles and balance the load(10k topics\u002F publishers) when the topics have a high load.",[48,36874,3931],{},[32,36876,36878],{"id":36877},"test-results","Test Results",[48,36880,36881],{},"Assignment Scalability Test Result",[48,36883,36884],{},"100k Publisher Connection Recovery Time",[48,36886,36887],{},"Modular LB",[48,36889,36890],{},"At 12:25, the restart happened",[48,36892,36893],{},[384,36894],{"alt":18,"src":36895},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df1556821f1691a39d5d6_image%20(10).png",[48,36897,36898],{},"Extensible LB",[48,36900,36901],{},"At 09:33, the restart happened",[48,36903,36904],{},[384,36905],{"alt":18,"src":36906},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df1380d6c6e1a2489c0b6_image%20(4).png",[48,36908,36909],{},"Publisher Connection Recovery Time:",[321,36911,36912,36915],{},[324,36913,36914],{},"Modular LB: 20 mins",[324,36916,36917],{},"Extensible LB: 10 mins",[48,36919,3931],{},[48,36921,36922],{},"Assignment Latency Test Result",[48,36924,36925],{},"p99.99 Pub Latency when restarting brokers one by one (total 10 brokers)",[48,36927,36928],{},[384,36929],{"alt":18,"src":36930},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df16c5c0fb7011197b187_image%20(5).png",[48,36932,36933],{},"p99.99 Pub Latency:",[321,36935,36936,36939],{},[324,36937,36938],{},"Modular LB: 1841 ms",[324,36940,36941],{},"Extensible LB: 1228 ms",[48,36943,3931],{},[48,36945,36946],{},"Unload Test Result",[48,36948,36949],{},"‍Modular LB",[48,36951,36952],{},"At 01:38, scaled down from 10 to 5",[48,36954,36955],{},"At 01:58, scaled up from 5 to 10",[48,36957,36958],{},[384,36959],{"alt":18,"src":36960},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df18c9f9e9b8097b4948b_image%20(6).png",[48,36962,36898],{},[48,36964,36965],{},"At 21:58, scaled down from 10 to 5",[48,36967,36968],{},"At 22:08, scaled up from 5 to 10",[48,36970,36971],{},[384,36972],{"alt":18,"src":36973},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df19b0d6c6e1a248a133a_image%20(7).png",[48,36975,36976],{},"Case 1: Time to balance the load from scaling down from 10 brokers to 5 brokers",[48,36978,36979],{},"Time to balance the load:",[321,36981,36982,36985],{},[324,36983,36984],{},"Modular LB: 5 mins",[324,36986,36987],{},"Extensible LB: 3 mins",[48,36989,36990],{},"Case 2: Time to balance the load from scaling up from 5 brokers to 10 brokers",[48,36992,36979],{},[321,36994,36995,36998],{},[324,36996,36997],{},"Modular LB: 7 mins",[324,36999,37000],{},"Extensible LB: 5 mins",[48,37002,3931],{},[48,37004,37005],{},"Split Test Result",[48,37007,37008],{},"Time to balance the load by splitting bundles starting from 1 bundle (up to 128 bundles) and unloading to 10 brokers",[48,37010,36887],{},[48,37012,37013],{},[384,37014],{"alt":18,"src":37015},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df1aa4a302a693609ed53_image%20(8).png",[48,37017,36898],{},[48,37019,37020],{},[384,37021],{"alt":18,"src":37022},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df1b65e7e9bcbec79b1c2_image%20(9).png",[48,37024,36979],{},[321,37026,37027,37030],{},[324,37028,37029],{},"Modular LB: 15 mins",[324,37031,37032],{},"Extensible LB: 13 mins",[48,37034,37035],{},"Also, with loadBalancerBrokerLoadTargetStd=0.1, the new load manager shows a better topic load balance, max - min= 1.1k -779  = 321, than the old load manager’s, 1.6k - 394= 1.2k, which is about 4x better.",[48,37037,37038],{},"Max Topic Count - Min Topic Count:",[321,37040,37041,37044],{},[324,37042,37043],{},"Modular LB: 1.1k -779  = 321",[324,37045,37046],{},"Extensible LB: 1.6k - 394= 1.2k",[48,37048,37049],{},"Please note that the split and unloading cycles occur concurrently. Because of that, unloading could be delayed if the next split occurs faster before unloading. We could further optimize this behavior by splitting the parent bundles in the n-way instead of the current 2-way and immediately triggering unloading post splits. Meanwhile, users could tune loadBalancerSplitIntervalMinutes(default 1min) and loadBalancerSheddingIntervalMinutes(default 1min) if they need to tune those frequencies.",[48,37051,3931],{},[48,37053,37054],{},"Test Result Summary",[48,37056,37057],{},[384,37058],{"alt":18,"src":37059},"\u002Fimgs\u002Fblogs\u002F650dfa4ad3d1705f95192450_Screenshot-2023-09-22-at-10.33.42-PM.png",[48,37061,37062,37063,37066],{},"As we shared earlier",[2628,37064,37065],{},"How do we solve the problems in New Load Balancer?",", the new load balancer implemented the following changes, and we are glad to share that these changes can help the above load balance cases up to 2x better.",[48,37068,37069],{},"Distributed load balance decisions",[321,37071,37072],{},[324,37073,37074],{},"Topic lookup and split decisions on every broker instead of going through the leader",[48,37076,37077],{},"Optimized load data sharing",[321,37079,37080,37083],{},[324,37081,37082],{},"The load data is shared in a shorter path. Broker and bundle load data are shared with other brokers via non-persistent(in-memory) Pulsar system topics instead of involving disk persistence in the metadata store(ZK). This makes load balance decisions more up-to-date. Pulsar takes one step closer to a ZK-less architecture.",[324,37084,37085],{},"The amount of shared load data is minimized. Each broker shares only the top K bundles’ load instead of all, which scales better when there are many bundles. Broker and bundle load data are decoupled into different topics because their update cadence differs with different consumption patterns.",[48,37087,37088],{},"Optimized ownership data sharing",[321,37090,37091,37094],{},[324,37092,37093],{},"Ownership data is shared via a Pulsar system topic instead of via metadata store (ZK).",[324,37095,37096],{},"Bundle ownership transfers(pre-assigns) to other brokers upon unloading and broker shutdown.",[48,37098,37099],{},"Improved Shedding algorithm",[321,37101,37102],{},[324,37103,37104],{},"TransferShedder improves the unloading behavior to redistribute the load with minimal steps.",[40,37106,2125],{"id":2122},[48,37108,37109],{},"Extensible Load Balancer reduced the ZK dependencies in Pulsar by Pulsar native topics and table views. Along with this architectural design change, the test data shows that distributed load balance decisions, optimized load data and ownership data sharing, and new load balance algorithms with the bundle transfer option help to improve the broker load balance performance.",[48,37111,37112,37113,37117],{},"Last year, the pulsar community worked hard to push this load balancer improvement project out to the public, including the ",[55,37114,37116],{"href":36624,"rel":37115},[264],"load balancer docs and migration steps",". We very much appreciate all of the contributors to this project. We are excited to introduce this new load balancer in Pulsar 3.0 with promising performance results.",[48,37119,37120,37121,37126],{},"Furthermore, in addition to this load balancer improvement, there are other innovations in Pulsar 3.0. We strongly recommend checking this ",[55,37122,37125],{"href":37123,"rel":37124},"https:\u002F\u002Fpulsar.apache.org\u002Fblog\u002F2023\u002F05\u002F02\u002Fannouncing-apache-pulsar-3-0\u002F",[264],"Pulsar-3.0 release post",", and we look forward to hearing feedback and contributions from the Pulsar community.",[48,37128,37129],{},"StreamNative proudly holds the position of a major contributor to the development of Apache Pulsar. Our dedication to driving innovation within the Apache Pulsar project remains resolute, and we are steadfast in our commitment to pushing its boundaries even further.",[48,37131,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":37133},[37134,37140,37146,37149],{"id":36542,"depth":19,"text":7347,"children":37135},[37136,37137,37138,37139],{"id":36554,"depth":279,"text":36555},{"id":36573,"depth":279,"text":36574},{"id":36595,"depth":279,"text":36596},{"id":36617,"depth":279,"text":36618},{"id":36630,"depth":19,"text":36631,"children":37141},[37142,37143,37144,37145],{"id":36686,"depth":279,"text":36687},{"id":36698,"depth":279,"text":36699},{"id":36733,"depth":279,"text":36734},{"id":36785,"depth":279,"text":36786},{"id":36824,"depth":19,"text":36825,"children":37147},[37148],{"id":36877,"depth":279,"text":36878},{"id":2122,"depth":19,"text":2125},"2023-09-22","We are thrilled to introduce our latest addition to the Apache Pulsar version 3.0, Extensible Load Balancer, which improves the existing Pulsar Broker Load Balancer. For those seeking more details, in this blog, we're sharing the specifics of the enhancements and the obstacles we've overcome during the implementation process.",{},{"title":34033,"description":37151},"blog\u002Fextensible-load-balancer-pulsar-3-0",[3550,821],"tYJtCICpNx4uOv2GItUHeQ-J-9CWsMePb9psN6NyZRA",{"id":37158,"title":37159,"authors":37160,"body":37162,"category":290,"createdAt":290,"date":37403,"description":37404,"extension":8,"featured":294,"image":37405,"isDraft":294,"link":290,"meta":37406,"navigation":7,"order":296,"path":37407,"readingTime":4475,"relatedResources":290,"seo":37408,"stem":37409,"tags":37410,"__hash__":37411},"blogs\u002Fblog\u002Fintroducing-streamnative-pulsar-operators.md","Introducing StreamNative Operators for Apache Pulsar",[24776,37161],"Gilles Barbier",{"type":15,"value":37163,"toc":37392},[37164,37177,37181,37192,37200,37203,37206,37217,37221,37224,37241,37245,37248,37259,37263,37272,37275,37284,37288,37291,37302,37306,37309,37312,37318,37321,37327,37337,37341,37344,37347,37353,37356,37363,37386],[48,37165,37166,37167,37171,37172,37176],{},"Apache Pulsar is the most advanced data streaming technology available today, however, managing efficiently the different components of a Pulsar cluster can be a complex task. To ease this complexity, StreamNative released the ",[55,37168,37170],{"href":35495,"rel":37169},[264],"StreamNative Operator for Apache Pulsar"," and offered it under a ",[55,37173,37175],{"href":37174},"\u002Fcommunity-licence","free community license",", which allows enterprises to start working with open-source Pulsar without dealing with the initial complexity of operating it so that teams can focus on trying and start implementing applications on top of it.",[40,37178,37180],{"id":37179},"running-apache-pulsar-on-kubernetes","Running Apache Pulsar On Kubernetes",[48,37182,37183,37184,37187,37188,37191],{},"Apache Pulsar is the ideal choice for enterprises that want an open-source, elastic, and multi-tenant data streaming and messaging platform, allowing them to ",[55,37185,37186],{"href":34039},"centralize data management"," and drastically ",[55,37189,37190],{"href":32177},"reduce operation costs"," compared to having multiple clusters of older technologies such as Kafka or RabbitMQ. Well-designed for optimal functionality and flexibility in a containerized environment, its architecture is somehow sophisticated as it ties together multiple open-source projects, in particular Apache BookKeeper for its storage component, and Apache Zookeeper for the management of distributed metadata.",[48,37193,37194,37195,37199],{},"To ease this complexity, StreamNative - the company founded by the original creators of Apache Pulsar - released its battle-tested ",[55,37196,37198],{"href":35495,"rel":37197},[264],"Kubernetes operators for Apache Pulsar",", which embody StreamNative's accumulated expertise by the creators of Apache Pulsar and years of experience managing Pulsar clusters in a Kubernetes environment for large-scale companies like Verizon, Discord, or Iterable.",[48,37201,37202],{},"This set of Kubernetes operators makes it easy to deploy, manage, and scale Apache Pulsar clusters. They provide a declarative API that allows teams to define the Pulsar cluster configuration in Kubernetes manifests, and automatically manage the lifecycle of Pulsar brokers, proxies, and BookKeeper bookies.",[48,37204,37205],{},"The StreamNative Pulsar Operators offer a number of benefits, including:",[321,37207,37208,37211,37214],{},[324,37209,37210],{},"Simplified deployment: The operators automate the deployment of Pulsar clusters, so teams don't have to worry about manually configuring and managing the individual components.",[324,37212,37213],{},"High Availability: StreamNative Pulsar Operators set up clusters in a highly available manner by default. They manage replica placement, broker distribution, and failover mechanisms, ensuring that event streams stay reliable even in the face of failures.",[324,37215,37216],{},"Declarative configuration: The operators use a declarative API, so teams can define the Pulsar cluster configuration in Kubernetes manifests. This makes it easy to manage the Pulsar cluster and to roll back changes if necessary.",[32,37218,37220],{"id":37219},"protocol-ecosystems","Protocol ecosystems",[48,37222,37223],{},"StreamNative Pulsar Operators support pillars of the StreamNative protocol ecosystem: Kafka on Pulsar (KoP), MQTT on Pulsar (MoP), and AMQP on Pulsar (AoP). These projects extend the capabilities of Apache Pulsar by providing compatibility with popular messaging and streaming protocols, offering organizations the flexibility to choose the best fit for their event-driven architectures.",[48,37225,37226,37227,1186,37230,5422,37235,37240],{},"With StreamNative Pulsar Operators, teams can leverage ",[55,37228,35093],{"href":29592,"rel":37229},[264],[55,37231,37234],{"href":37232,"rel":37233},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fmop",[264],"MoP",[55,37236,37239],{"href":37237,"rel":37238},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Faop",[264],"AoP"," to empower organizations to embrace event-driven architectures to open up new avenues for innovation in IoT, real-time communication, and enterprise integration.",[32,37242,37244],{"id":37243},"automated-operations","Automated operations",[48,37246,37247],{},"Like all stateful applications, a BookKeeper cluster on Kubernetes is challenging to manage, but StreamNative Pulsar Operators offer several enhancements to facilitate the bookies' operation:",[321,37249,37250,37253,37256],{},[324,37251,37252],{},"Auto disable autorecovery: when upgrading the bookie’s version or changing the bookie configurations, the operator will automatically disable the autorecovery service in the cluster to avoid unexpected ledger replication and enable the autorecovery back after the upgrade ends.",[324,37254,37255],{},"Auto decommission job: Before removing a bookie pod in the cluster, the StreamNative Pulsar Operator will automatically trigger a decommissioning job to start the ledger replication, avoiding the risk of data loss.",[324,37257,37258],{},"Auto JVM configuration: The operator supports generating the JVM configuration based on the current Pod size with StreamNative Pulsar experts’ best practice.",[32,37260,37262],{"id":37261},"olm-and-openshift-support","OLM and OpenShift support",[48,37264,37265,37266,37271],{},"StreamNative Pulsar Operators support the ",[55,37267,37270],{"href":37268,"rel":37269},"https:\u002F\u002Folm.operatorframework.io\u002F",[264],"Operator Lifecycle Manager"," to provide a simple declarative way to install, manage, and upgrade operators on a cluster.",[48,37273,37274],{},"Also, these operators are certified as Red Hat OpenShift Operators, which benefits:",[321,37276,37277,37280,37282],{},[324,37278,37279],{},"Enterprise-grade security and reliability: Organizations with strict security protocols can confidently use the operators to run Pulsar on OpenShift, knowing the operators meet Red Hat’s standards of security and reliability.",[324,37281,34368],{},[324,37283,34371],{},[32,37285,37287],{"id":37286},"cloud-native-networking","Cloud-native networking",[48,37289,37290],{},"Istio is a popular cloud-native networking platform, and StreamNative Pulsar Operators integrate with Istio to bring a better cloud-native experience for Pulsar users by:",[321,37292,37293,37296,37299],{},[324,37294,37295],{},"Support creating and managing Istio VirtualService and Gateway resources for Pulsar",[324,37297,37298],{},"Support exposing Pulsar Protocol, Kafka Protocol, and MQTT Protocol through the Istio Gateway in a unified way.",[324,37300,37301],{},"Leverage the Istio to provide encrypted traffic communication between Pulsar components",[32,37303,37305],{"id":37304},"getting-started-with-streamnative-pulsar-operators","Getting started with StreamNative Pulsar Operators",[48,37307,37308],{},"A few lines are enough to set and run an operational Pulsar cluster.",[48,37310,37311],{},"Using the Helm to start the StreamNative Pulsar Operators:",[8325,37313,37316],{"className":37314,"code":37315,"language":8330},[8328],"helm repo add streamnative https:\u002F\u002Fcharts.streamnative.io\nhelm repo update\nhelm upgrade --install pulsar-operator streamnative\u002Fpulsar-operator\n",[4926,37317,37315],{"__ignoreMap":18},[48,37319,37320],{},"Using below commands to provision a Pulsar cluster:",[8325,37322,37325],{"className":37323,"code":37324,"language":8330},[8328],"kubectl create ns pulsar\nkubectl apply -f https:\u002F\u002Fraw.githubusercontent.com\u002Fstreamnative\u002Fcharts\u002Fmaster\u002Fexamples\u002Fpulsar-operators\u002Fquick-start.yaml\n",[4926,37326,37324],{"__ignoreMap":18},[48,37328,37329,37330,37333,37334],{},"Look at our ",[55,37331,7120],{"href":35495,"rel":37332},[264]," for more details.\n",[384,37335],{"alt":18,"src":37336},"\u002Fimgs\u002Fblogs\u002Fa.png",[40,37338,37340],{"id":37339},"empowering-enterprises-with-advanced-streaming-technology","Empowering Enterprises with Advanced Streaming Technology",[48,37342,37343],{},"These tools are part of StreamNative's commitment to fostering a robust ecosystem for Apache Pulsar. By simplifying the deployment and management of Pulsar in an enterprise environment, we enable teams across your organization to benefit from this cutting-edge technology. Apache Pulsar offers superior messaging and streaming capabilities, and with the new StreamNative Kubernetes operator, it's more accessible than ever.",[48,37345,37346],{},"As data is increasingly becoming the backbone of successful decision-making and innovation, the value of an advanced, easy-to-manage data streaming technology cannot be overstated. We believe that our contribution will open the doors to new possibilities for businesses seeking to leverage real-time data. It is a significant step forward in making Apache Pulsar the go-to solution for enterprise data streaming needs.",[48,37348,37349,37350,37352],{},"The StreamNative Kubernetes operators for Apache Pulsar are available under a ",[55,37351,37175],{"href":37174},". We invite you to explore how these tools can enhance your data streaming capabilities and help you unlock the full potential of Apache Pulsar in your organization.",[40,37354,3550],{"id":37355},"streamnative-cloud",[48,37357,37358,37359,37362],{},"Some enterprises would want access to Pulsar experts, 24\u002F7 support, and fully managed Pulsar clusters. For those StreamNative proposes ",[55,37360,3550],{"href":37361},"\u002Fproduct",", a fully managed, enterprise-grade version of Pulsar with additional features such as:",[321,37364,37365,37368,37371,37374,37377,37380,37383],{},[324,37366,37367],{},"Enhanced compatibility with the Kafka protocol",[324,37369,37370],{},"Battle-tested Pulsar Functions",[324,37372,37373],{},"Audit Logs",[324,37375,37376],{},"Health detector",[324,37378,37379],{},"Additional Security & Authentication features",[324,37381,37382],{},"Managed connectors (sources & sinks)",[324,37384,37385],{},"In-house console and CLIs to manage your clusters",[48,37387,37388,37389,37391],{},"StreamNative Cloud can be deployed on-premise, fully managed on your public cloud account, or used in SaaS mode. StreamNative has already helped dozens of engineering teams worldwide make the move to Pulsar, ",[55,37390,24379],{"href":6392}," for more information.",{"title":18,"searchDepth":19,"depth":19,"links":37393},[37394,37401,37402],{"id":37179,"depth":19,"text":37180,"children":37395},[37396,37397,37398,37399,37400],{"id":37219,"depth":279,"text":37220},{"id":37243,"depth":279,"text":37244},{"id":37261,"depth":279,"text":37262},{"id":37286,"depth":279,"text":37287},{"id":37304,"depth":279,"text":37305},{"id":37339,"depth":19,"text":37340},{"id":37355,"depth":19,"text":3550},"2023-09-05","Discover StreamNative Pulsar Operators, a set of Kubernetes tools designed to streamline Apache Pulsar cluster deployment and management. With features like automated deployment, high availability, and declarative configuration, managing Pulsar clusters has never been easier.","\u002Fimgs\u002Fblogs\u002F64f75d15e8e697796d63d278_operator.jpg",{},"\u002Fblog\u002Fintroducing-streamnative-pulsar-operators",{"title":37159,"description":37404},"blog\u002Fintroducing-streamnative-pulsar-operators",[821,16985],"_yslhHlJyZ7w4ROvN6eS874VI8QWtUWJLqfR_KRyO4c",{"id":37413,"title":34007,"authors":37414,"body":37416,"category":821,"createdAt":290,"date":37921,"description":37922,"extension":8,"featured":294,"image":37923,"isDraft":294,"link":290,"meta":37924,"navigation":7,"order":296,"path":27695,"readingTime":16196,"relatedResources":290,"seo":37925,"stem":37926,"tags":37927,"__hash__":37928},"blogs\u002Fblog\u002Fhow-pulsars-architecture-delivers-better-performance-than-kafka.md",[37415,31294],"Yiming Zang",{"type":15,"value":37417,"toc":37899},[37418,37420,37423,37426,37430,37433,37437,37440,37445,37448,37451,37453,37457,37460,37465,37468,37471,37474,37482,37485,37491,37494,37498,37501,37504,37518,37522,37525,37529,37532,37535,37546,37550,37553,37556,37561,37564,37568,37573,37576,37579,37582,37585,37588,37590,37593,37597,37600,37603,37607,37610,37613,37617,37620,37623,37626,37629,37640,37643,37651,37655,37658,37663,37666,37670,37673,37678,37681,37695,37698,37700,37704,37708,37711,37716,37719,37724,37727,37730,37738,37742,37745,37750,37753,37758,37761,37764,37767,37770,37781,37784,37789,37791,37795,37798,37801,37815,37818,37838,37841,37845,37848,37851,37854,37868,37873,37876,37879,37882,37885,37887,37889,37892,37895],[40,37419,33228],{"id":33227},[48,37421,37422],{},"In the realm of distributed messaging systems, Apache Kafka and Apache Pulsar stand out as popular choices for high-throughput, real-time data streaming. While both platforms excel in their respective capabilities, Pulsar has garnered attention for its remarkable speed.",[48,37424,37425],{},"This may be surprising given that the architecture of Pulsar is more sophisticated, notably involving the presence of an extra network hop between multiple layers. And yet, despite the presence of a network, Pulsar can outperform Kafka in terms of performance. This article explains how this is possible.",[32,37427,37429],{"id":37428},"understanding-the-architectural-differences","Understanding the Architectural Differences",[48,37431,37432],{},"To comprehend the disparities in performance, it is crucial to examine the architectural variances between Pulsar and Kafka.",[32,37434,37436],{"id":37435},"kafka-architecture","Kafka Architecture",[48,37438,37439],{},"Apache Kafka operates on a three-tier architecture with the presence of Application Clients, Kafka Brokers and ZooKeeper.",[48,37441,37442],{},[384,37443],{"alt":18,"src":37444},"\u002Fimgs\u002Fblogs\u002F64dcd185982c22533d1ddaab_8-H9QT1d7gJ0-21EhE9rrdXVy6HMCUOLlpu7GiT2aTpYt-MivAYRANWyatgLVyKwzxynNxdI_5Lo_55yQhyMZ7f2nBYiaA9rLqoukZdS6NblRpq4yEMrZdZ3cV54P4F861xQWQe2yuKJtyoglYnfo7E.png",[48,37446,37447],{},"Kafka Producer and Consumer clients connect directly to Kafka brokers for reads and writes. Zookeeper serves as the metadata layer providing partition ownership and leader election, which doesn’t directly serve in the critical read or write path.",[48,37449,37450],{},"Additionally, by utilizing Kraft in a Kafka cluster, the traditional need for Zookeeper nodes could be eliminated in Kafka, resulting in a two-tier architecture.",[48,37452,3931],{},[32,37454,37456],{"id":37455},"pulsar-architecture","Pulsar Architecture",[48,37458,37459],{},"In comparison, Apache Pulsar adopts a four-tier architecture comprising Clients, Pulsar Brokers, Apache Bookkeeper, and a configurable Metadata Store(e.g ZooKeeper, Etcd, RocksDB or Oxia). In some cases, we would even need a five-tier architecture with a Pulsar Proxy.",[48,37461,37462],{},[384,37463],{"alt":18,"src":37464},"\u002Fimgs\u002Fblogs\u002F64dcd1851093465ed076eb03_x46m91bNCzjfpvlrhjCSLS8SU2Wzxrxb-YTh8lfggRhxvT9UlflVgS8vLQ_4EUqamP4j-7K6NK9v4zdGGK706_Q0shmd5tWiaJAY1xM4h56E_ZBIWCXB88Vvc8yGDVf4B9j6Wp7fm78jXd5NoPEf7Es.png",[48,37466,37467],{},"Pulsar Client can connect directly to Brokers to produce and consume messages as well as for topic lookup.",[48,37469,37470],{},"Pulsar proxy is an optional gateway component, which can be used when direct connections between Clients and Pulsar brokers are either infeasible or undesirable. For example, StreamNative Cloud adopts Istio in place of Pulsar Proxy to achieve high availability and performance.",[48,37472,37473],{},"Pulsar Brokers is a stateless component which serves mainly for two purposes:",[1666,37475,37476,37479],{},[324,37477,37478],{},"Topic lookup, which tells you which topic partition is owned by which broker",[324,37480,37481],{},"Dispatcher, which transfers data and dispatch them via managed ledger by talking to Bookkeeper",[48,37483,37484],{},"Bookkeeper is simply the storage layer, similar to Kafka brokers in the sense where it persists all the data and serves reads and writes.",[48,37486,37487,37488,37490],{},"Pulsar supports configurable Metadata Store (e.g ZooKeeper, Etcd, RocksDB, Oxia). Historically, Apache ZooKeeper has been the most popular primary Pulsar metadata store. StreamNative recently invented ",[55,37489,5599],{"href":21529},", a  better scalable metadata store. This metadata store is very critical to Pulsar since it is being used for coordination, and storing key metadata information such as topic partition ownership, Bookkeeper ledger metadata, etc.",[48,37492,37493],{},"When comparing the architectures of Pulsar and Kafka, it's noticeable that Pulsar assigns more granular roles to its components. However, intriguingly, Pulsar still manages to outperform Kafka in most scenarios. In the upcoming sections, we'll explore the factors contributing to Pulsar's impressive performance despite this architectural difference.",[40,37495,37497],{"id":37496},"kafka-lack-of-isolation","Kafka - Lack of Isolation",[48,37499,37500],{},"One prominent factor contributing to Kafka's relatively slower performance lies in its design.",[48,37502,37503],{},"Kafka does not inherently provide good IO isolation, leading to substantial interference between read and write traffic.",[321,37505,37506,37509,37512,37515],{},[324,37507,37508],{},"Read and writes can impact each other: Kafka brokers do not have separate dedicated IO threads exclusively for read and write requests. Besides, high write throughput can lead to increased disk and I\u002FO pressure on Kafka brokers which can affect read performance, and vice versa, high read rates from consumers can potentially impact write performance as well.",[324,37510,37511],{},"Lack of Isolation among different topics or partitions: There is no hard separation or resource isolation between topics or partitions, and they can all compete with each other for disk and network IO. This means one hot topic\u002Fpartition can impact its local neighbors' performance a lot in a persistent manner, which makes Kafka difficult to be maintained in a multi-tenant environment",[324,37513,37514],{},"Lack of isolation between disks, cpu and network resources: Kafka brokers act as both a serving layer and the persistence storage layer. You can not scale them out independently, which means you will always hit the bottleneck for one of the resources first, either the disk first or the cpu\u002Fnetwork first. For on-prem users, it’s fairly difficult to tune resource allocation to perfectly fit the use case or traffic pattern. For cloud users, unfortunately, Kafka itself is not very cloud friendly, e.g. running Kafka on K8S is still a challenge in terms of operation.",[324,37516,37517],{},"Tightly coupled partitioning model: Kafka partitions need to be scaled accordingly for the consumer jobs based on the data processing speed.  However, having too many partitions at the same time will hurt batch efficiency as well as compression rate. So it is sometimes hard to tune the number of partitions for a kafka topic.",[40,37519,37521],{"id":37520},"pulsar-better-isolation","Pulsar - Better Isolation",[48,37523,37524],{},"Apache Pulsar has been designed with a focus on achieving strong IO isolation, which is one of its key architectural strengths.",[32,37526,37528],{"id":37527},"separating-compute-from-storage","Separating Compute from Storage",[48,37530,37531],{},"Pulsar’s architecture allows for independent scaling of Brokers for compute and network, and scaling Bookies for disk space or IO, which provides a much better isolation and decoupling between disk and network throughput limitation.",[48,37533,37534],{},"This would be beneficial for certain scenarios, such as:",[321,37536,37537,37540,37543],{},[324,37538,37539],{},"Read heavy or high fan-out, but write throughput is low: You can independently scale out Brokers to handle more reads",[324,37541,37542],{},"Extremely short or long retention: You can dynamically scale in or out bookies based on disk usage you need",[324,37544,37545],{},"Write heavy or high fan-in, but reads are small: You can independently scale out bookies to handle more write throughput",[32,37547,37549],{"id":37548},"segmented-storage","Segmented Storage",[48,37551,37552],{},"Writes to a topic partition are split into segments, which are then stripped across multiple bookie nodes, instead of a single bookie.",[48,37554,37555],{},"This provides a better isolation between topics and partitions so that one hot topic or partition will not keep impacting other topics living on the same node.",[48,37557,37558],{},[384,37559],{"alt":18,"src":37560},"\u002Fimgs\u002Fblogs\u002F64dcd189f25c6bb275c31711_LTbd20CEAXx4_HkXBkvXeLEk8oVeg6fxDapyAKd2HTCgzv8JQD5yu1G1WkPC-0NoWExlATjMODHsU39VBOkDH54bCA0jYCZYxZ7UQaIH2YkzBJWC7ilGX0KdYdssILlnnCHIymEHjw4TVbwJsoGbc88.png",[48,37562,37563],{},"To be more specific, the data of any Pulsar partition will be spread across the whole bookkeeper cluster, unlike Kafka, all of the data of a single partition always stays on the same set of brokers. The benefit of this is that, if a single partition becomes really hot or overloaded, it’s not an issue for Pulsar because load will be spread evenly to all Bookies, but it’s a big problem for Kafka because it will overload the three Kafka brokers which own that partition, and thus cause issues for other topics owned by the same set of brokers.",[32,37565,37567],{"id":37566},"bookkeeper-io-isolation","Bookkeeper IO Isolation",[48,37569,37570],{},[384,37571],{"alt":18,"src":37572},"\u002Fimgs\u002Fblogs\u002F64dcd186c6b49b490adff562_3p_DpJnB_1M_Y-4nYX_Grzk7YdwAK2kno-icx6jNCqcIhBvuV_amY3rS1A8WEmT-6hgxr-Ij2EiNUUvDkSTuoEIVQ1OypZAcxQjBH9W0cwo5ZrpQVNAzNodwDs8b6b-_q_EJez8OH1rGbkcx1qSjh7s.png",[48,37574,37575],{},"Write Ahead Log (Journal)",[48,37577,37578],{},"BookKeeper uses a write-ahead log mechanism for durability. Data is first written to the Journal in sequential order and append-only manner before being persisted to the main ledger disk.",[48,37580,37581],{},"Storage Separation",[48,37583,37584],{},"BookKeeper separates the storage device used for the journal from the main ledger storage and the journal is usually stored on faster and more durable storage (e.g., SSDs) to handle the write-intensive workload effectively.",[48,37586,37587],{},"Tailing reads are always served from the memTable, and only catch up reads will be served from the Ledger Disk and Index Disk. As a result, heavy reads in Bookkeeper will not impact incoming write performance because they are served from different physical disks, and have pretty good isolation.",[32,37589,319],{"id":316},[48,37591,37592],{},"In summary, even though Kafka requires one or two less network hops than Pulsar, the latency overhead brought by read-write interference could potentially be a few magnitudes higher than network hop latency. Therefore, Kafka's performance can be influenced a lot by inefficiencies in other areas rather than the number of network hops.",[40,37594,37596],{"id":37595},"network-hops","Network Hops",[48,37598,37599],{},"Compared to disk performance, network latency is always a bigger concern since the network can become unreliable and thus network latency can go extremely high. A frequent concern raised by Pulsar users revolves around the network could potentially act as a bottleneck in Pulsar's multi-tier architecture.",[48,37601,37602],{},"Now let’s deep dive into this topic and see whether network hops are a real concern for Pulsar or not",[32,37604,37606],{"id":37605},"network-hop-latency-can-be-very-low","Network Hop Latency can be very Low",[48,37608,37609],{},"Network hop latency can be optimized and tuned to a very low level if machines are close to each other or sharing the same network. For example, servers in the same datacenter or region can achieve a ping latency with less than 1ms. EC2 instances on AWS within the same region typically have a ping latency with less than 1ms as well even for hosts in different AZs, and similar for GCP.",[48,37611,37612],{},"With this data in mind, as long as Pulsar components are all deployed within the same region, one or two extra hops would just mean a few extra milliseconds of latency overhead, which can be almost ignorable.",[32,37614,37616],{"id":37615},"write-path","Write Path",[48,37618,37619],{},"Just with the fact that Pulsar has a more complicated multi-layer architecture doesn’t mean it has more hops in its read and write critical paths. The actual end to end latency for publishing and consuming depends mainly on how it works under the hood.",[48,37621,37622],{},"So let’s dive more into how Kafka and Pulsar write paths look like.",[48,37624,37625],{},"Assuming the below replication settings being used:",[48,37627,37628],{},"Kafka Broker:",[321,37630,37631,37634,37637],{},[324,37632,37633],{},"ack=all",[324,37635,37636],{},"replication.factor=3",[324,37638,37639],{},"min.insync.replicas=2",[48,37641,37642],{},"Pulsar:",[321,37644,37645,37648],{},[324,37646,37647],{},"write quorum size=3",[324,37649,37650],{},"ack quorum size=2",[3933,37652,37654],{"id":37653},"kafka-leader-follower-replication","Kafka: Leader Follower Replication",[48,37656,37657],{},"Kafka relies on a poll model, where write requests are sent to the leader broker first. The leader broker then has to wait for ALL of its ISR followers to fetch and replicate data, and acknowledge them back. This is more error prone because one slow follower Broker in the ISR can cause write operations to become significantly slower or even time out, impacting overall performance. And if the Leader Broker is experiencing some slowness, writes will also be impacted.",[48,37659,37660],{},[384,37661],{"alt":18,"src":37662},"\u002Fimgs\u002Fblogs\u002F64dd0b67ab10f4ad122b02a2_KZRZqqxFl1ddQHlqGj9Bm8zCtqrCEejfPhlfwtgLrCfnbtKX0qbSoBJe06GFbm08u0amn32RHCjLyYsykjhso_02lBafqspiePHr7KBKxvprTfRHVS4rAodVwPtsvRvrA9XYuNCEt33J9XtFmh_Xlv8.png",[48,37664,37665],{},"Of course, we can choose to reduce replica.lag.time.max.ms config so that if the follower broker becomes too slow, it will drop out of ISR and then we will be able to satisfy the write requests faster with only 2 brokers. However it also means that you will observe under replicated partitions and constant ISR expand\u002Fshrink operations, which impacts overall durability and reliability.",[3933,37667,37669],{"id":37668},"pulsar-parallel-replication","Pulsar: Parallel Replication",[48,37671,37672],{},"In contrast, when writing to BookKeeper, Pulsar leverages a parallel replication strategy, where it sends the writes to all 3 bookies at the same time waiting for 2 acknowledgments.",[48,37674,37675],{},[384,37676],{"alt":18,"src":37677},"\u002Fimgs\u002Fblogs\u002F64dd0bd577cacd54094f9d86_FezkGPM3pEkWt-5ODFC7ybSL9p3Rp1MvR74OVbTQ-491oAidHEo6XilxjzFVE5pi8SCYvkfaOgZRFORSromTeORfgVelGHMRBh-XnR8QvKZBhf59fFYJjzVsxhbXJ4XuEeifXmchC658q5wslT1USjg.png",[48,37679,37680],{},"Since writes happen in parallel, with regards to write operations, both Pulsar and Kafka have to go through four hops end to end as shown in the above diagrams. Those hops on high level are:",[1666,37682,37683,37686,37689,37692],{},[324,37684,37685],{},"Client send a produce request to Kafka\u002FPulsar broker",[324,37687,37688],{},"Kafka\u002FPulsar Broker replicate data by talking to the other replicas",[324,37690,37691],{},"Kafka\u002FPulsar Broker get the acknowledgement response from other replicas",[324,37693,37694],{},"Kafka\u002FPulsar Broker send the write response back to the client",[48,37696,37697],{},"In conclusion, the major difference between Kafka and Pulsar for writes path is how the underline replication works, and the end to end number of hops are actually the same for both.",[48,37699,3931],{},[32,37701,37703],{"id":37702},"read-path","Read Path",[3933,37705,37707],{"id":37706},"kafka-read-from-leader","Kafka: Read from Leader",[48,37709,37710],{},"Let’s first look at how Kafka fetch requests are being served.",[48,37712,37713],{},[384,37714],{"alt":18,"src":37715},"\u002Fimgs\u002Fblogs\u002F64dcd18595ea02055a076cc2_z1FJ1-iAvP5B5QKCmWxQuHdXMxlPRxIMaxLIeFW0oEuhTgWa_ati9f_eYn6kj7SazBEXNsaivh5Av9byzQi6noHPsXo1OGbjQAx7APw-Hj7Z5UlB4xayDTxkd6gFWuSMI8HmkWyPe2_NdR0RjYkRkEU.png",[48,37717,37718],{},"For fetch requests, the Kafka client will send it to the leader broker and just wait for the leader to send the response back, so there are only 2 hops which is extremely simple.",[48,37720,37721],{},[384,37722],{"alt":18,"src":37723},"\u002Fimgs\u002Fblogs\u002F64dcd18537fd18a6625a4b54_4zSCNuAnTw0UGZiUx1y42GAaHvZ0yHI2UbZJbVbFfXpL_PhOUyKE_gbJcG3ykXs7quv8gZ85FGGZKvnVKKq-1Cil7IYhucmf3M4Sx6HP5paQPhiECig8patnJR-AuCPTTfUI5kM1nN_JGvFh5uj8pBg.png",[48,37725,37726],{},"If we look into what happens inside the leader broker, when the leader broker receives the FetchRequest, it will first try to fetch the batch of records from PageCache, and if it can’t find the data from PageCache, then it will start seeking the data from the disks.",[48,37728,37729],{},"This overall process looks simple, but one critical drawback for this approach is that it has a very tight dependency on the leader broker being healthy and responsive, not being overloaded or having some network issues.",[48,37731,37732,37733,37737],{},"As we know that leader election will only happen when the leader becomes totally unresponsive, but not being slow. Kafka by default doesn’t allow you to read from any follower replicas, so if the leader encounters any failures or becomes slow, it will slow down all the fetch requests by a lot and consumers will start falling behind. Although this can be improved by adopting a specific cross-region replica distribution model such as ",[55,37734,37736],{"href":30953,"rel":37735},[264],"KIP-392",", it is not straightforward to configure and thus it is not very widely adopted by most Kafka users.",[3933,37739,37741],{"id":37740},"pulsar-speculative-read","Pulsar: Speculative Read",[48,37743,37744],{},"Now let’s take a look at how Pulsar serves read requests.",[48,37746,37747],{},[384,37748],{"alt":18,"src":37749},"\u002Fimgs\u002Fblogs\u002F64dcd185982c22533d1ddb50_wVcJ9lOBUgWX8CLWxDYr0poJNti1sxXGIo0duwmFO6PAQtCqPe88-vluDJzL46CLkTzxs-31fkdBjp8sZoODwmDyaqVqVQzlO1WS8ZCtaiByLcRUmRWzHIYqd9xb8bhrNGzbnQPnrcTeVz3voWo1qQ4.png",[48,37751,37752],{},"Similar to the leader concept in Kafka, Pulsar also has an ownership concept for its brokers, so one of the Pulsar brokers will own the partition which the client is trying to consume from. Tailing reads would be served from the Pulsar Broker managed ledger cache, but in case of cache miss, Pulsar Broker will get the data from the Bookies via speculative read.",[48,37754,37755],{},[384,37756],{"alt":18,"src":37757},"\u002Fimgs\u002Fblogs\u002F64dcd185682ccbc3bd32a5ee_wu0lO6b2uiwXURQ0rwSMUNelBwqAF9KMK5plney6Uo-1rn_LiDlCASqpzQdrkbjfgZG6oyUFgl5kr1W-ptgYkD8pS3WJdcHJopc4CcPc2JesNNxiKZ0k0IRYwt5zzB4j80aYBBwrff6sWS0sE-cpGe0.png",[48,37759,37760],{},"As shown in the above graph, Bookies also have their own ledger cache to serve tailing reads, so most tailing read requests still won’t incur any disk read IOPs.",[48,37762,37763],{},"While Pulsar entails one potential additional network hop from the read perspective, it does not necessarily result in slower performance. This is because tailing read requests in Pulsar are primarily served from the Pulsar Broker managed ledger cache, rather than from Bookie storage layer. Thus, the extra network hop does not really impose a significant performance penalty for read operations.",[48,37765,37766],{},"Compared to Kafka, Pulsar's design has its own pros and cons.",[48,37768,37769],{},"Pros",[321,37771,37772,37775,37778],{},[324,37773,37774],{},"In most scenarios, serving tailing reads only need 2 hops for both Kafka and Pulsar",[324,37776,37777],{},"Kafka requests are sticky to the leader, whereas Pulsar can easily unload a partition and switch owner",[324,37779,37780],{},"Pulsar is resilient to single or multiple Bookie failures using speculative reads, where as Kafka can not serve reads when partition is offline",[48,37782,37783],{},"Cons",[321,37785,37786],{},[324,37787,37788],{},"When partition ownership change or catch up reads happens, Pulsar Broker will not have any cache in memory, so all reads will have to be served from Bookies, resulting in extra hops and network bandwidth usage",[48,37790,3931],{},[32,37792,37794],{"id":37793},"latency-sensitivity","Latency Sensitivity",[48,37796,37797],{},"Kafka is actually much more network latency sensitive than Pulsar, which means when there’s a network degradation, e.g. a bad\u002Fslow link, you would more often see latency impact in Kafka than in Pulsar for read and write.",[48,37799,37800],{},"One reason behind this is due to Kafka’s tightly coupled partitioning model. When a single broker becomes slow due to a degraded network, it will impact a lot of topic partitions across multiple brokers in the cluster.",[321,37802,37803,37806,37809,37812],{},[324,37804,37805],{},"As partition leader, its followers might have trouble fetching data for replication.",[324,37807,37808],{},"As a follower, it might have trouble to keep up with other leaders",[324,37810,37811],{},"Consumers can only fetch data from the partition leader, not the followers",[324,37813,37814],{},"Producers can only publish data to the partition leader, not the followers",[48,37816,37817],{},"Pulsar on the other hand is more resilient than Kafka in terms of network failures or degradation, this is benefited from several factors",[321,37819,37820,37823,37826,37829,37832,37835],{},[324,37821,37822],{},"Single slow bookie will have very minimum impact to the overall performance",[324,37824,37825],{},"Parallel Replication",[324,37827,37828],{},"Speculative Reads",[324,37830,37831],{},"Ensemble Change on write failures",[324,37833,37834],{},"Pulsar Broker and partitions are loosely coupled and topic partition can easily be unloaded",[324,37836,37837],{},"Auto Rebalance",[48,37839,37840],{},"Pulsar brokers are stateless, the broker load balancer can easily rebalance the topic ownerships based on the broker's dynamic load. This is also helpful when a broker is unavailable(network-partitioned, down), since orphan topics will be immediately reassigned to the available brokers.",[40,37842,37844],{"id":37843},"connection-limitations","Connection Limitations",[48,37846,37847],{},"Kafka suffers from fan-in connection limitations since, without a proxy layer, all connections are strictly routed to the partition leader. This limitation leads to significant garbage collection problems and restricts scalability to support a high number of producer connections.",[48,37849,37850],{},"To test out the connection limitation for Kafka brokers, we have done some specific benchmarks, in which we keep sending a fixed amount of write traffic, around 300k events\u002Fsecond with average event size of 500 bytes per broker, but through a different number of producer clients.",[48,37852,37853],{},"The Kafka benchmark is performed using the following bare metal setup:",[321,37855,37856,37859,37862,37865],{},[324,37857,37858],{},"CPU: Intel Xeon or AMD EPYC 2.1 GHz processor with 64 cores",[324,37860,37861],{},"Storage: Two 4TB RPM NVMe SSD drives (JBOD)",[324,37863,37864],{},"Memory: 512GB of RAM",[324,37866,37867],{},"Network: 100Gb Nic bandwidth",[48,37869,37870],{},[384,37871],{"alt":18,"src":37872},"\u002Fimgs\u002Fblogs\u002F64dcd185c0039ddb3b4a1c2f_d0ZdivC_Lfp3sTlg9Z_xoiC3cssf7xNtL6SA_3584-toJq6PjxQDj62iz3iRhU2oEMNsA4ztOibCpTQTwTd2onWzrTJqPrxMSg9YNAYYt9R2YvNdcg__wx2K1TP7YcHK8mRE9EDF2yQmUyw1HK7fp0M.png",[48,37874,37875],{},"The above Kafka benchmark result shows that as the total number of fan-in connections for each Kafka broker crossed over 100k, we started to see performance degradation. When the number of connections got close to 150k, we started to see more produce failures as well as high GC time (around 20 to 40 seconds)",[48,37877,37878],{},"Scaling the number of Kafka brokers just for connections is expensive, and it does not help a lot because when using round robin partitioning, each single client instance will still create a dedicated connection to every single broker. So adding more brokers doesn’t change the fact that the whole Kafka cluster can still only support up to around 120k-150k clients. The major reason why performance starts to degrade when connections are high is due to the GC overhead brought by extra connections.",[48,37880,37881],{},"In contrast, Pulsar overcomes this challenge by having stateless Pulsar Brokers as a proxy layer between client and storage, which can be independently scalable to handle fan-in connections.",[48,37883,37884],{},"With Pulsar brokers' stateless characteristic, adding new brokers is relatively cheap, and Pulsar Broker Load Balancer would seamlessly rebalance partitions in a bundle to the new brokers without the need to move data at the bookie layer. As a result, a Pulsar cluster can easily handle many more connections than a Kafka cluster with its more sophisticated architecture.",[48,37886,3931],{},[40,37888,2125],{"id":2122},[48,37890,37891],{},"Despite potential network bottlenecks in its architecture, Pulsar outperforms Kafka in terms of speed in a lot of scenarios. This advantage stems from efficient IO isolation, optimized read and write performance, expandable network bandwidth, and more scalable connection handling.",[48,37893,37894],{},"As both Pulsar and Kafka continue to evolve, it remains crucial for organizations to evaluate their strengths and weaknesses based on specific use cases and requirements. Understanding the nuances of their architectures empowers decision-makers to make informed choices when selecting the most suitable messaging system for their real-time data streaming needs.",[48,37896,37897],{},[34077,37898],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":37900},[37901,37906,37907,37913,37919,37920],{"id":33227,"depth":19,"text":33228,"children":37902},[37903,37904,37905],{"id":37428,"depth":279,"text":37429},{"id":37435,"depth":279,"text":37436},{"id":37455,"depth":279,"text":37456},{"id":37496,"depth":19,"text":37497},{"id":37520,"depth":19,"text":37521,"children":37908},[37909,37910,37911,37912],{"id":37527,"depth":279,"text":37528},{"id":37548,"depth":279,"text":37549},{"id":37566,"depth":279,"text":37567},{"id":316,"depth":279,"text":319},{"id":37595,"depth":19,"text":37596,"children":37914},[37915,37916,37917,37918],{"id":37605,"depth":279,"text":37606},{"id":37615,"depth":279,"text":37616},{"id":37702,"depth":279,"text":37703},{"id":37793,"depth":279,"text":37794},{"id":37843,"depth":19,"text":37844},{"id":2122,"depth":19,"text":2125},"2023-08-16","Pulsar outperforms Kafka in terms of speed in a lot of scenarios. This advantage stems from efficient IO isolation, optimized read and write performance, expandable network bandwidth, and more scalable connection handling. ","\u002Fimgs\u002Fblogs\u002F64dd0f2ac53f5aded86ca0c4_Illustration-2.png",{},{"title":34007,"description":37922},"blog\u002Fhow-pulsars-architecture-delivers-better-performance-than-kafka",[799,27847],"H0WumwRBLCe_azd5noGLXIVx9gqpse2cxYZAxYj-vrQ",{"id":37930,"title":34040,"authors":37931,"body":37933,"category":290,"createdAt":290,"date":38124,"description":38125,"extension":8,"featured":294,"image":38126,"isDraft":294,"link":290,"meta":38127,"navigation":7,"order":296,"path":34039,"readingTime":4475,"relatedResources":290,"seo":38128,"stem":38129,"tags":38130,"__hash__":38131},"blogs\u002Fblog\u002Fcompliance-and-data-governance-with-apache-pulsar-and-streamnative.md",[37161,37932],"Marshall Portwood",{"type":15,"value":37934,"toc":38118},[37935,37938,37941,37944,37958,37961,37970,37974,37977,37997,38000,38004,38007,38048,38051,38054,38058,38061,38093,38096,38098,38101,38104,38107],[48,37936,37937],{},"In today's modern enterprises, engineering teams are confronted with multiple challenges. These include not only meeting strict deadlines but also ensuring adherence to regulatory compliance and establishing robust data governance.",[48,37939,37940],{},"Non-compliance can lead to severe penalties, reputational damage, and loss of customer trust. Therefore, it is imperative for organizations to leverage robust technologies that can facilitate comprehensive data compliance, especially in regulated and compliance-driven industries.",[48,37942,37943],{},"One such technology that has emerged as a powerful tool in this context is Apache Pulsar, an open-source distributed streaming and messaging system originally created at Yahoo and now part of the Apache Software Foundation. Apache Pulsar has become one of the most powerful pieces of technology for those concerned with data compliance:",[321,37945,37946,37949,37952,37955],{},[324,37947,37948],{},"Its multitenancy feature facilitates logical data separation based on teams, applications, or customers.",[324,37950,37951],{},"It enables message replay for verifying data processing activities and long-term data retention to maintain an audit trail.",[324,37953,37954],{},"Its built-in schema registry assures only predefined data schemas are accepted.",[324,37956,37957],{},"It offers fine-grained access control, end-to-end data encryption for secure transportation, and supports multiple enterprise-grade authentication protocols to prevent unauthorized access.",[48,37959,37960],{},"We will discuss how those features meet the most demanding technical requirements for building a compliant data management system today, taking GDPR as an example.",[48,37962,37963,37964,37966,37967,190],{},"Those features are just a few of the reasons that enterprises are turning to Apache Pulsar to solve these issues and choosing ",[55,37965,4496],{"href":10259},", a company founded by the original creators of Apache Pulsar, to help them with an enterprise-grade managed ",[55,37968,37969],{"href":37361},"out-of-the-box solution",[40,37971,37973],{"id":37972},"technical-requirements-for-building-a-compliant-data-management-system-today","Technical Requirements for Building a Compliant Data Management System Today",[48,37975,37976],{},"Building a compliant data management system is a complex but essential task. It requires key technical capabilities to ensure data security, privacy, integrity, and accessibility to comply with regulatory requirements:",[321,37978,37979,37982,37985,37988,37991,37994],{},[324,37980,37981],{},"Data Security: A compliant data management system must have robust security measures in place. This includes encryption of data, strong access controls to prevent unauthorized access and secure data transfer protocols.",[324,37983,37984],{},"Data Privacy: Privacy is a fundamental aspect of data compliance. The system should have measures such as anonymization and pseudonymization to protect sensitive data. It should also ensure secure data storage and provide controls for data subjects to manage their data.",[324,37986,37987],{},"Data Integrity: Ensuring the accuracy, consistency, and reliability of data is crucial. The system should have data validation and integrity checks to prevent data corruption or loss. It should also support data versioning to track changes over time.",[324,37989,37990],{},"Data Retention and Deletion: Regulatory requirements often specify how long certain types of data should be retained and when they should be deleted. The system should have clear data retention and deletion policies and mechanisms to enforce them.",[324,37992,37993],{},"Auditability: A compliant data management system should have comprehensive logging and monitoring capabilities. This allows for auditing and accountability, ensuring that all data processing activities are transparent and traceable.",[324,37995,37996],{},"Data Portability: Regulations like GDPR require that individuals should be able to move, copy or transfer their personal data easily from one IT environment to another. The system should support data portability to comply with such requirements.",[48,37998,37999],{},"Building a compliant data management system with these capabilities can be a challenging task. However, technologies like Apache Pulsar can significantly simplify this process. In the following sections, we will explore how.",[40,38001,38003],{"id":38002},"key-features-of-apache-pulsar-for-data-compliance","Key Features of Apache Pulsar for Data Compliance",[48,38005,38006],{},"Apache Pulsar's design and features make it a powerful tool for building a compliant data streaming and messaging system. Here are some key features contributing to data compliance:",[321,38008,38009,38017,38020,38028,38031,38034,38041],{},[324,38010,38011,38012,38016],{},"Multitenancy: Apache Pulsar's multi tenancy feature allows for logical separation of data within the same Pulsar instance. This means that data from different teams (tenants), applications, or customers ",[55,38013,38015],{"href":38014},"\u002Fblog\u002Fpulsar-isolation-depth-look-how-to-achieve-isolation-in-pulsar","can be isolated"," and remain invisible to others. Different tenants can have different policies. For example, one tenant might require data to be retained for seven years due to regulatory requirements, while another tenant might only need data to be retained for one year. Furthermore, having one Pulsar instance - instead of multiple Kafka clusters for example - inherently reduces risks by allowing a centralized management of data security and privacy.",[324,38018,38019],{},"Message Replay: Apache Pulsar allows for message replay, which means that data can be reprocessed from a certain point in time. Message replay can be used to verify the accuracy of data processing activities. For example, in an audit, it might be necessary to replay messages to verify that all transactions were processed correctly.",[324,38021,38022,38023,38027],{},"Long-term Retention: With Apache Pulsar, data ",[55,38024,38026],{"href":38025},"\u002Fblog\u002Fdeep-dive-into-topic-data-lifecycle-apache-pulsar","can be stored for extended periods"," at a reasonable cost, thanks to its tiered-storage feature. Retaining messages for a certain period allows organizations to have an audit trail of data. This can be crucial for investigations or audits to verify that data processing activities comply with internal policies and external regulations. This feature is mandatory for compliance with regulations that require data to be retained for specific periods.",[324,38029,38030],{},"Schema Registry: Apache Pulsar has a built-in schema registry. Using schemas ensures that only data conforming to a predefined schema is accepted, preventing data corruption. Schemas ensure consistency and reliability of data across multiple producers and consumers through the organization. The schema registry supports schema evolution, which means that schemas can be updated over time while maintaining compatibility with older versions. This is crucial for data compliance as it allows organizations to adapt to changing data requirements while ensuring that older data is still valid and accessible.",[324,38032,38033],{},"Fine-Grained Access Control: Apache Pulsar allows administrators to control who can publish to a topic, who can subscribe to a topic, and who can consume from a topic. This can be configured at the namespace level or the individual topic level, providing a high degree of flexibility and control. This feature is crucial for maintaining data security and privacy, as many regulations require organizations to ensure that only authorized individuals can access certain types of data.",[324,38035,38036,38037,38040],{},"End-to-End Encryption: Apache Pulsar supports ",[55,38038,38039],{"href":34046},"end-to-end encryption of data",". This means that data is encrypted from the point it enters the system until it reaches the intended recipient, safeguarding data privacy and security during transit and storage.",[324,38042,38043,38044,38047],{},"Enterprise-Grade Authentication Protocols: Apache Pulsar supports ",[55,38045,38046],{"href":34046},"multiple authentication providers",", including JWT, Athenz, Kerberos, and TLS. These enterprise-grade authentication protocols prevent unauthorized access to data.",[48,38049,38050],{},"These features of Apache Pulsar not only ensure data compliance but also provide flexibility and scalability, making it a suitable choice for organizations of all sizes.",[48,38052,38053],{},"In the next section, we will discuss how Apache Pulsar aligns with the requirements of the General Data Protection Regulation (GDPR).",[40,38055,38057],{"id":38056},"apache-pulsar-and-gdpr-compliance","Apache Pulsar and GDPR Compliance",[48,38059,38060],{},"The General Data Protection Regulation (GDPR) is a critical regulation in the data privacy landscape that applies to all organizations processing the personal data of individuals in the European Union. It imposes strict requirements on data security, privacy, and governance. Let's explore how Apache Pulsar's features align with these requirements:",[321,38062,38063,38066,38069,38072,38075,38078,38081,38084,38087,38090],{},[324,38064,38065],{},"Data Minimization and Purpose Limitation: GDPR mandates that only necessary data should be collected and processed for specified, explicit, and legitimate purposes. Apache Pulsar's schema registry ensures that only data conforming to a predefined schema is accepted, thereby supporting data minimization. Its multi tenancy feature allows for logical separation of data, ensuring that data is processed only for its intended purpose.",[324,38067,38068],{},"Data Accuracy: GDPR requires that personal data should be accurate and kept up to date. Apache Pulsar's schema validation helps maintain data accuracy by ensuring that only data conforming to the schema is accepted.",[324,38070,38071],{},"Data Security: GDPR requires organizations to implement appropriate technical and organizational measures to ensure data security. Apache Pulsar's features such as access control, end-to-end encryption, and multi tenancy provide robust data security.",[324,38073,38074],{},"Accountability and Transparency: Under GDPR, organizations must be able to demonstrate compliance with data protection principles and provide transparent information to data subjects about how their data is processed. Apache Pulsar's message replay and comprehensive logging capabilities support auditing and transparency.",[324,38076,38077],{},"Data Portability: GDPR gives individuals the right to receive their personal data in a structured, commonly used, and machine-readable format. Apache Pulsar's flexible data processing capabilities make it easy to retrieve and aggregate data of a specific user.",[324,38079,38080],{},"Data Retention: GDPR mandates that personal data should not be retained longer than necessary. Apache Pulsar's long-term retention feature allows organizations to implement and enforce data retention policies.",[324,38082,38083],{},"Right To Erasure: Under Article 17 of the GDPR, individuals have the right to have their personal data erased under circumstances where the data is no longer necessary for the purpose it was originally collected. This can be implemented in a few different ways with Apache Pulsar:",[324,38085,38086],{},"~ Topic deletion: The high cardinality of topics in Apache Pulsar allows for an architecture where individual topics can be dedicated to specific users or customers. By deleting these dedicated topics, organizations can effectively exercise the right to erasure.",[324,38088,38089],{},"~ Encryption techniques: By proactively eliminating the encryption key associated with a particular user's data, an organization effectively renders the user's data irretrievable.",[324,38091,38092],{},"~ Data Retention Policies: Apache Pulsar's ability to set data retention policies at the namespace or topic level can be leveraged for GDPR compliance. For example, by configuring a policy to discard messages immediately after they are consumed, organizations can ensure that personal data is not unnecessarily retained, aligning with GDPR's data minimization principles and right to erasure.",[48,38094,38095],{},"By aligning with these GDPR requirements, Apache Pulsar can help organizations not only achieve compliance but also build trust with their customers by ensuring the protection of their personal data.",[40,38097,2125],{"id":2122},[48,38099,38100],{},"As we have explored in this article, Apache Pulsar, with its robust and flexible features, provides a comprehensive solution to meet the complex requirements of data compliance, even more in highly regulated industries where data lineage, governance, and compliance are critical.",[48,38102,38103],{},"Apache Pulsar's features such as multitenancy, schema registry, long-term retention, message replay, access control, and end-to-end encryption, all contribute to ensuring data security, privacy, and integrity. For example, Apache Pulsar lets you align with the General Data Protection Regulation (GDPR) making it a compelling choice for organizations operating in or dealing with the European Union.",[48,38105,38106],{},"Of course, it's important to remember that while technology provides the tools for data compliance, it is the organization's responsibility to implement and maintain these tools effectively. Compliance is not a one-time task but an ongoing process that requires continuous monitoring, evaluation, and improvement.",[48,38108,38109,38110,38114,38115,38117],{},"Interested by Pulsar? StreamNative helps engineering teams worldwide make the move to Pulsar. Founded by the original creators of Apache Pulsar, StreamNative is one of the leading contributors to the open-source project and the author of the ",[55,38111,38113],{"href":35495,"rel":38112},[264],"StreamNative Operators"," for running Apache Pulsar on Kubernetes, and of ",[55,38116,3550],{"href":37361},", a fully managed service to help teams accelerate time-to-production.",{"title":18,"searchDepth":19,"depth":19,"links":38119},[38120,38121,38122,38123],{"id":37972,"depth":19,"text":37973},{"id":38002,"depth":19,"text":38003},{"id":38056,"depth":19,"text":38057},{"id":2122,"depth":19,"text":2125},"2023-07-24","Article describing how modern enterprises leverage Apache Pulsar for challenges like regulatory compliance, data governance and deadlines. Apache Pulsar is ideal for data compliance in regulated industries.","\u002Fimgs\u002Fblogs\u002F64bed6a5fbd7456e630fde0e_compliance.png",{},{"title":34040,"description":38125},"blog\u002Fcompliance-and-data-governance-with-apache-pulsar-and-streamnative",[7347,821],"ulCDKHezvVtN0EzdaTmQmBWQVsQVb2u6dArBueiGFeY",{"id":38133,"title":38134,"authors":38135,"body":38136,"category":821,"createdAt":290,"date":38433,"description":38434,"extension":8,"featured":294,"image":38435,"isDraft":294,"link":290,"meta":38436,"navigation":7,"order":296,"path":38437,"readingTime":38438,"relatedResources":290,"seo":38439,"stem":38440,"tags":38441,"__hash__":38443},"blogs\u002Fblog\u002Fdeep-dive-into-data-placement-policies.md","Data Placement Policy Best Practices for Apache Pulsar",[809],{"type":15,"value":38137,"toc":38419},[38138,38141,38145,38148,38155,38161,38164,38171,38175,38178,38181,38184,38186,38194,38196,38199,38202,38204,38208,38211,38214,38220,38222,38225,38231,38234,38245,38251,38258,38262,38265,38271,38274,38277,38280,38283,38286,38290,38293,38298,38302,38305,38309,38312,38318,38321,38327,38330,38336,38340,38343,38349,38351,38357,38359,38365,38368,38370,38373,38377,38386],[48,38139,38140],{},"In this article, we’re going to take an in-depth look at data placement policies in Apache Pulsar. But before that, we need to first understand isolation policies in Pulsar. Data placement policies can help us isolate data in Pulsar and achieve different levels of disaster tolerance.",[40,38142,38144],{"id":38143},"isolation-policies-in-apache-pulsar","Isolation policies in Apache Pulsar",[48,38146,38147],{},"For a Pulsar cluster, a Pulsar instance provides services to multiple teams. When organizing resources across multiple teams, you want to make a suitable isolation plan to avoid resource competition between different teams and applications, thus providing high-quality messaging services. In this case, you need to take resource isolation into consideration and weigh your intended actions against expected and unexpected consequences.",[48,38149,38150,38151,190],{},"Pulsar supports isolation at both the broker level and BookKeeper level. As shown in the image below, for broker level isolation, you can divide brokers into different groups and assign different groups to each namespace. In this way, we can bind topics in the namespace to a set of brokers that belong to specific groups. For detailed information, refer to ",[55,38152,38154],{"href":38153},"\u002Fblog\u002Fengineering\u002F2022-05-26-pulsar-isolation-part-4-single-cluster-isolation\u002F","Pulsar Isolation Part IV: Single Cluster Isolation",[48,38156,38157],{},[384,38158],{"alt":38159,"src":38160},"Figure 1. Pulsar isolation - broker level isolation","\u002Fimgs\u002Fblogs\u002F63b3ff329ebbee7d04ca6992_pulsar-isolation.png",[48,38162,38163],{},"Pulsar brokers not only provide message services, but also offer message storage isolation with the help of BookKeeper clients. For message services, we can bind topics to a set of brokers to own them. For message storage isolation, we need to configure BookKeeper data placement policies for BookKeeper clients.",[48,38165,38166,38167,38170],{},"Because this is a large topic, in this article we will mainly focus on BookKeeper’s data placement policy and provide guidance on how to configure these policies with ",[4926,38168,38169],{},"pulsar-admin"," commands.",[40,38172,38174],{"id":38173},"bookie-data-isolation-level","Bookie data isolation level",[48,38176,38177],{},"Bookie data isolation is controlled by the bookie client. For Pulsar, there are two kinds of bookie clients to read and write data. One is on the broker side. Pulsar brokers use bookie clients to read and write topic messages. The other one is on the bookie autoRecovery side. The bookie auditor will check whether ledger replicas fulfill the expected placement policy and the bookie replication worker will write ledger replicas to target bookies according to the configured placement policy.",[48,38179,38180],{},"To enable a placement policy for Pulsar, we should configure it both on the Pulsar broker and BookKeeper auto recovery side. To do so, we can simply use the bin\u002Fpulsar-admin bookies set-bookie-rack command. This command will write the placement policy into ZooKeeper, and both bookie clients on the broker and auto recovery side will read the placement policy from ZooKeeper and apply it.",[48,38182,38183],{},"BookKeeper provides two placement policies:",[48,38185,3931],{},[321,38187,38188,38191],{},[324,38189,38190],{},"RackawareEnsemblePlacementPolicy",[324,38192,38193],{},"RegionAwareEnsemblePlacementPolicy",[48,38195,3931],{},[48,38197,38198],{},"You can use RackawareEnsemblePlacementPolicy and RegionAwareEnsemblePlacementPolicy in all kinds of deployments where the rack is a subset of a region (the former is included in the latter).",[48,38200,38201],{},"Now let’s take a deeper dive into how RackawareEnsemblePlacementPolicy and RegionAwareEnsemblePlacementPolicy work.",[48,38203,3931],{},[32,38205,38207],{"id":38206},"achieving-rack-level-disaster-tolerance","Achieving rack-level disaster tolerance",[48,38209,38210],{},"RackAwareEnsemblePlacementPolicy is a policy that forces different data replicas to be placed in different racks to guarantee data rack-level disaster tolerance. In a data center, we usually have a lot of racks, and each rack has many storage nodes. In production, we need to place different data replicas into different racks to tolerate rack-level failure. We can configure rack information for each bookie node, and RackAwareEnsemblePlacementPolicy can help us avoid rack-level failure.",[48,38212,38213],{},"If you use the RackawareEnsemblePlacementPolicy, you should configure bookie instances with their own rack. The related commands are as follows:",[8325,38215,38218],{"className":38216,"code":38217,"language":8330},[8328],"bin\u002Fpulsar-admin bookies set-bookie-rack --bookie bookie1:3181 --hostname bookie1.pulsar.com:3181 --group group1 --rack \u002Frack1\nbin\u002Fpulsar-admin bookies set-bookie-rack --bookie bookie2:3181 --hostname bookie2.pulsar.com:3181 --group group1 --rack \u002Frack1\nbin\u002Fpulsar-admin bookies set-bookie-rack --bookie bookie3:3181 --hostname bookie3.pulsar.com:3181 --group group1 --rack \u002Frack1\nbin\u002Fpulsar-admin bookies set-bookie-rack --bookie bookie4:3181 --hostname bookie4.pulsar.com:3181 --group group1 --rack \u002Frack1\nbin\u002Fpulsar-admin bookies set-bookie-rack --bookie bookie5:3181 --hostname bookie5.pulsar.com:3181 --group group1 --rack \u002Frack2\n",[4926,38219,38217],{"__ignoreMap":18},[48,38221,3931],{},[48,38223,38224],{},"In Figure 2, the BookKeeper cluster has 4 racks and 13 bookie instances. If a topic is configured with EnsembleSize = 3, WriteQuorum=3, AckQuorum=2, the bookie client will choose 3 racks from the 4 total racks, such as rack 1, rack 3, and rack 4. For each rack, it will choose 1 bookie instance to write, such as bookie 1, bookie 8, and bookie 12.",[48,38226,38227],{},[384,38228],{"alt":38229,"src":38230},"Figure 2. BookKeeper cluster with 4 racks and 13 bookie instances","\u002Fimgs\u002Fblogs\u002F63b3ff33a7bd88ce7e39c31f_rack-placement-example-4.png",[48,38232,38233],{},"If all of the bookie instances in rack 3 and rack 4 have failed and the 3 rack requirements cannot be met, the client will choose bookies for new ledger creation and old ledger recovery based on the EnforceMinNumRacksPerWriteQuorum and MinNumRacksPerWriteQuorum=3 field.",[321,38235,38236,38239,38242],{},[324,38237,38238],{},"If you set EnforceMinNumRacksPerWriteQuorum=true and MinNumRacksPerWriteQuorum=3, the bookie client will fail to choose bookies to write to and throw a BKNotEnoughBookiesException because there are only 2 racks available and MinNumRacksPerWriteQuorum=3 is not fulfilled. This means that the new ledger cannot be created and that the old ledger cannot be recovered.",[324,38240,38241],{},"If you set EnforceMinNumRacksPerWriteQuorum=true and MinNumRacksPerWriteQuorum=2, for ledger recovery, for example, the old ledger’s ensemble is \u003Cbookie1, bookie8, bookie12>. The bookie client will choose 2 bookies from rack1 and rack2, such as bookie2 and bookie7, to place 2 replicas. For new ledger creation, the bookie client will choose 2 bookies from rack1 and rack2, such as bookie1 and bookie6, and for the last replica, it will randomly choose one bookie to place.",[324,38243,38244],{},"If you set EnforceMinNumRacksPerWriteQuorum=false, for ledger recovery, for example, the old ledger’s ensemble is \u003Cbookie1, bookie8 and bookie12>. The bookie client will choose 2 bookies from rack1 and rack2, such as bookie2 and bookie7, to place 2 replicas. For new ledger creation, the bookie client will choose 2 bookies from rack1 and rack2, such as bookie2 and bookie5, and for the last replica, it will randomly choose one bookie to place.",[48,38246,38247],{},[384,38248],{"alt":38249,"src":38250},"Figure 4. BookKeeper cluster with 4 regions","\u002Fimgs\u002Fblogs\u002F63b4000091816246e2adaabf_rack-placement-example.png",[48,38252,38253,38254,38257],{},"If 2 regions fail as shown in Figure 5 (For ledger recovery, for example, the old ledger’s ensemble is \u003Cbookie5, bookie17, bookie21>), the bookie client will choose one bookie from Region A or Region D to replace the failed bookie17. For new ledger creation, the bookie client will choose Region A and Region D to write replicas. In Region A, it will fall back to RackawareEnsemblePlacementPolicy and choose 2 bookie instances from rack1 and rack2. For Region D, it will choose one bookie instance from rack8. In the end, it may choose bookie1, bookie6, and bookie23 to write ledger replicas.\n",[384,38255],{"alt":18,"src":38256},"\u002Fimgs\u002Fblogs\u002F63b40000eaf15016e782c3f6_rack-placement-example-3.png","Figure 5",[40,38259,38261],{"id":38260},"how-data-placement-policies-work","How data placement policies work",[48,38263,38264],{},"The bookie isolation group makes use of the existing BookKeeper rack-aware placement policy. The “rack” concept can be anything (e.g. rack\u002Fregion\u002Favailability zone). In this example, we use the bin\u002Fpulsar-admin bookies set-bookie-rack command to configure the isolation policy.",[8325,38266,38269],{"className":38267,"code":38268,"language":8330},[8328],"bin\u002Fpulsar-admin bookies set-bookie-rack\nThe following options are required: [-b | --bookie], [-r | --rack]\n\nThen we need to update the rack placement information for a specific bookie in the cluster. Note that the bookie address format is `address:port`.\nUsage: set-bookie-rack [options]\n  Options:\n  * -b, --bookie\n      Bookie address (format: `address:port`)\n    -g, --group\n      Bookie group name\n      Default: default\n    --hostname\n      Bookie host name\n  * -r, --rack\n      Bookie rack name\n",[4926,38270,38268],{"__ignoreMap":18},[48,38272,38273],{},"In this command, we can specify the rack name and group name for each bookie. The rack name is used to represent which region or rack this bookie belongs to. You can assign the group name to a specific namespace to achieve namespace-level isolation.",[48,38275,38276],{},"The bin\u002Fpulsar-admin bookies set-bookie-rack command writes the configured placement policy into ZooKeeper, and the bookie clients will get the placement policy from ZooKeeper and apply it when choosing ledger ensembles.",[48,38278,38279],{},"The basic idea of the rack-aware placement policy is this: when the ensemble for a ledger is formed or replicated, the client picks bookies from different racks to reduce the possibility of data unavailability. If bookies from different racks are not available, the policy falls back to choosing randomly across available bookies.",[48,38281,38282],{},"In contrast to the rack-aware policy, the basic idea of the region-aware placement policy is when the ensemble for a ledger is formed or replicated, the client picks bookies from different regions, and for the selected region, it will pick bookies from different racks if more than one ensemble falls into the same region.",[48,38284,38285],{},"Another tip for choosing ensembles is​ to take current disk usage weight into consideration. When the BookKeeper cluster runs for a long time, you may notice that different bookies’ ledger disk usage is unbalanced. You can enable disk weight by setting ​DiskWeightBasedPlacementEnabled=true in conf\u002Fbroker.conf. After enabling disk weight, the bookie client will take the disk usage into consideration when choosing ensembles for the ledger.",[40,38287,38289],{"id":38288},"pulsar-and-bookkeepers-isolation-policy-working-together-for-namespace-isolation","Pulsar and BookKeeper’s isolation policy: working together for namespace isolation",[48,38291,38292],{},"As shown in Figure 1 above, we need to configure three parts if we want to enable the isolation policy for both Pulsar brokers and BookKeeper.",[48,38294,38295,38296,190],{},"For detailed operations, refer to ",[55,38297,38154],{"href":38153},[40,38299,38301],{"id":38300},"how-to-configure-the-placement-policy","How to configure the placement policy",[48,38303,38304],{},"If you want to enable the placement policy, you need to enable it both on the broker and bookie auto recovery side. The following is an example of enabling the region-aware placement policy.",[32,38306,38308],{"id":38307},"enable-the-policy-on-the-broker-side","Enable the policy on the broker side",[48,38310,38311],{},"In conf\u002Fbroker.conf, configure the following field:",[8325,38313,38316],{"className":38314,"code":38315,"language":8330},[8328],"bookkeeperClientRegionawarePolicyEnabled=true\n",[4926,38317,38315],{"__ignoreMap":18},[48,38319,38320],{},"To enable MinNumRacksPerWriteQuorum, configure the following fields:",[8325,38322,38325],{"className":38323,"code":38324,"language":8330},[8328],"bookkeeperClientMinNumRacksPerWriteQuorum=2\nbookkeeperClientEnforceMinNumRacksPerWriteQuorum=true\n",[4926,38326,38324],{"__ignoreMap":18},[48,38328,38329],{},"To enable disk weight placement, configure the following field:",[8325,38331,38334],{"className":38332,"code":38333,"language":8330},[8328],"bookkeeperDiskWeightBasedPlacementEnabled=true\n",[4926,38335,38333],{"__ignoreMap":18},[32,38337,38339],{"id":38338},"enable-the-policy-on-the-bookie-auto-recovery-side","Enable the policy on the bookie auto recovery side",[48,38341,38342],{},"In conf\u002Fbookkeeper.conf, configure the following fields:",[8325,38344,38347],{"className":38345,"code":38346,"language":8330},[8328],"ensemblePlacementPolicy=org.apache.bookkeeper.client.RegionAwareEnsemblePlacementPolicy\nreppDnsResolverClass=org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping\n",[4926,38348,38346],{"__ignoreMap":18},[48,38350,38320],{},[8325,38352,38355],{"className":38353,"code":38354,"language":8330},[8328],"minNumRacksPerWriteQuorum=2\nenforceMinNumRacksPerWriteQuorum=true\n",[4926,38356,38354],{"__ignoreMap":18},[48,38358,38329],{},[8325,38360,38363],{"className":38361,"code":38362,"language":8330},[8328],"diskWeightBasedPlacementEnabled=true\n",[4926,38364,38362],{"__ignoreMap":18},[48,38366,38367],{},"For broker and BookKeeper placement policy configuration, refer to the previous sections.",[40,38369,319],{"id":316},[48,38371,38372],{},"This article gives an overview of Pulsar isolation policies, both on the Pulsar broker and BookKeeper side. We also look at how Pulsar and BookKeeper’s isolation policy can work together for namespace isolation. For the BookKeeper isolation policy, we explain how the rack-aware placement policy and region-aware placement policy work respectively. These data placement policies provide the ability for different level disaster tolerance.",[40,38374,38376],{"id":38375},"more-on-apache-pulsar","More on Apache Pulsar",[48,38378,38379,38380,38385],{},"Pulsar has become ",[55,38381,38384],{"href":38382,"rel":38383},"https:\u002F\u002Fblogs.apache.org\u002Ffoundation\u002Fentry\u002Fapache-in-2021-by-the",[264],"one of the most active Apache projects"," over the past few years, with a vibrant community that continues to drive innovation and improvements to the project.",[321,38387,38388,38394,38406,38412],{},[324,38389,38390,38391,190],{},"Start your on-demand Pulsar training today with ",[55,38392,31914],{"href":31912,"rel":38393},[264],[324,38395,38396,38397,38400,38401,38405],{},"Interested in ",[55,38398,38399],{"href":37361},"fully-managed Apache Pulsar"," with enhanced reliability, tools, and features? ",[55,38402,38404],{"href":38403},"\u002Fthank\u002Fcontact-us","Contact us"," now!",[324,38407,36219,38408,38411],{},[55,38409,38410],{"href":21458},"2022 Pulsar vs. Kafka Benchmark Report"," for a side-by-side comparison of Pulsar and Kafka performance, including tests on throughput, latency, and more.",[324,38413,38414,38415,38418],{},"Watch sessions from ",[55,38416,38417],{"href":35424},"Pulsar Summit San Francisco 2022"," for best practices and the future of messaging and event streaming technologies.",{"title":18,"searchDepth":19,"depth":19,"links":38420},[38421,38422,38425,38426,38427,38431,38432],{"id":38143,"depth":19,"text":38144},{"id":38173,"depth":19,"text":38174,"children":38423},[38424],{"id":38206,"depth":279,"text":38207},{"id":38260,"depth":19,"text":38261},{"id":38288,"depth":19,"text":38289},{"id":38300,"depth":19,"text":38301,"children":38428},[38429,38430],{"id":38307,"depth":279,"text":38308},{"id":38338,"depth":279,"text":38339},{"id":316,"depth":19,"text":319},{"id":38375,"depth":19,"text":38376},"2023-07-16","Learn Pulsar isolation policies on Pulsar broker side and BookKeeper side. Discover how these policies enable different levels of disaster tolerance.","\u002Fimgs\u002Fblogs\u002F63c7f9f43d691125f4ff6429_63b3ff32877620a3217db989_deep-dive-of-data-placement-policies-top-.jpeg",{},"\u002Fblog\u002Fdeep-dive-into-data-placement-policies","11 min read",{"title":38134,"description":38434},"blog\u002Fdeep-dive-into-data-placement-policies",[38442,821],"Tutorials","VbN7NwYWgd4ra7Z9VEhm2aqJHHNpI0BnwWLSGPonidA",{"id":38445,"title":38446,"authors":38447,"body":38448,"category":7338,"createdAt":290,"date":38937,"description":38938,"extension":8,"featured":294,"image":38939,"isDraft":294,"link":290,"meta":38940,"navigation":7,"order":296,"path":33874,"readingTime":16196,"relatedResources":290,"seo":38941,"stem":38942,"tags":38943,"__hash__":38944},"blogs\u002Fblog\u002Fpulsar-virtual-summit-europe-2023-key-takeaways.md","Pulsar Virtual Summit Europe 2023: Key Takeaways",[31294],{"type":15,"value":38449,"toc":38919},[38450,38459,38462,38465,38473,38476,38497,38501,38504,38516,38520,38523,38526,38529,38537,38540,38548,38551,38554,38557,38560,38565,38569,38572,38583,38586,38589,38600,38603,38606,38609,38618,38627,38641,38648,38651,38658,38666,38669,38673,38676,38680,38688,38691,38698,38705,38709,38712,38715,38718,38726,38729,38740,38744,38747,38754,38757,38761,38768,38776,38784,38787,38791,38797,38802,38805,38808,38816,38823,38826,38829,38843,38846,38849,38856,38860,38868,38872,38880,38883,38886,38899,38901,38904,38907,38914,38917],[48,38451,38452,38453,38458],{},"Pulsar ",[36,38454,38455],{},[44,38456,38457],{},"Virtual"," Summit Europe 2023 brought the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.",[48,38460,38461],{},"May 23rd witnessed a remarkable milestone as over 400 attendees from 20+ countries joined the virtual stage to explore the cutting-edge advancements in Apache Pulsar and the real-world success stories of Pulsar-powered companies. This record-breaking turnout at the Pulsar Summit not only demonstrates the surging adoption of Pulsar but also highlights the ever-growing enthusiasm and curiosity surrounding this game-changing technology!",[48,38463,38464],{},"Some key facts:",[321,38466,38467,38470],{},[324,38468,38469],{},"400+ attendees representing 20+ countries",[324,38471,38472],{},"24 speakers from companies, including The Lego Group, Zafin, VMware, Axon, HSL, and more",[48,38474,38475],{},"In this blog post, I will share the key takeaways I gained from the Pulsar Virtual Summit Europe:",[321,38477,38478,38485,38488,38491,38494],{},[324,38479,38480,38484],{},[55,38481,38483],{"href":38482},"\u002Fblog\u002Fpulsar-3-0-is-available-for-testing-on-streamnative-cloud","Pulsar 3.0"," introduces new features that make Pulsar an even better choice as an Enterprise ready technology for real-time event-driven architecture with the introduction of Long-term-support and improvements that allow for supporting millions of topics.",[324,38486,38487],{},"The Pulsar ecosystem continues to expand, with over 10,000 active Slack users and over 600 contributors.",[324,38489,38490],{},"Pulsar's developer experience is continuously improving, with recent enhancements including the availability of Docker images for ARM64 architectures and a revamped website.",[324,38492,38493],{},"Building real-time data pipelines is becoming easier with the availability of low-code transformations and Apache NiFi integration.",[324,38495,38496],{},"Pulsar is widely adopted across various industries, including finance, telecommunications, transport, and manufacturing, as showcased by real-world examples presented during the Pulsar Summit Europe.",[40,38498,38500],{"id":38499},"pulsar-as-an-enterprise-ready-messaging-data-streaming-service","Pulsar as an Enterprise-ready messaging & data-streaming service.",[48,38502,38503],{},"Pulsar has key features that confirm it as an ideal choice for Enterprises, such as multi-tenancy, the ability to handle hundreds of thousands of topics, elasticity, and built-in geo-replication.",[48,38505,38506,38507,4003,38511,38515],{},"At this summit, companies such as the ",[55,38508,38510],{"href":38509},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-challenges-of-hosting-a-pulsar-as-a-service-platform-under-a-shared-responsibility","Lego Group",[55,38512,38514],{"href":38513},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-from-an-asyncapi-definition-to-a-deployed-pulsar-topology-via-gitops","Raiffeisen Bank International (RBI)"," share their experience in adopting and operating a Pulsar cluster as their centralized messaging & streaming service at their company scale. You can read the details in the paragraphs above.",[32,38517,38519],{"id":38518},"lts-with-pulsar-30","LTS with Pulsar 3.0",[48,38521,38522],{},"LTS is essential for big companies as it provides stability, security, compatibility, and reduced disruption, enabling them to maintain smooth operations, protect data, and effectively manage their software infrastructure.",[48,38524,38525],{},"Matteo Merli, Apache Pulsar PMC Chair & StreamNative CTO, announced Pulsar's Long Term Support model (LTS), starting at the Pulsar 3.0.x release.",[48,38527,38528],{},"The two main goals of LTS are:",[321,38530,38531,38534],{},[324,38532,38533],{},"providing a path for more extended support of releases so that users can upgrade at their pace",[324,38535,38536],{},"and at the same time, providing a path for fast innovation",[48,38538,38539],{},"Depending on what you need, you can decide which Pulsar version you should be using:",[321,38541,38542,38545],{},[324,38543,38544],{},"LTS release for more stability: only fixes are backported",[324,38546,38547],{},"or: feature releases to benefit from improvement and newer features",[48,38549,38550],{},"Using an LTS release, you’ll benefit from bug fixes for up to 24 months & security patches for up to 36 months.",[48,38552,38553],{},"Additionally, the feature releases will be published in a predictable schedule: every three months.",[48,38555,38556],{},"Furthermore, LTS will provide a smoother path for upgrades.",[48,38558,38559],{},"Quoting Matteo Merli:",[916,38561,38562],{},[48,38563,38564],{},"Pulsar 3.0 is a new chapter. I truly believe we are now even better positioned to deliver features & improvements fast & safely.",[32,38566,38568],{"id":38567},"support-of-millions-of-topics-at-the-enterprise-scale","Support of millions of topics at the Enterprise scale",[48,38570,38571],{},"Handling a large volume of topics is a strong requirement for a centralized and multi-tenant messaging platform. Pulsar stands out as the ideal solution for this scenario, as it enables a single cluster to manage over 1 million topics efficiently. This is indeed impressive, but can we push the boundaries even further?",[48,38573,38574,38575,38579,38580,38582],{},"During his ",[55,38576,38578],{"href":38577},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-oxia-scaling-pulsars-metadata-to-100x","captivating keynote presentation at the Pulsar Summit",", Matteo Merli, the Apache Pulsar PMC Chair and StreamNative CTO, unveils the exciting potential of ",[55,38581,5599],{"href":21529},", a new open-source metadata store, to enable Pulsar's remarkable scalability.",[48,38584,38585],{},"In the context of distributed systems, it is crucial to have real-time knowledge of the specific node responsible for serving a particular resource. Furthermore, Pulsar, functioning as a storage system, necessitates the storage of metadata, including ‘pointers’ to other data. To fulfill these requirements, Pulsar heavily relies on a distributed coordination and metadata storage system. This robust system addresses various inquiries, such as determining the assigned broker node for a specific topic or retrieving the data retention policy associated with a given topic, among others.",[48,38587,38588],{},"You have the option to select which metadata provider system to utilize. Since PIP-45, the metadata provider has become pluggable. The primary implementations available are ZooKeeper and etcd. However, both of them have limitations that hinder Pulsar's scalability when it comes to increasing the number of topics:",[321,38590,38591,38594,38597],{},[324,38592,38593],{},"They lack horizontal scalability.",[324,38595,38596],{},"Vertical scaling only offers limited improvements.",[324,38598,38599],{},"Their data storage capacity is restricted to a few gigabytes.",[48,38601,38602],{},"To address these challenges, StreamNative developed Oxia, which resolves metadata and coordination issues on a large scale. Oxia introduces a novel architecture that leverages modern Kubernetes environments.",[48,38604,38605],{},"With a conventional metadata and coordination provider, a single Pulsar cluster can currently handle over 1 million topics. In contrast, Oxia aims to enable a single Pulsar cluster to support more than 100 million topics, which is truly remarkable.",[48,38607,38608],{},"Oxia is an open-source solution and not limited to Pulsar. It can be employed for other distributed coordination and metadata requirements.",[48,38610,38611,1154,38614,38617],{},[55,38612,38613],{"href":38577},"Watch Matteo’s keynote",[55,38615,38616],{"href":21529},"read this blog article"," to learn more.",[48,38619,38620,38621,38626],{},"Dealing with metrics from a vast number of topics can pose significant challenges. To address this, Asaf Mesika from StreamNative has put forward an enhancement proposal for the Pulsar metrics system (",[55,38622,38625],{"href":38623,"rel":38624},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F20197",[264],"PIP-264",") to facilitate monitoring for a large number of topics. The proposed improvements include:",[321,38628,38629,38632,38635,38638],{},[324,38630,38631],{},"Aggregating metrics for groups of topics.",[324,38633,38634],{},"Implementing fine-grained metrics filtering.",[324,38636,38637],{},"Unifying metrics using a standardized naming convention.",[324,38639,38640],{},"Consolidating all existing metrics libraries into a single one, namely OpenTelemetry.",[48,38642,38643,38644,190],{},"For further details, you can watch Asaf's talk at ",[55,38645,38647],{"href":38646},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-the-future-of-metrics-in-apache-pulsar","this link",[48,38649,38650],{},"The proposal shows great promise, and in my opinion, anyone responsible for monitoring a cluster with a high number of topics should closely follow this PIP and consider contributing to it.",[48,38652,38653,38654,38657],{},"Finally, in the ",[55,38655,38656],{"href":38482},"latest release of Pulsar (Pulsar 3.0 LTS)",", various enhancements have been introduced to enhance your ability to manage a larger number of topics. These improvements include:",[321,38659,38660,38663],{},[324,38661,38662],{},"The introduction of a new version of BookKeeper, which greatly enhances throughput and reduces latency, especially in scenarios involving a high number of topics. You can find more information about this in Matteo's keynote announcement.",[324,38664,38665],{},"A more efficient service discovery and session establishment mechanism, enabling newly connected Pulsar clients to initiate message sending and consumption much more quickly.",[48,38667,38668],{},"These updates in Pulsar 3.0 LTS provide significant benefits for managing a higher volume of topics.",[32,38670,38672],{"id":38671},"pulsar-performance-continuously-improves","Pulsar performance continuously improves.",[48,38674,38675],{},"In Pulsar 3.0, an upgraded version of BookKeeper is introduced, resulting in significant enhancements in throughput and latency. These improvements are particularly noticeable when dealing with numerous topics or when message batching is disabled or ineffective. For instance, Pulsar 3.0 achieves twice the throughput compared to previous versions when operating with over 10,000 topics. Additionally, the utilization of Direct IO enhances IO speed, especially in containerized environments.",[40,38677,38679],{"id":38678},"pulsar-has-a-thriving-ecosystem-and-an-engaged-community","Pulsar has a thriving ecosystem and an engaged community.",[48,38681,38682,38683,38687],{},"During the ",[55,38684,38686],{"href":38685},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-pulsar-the-state-of-the-wave","opening keynote"," of the Pulsar Virtual Summit Europe 2023, Sijie Guo, the CEO of StreamNative, highlighted the remarkable growth of the Pulsar community. Starting with just a couple of contributors in its early days, Pulsar now boasts a staggering 600 contributors and is a top-5 Apache Software Foundation project.",[48,38689,38690],{},"Thousands of organizations worldwide have also embraced Pulsar. Moreover, there is a community of over 10,000 Slack members ready to provide assistance.",[48,38692,38693,38694,190],{},"Pulsar benefits from an extensive range of open-source connectors, offloaders, and protocol adapters, allowing smooth integration with various systems. It also offers a comprehensive collection of client libraries, enabling developers to code event-driven applications in their preferred programming language. Furthermore, Pulsar seamlessly integrates with popular open-source processing engines like Apache Flink and Apache Spark. To explore this ecosystem, you can visit the ",[55,38695,38697],{"href":29447,"rel":38696},[264],"StreamNative Hub",[48,38699,38700,38701,190],{},"During the Pulsar Summit Europe, numerous sessions delve into the seamless integration of Pulsar with Spring, Apache Pinot, RisingWave, Nifi, and other technologies. To learn more, you can access the ",[55,38702,38704],{"href":38703},"\u002Fblog\u002Fpulsar-virtual-summit-europe-2023-on-demand-videos-available-now#ecosystem","videos from the Ecosystem track",[40,38706,38708],{"id":38707},"pulsars-elasticity-enables-achieving-optimal-performance-while-maintaining-cost-efficiency","Pulsar's elasticity enables achieving optimal performance while maintaining cost efficiency.",[48,38710,38711],{},"Horizontal scalability is a crucial requirement for any data streaming platform, but it should not be confused with elasticity.",[48,38713,38714],{},"Horizontal scalability involves adding more resources to handle increased workloads. On the other hand, elasticity refers to the ability to quickly adapt to changes in workload by efficiently allocating and deallocating resources, thereby achieving optimal performance at the right cost.",[48,38716,38717],{},"While some data streaming platforms lack elasticity and require careful resource allocation planning in advance, Pulsar stands out with its exceptional elasticity.",[48,38719,38720,38721,38725],{},"In ",[55,38722,38724],{"href":38723},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-scalable-distributed-messaging-streaming-with-apache-pulsar","the first part of his presentation",", Julien Jakubowski, Developer Advocate EMEA at StreamNative, elucidates how Pulsar's sophisticated architecture delivers both scalability and elasticity. He further explores the three levels of elasticity offered by Pulsar.",[48,38727,38728],{},"The load balancer plays a pivotal role in Pulsar's elasticity. In Pulsar 3.0, the community has enhanced the load balancer with a specific focus on elasticity. The new load balancer in Pulsar 3.0 ensures the following:",[321,38730,38731,38734,38737],{},[324,38732,38733],{},"Efficiently balancing traffic within the cluster, even during periods of abrupt workload spikes.",[324,38735,38736],{},"Quickly achieving an optimal state for the cluster.",[324,38738,38739],{},"Maximizing topic availability during reassignments.",[40,38741,38743],{"id":38742},"the-developer-experience-with-pulsar-is-constantly-being-enhanced","The developer experience with Pulsar is constantly being enhanced.",[48,38745,38746],{},"The developer experience is crucial, and Pulsar has benefited from significant improvements in this area.",[48,38748,38749,38750,38753],{},"In his ",[55,38751,38752],{"href":38685},"keynote presentation",", Matteo Merli announced that Docker images are now available for both x86-64 and ARM64 architectures starting from Pulsar 3.0. Developers using Apple Silicon-based MacBooks can now enjoy an enhanced experience as Pulsar docker containers boot and run faster on these devices.",[48,38755,38756],{},"Furthermore, Pulsar has unveiled a brand new website! The credit goes to Emidio Cardeira, Asaf Mesika, Tison Chen from StreamNative, and Kiryl Valkovich from Teal Tools for implementing this update. The Apache Pulsar website now features a refreshed and visually appealing design that perfectly captures the futuristic essence of our dynamic community and next-generation solution.",[40,38758,38760],{"id":38759},"building-real-time-data-pipelines-with-minimal-programming-skills-is-becoming-easier","Building real-time data pipelines with minimal programming skills is becoming easier.",[48,38762,38763,38764,38767],{},"Sijie’s Guo explains in his ",[55,38765,38766],{"href":38685},"opening talk"," that you don’t need to bring a full-fledged streaming processing technology such as Apache Flink or Apache Spark to build all your pipelines. For the less advanced use cases, you can build a pipeline with the comprehensive set of Pulsar IO connectors without writing a single line of code. You can also create Pulsar Functions to write simple and easy-to-deploy processing logic with just a few lines of code.",[48,38769,38770,38771,38775],{},"During ",[55,38772,38774],{"href":38773},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-build-low-code-stream-data-pipelines-with-pulsar-transformations","his presentation",", Christophe Bornet from DataStax introduced an innovative advancement in Pulsar known as Pulsar Transformations. These transformations enable users to manipulate data through low-code techniques while harnessing the power of existing components within Pulsar.",[48,38777,38778,38779,38783],{},"Additionally, Apache Pulsar and Apache NiFi can be combined to create real-time data pipelines without coding. By using a drag-and-drop interface, users can easily connect different data sources and destinations. This integration allows for seamless data flow and processing, enabling users to handle complex data tasks efficiently. For more information and a demo, you can ",[55,38780,38782],{"href":38781},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-using-apache-nifi-with-apache-pulsar-for-fast-data-on-ramp","watch Tim's talk"," on this topic.",[48,38785,38786],{},"You can efficiently build a full data pipeline with minimal or zero lines of code without the need to handle the complexity of setting up and managing a stream processing infrastructure. Pulsar IO connectors, Pulsar functions, Pulsar transformations, or Apache NiFi provide convenient options for creating data pipelines without the complexities of infrastructure configuration and maintenance.",[40,38788,38790],{"id":38789},"pulsar-supports-various-use-cases-in-several-industries","Pulsar supports various use cases in several industries.",[48,38792,38793,38794,4031],{},"Pulsar has already been deployed by thousands of companies across the globe in various industries. Quoting Sijie Guo in the ",[55,38795,38796],{"href":38685},"first keynote",[916,38798,38799],{},[48,38800,38801],{},"From med-tech to financial services, IoT, manufacturing, e-commerce, gaming, and more, Pulsar became part of the modern data stack.",[48,38803,38804],{},"During the event, numerous industry professionals showcased real-world examples of how Pulsar transforms data streaming applications across various sectors. Engineers representing prominent companies demonstrated their successful implementation of Pulsar, emphasizing the reasons behind their choice and its benefits.",[32,38806,25163],{"id":38807},"finance",[48,38809,38810,38811,38815],{},"George Orban (Daiwa Capital Market) ",[55,38812,38814],{"href":38813},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-pulsar-in-finance-a-tale-of-a-migration","shares a \"Pulsar love story\""," where he discusses migrating a pricing engine and trading system to Apache Pulsar. He delves into the motivations behind selecting Pulsar, highlighting its suitability for finance and enterprise applications, and outlines the notable enhancements it brought to their stack in terms of resilience, robustness, and speed.",[48,38817,38818,38819,38822],{},"Raiffeisen Bank International (RBI) is one of the top European banks focusing on digital transformation, sustainability, and customer experience. They help over 60M customers with all kinds of financial services. Their central backbone for all their data integration initiatives is powered by Pulsar. Watch ",[55,38820,38821],{"href":38513},"Markus Falkner & Armin Woworsky’s keynote",", where they share their experience on Async API & GitOps on this platform.",[48,38824,38825],{},"Zafin offers an enterprise platform that enables banks to separate products and pricing from their core systems and consolidate them into a cross-enterprise product innovation layer.",[48,38827,38828],{},"Zafin selected Pulsar for a complex & sensitive data streaming use case because:",[321,38830,38831,38834,38837,38840],{},[324,38832,38833],{},"Pulsar can scale out rapidly and dynamically to increase throughput without restarting the applications",[324,38835,38836],{},"Pulsar's geo-replication feature allows them to seamlessly replicate their entire data to a disaster recovery region without experiencing performance drawbacks.",[324,38838,38839],{},"Pulsar’s Tier Storage feature allows for multi-year data retention on cheap storage.",[324,38841,38842],{},"A Pulsar cluster can be upgraded without downtime.",[48,38844,38845],{},"By using StreamNative Cloud to manage Pulsar, Zafin is able to ensure observability with out-of-the-box monitoring. In addition, StreamNative Cloud greatly simplifies Zafin’s cluster management.",[48,38847,38848],{},"Zafin has been partnering with StreamNative for over a year, and its timely delivery has garnered high satisfaction from all parties involved.",[48,38850,38851,38852,190],{},"Lloyd Chandran & Matt Hefford from Zafin ",[55,38853,38855],{"href":38854},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-how-we-simplified-a-highly-complex-and-sensitive-data-stream-using-apache-pulsar","share their experience in this presentation",[32,38857,38859],{"id":38858},"telco","Telco",[48,38861,38862,38863,38867],{},"Habip Kenan Üsküda (Axon Networks) ",[55,38864,38866],{"href":38865},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-pulsar-observability-in-high-topic-cardinality-deployments-for-telco","shares their journey"," of constructing an observability stack for their Pulsar-based platform in the telecommunications field. This stack empowered their monitoring infrastructure to expand seamlessly, accommodating an impressive scale of 1 million topics.",[32,38869,38871],{"id":38870},"transport","Transport",[48,38873,38874,38875,38879],{},"Jaakko Malkki ",[55,38876,38878],{"href":38877},"\u002Fvideos\u002Fsystem-level-testing-of-a-pulsar-based-microservice-application","explains in his presentation"," how Helsingin Seudun Liikenne (Helsinki Regional Transport Authority) utilizes Transitdata, a microservice application based on Pulsar, to process real-time public transport data like predictions for stop times, vehicle locations, and service notifications. He highlights the difficulties of testing applications with a microservice architecture at the system level and discusses their approach to simplifying the creation of automated tests. This strategy enables the rapid rollout of new features.",[32,38881,25249],{"id":38882},"manufacturing",[48,38884,38885],{},"At Pulsar Summit Europe, engineers from LEGO Group discuss their utilization of Pulsar as a messaging and streaming platform, implemented across various domains within the company. They delve into their experiences in hosting and managing this platform at an enterprise level and highlight their successful experience with StreamNative Cloud. Watch the videos below to learn more:",[321,38887,38888,38893],{},[324,38889,38890],{},[55,38891,38892],{"href":38509},"Challenges of Hosting a Pulsar-as-a-Service Platform Under a Shared Responsibility Model",[324,38894,38895],{},[55,38896,38898],{"href":38897},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-documentation-as-configuration-for-management-of-apache-pulsar","Documentation as Configuration for Management of Apache Pulsar",[40,38900,2125],{"id":2122},[48,38902,38903],{},"In conclusion, the Pulsar Virtual Summit Europe was an incredible event that showcased the power and potential of Apache Pulsar. From enlightening keynote sessions to deep-dive technical presentations, attendees gained valuable insights into leveraging Pulsar for their real-time data applications.",[48,38905,38906],{},"But the excitement doesn't end there! The upcoming Pulsar Summit North America is just around the corner, taking place on October 25, 2023, in San Francisco. The summit promises to be a hub of innovation, collaboration, and knowledge sharing among industry experts, developers, and enthusiasts.",[48,38908,38909,38910,190],{},"If you have valuable insights, experiences, or breakthroughs related to Apache Pulsar, remember to submit your ideas before the Call for Speakers closing date on July 7, 2023. This is your chance to contribute and share your expertise with the Pulsar community. Be part of shaping the future of real-time data processing with Pulsar by ",[55,38911,38913],{"href":38912},"\u002Fblog\u002Fpulsar-summit-na-2023-call-for-speakers","submitting your proposal",[48,38915,38916],{},"Let's come together to learn, network, and take Pulsar to new heights. See you there!",[48,38918,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":38920},[38921,38926,38927,38928,38929,38930,38936],{"id":38499,"depth":19,"text":38500,"children":38922},[38923,38924,38925],{"id":38518,"depth":279,"text":38519},{"id":38567,"depth":279,"text":38568},{"id":38671,"depth":279,"text":38672},{"id":38678,"depth":19,"text":38679},{"id":38707,"depth":19,"text":38708},{"id":38742,"depth":19,"text":38743},{"id":38759,"depth":19,"text":38760},{"id":38789,"depth":19,"text":38790,"children":38931},[38932,38933,38934,38935],{"id":38807,"depth":279,"text":25163},{"id":38858,"depth":279,"text":38859},{"id":38870,"depth":279,"text":38871},{"id":38882,"depth":279,"text":25249},{"id":2122,"depth":19,"text":2125},"2023-07-11","Pulsar Virtual Summit Europe 2023 brought the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.","\u002Fimgs\u002Fblogs\u002F64ad4fdc326277875a82816b_image-4.png",{},{"title":38446,"description":38938},"blog\u002Fpulsar-virtual-summit-europe-2023-key-takeaways",[5376,821],"PtxEw2OG8yvrrVlFoGqWjCvexXURnri-tHH0G08yXwo",{"id":38946,"title":38947,"authors":38948,"body":38949,"category":3550,"createdAt":290,"date":39243,"description":39244,"extension":8,"featured":294,"image":39245,"isDraft":294,"link":290,"meta":39246,"navigation":7,"order":296,"path":33966,"readingTime":39247,"relatedResources":290,"seo":39248,"stem":39249,"tags":39250,"__hash__":39251},"blogs\u002Fblog\u002Famazon-eventbridge-connector-is-now-integrated-with-streamnative-cloud.md","Amazon EventBridge connector is now integrated with StreamNative Cloud",[6969],{"type":15,"value":38950,"toc":39228},[38951,38955,38966,38970,38978,38982,38984,38987,39001,39004,39012,39014,39017,39021,39030,39047,39049,39053,39062,39074,39090,39094,39097,39123,39127,39136,39139,39148,39152,39160,39166,39170,39179,39182,39188,39192,39195,39200,39204,39206,39209,39212,39218,39220,39223,39226],[32,38952,38954],{"id":38953},"what-is-amazon-eventbridge","What is Amazon EventBridge?",[48,38956,38957,38962,38963],{},[55,38958,38961],{"href":38959,"rel":38960},"https:\u002F\u002Fdocs.aws.amazon.com\u002Feventbridge\u002Flatest\u002Fuserguide\u002Feb-what-is.html",[264],"EventBridge"," is a serverless service that uses events to connect application components together, making it easier for you to build scalable event-driven applications. It can be used to route events from sources such as home-grown applications, AWS services, and third-party software to consumer applications across organizations. EventBridge provides a simple and consistent way to ingest, filter, transform, and deliver events to build new applications quickly.\n",[384,38964],{"alt":18,"src":38965},"\u002Fimgs\u002Fblogs\u002F64abd1eb22f316f32804b3b7__PG0nRKs1CmJUJde2h9d9gdVD4xbaau3XTBHb3_JthQcTlUZQWOxwU3MSYG12DIsMxRcBGo374GguKpqYuHWPOqJ9HAUwSHMt2X5FzmkFbT9IDjcgWCnNnb1rwQ5CdrXS8NBWDpeARmGq8-y7aQF04E.png",[32,38967,38969],{"id":38968},"what-is-streamnative-cloud","What is StreamNative Cloud?",[48,38971,38972,38974,38975,33315],{},[55,38973,3550],{"href":37361}," is a fully managed cloud-native messaging and event streaming service built on Apache Pulsar.\n",[384,38976],{"alt":18,"src":38977},"\u002Fimgs\u002Fblogs\u002F64abd1ebc40106827cc81ba3_d3jbRRXUibkY3ioUh3_Ca_mb2o4ctL9TOQfPgWsVsO1l7M9bwvfsDFfWTgZ-cc-d1bbo4go5igIrIHpZjgGwIxLGQgvKGmfau7CUIVW9mL7SRPi8igwcl9CH13VMRqQjM3ZTKOIyjpqNMtqUB_yrr0E.png",[40,38979,38981],{"id":38980},"why-integrate-streamnative-cloud-with-amazon-eventbridge","Why integrate StreamNative Cloud with Amazon EventBridge?",[48,38983,3931],{},[48,38985,38986],{},"StreamNative Cloud and Amazon EventBridge have different ecosystems and capabilities.",[48,38988,38989,38990,1186,38995,39000],{},"Synchronizing StreamNative Cloud data to EventBridge can help teams quickly access the AWS ecosystem, such as ",[55,38991,38994],{"href":38992,"rel":38993},"https:\u002F\u002Faws.amazon.com\u002Flambda\u002F",[264],"Lambda function",[55,38996,38999],{"href":38997,"rel":38998},"https:\u002F\u002Faws.amazon.com\u002Fapi-gateway\u002F",[264],"API Gateway",", etc. Likewise, synchronizing Amazon EventBridge data to StreamNative Cloud can take advantage of more features of the Pulsar ecosystem, such as functions, data order guarantees, and flexible publish and subscribe models.",[48,39002,39003],{},"To integrate StreamNative Cloud with Amazon EventBridge, there are two data flow directions:",[321,39005,39006,39009],{},[324,39007,39008],{},"StreamNative Cloud → EventBridge: Specify a topic in Pulsar, and whenever there is data in that topic, it is synchronized to an event bus in AWS EventBridge.",[324,39010,39011],{},"EventBridge → StreamNative Cloud: Specify an event bus in AWS EventBridge, and whenever there is data in that event bus, it is synchronized to a topic in Pulsar.",[48,39013,3931],{},[48,39015,39016],{},"Here's how to implement each scenario.",[32,39018,39020],{"id":39019},"streamnative-cloud-data-to-aws-eventbridge","StreamNative cloud data to AWS EventBridge.",[48,39022,39023,39024,39029],{},"This scenario can be implemented using a pulsar sink connector. StreamNative open-sourced the ",[55,39025,39028],{"href":39026,"rel":39027},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-aws-eventbridge",[264],"AWS EventBridge sink connector"," and has already integrated it into StreamNative Cloud. This connector can be used to easily send StreamNative Cloud messages to AWS EventBridge.",[48,39031,39032,39033,39035,39038,39041,39042,190],{},"Create an AWS EventBridge sink connector on the StreamNative Cloud console.",[15918,39034],{},[384,39036],{"alt":18,"src":39037},"\u002Fimgs\u002Fblogs\u002F64abd1ebd32c171d3dc87f3f_lm_9DVh5gX-2gzhEjDSZ2dY-RU54JsPiYyS7vhwEZk609o-btLDoOXQ8zS-JCHp3CGlVxaBt8_IdePmzeWMU5dc-qR1-EyjSzI6jEcnoZ2e5ab-fgT3uPZnsISKehLr4jx0mVLk-27CE8Q6A9AP6m8Y.png",[384,39039],{"alt":18,"src":39040},"\u002Fimgs\u002Fblogs\u002F64abd1eccd7f63151605f7d9_9Os2s1TI3P06af1WxLNTn1bVRJCmObk-Dc6I-Krbmmuer9Ey21Q2JzV_qM-qbHeMloD3afUl1GhQmQQYsQkiS6ffjr0OYCv7cBm8og7HaBF41j7Joq4vxF12QBZBZye3aNlP1VBPEtQVVhAuPZTxLl0.png","\nFor more configuration information, please refer to the ",[55,39043,39046],{"href":39044,"rel":39045},"https:\u002F\u002Fdocs.streamnative.io\u002Fhub\u002Fconnector-aws-eventbridge-sink-v2.10.4.3",[264],"StreamNative Hub documentation",[48,39048,3931],{},[40,39050,39052],{"id":39051},"tutorial-aws-eventbridge-data-to-streamnative-cloud","Tutorial: AWS EventBridge data to StreamNative cloud.",[48,39054,39055,39056,39061],{},"Since AWS EventBridge does not provide a similar poll\u002Freceive interface, this means that we cannot provide source connectors for StreamNative Cloud. AWS EventBridge uses ",[55,39057,39060],{"href":39058,"rel":39059},"https:\u002F\u002Fdocs.aws.amazon.com\u002Feventbridge\u002Flatest\u002Fuserguide\u002Feb-api-destinations.html",[264],"API destinations"," to send msg from EventBridge to the third-party app.",[48,39063,3600,39064,39068,39069,39073],{},[55,39065,39067],{"href":33836,"rel":39066},[264],"StreamNative REST API"," can be configured in AWS EventBridge ",[55,39070,39072],{"href":39058,"rel":39071},[264],"API destinations ","to send events to StreamNative Cloud.",[916,39075,39076],{},[48,39077,39078,39079,39084,39085,39089],{},"Note: If you use native Pulsar without StreamNative Cloud, you can’t integrate EventBridge into Pulsar data flow. Because the native ",[55,39080,39083],{"href":39081,"rel":39082},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fclient-libraries-rest\u002F#producer",[264],"Pulsar REST API ","does not support configured JSON content,",[55,39086,39088],{"href":33836,"rel":39087},[264]," StreamNative Cloud REST API"," improves it.",[32,39091,39093],{"id":39092},"_1-get-streamnative-cloud-rest-config-information","1. Get StreamNative Cloud REST config information",[48,39095,39096],{},"To get StreamNative Cloud REST Config:",[1666,39098,39099,39102,39105,39108,39111,39114,39117],{},[324,39100,39101],{},"Log in to StreamNative Cloud.",[324,39103,39104],{},"In the left navigation pane, choose Pulsar Clients.",[324,39106,39107],{},"Select Rest API table.",[324,39109,39110],{},"Select a service account.",[324,39112,39113],{},"Select the authentication type, and use Oauth2 and DownloadKey. Extract type, client_id, client_secret, client_email, and issuer_url from this key.",[324,39115,39116],{},"Show Client configuration. Extract grant_type, audience, and get_token_url from this information.",[324,39118,39119,39120],{},"Select a topic and get Producer messages curl cmd. Extract produce_message_url from this cmd.\n",[384,39121],{"alt":18,"src":39122},"\u002Fimgs\u002Fblogs\u002F64abd1ece9eb30b8e2bff909_Rg29ocvkwqr38io1idKyupw-i9O7CCLUCzWoqEf2Sb4pWkKpH7AucomXcTHI-6Qk9LObM3cZTA65X5Sxlx6Ss-hppwb4oDKbgvIX7eY2Ec3IJxox-OpfPYs6v9aWivR6zoSBbemdTYvu9vZLJcQk1AE.png",[32,39124,39126],{"id":39125},"_2-create-a-connection-on-aws-eventbridge-api-destinations","2. Create a connection on AWS EventBridge API destinations",[916,39128,39129],{},[48,39130,39131,39132],{},"For details on how to create a connection, please refer to the ",[55,39133,39135],{"href":39058,"rel":39134},[264],"official website.",[48,39137,39138],{},"According to the StreamNative Cloud REST config obtained in step-1, please fill it in as shown in the figure.",[48,39140,24328,39141,39144,39145,33315],{},[384,39142],{"alt":18,"src":39143},"\u002Fimgs\u002Fblogs\u002F64abd1eb14bccd845e521caf_lrJ8olb9imTOyzPcCTkl6i5Szlsh_wVYKTSnL7bwum90NLawmBgdGKN6jp_bFBi-mB39uCa9Os-oMNYxK_83vNJ2FJPUaNm7Uzcu3A-XRP1mRSA14EV5KFONMq1PaYnrQYf61uPIhNg2CQVwytbHlN0.png","\nWhen you create, you can see that the connection will start to authorize, and when the authentication is passed, the instructions are configured without problems.\n",[384,39146],{"alt":18,"src":39147},"\u002Fimgs\u002Fblogs\u002F64abd1eb001d559106cbb68d_WovoIiLz3P8c1KsgRHcfN_7930qLrUiF5Vb0Iu5xqKp8Dc5tn1oI047vAv5xx_ToWV4oZ0PzhzgoT0fVlral5eNWT1E4PydwplU5ZB_7FtM-C2PziRO8baWaJnfSSAX0zbEprMyv3HW4wI913Xdkh6Y.png",[32,39149,39151],{"id":39150},"_3-create-api-destinations-on-aws-eventbridge","3. Create API destinations on AWS EventBridge",[916,39153,39154],{},[48,39155,39156,39157],{},"For details on how to create an API destination, please refer to the ",[55,39158,39135],{"href":39058,"rel":39159},[264],[48,39161,39162,39163],{},"To create API destinations, you only need to fill in the produce_message_url obtained in step-1 to the endpoint and select the connection you just created.\n",[384,39164],{"alt":18,"src":39165},"\u002Fimgs\u002Fblogs\u002F64abd1ec1d846f5e0e80f03e_k_L27I0qk-4aId-_WWvvINW8zzhLuAWD7VcPSR7oQ1aPBHHx3sPx5VYKMIm0Ka4XnEvHlq8NJIuCOtD6Chk_TzrzkBZuUSUWcp-NpjnTAzm5sSrrtvxXrTq2ybGSkl1m7Ob_P5NvaNpCaNHDbu4l7YM.png",[32,39167,39169],{"id":39168},"_4-create-a-rule-on-eventbus","4. Create a Rule on EventBus.",[916,39171,39172],{},[48,39173,39174,39175,190],{},"For more on how to create rules, please refer to the ",[55,39176,7120],{"href":39177,"rel":39178},"https:\u002F\u002Fdocs.aws.amazon.com\u002Feventbridge\u002Flatest\u002Fuserguide\u002Feb-create-rule.html",[264],[48,39180,39181],{},"Create a Rule whenever the data in the EventBus conforms to the rule pattern, and send that data to API destinations.",[48,39183,39184,39185,33315],{},"In the target step, select the API destinations you created.\n",[384,39186],{"alt":18,"src":39187},"\u002Fimgs\u002Fblogs\u002F64abd1ecdaaa7dd81979403f_Yg4bQEWToqap2zn86Lni2YmST5srPTj38XZUs00lep7pIPyUPouJNkQzySV6QxgR0NZ5CRn1gTrw79e8IlOZyOgeWDyV9JOgzCmvoIwfFbTXafE7rYagGQuZ_0O_Eeq5dB14H92vu59FiHYAT5keixk.png",[32,39189,39191],{"id":39190},"_5-send-events-to-eventbus","5. Send events to EventBus.",[48,39193,39194],{},"You can simulate sending an event to the event bus to trigger it to SN Cloud.",[48,39196,24328,39197,33315],{},[384,39198],{"alt":18,"src":39199},"\u002Fimgs\u002Fblogs\u002F64abd1ecd32c171d3dc87f5e_SFxI4CDmaDP_Uc08neBojLZtSTgCvAeyFDM91i9PquOBrKYM0Z46hrKGaaUifdy8Cocqkh4-XjIfE2cTnqyvAwsMwaeKLmiLwiXbmsouIYCK2k-o3rJtiC16R7JYNDubb2TMgVA3t0OSpta3O-CEP7I.png",[32,39201,39203],{"id":39202},"_6-consume-messages-from-streamnative-cloud","6. Consume messages from StreamNative Cloud",[48,39205,3931],{},[48,39207,39208],{},"You can use client lib or pulsarctl to consume that topic message.",[48,39210,39211],{},"The data that AWS delivers is a JSON, the REST API of StreamNative Cloud just uses byte schema, and you need to convert bytes to string. Its format is as follows:",[8325,39213,39216],{"className":39214,"code":39215,"language":8330},[8328],"{\n    \"version\": \"0\",\n    \"id\": \"0dda1223-b6d0-0048-d75e-3aac6b5a3cd0\",\n    \"detail-type\": \"com.test.People\",\n    \"source\": \"api-destination-pulsar-rule\",\n    \"account\": \"598203581484\",\n    \"time\": \"2023-05-18T09:46:29Z\",\n    \"region\": \"ap-northeast-1\",\n    \"resources\": [],\n    \"detail\": {\n        \"name\": \"test-sn\",\n        \"age\": 20\n    }\n}\n",[4926,39217,39215],{"__ignoreMap":18},[48,39219,3931],{},[48,39221,39222],{},"The first layer of data format is fixed (detail-type, detail, etc), and the user's data is in the detail field.",[48,39224,39225],{},"In general, detail-type should contain type information about detail, and you can deserialize the content of detail based on detail-type. It depends on how the source of your data is set.",[48,39227,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":39229},[39230,39231,39232,39235],{"id":38953,"depth":279,"text":38954},{"id":38968,"depth":279,"text":38969},{"id":38980,"depth":19,"text":38981,"children":39233},[39234],{"id":39019,"depth":279,"text":39020},{"id":39051,"depth":19,"text":39052,"children":39236},[39237,39238,39239,39240,39241,39242],{"id":39092,"depth":279,"text":39093},{"id":39125,"depth":279,"text":39126},{"id":39150,"depth":279,"text":39151},{"id":39168,"depth":279,"text":39169},{"id":39190,"depth":279,"text":39191},{"id":39202,"depth":279,"text":39203},"2023-07-10","Synchronizing StreamNative Cloud data to EventBridge can help teams quickly access the AWS ecosystem, such as Lambda function, API Gateway, etc. Likewise, synchronizing Amazon EventBridge data to StreamNative Cloud can take advantage of more features of the Pulsar ecosystem, such as functions, data order guarantees, and flexible publish and subscribe models.","\u002Fimgs\u002Fblogs\u002F64be26f1c08eee24ded8768e_image-9.png",{},"4 min",{"title":38947,"description":39244},"blog\u002Famazon-eventbridge-connector-is-now-integrated-with-streamnative-cloud",[28572,302],"8OVad8kn9EtTfxgvCI179V6aOaGZT3G1HVm4VD-bNbc",{"id":39253,"title":32178,"authors":39254,"body":39255,"category":821,"createdAt":290,"date":39460,"description":39461,"extension":8,"featured":294,"image":39462,"isDraft":294,"link":290,"meta":39463,"navigation":7,"order":296,"path":32177,"readingTime":4475,"relatedResources":290,"seo":39464,"stem":39465,"tags":39466,"__hash__":39467},"blogs\u002Fblog\u002Freducing-total-cost-of-ownership-tco-for-enterprise-data-streaming-and-messaging.md",[37161],{"type":15,"value":39256,"toc":39452},[39257,39260,39263,39267,39270,39273,39275,39287,39292,39295,39298,39303,39306,39309,39313,39316,39319,39322,39325,39336,39352,39356,39359,39379,39382,39390,39393,39397,39400,39403,39406,39409,39412,39415,39419,39422,39425,39428,39431,39434,39436,39439,39442,39450],[48,39258,39259],{},"Modern enterprises are often faced with a conundrum: How can they leverage robust data streaming capabilities while keeping TCO within manageable bounds?",[48,39261,39262],{},"Conventional technologies such as Kafka or RabbitMQ pose significant challenges and expenses due to inefficient resource utilization resulting from their lack of elasticity, high infrastructure and operational costs from the multiplication of clusters through the organization, and the potential for downtime due to the absence of geo-replication. We introduce Apache Pulsar which natively solves those challenges with unique architecture and features and emerges as a transformative step towards a modern, scalable, and cost-effective data platform.",[40,39264,39266],{"id":39265},"the-cost-impact-of-traditional-data-streaming-technologies","The Cost Impact of Traditional Data Streaming Technologies",[48,39268,39269],{},"Enterprises are increasingly reliant on data, generating vast quantities every second. Consequently, data streaming and messaging systems have emerged as critical components within these organizations, forming the bedrock of their ability to build real-time applications and derive actionable insights from their data.",[48,39271,39272],{},"‍\nHowever, the decision to implement such systems requires an in-depth understanding of the total investment needed to produce the expected results. Those are not simply upfront expenses tied to setting up a new system. It encapsulates a wide range of factors, including the costs of installation, operation, maintenance, and upgrades. Even the potential financial implications of system downtime need to be accounted for in a comprehensive TCO analysis:",[48,39274,3931],{},[321,39276,39277,39284],{},[324,39278,39279,39280,39283],{},"Infrastructure Costs: the cost of the resources you need to buy or rent to run your systems. Not all technologies are equally efficient when it come to using them. Traditional technologies like Kafka or RabbitMQ are typically unable to dynamically scale resources based on real-time demand. To ensure uninterrupted service, these platforms are often provisioned based on peak load estimates, with an additional buffer for unexpected surges. Consequently, during periods of lower demand, these systems are underutilized leading to inefficient resource utilization and higher overall server costs.\n",[384,39281],{"alt":18,"src":39282},"\u002Fimgs\u002Fblogs\u002F649e8a70a66870ab680bb030_8sfN8FGv-2PAXgm9GhtLs8Oja3Sfb2Hnqy_S--uC1qZnfOAT7e-SqgpiL6iPJz6tv4K0TpBducKqLluYa4fdtxcNGkqCg92Xn9cxZynuBSqQI2CRyr9K4tuJsW3QwOK-9wcuzzhb8vaZk_j6eEccku4.png","\nAlso, more often than not, Enterprises simultaneously require messaging and streaming capabilities which lead to multiple clusters with independent resources,  piling up server costs (and operational costs), as well as data duplication, which inflates storage costs and complicates data management.",[324,39285,39286],{},"Operational Costs: The personnel, time, and financial resources devoted to tasks like ongoing monitoring, managing system updates, troubleshooting performance issues, and adjusting configurations for optimal data flow and security. For large-scale enterprises, these tasks can quickly balloon, especially when different applications or departments maintain separate clusters, which results in multiple instances of oversized hardware procurement, repeated setup, and maintenance tasks.",[48,39288,24328,39289],{},[384,39290],{"alt":18,"src":39291},"\u002Fimgs\u002Fblogs\u002F649e8a704d50f8e5910cd4e9_0EWmONI7m1uRj61Yz3GUcupx10YqfrfnKFAjRkfTfunokof-OFnMsw2FASXfnMnvp58xfTpCmOOUQC2a31gmLCtnmieuhDShEwp4CLC1MM-1lfwEq6k8yiLsmGqtZgXeTV9ErDehfq7cOWqNadCcEDc.png",[48,39293,39294],{},"A solution that provides centralized data governance and infrastructure management can drastically reduce these expenses, improving both cost-effectiveness and operational efficiency.",[48,39296,39297],{},"Similarly, managing different technologies for streaming and messaging use cases requires separate operational oversight (system updates, performance tuning, and troubleshooting) resulting in a higher expenditure of time, effort, and personnel resources with different skills.",[321,39299,39300],{},[324,39301,39302],{},"Business Costs: traditional data streaming systems are often not built with robust fault tolerance and geo-redundancy mechanisms. Without such fail-safe measures in place, system disruptions, whether they arise typically from hardware failures, network issues, or human errors, can lead to significant data loss. Recovering from such incidents involves more than just the technical tasks of data restoration and system repair but also accounts for the 'business downtime' costs. These can include the loss of business opportunities during the outage, reputational damage, potential regulatory fines for data loss, and the loss of customer trust.",[48,39304,39305],{},"Finally, enterprises that choose to manage their own data streaming systems may experience significant opportunity costs. By allocating highly skilled personnel to these tasks, they divert valuable resources away from core business operations and strategic initiatives.",[48,39307,39308],{},"It's clear that while traditional data streaming technologies play a crucial role in businesses, they also come with a hefty price tag.",[40,39310,39312],{"id":39311},"understanding-apache-pulsar","Understanding Apache Pulsar",[48,39314,39315],{},"Originally developed by Yahoo with the intention to unify the best features of existing messaging systems in a cloud-native architecture, Apache Pulsar is now one of the most active projects of the Apache Software Foundation and offers a compelling blend of flexibility, scalability, and reliability.",[48,39317,39318],{},"So, what makes Apache Pulsar unique?",[48,39320,39321],{},"In many traditional data streaming systems, serving and storage functions are intertwined. This architecture makes scaling a challenge as the growth in data volumes requires scaling both the serving and storage capacities concurrently, which is not optimal or cost-effective. At the opposite, Apache Pulsar's architecture fundamentally separates the serving and storage layers, which enables independent scaling of each. For example, if there's a spike in data intake, you can add more serving brokers without having to invest in additional storage capacity, and vice versa. This separation makes Pulsar highly elastic, allowing it to efficiently respond to changing data loads and throughput requirements. This architecture also forms the foundation for tiered storage, which allows for older data that are infrequently accessed to be offloaded from the serving brokers to cheaper, long-term storage, like Amazon S3. This significantly reduces storage costs and enables efficient utilization of more expensive primary storage.",[48,39323,39324],{},"On top of this great design, the community added great features, including:",[321,39326,39327,39330,39333],{},[324,39328,39329],{},"Multi-tenancy support,  which allows multiple teams or applications to share a single Pulsar cluster, effectively isolating their data and traffic. This means you can run multiple applications on a single Pulsar cluster with a high degree of security and isolation, thus maximizing the utilization of your resources.",[324,39331,39332],{},"built-in support for geo-replication. Data can be served from local brokers for low latency access while being stored across different geographical regions for higher fault tolerance. In the event of a failure, Pulsar can seamlessly switch to brokers in a different region, minimizing downtime and potential data loss.",[324,39334,39335],{},"Support of queuing and streaming use cases. This means you don't need separate systems for handling real-time and delayed data, simplifying your infrastructure and reducing maintenance overhead.",[48,39337,39338,39339,5422,39342,1186,39347,190],{},"Furthermore, internal optimizations lead to ",[55,39340,39341],{"href":27690},"better performance",[55,39343,39346],{"href":39344,"rel":39345},"https:\u002F\u002Fpandio.com\u002Fzero-data-loss-a-reality-in-apache-pulsar-not-true-with-kafka\u002F",[264],"guaranteed message delivery",[55,39348,39351],{"href":39349,"rel":39350},"https:\u002F\u002Fjack-vanlightly.com\u002Fblog\u002F2018\u002F10\u002F21\u002Fhow-to-not-lose-messages-on-an-apache-pulsar-cluster",[264],"even in the face of network splits or server crashes",[40,39353,39355],{"id":39354},"why-apache-pulsar-is-cheaper","Why Apache Pulsar is cheaper",[48,39357,39358],{},"In a context of higher cost scrutiny, saving costs free up resources for other strategic initiatives, enhancing the overall efficiency and competitive edge of the organization. Apache Pulsar offers several key advantages to save costs:",[1666,39360,39361,39364,39367,39370,39373,39376],{},[324,39362,39363],{},"More efficient use of the infrastructure: Benchmarks show that Apache Pulsar has better raw performance and needs less hardware than Apache Kafka for a given throughput, partly due to the separation of the serving and storage layers that allows for separate cost optimization. The difference can go from 20% to 70% on a single cluster for the most demanding applications.",[324,39365,39366],{},"Being elastic avoids oversizing: Apache Pulsar's elastic architecture allows you to scale your system dynamically according to your needs. Rather than requiring a significant upfront investment in high-specification machines, you have the flexibility to add more servers to your existing cluster as and when the data load increases. This flexibility ensures efficient resource allocation, reduces initial costs, and allows for better adaptability to changing business requirements. It also reduces the need for frequent maintenance and tuning when workload spikes, which in turn lowers operational costs.",[324,39368,39369],{},"Reduced costs for long retention: With Pulsar's tiered storage feature, old data can be offloaded from expensive primary storage to cheaper, long-term storage, reducing storage costs.",[324,39371,39372],{},"Reduced downtime: Pulsar's built-in fault-tolerance and geo-replication features ensure data is securely backed up in multiple locations. This minimizes the risk of data loss or system outages, saving costs associated with downtime and disaster recovery. Its ability to seamlessly handle hardware failures or network splits without interrupting service is a critical advantage for businesses where every second of downtime translates into substantial financial losses.",[324,39374,39375],{},"Elimination of redundant systems: By offering a unified platform for both queuing and streaming, Apache Pulsar eliminates the need for maintaining separate systems for different types of data processing. This simplification not only reduces the costs associated with managing multiple systems but also lowers the risk of data inconsistencies and redundancies that can arise from using disparate systems. Moreover, the community and StreamNative are working on protocol handlers that allow existing applications, which currently rely on technologies like Kafka or RabbitMQ, to operate on Apache Pulsar without any modifications..",[324,39377,39378],{},"Resources and operational mutualization, thanks to multi-tenancy: As we discussed earlier, Apache Pulsar's native multi-tenancy support allows multiple applications or departments to share a single Pulsar cluster while maintaining strict data isolation. This leads to better resource utilization and lower hardware and maintenance costs, as the need for separate infrastructure for each application or department is eliminated.",[48,39380,39381],{},"In summary,",[321,39383,39384,39387],{},[324,39385,39386],{},"Apache Pulsar minimizes expenses on individual clusters through its superior performance and elasticity, eliminating the need for over-provisioning resources, and thereby avoiding underutilized infrastructure.",[324,39388,39389],{},"When deployed as a shared platform across various teams within an organization, Apache Pulsar can lead to massive cost savings, as it allows efficient resource sharing and management.",[48,39391,39392],{},"Apache Pulsar presents a highly compelling option for enterprises seeking to enhance the efficiency and cost-effectiveness of their data streaming and messaging systems.",[40,39394,39396],{"id":39395},"case-study-orange-financial","Case Study: Orange Financial",[48,39398,39399],{},"Orange Financial is a major player in the mobile payment market with over 500 million registered users and 41.9 million active users, processing over 50 million transactions daily​​.",[48,39401,39402],{},"Prior to adopting Apache Pulsar, Orange Financial utilized a Lambda Architecture to handle its data processing needs, which involved splitting business logic into many segments and duplicating data across different systems for processing. This approach proved to be complex, hard to maintain, and costly. As their business grew, maintaining the different software stacks and clusters (including Kafka, Hive, Spark, Flink, and HBase) became prohibitively expensive​​.",[48,39404,39405],{},"Apache Pulsar was chosen to streamline their data processing stack, aiming to simplify the architecture, improve production efficiency, and reduce costs. With Pulsar, Orange Financial was able to unify log storage and computation into a single system, handling both real-time event streaming and processing. This resulted in a more robust and unified data serving layer​​.",[48,39407,39408],{},"The migration to Apache Pulsar also led to notable improvements in throughput and latency, allowing Orange Financial to handle peak traffic more effectively. The Pulsar-based system demonstrated high performance, being capable of responding to a transaction within 200 milliseconds​​.",[48,39410,39411],{},"Additionally, the use of Apache Pulsar's geo-replication and disaster recovery features resulted in a significant reduction of risk. By allowing data to be stored across multiple geographical locations, Orange Financial could ensure data availability and durability, even in the event of system failures​.",[48,39413,39414],{},"Ultimately, the switch to Apache Pulsar led to significant cost reductions. By simplifying their data processing architecture and reducing the need for maintaining multiple software stacks and clusters, Orange Financial was able to lower their operation and maintenance costs.",[40,39416,39418],{"id":39417},"apache-pulsar-a-strategic-investment","Apache Pulsar: A Strategic Investment",[48,39420,39421],{},"Considering Apache Pulsar for your organization's data streaming and messaging requirements is more than just a cost-saving decision. It represents a strategic investment toward improved data management and future readiness.",[48,39423,39424],{},"As data volumes continue to surge in the era of big data and real-time analytics, businesses need a robust, scalable, and reliable system that can keep pace. Apache Pulsar fits this requirement perfectly. Its ability to handle high volumes of data with ease, coupled with features like multi-tenancy, fault tolerance and geo-replication, makes it a future-proof solution for businesses.",[48,39426,39427],{},"Moreover, Pulsar's unified model for both streaming and queuing can simplify your data infrastructure, eliminating the need for multiple systems to manage different data processing needs. This not only reduces the potential for data inconsistencies but also makes the system easier to manage and scale.",[48,39429,39430],{},"Apache Pulsar also encourages better resource utilization through its multi-tenancy support. This feature is especially beneficial for enterprises where multiple departments or applications may need access to data streaming and messaging services. Instead of investing in isolated clusters for each application or department, businesses can consolidate their resources, leading to substantial cost savings and improved operational efficiency.",[48,39432,39433],{},"Given these advantages, investing in Apache Pulsar can be seen as a strategic move towards modern, efficient, and cost-effective data management.",[40,39435,2125],{"id":2122},[48,39437,39438],{},"In a world where data is increasingly becoming the lifeblood of organizations, choosing the right data streaming and messaging system can significantly influence a company's efficiency, responsiveness, and competitive edge. The total cost of ownership is a crucial factor that can't be overlooked when evaluating these systems.",[48,39440,39441],{},"Case studies of global organizations show that the shift to Apache Pulsar can lead to substantial cost savings, improved operational efficiency, and enhanced system stability. By integrating Apache Pulsar into their technical infrastructure, these organizations have made a strategic investment toward future-proofing their data management systems. While the transition may entail certain costs and challenges in the short term, the long-term benefits in terms of greatly reduced costs and improved operational efficiency are well worth the effort.",[48,39443,39444,39445,38114,39448,38117],{},"StreamNative has helped engineering teams worldwide make the move to Pulsar. Founded by the original creators of Apache Pulsar, StreamNative is one of the leading contributors to the open-source Apache Pulsar project and the author of the ",[55,39446,38113],{"href":35495,"rel":39447},[264],[55,39449,3550],{"href":37361},[48,39451,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":39453},[39454,39455,39456,39457,39458,39459],{"id":39265,"depth":19,"text":39266},{"id":39311,"depth":19,"text":39312},{"id":39354,"depth":19,"text":39355},{"id":39395,"depth":19,"text":39396},{"id":39417,"depth":19,"text":39418},{"id":2122,"depth":19,"text":2125},"2023-06-30","Discover Apache Pulsar's unique approach to efficient data streaming, enabling cost-effective, scalable solutions over Kafka and RabbitMQ","\u002Fimgs\u002Fblogs\u002F649e8f101a32886a3ae38bc7_photo_38324684.jpg",{},{"title":32178,"description":39461},"blog\u002Freducing-total-cost-of-ownership-tco-for-enterprise-data-streaming-and-messaging",[27847,799,11043,5954],"YGXXjdXtu3vIkyn_sCVza2roAtCMzjKqYumI0E5WRYA",{"id":39469,"title":39470,"authors":39471,"body":39472,"category":821,"createdAt":290,"date":39707,"description":39708,"extension":8,"featured":294,"image":39709,"isDraft":294,"link":290,"meta":39710,"navigation":7,"order":296,"path":39711,"readingTime":33691,"relatedResources":290,"seo":39712,"stem":39713,"tags":39714,"__hash__":39715},"blogs\u002Fblog\u002Fannouncing-the-amazon-eventbridge-sink-connector-for-apache-pulsar.md","Announcing the Amazon EventBridge Sink Connector for Apache Pulsar",[6969],{"type":15,"value":39473,"toc":39702},[39474,39477,39481,39489,39494,39498,39505,39508,39519,39522,39526,39529,39540,39544,39555,39557,39560,39594,39602,39606,39611,39617,39622,39628,39637,39644,39648,39651,39693,39695],[48,39475,39476],{},"StreamNative is excited to announce the general availability of the Amazon EventBridge sink connector for Apache Pulsar. This connector synchronizes Pulsar data to Amazon EventBridge in real-time, enabling Google Amazon EventBridge to leverage Pulsar and expand the Apache Pulsar ecosystem.",[8300,39478,39480],{"id":39479},"what-is-the-amazon-eventbridge-sink-connector","What is the Amazon EventBridge sink connector?",[48,39482,39483,39484,39488],{},"The",[55,39485,39487],{"href":39044,"rel":39486},[264]," Amazon EventBridge sink connector"," pulls data from Pulsar topics and persists data to AWS EventBridge.",[48,39490,24328,39491],{},[384,39492],{"alt":18,"src":39493},"\u002Fimgs\u002Fblogs\u002F649965a38cc2c82d1ee2669f_bFZg5KpWQdcreiMvdWcjMbVHG_0hjKt4jPzTjMzkpnLw7rutlS-7jlykay_CLyny7WGjwZxKRLSoL8I-vYH59rMXtJLgtPB23Cr77iyFEJpmQe8RohssBDARI0QImiB7R-dG4DyUXEqc0vTXCJPlNcA.png",[8300,39495,39497],{"id":39496},"why-we-built-the-amazon-eventbridge-sink-connector","Why we built the Amazon EventBridge sink connector?",[48,39499,39500,39501,39504],{},"AWS ",[55,39502,38961],{"href":38959,"rel":39503},[264]," is a serverless service that uses events to connect application components together, making it easier to build scalable event-driven applications.",[48,39506,39507],{},"Sending data from Apache Pulsar to AWS EventBridge can provide several benefits:",[321,39509,39510,39513,39516],{},[324,39511,39512],{},"AWS EventBridge can act as a central hub for processing and routing events from various sources, including Apache Pulsar. This can enable teams to build a unified event-driven architecture that can handle events from different systems and applications.",[324,39514,39515],{},"AWS EventBridge provides a wide range of event targets, such as AWS Lambda, Amazon SNS, Amazon SQS, etc., that can be used quickly to access the AWS ecosystem. This can enable teams to build event-driven workflows that can automate business processes and reduce manual intervention.",[324,39517,39518],{},"AWS EventBridge provides built-in security and compliance features, such as AWS CloudTrail integration, AWS Identity and Access Management (IAM) policies, and encryption at rest and in transit. This can help ensure the confidentiality, integrity, and availability of event data.",[48,39520,39521],{},"Therefore, StreamNative developed this connector to provide an easy way for teams to write data from Pulsar to EventBridge in real time.",[8300,39523,39525],{"id":39524},"benefits-of-the-amazon-eventbridge-sink-connector","Benefits of the Amazon EventBridge sink connector?",[48,39527,39528],{},"The integration between Amazon EventBridge and Apache Pulsar provides four key benefits.",[321,39530,39531,39534,39537],{},[324,39532,39533],{},"Simplicity: Quickly move data from Apache Pulsar to Amazon EventBridge without requiring any code",[324,39535,39536],{},"Efficiency: Reduce the time in configuring the data layer. This means that teams can spend more time building their business use cases instead of configuring their data.",[324,39538,39539],{},"Scalability: Run the EventBridge connector in different modes (standalone or distributed). This allows teams to build reactive data pipelines to meet business and operational needs in real-time.",[8300,39541,39543],{"id":39542},"get-started-with-the-amazon-eventbridge-sink-connector","Get started with the Amazon EventBridge sink connector",[916,39545,39546],{},[48,39547,39548,39549,39554],{},"For StreamNative Cloud users, do refer to this ",[55,39550,39553],{"href":39551,"rel":39552},"https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F1CBWukTz0n_STGzdSQaN1GS6QlFQH3YT2rlcFWbjzhac",[264],"blog"," to create a connector on StreamNative Cloud quickly.",[40,39556,10104],{"id":10103},[48,39558,39559],{},"First, you must run an Apache Pulsar cluster and an Amazon EventBridge service.",[1666,39561,39562,39573,39582],{},[324,39563,39564,39565,39568,39569,22220],{},"Prepare the Pulsar service. You can quickly run a Pulsar cluster anywhere by running ",[4926,39566,39567],{},"$PULSAR_HOME\u002Fbin\u002Fpulsar standalone",". Refer to the ",[55,39570,7120],{"href":39571,"rel":39572},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fstandalone\u002F",[264],[324,39574,39575,39576,39581],{},"Prepare the AWS EventBridge service. See ",[55,39577,39580],{"href":39578,"rel":39579},"https:\u002F\u002Fdocs.aws.amazon.com\u002Feventbridge\u002Flatest\u002Fuserguide\u002Feb-get-started.html",[264],"Getting Started with Amazon EventBridge"," for details. Need to create EventBus and Rule first.",[324,39583,39584,39585,39590,39591,190],{},"Set up the AWS EventBridge connector. Download the connector from the ",[55,39586,39589],{"href":39587,"rel":39588},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-aws-eventbridge\u002Freleases",[264],"Releases"," page, and then move the nar package to ",[4926,39592,39593],{},"$PULSAR_HOME\u002Fconnectors",[48,39595,39596,39597,39601],{},"Apache Pulsar provides a ",[55,39598,20384],{"href":39599,"rel":39600},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fio-overview",[264]," feature to run the connector. Follow the steps below to get the connector up and running.",[40,39603,39605],{"id":39604},"configure-the-sink-connector","Configure the sink connector",[1666,39607,39608],{},[324,39609,39610],{},"Create a configuration file named aws-eventbridge-sink-config.json. The configured connector writes the message in the public\u002Fdefault\u002Faws-eventbridge-pulsar topic to the pulsar-event-bus-name event bus of EventBridge.",[8325,39612,39615],{"className":39613,"code":39614,"language":8330},[8328],"{\n    \"name\": \"eventbridge-sink\",\n    \"archive\": \"connectors\u002Fpulsar-io-aws-eventbridge-2.10.4.3.nar\",\n    \"tenant\": \"public\",\n    \"namespace\": \"default\",\n    \"inputs\": [\n        \"aws-eventbridge-pulsar\"\n    ],\n    \"parallelism\": 1,\n    \"configs\": {\n        \"accessKeyId\": \"{{Your access access key}}\",\n        \"secretAccessKey\": \"{{Your secret access key}}\",\n        \"region\": \"test-region\",\n        \"eventBusName\": \"pulsar-event-bus-name\"\n    }\n}\n",[4926,39616,39614],{"__ignoreMap":18},[1666,39618,39619],{"start":19},[324,39620,39621],{},"Run the sink connector.",[8325,39623,39626],{"className":39624,"code":39625,"language":8330},[8328],"PULSAR_HOME\u002Fbin\u002Fpulsar-admin sinks localrun --sink-config-file aws-eventbridge-sink-config.json\n",[4926,39627,39625],{"__ignoreMap":18},[1666,39629,39630],{"start":279},[324,39631,39632,39633,39636],{},"You can send messages to the ",[4926,39634,39635],{},"public\u002Fdefault\u002Faws-eventbridge-pulsar"," topic, then view it in EventBridge.",[48,39638,39639,39640,190],{},"For more information, see the ",[55,39641,39643],{"href":39044,"rel":39642},[264],"Hub doc",[8300,39645,39647],{"id":39646},"get-involved","Get involved",[48,39649,39650],{},"The Amazon EventBridge sink connector is a major step in the journey of integrating Pulsar with other big data systems. To get involved with the Amazon EventBridge sink connector for Apache Pulsar, check out the following featured resources:",[321,39652,39653,39665,39674],{},[324,39654,39655,39656,39659,39660,39664],{},"Try out the Amazon EventBridge sink connector. To get started, ",[55,39657,36195],{"href":39587,"rel":39658},[264]," the connector and refer to the ",[55,39661,39663],{"href":39026,"rel":39662},[264],"ReadMe"," that walks through the whole process.",[324,39666,39667,39668,39673],{},"Make a contribution. The Amazon EventBridge sink connector is a community-driven service, which hosts its source code on the StreamNative GitHub repository. If you have any feature requests or bug reports, do not hesitate to ",[55,39669,39672],{"href":39670,"rel":39671},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-aws-eventbridge\u002Fissues\u002Fnew\u002Fchoose",[264],"share your feedback and ideas"," and submit a pull request.",[324,39675,39676,39677,39681,39682,39687,39688,39692],{},"Contact us. Feel free to create an issue on ",[55,39678,39680],{"href":39670,"rel":39679},[264],"GitHub",", send an email to the ",[55,39683,39686],{"href":39684,"rel":39685},"https:\u002F\u002Flists.apache.org\u002Flist.html?dev@pulsar.apache.org",[264],"Pulsar mailing list",", or message us on ",[55,39689,39691],{"href":33664,"rel":39690},[264],"Twitter"," to get answers from Pulsar experts.",[40,39694,10248],{"id":10247},[48,39696,39697,39698,39701],{},"StreamNative was founded by the original creators of Apache Pulsar and offers ",[55,39699,39700],{"href":37361},"a fully managed service"," to help teams accelerate time-to-production and take advantage of Pulsar’s powerful streaming and messaging technology. We work with software companies worldwide - powering the next generation of real-time\u002Fevent-streaming applications.",{"title":18,"searchDepth":19,"depth":19,"links":39703},[39704,39705,39706],{"id":10103,"depth":19,"text":10104},{"id":39604,"depth":19,"text":39605},{"id":10247,"depth":19,"text":10248},"2023-06-25","The Amazon EventBridge sink connector for Apache Pulsar synchronizes Pulsar data to Amazon EventBridge in real-time, enabling Google Amazon EventBridge to leverage Pulsar and expand the Apache Pulsar ecosystem.","\u002Fimgs\u002Fblogs\u002F6499abf8bc7e661fda04ea04_Announcing-the-Amazon-EventBridge-Sink-Connector-for-Apache-Pulsar.png",{},"\u002Fblog\u002Fannouncing-the-amazon-eventbridge-sink-connector-for-apache-pulsar",{"title":39470,"description":39708},"blog\u002Fannouncing-the-amazon-eventbridge-sink-connector-for-apache-pulsar",[28572,302],"8RcTZHiUva7Ag91Nr40JDrtsWExA-KEbalmSJVXjfuk",{"id":39717,"title":39718,"authors":39719,"body":39720,"category":7338,"createdAt":290,"date":39868,"description":39869,"extension":8,"featured":294,"image":36515,"isDraft":294,"link":290,"meta":39870,"navigation":7,"order":296,"path":38912,"readingTime":39247,"relatedResources":290,"seo":39871,"stem":39872,"tags":39873,"__hash__":39874},"blogs\u002Fblog\u002Fpulsar-summit-na-2023-call-for-speakers.md","Call for Speakers for Pulsar Summit North America 2023",[31718],{"type":15,"value":39721,"toc":39860},[39722,39725,39728,39732,39735,39738,39742,39745,39748,39752,39766,39769,39783,39790,39794,39808,39816,39825,39829,39832,39836,39839,39845,39847,39851],[48,39723,39724],{},"We’re excited to announce that Pulsar Summit North America 2023 will take place on Wednesday, October 25, 2023!",[48,39726,39727],{},"We welcome your participation to help make the event a success, by submitting a talk for the event or offering sponsorship.",[40,39729,39731],{"id":39730},"what-is-pulsar-summit","What is Pulsar Summit?",[48,39733,39734],{},"Pulsar Summit is the conference dedicated to Apache Pulsar and the messaging and event streaming community. The conference gathers an international audience of developers, data architects, data scientists, Apache Pulsar committers and contributors, and friends within the streaming and messaging ecosystem. Together, they share experiences, exchange ideas and knowledge, and receive hands-on training led by Pulsar experts.",[48,39736,39737],{},"Since 2020, seven global Pulsar Summit Events have featured 170+ interactive sessions by tech leads, open-source developers, software engineers, and software architects from Google, AWS, Splunk, Tencent, Verizon Media, Iterable, Yahoo, Nutanix, BIGO, TIBCO, OVHcloud, Clever Cloud, and more. The conferences have garnered 2,200 attendees representing 700 companies, including individuals from leading organizations such as Google, Microsoft, AMEX, Salesforce, TikTok, Alibaba, Tencent, Disney, and Paypal.",[40,39739,39741],{"id":39740},"speak-at-pulsar-summit-north-america-2023","Speak at Pulsar Summit North America 2023!",[48,39743,39744],{},"Share your Pulsar story and speak at the summit! Pulsar Summit offers a unique opportunity to connect with your peers and raise your profile in the rapidly growing Apache Pulsar community.",[48,39746,39747],{},"Our theme this year is \"Why Pulsar?\" and we are looking for stories that are innovative, informative, or thought-provoking. Join us to speak at the summit! You will be on stage with all the top Pulsar thought-leaders. It is a great way to participate and raise your profile in the rapidly growing Apache Pulsar community.",[3933,39749,39751],{"id":39750},"as-a-speaker-you-will-receive","As a speaker, you will receive:",[321,39753,39754,39757,39760,39763],{},[324,39755,39756],{},"Free conference pass.",[324,39758,39759],{},"Your headshot, bio and session featured on the Pulsar Summit website.",[324,39761,39762],{},"Your session will be promoted on YouTube, Twitter and LinkedIn.",[324,39764,39765],{},"The opportunity to share your knowledge and engage with the vibrant Pulsar community!",[48,39767,39768],{},"To speak at the summit, please submit an abstract about your presentation. Remember to keep your proposal short, relevant, and engaging. All levels of talks (beginner, intermediate, and advanced) are welcome. We invite you to submit your talk proposals and share your unique experience in one of the following themes:",[321,39770,39771,39774,39777,39780],{},[324,39772,39773],{},"“Why Pulsar?” - Your journey is inspiring, and we want to hear it! Share your stories of why you chose Pulsar, the challenges you overcame, and the solutions you implemented. Let others learn from your experiences and gain valuable insights from your use cases.",[324,39775,39776],{},"“Learning Pulsar” - Are you an educator at heart? Submit an entry-level talk introducing Pulsar, its remarkable features, and best practices. Help beginners navigate their Pulsar journey and inspire them with your expert knowledge.",[324,39778,39779],{},"“Deep Dive” - For all the experts out there, we invite you to share deep technical knowledge about Pulsar. Whether it’s about the inner workings, optimization techniques, or advanced features, we’d love to hear it all.",[324,39781,39782],{},"“Around Pulsar” - Have you built exceptional tools to work with Pulsar? Or have you integrated Pulsar with other technologies to create a superior solution? We’d love to hear about your innovations.",[48,39784,39785],{},[55,39786,39789],{"href":39787,"rel":39788},"https:\u002F\u002Fsessionize.com\u002Fpulsar-summit-north-america-2023",[264],"Submit your session abstract",[40,39791,39793],{"id":39792},"important-dates","Important Dates:",[321,39795,39796,39799,39802,39805],{},[324,39797,39798],{},"CFP opens: Tuesday, June 21st, 2023",[324,39800,39801],{},"CFP closes: Friday, July 7th, 2023",[324,39803,39804],{},"Speaker notifications sent: Friday, July 28th, 2023",[324,39806,39807],{},"Schedule announcement: August 4th, 2023",[48,39809,39810,39811,39815],{},"Submissions are open until Friday, July 7th, 2023. If you want some advice or feedback on your proposal or have any questions about the summit, please do not hesitate to contact us at ",[55,39812,39814],{"href":39813},"mailto:organizers@pulsar-summit.org","organizers@pulsar-summit.org",". We are happy to help!",[48,39817,39818,39819,39824],{},"Help us make #PulsarSummit North America 2023 successful by spreading the word and submitting your proposal and sponsorship! Follow us on Twitter (",[55,39820,39823],{"href":39821,"rel":39822},"https:\u002F\u002Ftwitter.com\u002FPulsarSummit",[264],"@pulsarsummit",") to receive the latest updates on the summit.",[40,39826,39828],{"id":39827},"about-apache-pulsar","About Apache Pulsar",[48,39830,39831],{},"Apache Pulsar is a cloud-native, distributed messaging and streaming platform that empowers companies around the world to manage trillions of events per day. The Pulsar community has witnessed rapid growth since it became a top-level Apache Software Foundation project in 2018. Over the past four years, the vibrant community keeps driving innovation and improvements to the project.",[40,39833,39835],{"id":39834},"about-the-organizer","About the Organizer",[48,39837,39838],{},"StreamNative is built by the original creators of Apache Pulsar and Apache BookKeeper, and is one of the leading contributors to the open source Apache Pulsar project. As the core developers of Pulsar, the StreamNative team is deeply versed in the technology, the community, and the use cases. Today, StreamNative is focusing on growing the Apache Pulsar and BookKeeper communities and bringing its deep experience across diverse Pulsar use cases to companies across the globe.",[48,39840,39841,39842,190],{},"StreamNative offers a fully managed service to help teams accelerate time-to-production and take advantage of Pulsar’s powerful streaming and messaging technology. Learn more about managing Pulsar at scale with",[55,39843,39844],{"href":37361}," StreamNative Cloud",[48,39846,3931],{},[40,39848,39850],{"id":39849},"stay-in-touch","Stay in Touch",[48,39852,39853,39854,39859],{},"Want to stay informed of the latest developments regarding Pulsar Virtual Summit North America 2023? ",[55,39855,39858],{"href":39856,"rel":39857},"https:\u002F\u002Fshare.hsforms.com\u002F1kAHYVhYzR6mYDzvzsXRnWA3x5r4",[264],"Sign up here"," to be the first to hear about open registration, exciting speaker announcements, and details for all things Pulsar Summit.",{"title":18,"searchDepth":19,"depth":19,"links":39861},[39862,39863,39864,39865,39866,39867],{"id":39730,"depth":19,"text":39731},{"id":39740,"depth":19,"text":39741},{"id":39792,"depth":19,"text":39793},{"id":39827,"depth":19,"text":39828},{"id":39834,"depth":19,"text":39835},{"id":39849,"depth":19,"text":39850},"2023-06-21","Pulsar Summit North America 2023 Announcement and CFP",{},{"title":39718,"description":39869},"blog\u002Fpulsar-summit-na-2023-call-for-speakers",[5376,821],"J3uhZ7WIbd3hMlhNSz-ymzHJrX0Au2aPG4O-e2yr7tI",{"id":39876,"title":39877,"authors":39878,"body":39881,"category":821,"createdAt":290,"date":40472,"description":40473,"extension":8,"featured":294,"image":40474,"isDraft":294,"link":290,"meta":40475,"navigation":7,"order":296,"path":40476,"readingTime":5505,"relatedResources":290,"seo":40477,"stem":40478,"tags":40479,"__hash__":40480},"blogs\u002Fblog\u002Fa-comparison-of-transaction-buffer-snapshot-strategies-in-apache-pulsar.md","A Comparison of Transaction Buffer Snapshot Strategies in Apache Pulsar 3.0",[39879,39880],"Xiangying Meng","Lishen Yao",{"type":15,"value":39882,"toc":40449},[39883,39885,39897,39905,39911,39914,39917,39921,39924,39928,39931,39935,39938,39942,39946,39949,39953,39956,39960,39963,39967,39970,39974,39977,39981,39984,39987,39991,40000,40003,40006,40010,40081,40085,40091,40095,40098,40104,40107,40113,40117,40134,40138,40141,40144,40150,40153,40156,40160,40163,40167,40173,40181,40185,40191,40199,40203,40206,40210,40213,40216,40226,40234,40237,40247,40255,40258,40261,40265,40268,40271,40277,40285,40288,40294,40299,40302,40305,40309,40312,40315,40321,40329,40332,40338,40343,40346,40349,40353,40356,40359,40365,40373,40376,40382,40387,40390,40393,40397,40400,40402,40405,40408,40411,40414,40420],[40,39884,46],{"id":42},[48,39886,39887,39890,39891,39896],{},[55,39888,38483],{"href":37123,"rel":39889},[264]," was released on May 2 and introduces a variety of new feature enhancements that improve the performance and stability for teams operating Pulsar at scale as well as making it more stable (and predictable) for powering messaging and data streaming services for mission critical use cases. Among these improvements is  ",[55,39892,39895],{"href":39893,"rel":39894},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F16913",[264],"transaction buffer segmented snapshots",". The new design incorporates multiple snapshot segments through a secondary index, with index and snapshot segments stored in different compact topics.",[48,39898,3600,39899,39904],{},[55,39900,39903],{"href":39901,"rel":39902},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.0.x\u002Ftransactions\u002F#transaction-buffer",[264],"transaction buffer"," plays an important role in Pulsar transactions, serving as a repository for messages produced within a transaction. In Pulsar releases prior to 3.0, the transaction buffer involves handling messages sent with transactions and taking periodic snapshots to avoid replaying all messages from the original topic. However, when a topic has long-term data retention and many aborted transactions, a single snapshot may become a bottleneck, causing increased costs as the snapshot size grows.",[48,39906,39907,39908,39910],{},"To evaluate the effectiveness of using multiple snapshot segments, the engineering team at ",[55,39909,4496],{"href":10259},", who contributed to the release of Pulsar 3.0, performed some benchmark tests using the OpenMessaging Benchmark framework. This benchmark report juxtaposes the new transaction buffer strategy of using multiple snapshots (segmented snapshots) against the previous single snapshot approach, focusing on key performance indicators such as throughput and latency.",[48,39912,39913],{},"The objective of this report is to offer users insights to select the most suitable strategy for their specific use cases, and to inform decisions regarding future optimizations and enhancements for more efficient transaction buffer management.",[40,39915,39916],{"id":22052},"Key benchmark findings",[32,39918,39920],{"id":39919},"_75x-improvement-for-network-io-efficiency","7.5x improvement for network IO efficiency",[48,39922,39923],{},"When tested under the same transaction abort rates, the newly implemented multi-snapshot strategy consistently maintained a steady throughput, averaging at 2 MB\u002Fs, and displayed a regular, periodic oscillation. This is in stark contrast to the previous strategy, which demonstrated an increasing throughput with an average rate of 15 MB\u002Fs. The new strategy, therefore, offers a significant advantage in terms of network IO conservation.",[32,39925,39927],{"id":39926},"_20x-lower-write-latency","20x lower write latency",[48,39929,39930],{},"The multi-snapshot strategy consistently kept write latency within a narrow band of 10-20ms. This is a marked improvement over the previous strategy, which saw write latency continually growing up to 200ms, an indication of a performance bottleneck.",[32,39932,39934],{"id":39933},"_20x-shorter-garbage-collection-gc-pauses","20x shorter Garbage Collection (GC) pauses",[48,39936,39937],{},"Throughout the testing period, the GC pauses for the new multi-snapshot strategy consistently hovered around 100ms, demonstrating efficient memory management. By comparison, the previous strategy saw GC pauses that not only increased over time but reached up to 2 seconds and even exceeded 20 seconds after an hour of testing. The consistent and stable performance of the new strategy points to enhanced system stability and operational efficiency.",[40,39939,39941],{"id":39940},"test-overview","Test overview",[32,39943,39945],{"id":39944},"what-we-tested","What we tested",[48,39947,39948],{},"We chose the following five performance indicators for benchmark testing, as they provide a comprehensive evaluation of system performance and help us better understand how the system behaves under different pressures.",[3933,39950,39952],{"id":39951},"_1-throughput","1. Throughput",[48,39954,39955],{},"This metric measures the amount of data that a system can handle within a specific timeframe. It is a crucial indicator of a system’s processing power and network efficiency. We anticipated that the new implementation would utilize network IO more efficiently than the existing one (i.e., higher throughput for the single snapshot strategy).",[3933,39957,39959],{"id":39958},"_2-entry-size-and-write-latency","2. Entry size and write latency",[48,39961,39962],{},"Entry size refers to the size of data segments written into the system, while write latency measures the delay between the issue of a write request and the completion of the operation. Smaller entry sizes and lower write latency generally improve system responsiveness and performance. The new implementation was expected to limit the size of snapshot segments, which might reduce write latency compared to the previous one. We expected a reduction in both the entry size and write latency.",[3933,39964,39966],{"id":39965},"_3-cpu-usage","3. CPU usage",[48,39968,39969],{},"This metric quantifies the intensity of Central Processing Unit (CPU) utilization by the system. It’s a critical metric as both the new and old strategies can potentially impact CPU usage. Under the correct configurations, we anticipated that the CPU usage of the new strategy would not exceed that of the old one and would be more stable.",[3933,39971,39973],{"id":39972},"_4-gc-pauses","4. GC pauses",[48,39975,39976],{},"Garbage Collection (GC) is an automatic memory management method to free up memory no longer in use or needed. GC pauses occur when GC operations pause the program to perform memory clean-up, which could negatively impact system performance. By monitoring these pauses, we can understand how the system manages memory and maintains performance. We expected a decrease in GC pauses with the new implementation.",[3933,39978,39980],{"id":39979},"_5-heap-memory","5. Heap memory",[48,39982,39983],{},"Heap memory refers to the runtime data area from which memory for all class instances and arrays is allocated. High heap usage could signal memory leaks, inadequate sizing, or code that creates excessive temporary objects. Therefore, tracking heap memory usage is crucial to ensuring effective use of memory resources. We expected a decrease in heap memory usage.",[48,39985,39986],{},"In summary, we hoped that these tests could demonstrate the superior performance of the new strategy, including lower throughput, reduced write latency, optimized CPU usage, fewer GC pauses, and lower heap memory usage.",[32,39988,39990],{"id":39989},"how-we-set-up-the-tests","How we set up the tests",[48,39992,39993,39994,39999],{},"We conducted all tests using the ",[55,39995,39998],{"href":39996,"rel":39997},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark",[264],"OpenMessaging Benchmark framework"," with the hosted service on StreamNative Cloud. The test environments for both snapshot strategies were identical in terms of infrastructure configurations and benchmark settings. In each case, we used a hosted Pulsar cluster deployed on Kubernetes, comprising 3 broker Pods, 3 bookie Pods, and 3 ZooKeeper Pods. We used Grafana to provide observability for the necessary metrics.",[48,40001,40002],{},"Given that snapshots are used to store information of aborted transactions, which will be cleared when the original transactional message ledger is deleted, we tested snapshot strategies under conditions of high transaction abort frequency and long retention time.",[48,40004,40005],{},"See the following configurations for the benchmark testbed details.",[3933,40007,40009],{"id":40008},"infrastructure","Infrastructure",[321,40011,40012,40015,40018,40021,40024,40027,40030,40033,40036,40039,40042,40045,40048,40051,40054,40056,40059,40061,40064,40067,40070,40073,40075,40078],{},[324,40013,40014],{},"StreamNative Cloud: Hosted service",[324,40016,40017],{},"~Infrastructure vendor: Google Cloud",[324,40019,40020],{},"~Settings: Advanced with Transactions enabled",[324,40022,40023],{},"Image: streamnative\u002Fpulsar-cloud:3.0.0.3-SNAPSHOT",[324,40025,40026],{},"Kubernetes version: 1.24.11-gke.1000",[324,40028,40029],{},"Network speed: 30 Gbps",[324,40031,40032],{},"Pulsar cluster components:",[324,40034,40035],{},"~3 broker Pods, each with:",[324,40037,40038],{},"~~CPU request: 4 cores",[324,40040,40041],{},"~~Memory request: 4Gi",[324,40043,40044],{},"~~Heap size: 2G",[324,40046,40047],{},"~~Direct memory size: 2G",[324,40049,40050],{},"~3 bookie Pods, each with:",[324,40052,40053],{},"~~CPU request: 2 cores",[324,40055,40041],{},[324,40057,40058],{},"~~Heap size: 1G",[324,40060,40047],{},[324,40062,40063],{},"~~1 volume for the journal (default size: 128Gi) and 1 volume for the ledger (default size 1Ti)",[324,40065,40066],{},"~3 ZooKeeper Pods, each with:",[324,40068,40069],{},"~~CPU request: 500m",[324,40071,40072],{},"~~Memory request: 1Gi",[324,40074,40058],{},[324,40076,40077],{},"~~Direct memory size: 1G",[324,40079,40080],{},"Observability tool: Grafana",[3933,40082,40084],{"id":40083},"benchmark-settings","Benchmark settings",[8325,40086,40089],{"className":40087,"code":40088,"language":8330},[8328],"consumerPerSubscription: 0\nmessageSize: 100B\npartitionsPerTopic: 3\npath: workloads\u002Ftransaction-3-topic-3-partitions-100b.yaml\npayloadFile: payload\u002Fpayload-100b.data\nproducerRate: 3000\nproducersPerTopic: 1\nsubscriptionsPerTopic: 0\ntestDurationMinutes: 6000\ntopics: 3\n",[4926,40090,40088],{"__ignoreMap":18},[3933,40092,40094],{"id":40093},"transaction-buffer-snapshot-settings","Transaction buffer snapshot settings",[48,40096,40097],{},"Single snapshot strategy config:",[8325,40099,40102],{"className":40100,"code":40101,"language":8330},[8328],"PULSAR_PREFIX_transactionBufferSegmentedSnapshotEnabled: \"false\"\nPULSAR_PREFIX_transactionCoordinatorEnabled: \"true\"\nPULSAR_PREFIX_transactionBufferSnapshotSegmentSize: \"51200\"\nPULSAR_PREFIX_maxMessageSize: \"52428800\"\nPULSAR_PREFIX_maxMessagePublishBufferSizeInMB: \"52428800\"\nPULSAR_PREFIX_numIOThreads: \"8\"\nPULSAR_PREFIX_transactionBufferSnapshotMinTimeInMillis=5000\nPULSAR_PREFIX_transactionBufferSnapshotMaxTransactionCount=1000\nbookkeeper.PULSAR_PREFIX_nettyMaxFrameSizeBytes: \"52531200\"\n",[4926,40103,40101],{"__ignoreMap":18},[48,40105,40106],{},"Multi-snapshot strategy config:",[8325,40108,40111],{"className":40109,"code":40110,"language":8330},[8328],"PULSAR_PREFIX_transactionBufferSegmentedSnapshotEnabled: \"true\"\nPULSAR_PREFIX_transactionCoordinatorEnabled: \"true\"\nPULSAR_PREFIX_transactionBufferSnapshotSegmentSize: \"1024000\"\nPULSAR_PREFIX_numIOThreads: \"8\"\nPULSAR_PREFIX_transactionBufferSnapshotMinTimeInMillis=5000\nPULSAR_PREFIX_transactionBufferSnapshotMaxTransactionCount=1000\n",[4926,40112,40110],{"__ignoreMap":18},[32,40114,40116],{"id":40115},"test-procedures","Test procedures",[1666,40118,40119,40122,40125,40128,40131],{},[324,40120,40121],{},"Set up the test environment with the specified hardware, software, and network configurations.",[324,40123,40124],{},"Configured the Pulsar Benchmark tool with the selected parameter settings.",[324,40126,40127],{},"Conducted performance tests for each scenario and compared the multi-snapshot strategy with the single snapshot strategy.",[324,40129,40130],{},"Monitored and recorded metrics such as throughput, write latency, and entry size.",[324,40132,40133],{},"Analyzed the results and drew conclusions according to the performance of both strategies.",[40,40135,40137],{"id":40136},"benchmark-tests-and-results","Benchmark tests and results",[48,40139,40140],{},"We ran the following benchmark tests with both transaction buffer snapshot strategies.",[48,40142,40143],{},"The test using the previous strategy ran stably for 70 minutes with the same message send rate, transaction abort rate, and snapshot take rate, followed by an unstable hour (see the CPU usage section below). According to the logs, BookKeeper kept reconnecting during this hour:",[8325,40145,40148],{"className":40146,"code":40147,"language":8330},[8328],"io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer\n",[4926,40149,40147],{"__ignoreMap":18},[48,40151,40152],{},"The new multi-snapshot solution ran stably for 160 minutes until the maximum heap memory size was reached.",[48,40154,40155],{},"We compared the test results from the following five aspects: throughput, latency, CPU usage, GC pauses, and memory.",[32,40157,40159],{"id":40158},"throughput","Throughput",[48,40161,40162],{},"This benchmark test compared the throughput of both strategies in writing snapshots to the system topic, with the same message send rates and transaction abort rates. We anticipated that the segmented snapshots would have significantly lower throughput, resulting in substantial network IO cost savings.",[3933,40164,40166],{"id":40165},"single-snapshot-strategy","Single snapshot strategy",[48,40168,40169,40172],{},[384,40170],{"alt":18,"src":40171},"\u002Fimgs\u002Fblogs\u002F6482835369662b852dcdec0d_image11.webp","Figure 1. Single snapshot strategy - Publish rate and throughput",[321,40174,40175,40178],{},[324,40176,40177],{},"The publish rate gradually declined from 3.5msg\u002Fs to 2msg\u002Fs.",[324,40179,40180],{},"The publish throughput increased linearly, reaching up to 25 MB\u002Fs.",[3933,40182,40184],{"id":40183},"multi-snapshot-strategy","Multi-snapshot strategy",[48,40186,40187,40190],{},[384,40188],{"alt":18,"src":40189},"\u002Fimgs\u002Fblogs\u002F648283b4ff4192b3250991a8_image5.webp","Figure 2. Multi-snapshot strategy - Publish rate and throughput",[321,40192,40193,40196],{},[324,40194,40195],{},"The publish rate remained stable at approximately 3.5msg\u002Fs.",[324,40197,40198],{},"The publish throughput saw a periodic change from 0MB\u002Fs to 4MB\u002Fs.",[3933,40200,40202],{"id":40201},"analysis","Analysis",[48,40204,40205],{},"After one hour of testing, the multi-snapshot strategy showed a throughput that was an order of magnitude lower than the single snapshot strategy, and it demonstrated greater stability. The throughput of the single snapshot strategy eventually increased to 25MB\u002Fs, but with a decreasing message publish rate. In contrast, the multi-snapshot strategy's throughput periodically fluctuated within the range of 5MB\u002Fs, as per the configuration, while maintaining a stable message publish rate. This implies that the new strategy can conserve more network IO resources used for sending messages to the system topic, resulting in improved and more stable performance.",[32,40207,40209],{"id":40208},"entry-size-and-write-latency","Entry size and write latency",[48,40211,40212],{},"This test focused on snapshot entry size and write latency. We set the snapshot segment size to 1MB (1,024,000 bytes) and did not impose any restrictions on the entry size for single snapshot, to avoid test interruption due to errors caused by excessively large entries. Our expectation was that the new segmented snapshot solution would maintain stability in snapshot segment size, thus ensuring consistently low latency.",[3933,40214,40166],{"id":40215},"single-snapshot-strategy-1",[48,40217,40218,40221,40222,40225],{},[384,40219],{"alt":18,"src":40220},"\u002Fimgs\u002Fblogs\u002F648284086cf53081242ecb8d_image3.png","Figure 3. Single snapshot strategy - Write latency",[384,40223],{"alt":18,"src":40224},"\u002Fimgs\u002Fblogs\u002F648284306d987582a9d57be2_image2.png","Figure 4. Single snapshot strategy - Entry size",[321,40227,40228,40231],{},[324,40229,40230],{},"With the previous single snapshot strategy, the storage writes latency increased over time. After running for an hour, just before the benchmark crashed, the write latency was mostly between 100ms and 200ms.",[324,40232,40233],{},"The size of the snapshot entry in the previous strategy also increased as the test progressed. After 10 minutes, the size exceeded the observable maximum value of 1MB.",[3933,40235,40184],{"id":40236},"multi-snapshot-strategy-1",[48,40238,40239,40242,40243,40246],{},[384,40240],{"alt":18,"src":40241},"\u002Fimgs\u002Fblogs\u002F64828452452fc5df9fca521b_image4.png","Figure 5. Multi-snapshot strategy - Write latency",[384,40244],{"alt":18,"src":40245},"\u002Fimgs\u002Fblogs\u002F6482847007b33fd2344bfdbb_image9.webp","Figure 6. Multi-snapshot strategy - Entry size",[321,40248,40249,40252],{},[324,40250,40251],{},"The latency of the new approach showed periodic fluctuations but never exceeded the range of 10-20ms.",[324,40253,40254],{},"As we set the transactionBufferSnapshotSegmentSize to 1024000, the entry size was always less than 1MB.",[3933,40256,40202],{"id":40257},"analysis-1",[48,40259,40260],{},"The test results indicate that the snapshot entry size in the new strategy did not continuously increase like in the single snapshot strategy. It consistently maintained a very low and stable latency. On the other hand, the single snapshot strategy experienced increasing entry size and write latency over time.",[32,40262,40264],{"id":40263},"cpu-usage","CPU usage",[48,40266,40267],{},"This test compared the CPU utilization between the two snapshot strategy. The number of keys during compaction and the size of sent entries both impact CPU utilization. The multi-snapshot strategy stores data in multiple entries, resulting in multiple keys. By contrast, the single snapshot strategy stores all data in a single entry, leading to write amplification. We expected that, under reasonable configurations, such as a snapshot segment size of 1MB, the new strategy would exhibit lower and more stable CPU utilization.",[3933,40269,40166],{"id":40270},"single-snapshot-strategy-2",[48,40272,40273,40276],{},[384,40274],{"alt":18,"src":40275},"\u002Fimgs\u002Fblogs\u002F648284a89ed007962ff3d115_image10.webp","Figure 7. Single snapshot strategy - CPU usage",[321,40278,40279,40282],{},[324,40280,40281],{},"During the first 50 minutes of the test, the CPU usage stayed around 200%, with a slight increase.",[324,40283,40284],{},"After an hour, the CPU usage showed significant fluctuations, and the broker began to become unstable.",[3933,40286,40184],{"id":40287},"multi-snapshot-strategy-2",[48,40289,40290,40293],{},[384,40291],{"alt":18,"src":40292},"\u002Fimgs\u002Fblogs\u002F648284db402ad24821430488_image1.webp","Figure 8. Multi-snapshot strategy - CPU usage",[321,40295,40296],{},[324,40297,40298],{},"The CPU usage was stable at about 200% and the test ended due to heap memory OOM.",[3933,40300,40202],{"id":40301},"analysis-2",[48,40303,40304],{},"The test results indicate that, under normal circumstances, the CPU utilization of the new strategy was slightly lower than that of the previous strategy, and it remained more stable before reaching machine bottlenecks. This means that the new multi-snapshot strategy outperforms the previous single snapshot strategy in terms of CPU usage.",[32,40306,40308],{"id":40307},"gc-pauses","GC pauses",[48,40310,40311],{},"This test examined the GC pause behavior of both strategies. The size and quantity of temporary objects generated during the snapshot-taking process can impact GC pauses. The test was conducted with a snapshot segment size of 1MB, without any restrictions imposed on the size of messages.",[3933,40313,40166],{"id":40314},"single-snapshot-strategy-3",[48,40316,40317,40320],{},[384,40318],{"alt":18,"src":40319},"\u002Fimgs\u002Fblogs\u002F6482850d452fc5df9fcb0bfb_image8.webp","Figure 9. Single snapshot strategy - GC pauses",[321,40322,40323,40326],{},[324,40324,40325],{},"During the stable testing period, GC pauses kept increasing, peaking at around 2 seconds.",[324,40327,40328],{},"After an hour of testing, the test became unstable, and the maximum GC pauses reached approximately 20 seconds.",[3933,40330,40184],{"id":40331},"multi-snapshot-strategy-3",[48,40333,40334,40337],{},[384,40335],{"alt":18,"src":40336},"\u002Fimgs\u002Fblogs\u002F64828530fbcf75f737c6f77c_image7.webp","Figure 10. Multi-snapshot strategy - GC pauses",[321,40339,40340],{},[324,40341,40342],{},"The delay of the new strategy consistently maintained below 100ms throughput the test.",[3933,40344,40202],{"id":40345},"analysis-3",[48,40347,40348],{},"The test results reveal that the GC pause of the new strategy consistently hovered around 100ms, indicating stable performance. In contrast, the GC pause of the previous single snapshot strategy progressively increased. This is probably caused by the write amplification issue, where the size of temporary snapshot objects generated during each operation continually expanded. Furthermore, as the machine approached its performance bottleneck, GC frequency increased significantly, leading to longer GC pauses. This clearly illustrates the superior performance of the new implementation in terms of managing GC pauses.",[32,40350,40352],{"id":40351},"heap-memory","Heap memory",[48,40354,40355],{},"This test compared the heap memory growth between the two strategies. The size and quantity of temporary objects generated during the snapshot-taking process can impact heap memory consumption. The test was conducted with a snapshot segment size of 1MB, without any restrictions imposed on the size of messages.",[3933,40357,40166],{"id":40358},"single-snapshot-strategy-4",[48,40360,40361,40364],{},[384,40362],{"alt":18,"src":40363},"\u002Fimgs\u002Fblogs\u002F64828568452fc5df9fcb7394_image12.webp","Figure 11. Single snapshot strategy - Heap memory",[321,40366,40367,40370],{},[324,40368,40369],{},"After 15 minutes of testing, the heap memory reached 1.5 GB.",[324,40371,40372],{},"Approximately 6 hours and 50 minutes into the test, the heap memory reached the OOM threshold of 2 GB. It was at this point that the CPU usage and GC pauses began to increase sharply and continued to fluctuate significantly.",[3933,40374,40184],{"id":40375},"multi-snapshot-strategy-4",[48,40377,40378,40381],{},[384,40379],{"alt":18,"src":40380},"\u002Fimgs\u002Fblogs\u002F64828598a1ce59e9ea87d5e1_image6.webp","Figure 12. Multi-snapshot strategy - Heap memory",[321,40383,40384],{},[324,40385,40386],{},"The memory of the new strategy steadily increased until the end of the test after OOM.",[3933,40388,40202],{"id":40389},"analysis-4",[48,40391,40392],{},"The test results show that the new strategy exhibits a slower and more stable heap memory growth pattern. By contrast, the previous strategy experienced a faster and more fluctuating heap memory growth. This implies that the new strategy offers advantages in terms of slower and more controlled memory growth, thus ensuring stability and delaying the onset of performance bottlenecks.",[40,40394,40396],{"id":40395},"future-improvement-on-demand-snapshot-segment-loading","Future improvement: On-demand snapshot segment loading",[48,40398,40399],{},"To reduce startup time when loading transaction buffers, we recommend on-demand snapshot segment loading. The current implementation may read all snapshot segments at startup, leading to longer startup time. With on-demand loading, we can selectively read specific snapshot segments as required, thereby reducing startup time.",[40,40401,2125],{"id":2122},[48,40403,40404],{},"Our test results demonstrate that the newly implemented multi-snapshot approach significantly outperforms the previous single snapshot approach in key performance metrics. With the same snapshot frequency, the new solution resolves the write amplification issue, resulting in lower network bandwidth utilization, reduced message latency, shorter GC stop-the-world (STW) times, and a more stable memory growth that avoids frequent garbage collection.",[48,40406,40407],{},"Further optimization, such as the implementation of on-demand loading of snapshot segments and distributed caching, can enhance the performance and stability of the new strategy in transaction buffers.",[48,40409,40410],{},"In real-world applications, we strongly recommend the adoption of the new multi-snapshot strategy.",[40,40412,40413],{"id":36476},"More resources",[48,40415,38379,40416,40419],{},[55,40417,38384],{"href":38382,"rel":40418},[264]," over the past few years, with a vibrant community driving innovation and improvements to the project. Check out the following resources to learn more about Pulsar.",[321,40421,40422,40427,40432,40441],{},[324,40423,40424,40425,190],{},"Run fully managed Pulsar services and enable transactions with ",[55,40426,3550],{"href":37361},[324,40428,38390,40429,190],{},[55,40430,31914],{"href":31912,"rel":40431},[264],[324,40433,40434,758,40437],{},[2628,40435,40436],{},"Blog",[55,40438,40440],{"href":40439},"\u002Fblog\u002Fdeep-dive-into-transaction-buffer-apache-pulsar","A Deep Dive into Transaction Buffer in Apache Pulsar",[324,40442,40443,758,40445],{},[2628,40444,40436],{},[55,40446,40448],{"href":40447},"\u002Fblog\u002Fdeep-dive-transaction-coordinators-apache-pulsar","A Deep Dive into Transaction Coordinators in Apache Pulsar",{"title":18,"searchDepth":19,"depth":19,"links":40450},[40451,40452,40457,40462,40469,40470,40471],{"id":42,"depth":19,"text":46},{"id":22052,"depth":19,"text":39916,"children":40453},[40454,40455,40456],{"id":39919,"depth":279,"text":39920},{"id":39926,"depth":279,"text":39927},{"id":39933,"depth":279,"text":39934},{"id":39940,"depth":19,"text":39941,"children":40458},[40459,40460,40461],{"id":39944,"depth":279,"text":39945},{"id":39989,"depth":279,"text":39990},{"id":40115,"depth":279,"text":40116},{"id":40136,"depth":19,"text":40137,"children":40463},[40464,40465,40466,40467,40468],{"id":40158,"depth":279,"text":40159},{"id":40208,"depth":279,"text":40209},{"id":40263,"depth":279,"text":40264},{"id":40307,"depth":279,"text":40308},{"id":40351,"depth":279,"text":40352},{"id":40395,"depth":19,"text":40396},{"id":2122,"depth":19,"text":2125},{"id":36476,"depth":19,"text":40413},"2023-06-09","This benchmark report provides an in-depth comparison of the single snapshot strategy and the segmented snapshot strategy in the transaction buffer.","\u002Fimgs\u002Fblogs\u002F649a9195b97b0a0b29c49d2e_a-comparison-of-transaction-buffer-snapshot-strategies-in-apache-pulsar.jpg",{},"\u002Fblog\u002Fa-comparison-of-transaction-buffer-snapshot-strategies-in-apache-pulsar",{"title":39877,"description":40473},"blog\u002Fa-comparison-of-transaction-buffer-snapshot-strategies-in-apache-pulsar",[821,9144],"fNgPFh8lGbPzWEhFCw7Lxc803LmiGBqQZ10BhOwJTds",{"id":40482,"title":40483,"authors":40484,"body":40486,"category":7338,"createdAt":290,"date":40975,"description":40976,"extension":8,"featured":294,"image":40977,"isDraft":294,"link":290,"meta":40978,"navigation":7,"order":296,"path":40979,"readingTime":22989,"relatedResources":290,"seo":40980,"stem":40981,"tags":40982,"__hash__":40983},"blogs\u002Fblog\u002Fpulsar-virtual-summit-europe-2023-on-demand-videos-available-now.md","Pulsar Virtual Summit Europe 2023 On-Demand Videos Available Now",[40485,31294],"Karin Landers",{"type":15,"value":40487,"toc":40961},[40488,40492,40502,40506,40510,40513,40516,40522,40526,40529,40533,40539,40542,40547,40551,40554,40559,40563,40566,40571,40575,40578,40586,40590,40593,40597,40600,40605,40609,40612,40617,40621,40624,40629,40632,40635,40640,40644,40647,40651,40654,40659,40663,40666,40670,40673,40679,40685,40688,40693,40697,40700,40704,40707,40713,40717,40720,40725,40729,40732,40738,40742,40745,40750,40754,40757,40763,40767,40769,40775,40779,40782,40789,40793,40797,40805,40809,40815,40823,40853,40857,40879,40884,40888,40891,40905,40909,40931,40933,40939,40943,40951,40957],[40,40489,40491],{"id":40490},"the-pulsar-virtual-summit-europe-2023-videos-are-available-now","The Pulsar Virtual Summit Europe 2023 videos are available now!",[48,40493,40494,40495,40498,40499,190],{},"Check out descriptions and links to each session below, or get every video ",[55,40496,40497],{"href":35424},"here,"," or on the StreamNative Youtube channel ",[55,40500,267],{"href":33878,"rel":40501},[264],[48,40503,40504],{},[34077,40505],{"value":34079},[40,40507,40509],{"id":40508},"about-pulsar-virtual-summit-europe-2023","About Pulsar Virtual Summit Europe 2023",[48,40511,40512],{},"StreamNative is proud to have hosted the 2nd Pulsar Summit in Europe and we would like to thank the Apache Pulsar community for making it a huge success.",[48,40514,40515],{},"On May 23rd, nearly 400 attendees representing over 20 countries gathered online for presentations on the latest improvements in Apache Pulsar and how companies are using Pulsar. It was the largest attendance to date for a Pulsar Summit, which speaks to the growing adoption and interest in Pulsar. Presentations included content about new features, project updates to make Pulsar even more scalable and resilient, as well as stories shared by companies that are building next-generation real-time streaming applications and solving complicated use cases using Pulsar. Read on for more information about each of the presentations!",[48,40517,40518],{},[384,40519],{"alt":40520,"src":40521},"Picture of the open podium with a microphone and \"Pulsar Summit, hosted by StreamNative\" on the front of the podium.","\u002Fimgs\u002Fblogs\u002F647e52237bbf38de51085711_PulsarSummit3.jpg",[40,40523,40525],{"id":40524},"keynotes","Keynotes",[48,40527,40528],{},"Keynote speakers illuminated the path forward, showcasing Pulsar’s new features for increased scalability and reliability!",[3933,40530,40532],{"id":40531},"pulsar-the-state-of-the-wave","Pulsar: The State of the Wave ‍",[48,40534,40535,40536],{},"Sijie Guo and Matteo Merli, StreamNative\nWatch the discussion of the evolution of Apache Pulsar over the years and what to expect in the future, including the recent release of version 3.0. ",[55,40537,40538],{"href":38685},"Watch the video.",[3933,40540,38892],{"id":40541},"challenges-of-hosting-a-pulsar-as-a-service-platform-under-a-shared-responsibility-model",[48,40543,40544,40545],{},"Edgaras Petovradzius and Mathias Ravn Tversted, the LEGO Group\nExplore the challenges the LEGO Group encountered to host and manage Pulsar-as-a-Service across multiple domains, and how they successfully collaborated with StreamNative in the process. ",[55,40546,40538],{"href":38509},[3933,40548,40550],{"id":40549},"how-we-simplified-a-highly-complex-and-sensitive-data-stream-using-apache-pulsar","How We Simplified a Highly Complex and Sensitive Data Stream Using Apache Pulsar",[48,40552,40553],{},"Matt Hefford and Lloyd Chandran, Zafin",[48,40555,40556,40557],{},"Hear why Zafin originally chose Pulsar and how StreamNative is able to provide expert help to troubleshoot challenges, and provide access to core committers to create a permanent solution. Matt Hefford also shares how Zafin utilizes StreamNative’s support for hosted Pulsar to achieve a challenging financial use case. ",[55,40558,40538],{"href":38854},[3933,40560,40562],{"id":40561},"from-an-async-api-definition-to-a-deployed-pulsar-topology-via-gitops","From an Async API Definition to a Deployed Pulsar Topology Via GitOps",[48,40564,40565],{},"Markus Falkner and Armin Woworsky, Raiffeisen Bank International",[48,40567,40568,40569],{},"Get a deeper understanding of GitOps, Kubernetes operators, Apache Pulsar, and Async API, and gain insights into how these technologies can be leveraged to build efficient CI\u002FCD pipelines that enable rapid deployment of message-driven applications. Learn how a comprehensive Continuous Integration and Continuous Delivery (CI\u002FCD) pipeline based on GitOps is used to deploy a topology built on Async API definitions using a Kubernetes operator to an Apache Pulsar cluster. ",[55,40570,40538],{"href":38513},[3933,40572,40574],{"id":40573},"oxia-scaling-pulsars-metadata-to-100x","Oxia: Scaling Pulsar’s Metadata to 100x",[48,40576,40577],{},"Matteo Merli, StreamNative",[48,40579,40580,40581,5157,40584],{},"In this session, Matteo introduces Oxia, the much-anticipated metadata store and coordination system designed to enable even more robust scaling of Pulsar clusters. He shares the design goals, architecture, and development journey of Oxia, and how its design leverages modern cloud-native environments to provide a highly flexible and dynamic operational environment. ",[55,40582,40583],{"href":21529},"Read the Oxia announcement",[55,40585,40538],{"href":38577},[40,40587,40589],{"id":40588},"pulsar-adoption-stories","Pulsar Adoption Stories",[48,40591,40592],{},"Several real-world practitioners demonstrated how Pulsar revolutionizes data streaming use cases across a variety of industries. Engineers from prominent companies showcased how their organizations are successfully adopting Pulsar. They highlighted why they chose Pulsar and how Pulsar helps them to address real-time data processing challenges while achieving remarkable levels of scalability and reliability.",[3933,40594,40596],{"id":40595},"pulsar-in-finance-a-tale-of-migration","Pulsar in Finance - A Tale of Migration",[48,40598,40599],{},"George Orban, Daiwa Capital Markets",[48,40601,40602,40603],{},"George Orban presents a “Pulsar Love Story,” sharing the experience of migrating a pricing engine and trading system from TIBCO Rendezvous and other messaging solutions to Apache Pulsar. He shares the reasons for choosing Pulsar, including its suitability for enterprise applications and finance, and how it improved their stack’s resilience, robustness, and speed. ",[55,40604,40538],{"href":38813},[3933,40606,40608],{"id":40607},"pulsar-observability-in-high-topic-cardinality-deployments-for-telco","Pulsar Observability in High-Topic Cardinality Deployments for Telco",[48,40610,40611],{},"Habip Kenan Üsküda, Axon Networks",[48,40613,40614,40615],{},"Habip Kenan Üsküda shares the experience of building an observability stack using Grafana and Prometheus for their cloud-native platform based on Apache Pulsar, enabling their monitoring stack to scale to 1 million topics. ",[55,40616,40538],{"href":38865},[3933,40618,40620],{"id":40619},"system-level-testing-of-a-pulsar-based-microservice-application","System-level Testing of a Pulsar-based Microservice Application",[48,40622,40623],{},"Jaakko Malkki, HSL (Helsinki Region Transport)",[48,40625,40626,40627],{},"In this talk, Jaakko Malkki describes how testing applications using microservice architecture at a system-level is difficult due to the complex nature of the application and the different types of technologies used. At Helsingin Seudun Liikenne, they use Pulsar-based microservice application called Transitdata for processing realtime public transport information, such as stop time predictions, vehicle positions and service alerts. Learn how they removed tedious manual, error-prone testing, and make creating automated tests easier, enabling faster deployment of new features. ",[55,40628,40538],{"href":38877},[3933,40630,38898],{"id":40631},"documentation-as-configuration-for-management-of-apache-pulsar",[48,40633,40634],{},"Alexander Wichmann and Ulrik Boll Djurtoft, the Lego Group",[48,40636,40637,40638],{},"Hear first-hand from Alexander Wichmann and Ulrik Boll Djurtoft, as they discuss their collaboration StreamNative, who provides hosted Pulsar-as-a-Service for the Lego Group. Explore the details of how that platform is used across different domains, and the advantages and disadvantages of utilizing documentation-based configuration, including the challenges in configuring Pulsar for this use case. ",[55,40639,40538],{"href":38897},[40,40641,40643],{"id":40642},"discover-apache-pulsar","Discover Apache Pulsar",[48,40645,40646],{},"Whether a seasoned Pulsar expert or new to the community, Pulsar Summit has something for everyone. Here, watch a high-level overview of Pulsar that also addresses the underlying architecture when compared to other technologies.",[3933,40648,40650],{"id":40649},"scalable-distributed-messaging-streaming-with-apache-pulsar","Scalable Distributed Messaging & Streaming with Apache Pulsar",[48,40652,40653],{},"Julien Jakubowski, StreamNative",[48,40655,40656,40657],{},"Julien presents the fundamentals of Pulsar, including messaging and the ability to build event-driven applications. He shares how Pulsar architecture enables vast scaling and elasticity for both processing & data storage effortlessly. He describes the guarantees of message high durability Pulsar offers, and how it can be used as a unified streaming and messaging platform. Finally, he dives into how Pulsar can integrate with existing application portfolio, and why Pulsar should be top of mind when solving demanding messaging and streaming use cases! ",[55,40658,40538],{"href":38723},[40,40660,40662],{"id":40661},"reliability-observability","Reliability & Observability",[48,40664,40665],{},"These talks will help you build bullet-proof systems that are both resilient and observable.",[3933,40667,40669],{"id":40668},"error-handling-patterns-in-pulsar","Error Handling Patterns in Pulsar",[48,40671,40672],{},"David Kjerrumgaard, StreamNative",[48,40674,40675,40676],{},"David Kjerrumgaard introduces different ways to handle errors and message retries in your event streaming applications. Learn the built-in mechanisms in Apache Pulsar that handle processing failures, including negative acknowledgments, retry topics, dead-letter queues, etc. ",[55,40677,40538],{"href":40678},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-error-handling-patterns-in-pulsar",[3933,40680,40682,40684],{"id":40681},"the-future-of-metrics-in-apache-pulsar",[55,40683,3931],{"href":38723},"The Future of Metrics in Apache Pulsar",[48,40686,40687],{},"Asaf Mesika, StreamNative",[48,40689,40690,40691],{},"Asaf Mesika discusses challenges of using observability metrics in Pulsar from both user and committer perspectives, including issues such as high topic count limitations, improper histogram use in Grafana, and implementation difficulties. He concludes with recommendations on how to address these common problems and offers insights for leveraging metrics in Pulsar using Open Telemetry. ",[55,40692,40538],{"href":38646},[40,40694,40696],{"id":40695},"ecosystem","Ecosystem",[48,40698,40699],{},"Watch these recordings to better understand the growth the Pulsar ecosystem is experiencing as companies forge strong connections to technologies such as Apache NiFi, RisingWave, Spring, Apache Pinot, and more.",[3933,40701,40703],{"id":40702},"introducing-spring-for-apache-pulsar-live-coding-demo","Introducing Spring for Apache Pulsar - Live Coding Demo",[48,40705,40706],{},"Chris Bono and Soby Chako, VMware",[48,40708,40709,40710],{},"Spring for Apache Pulsar is a library that makes it easy to create stand-alone, production-grade Spring based Applications using Apache Pulsar that you can “just run”. In this talk, Chris Bono and Soby Chako explore Spring for Apache Pulsar by looking at the core features that it provides including Spring Cloud Stream binder, Reactive support, GraalVM Native Image Support, and Pulsar IO\u002FFunctions support. By the end of this video, you will learn how to create a basic Spring for Apache Pulsar app and evolve it from imperative -> reactive -> native.  ",[55,40711,40538],{"href":40712},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-introducing-spring-for-apache-pulsar",[3933,40714,40716],{"id":40715},"build-low-code-stream-data-pipelines-with-pulsar-transformations","Build Low-code Stream Data Pipelines with Pulsar Transformations",[48,40718,40719],{},"Christophe Bornet, DataStax",[48,40721,40722,40723],{},"Christophe Bornet shares how the open-source Pulsar Transformations provide low-code manipulation of data while leveraging what is already part of Pulsar, instead of needing to deploy an advanced Stream Processing technology. ",[55,40724,40538],{"href":38773},[3933,40726,40728],{"id":40727},"how-to-choose-the-right-streaming-database-for-pulsar","How to Choose the Right Streaming Database for Pulsar",[48,40730,40731],{},"Bobur Umurzokov, Apache APISIX",[48,40733,40734,40735],{},"In today's world of real-time data processing and analytics, streaming databases have become an essential tool for businesses that want to stay ahead of the game. However, with so many options available in the market, choosing the right streaming database can be a daunting task. Watch Bobur Umurzokov as he shares what SQL streaming is, when, why and how to use the Streaming database, as well as some key factors that you should consider when choosing the right streaming database for your business. ",[55,40736,40538],{"href":40737},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-how-to-choose-the-right-streaming-database-for-pulsar",[3933,40739,40741],{"id":40740},"using-apache-nifi-with-apache-pulsar-for-fast-data-on-ramp","Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp",[48,40743,40744],{},"Tim Spann, Cloudera",[48,40746,40747,40748],{},"Tim Spann shares the power of joining forces between Apache NiFi and Apache Pulsar. Apache NiFi adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing, and once data is ready to be an event, NiFi can launch it into Pulsar at light speed. ",[55,40749,40538],{"href":38781},[3933,40751,40753],{"id":40752},"building-cost-effective-stream-processing-applications-with-risingwave-and-apache-pulsar","Building Cost-Effective Stream Processing Applications with RisingWave and Apache Pulsar",[48,40755,40756],{},"Yingjun Wu, RisingWave",[48,40758,40759,40760],{},"Yingjun Wu discusses how to build cost-effective and scalable stream processing applications with RisingWave and Apache Pulsar. Learn how RisingWave's decoupled compute-storage architecture and Pulsar's tiered storage can help you reduce infrastructure and operational costs, and how to build real-time analytics and insights for use cases such as fraud detection and anomaly detection. ",[55,40761,40538],{"href":40762},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-building-cost-effective-stream-processing-applications-with-risingwave-and-pulsar",[3933,40764,40766],{"id":40765},"demo-build-ml-enhanced-event-streaming-apps-with-java-microservices","‍Demo: Build ML Enhanced Event Streaming Apps with Java Microservices",[48,40768,40672],{},[48,40770,40771,40772],{},"Learn the easy way to build and scale machine learning apps in this demo-focused presentation from author and Solutions Engineer David Kjerrumgaard. ",[55,40773,40538],{"href":40774},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-build-ml-enhanced-event-streaming-apps-with-java-microservices",[3933,40776,40778],{"id":40777},"building-a-real-time-analytics-application-with-apache-pulsar-and-apache-pinot","‍Building a Real-time Analytics Application with Apache Pulsar and Apache Pinot",[48,40780,40781],{},"Mark Needham, StarTree and Mary Grygleski, DataStax",[48,40783,40784,40785],{},"Explore the integration between Pulsar and Pinot, and learn the features that it supports, then follow along for a demonstration of how to build a real-time analytics dashboard with these technologies. Mary Grygleski and Mark Needham show how analytical queries can be run on top of Puslar's event data with Apache Pinot, a real-time distributed OLAP datastore, and then used to deliver scalable real-time analytics with low latency. ",[55,40786,40788],{"href":40787},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-building-a-real-time-analytics-application-with-apache-pulsar-and-apache-pinot","Watch the video. ",[40,40790,40792],{"id":40791},"pulsar-virtual-summit-europe-2023-event-highlights","Pulsar Virtual Summit Europe 2023 Event Highlights",[3933,40794,40796],{"id":40795},"by-the-numbers","By the Numbers:",[321,40798,40799,40802],{},[324,40800,40801],{},"387 attendees representing 20+ countries",[324,40803,40804],{},"24 speakers from companies including The Lego Group, Zafin, VMware, Axon, HSL, and more",[3933,40806,40808],{"id":40807},"open-qa-session","Open Q&A Session:",[48,40810,40811,40812,190],{},"Another highlight of the Summit was the open Q&A session, where a panel of facilitators and engineers answered unstructured questions live. You can watch the entire Q&A session along with the extended responses ",[55,40813,267],{"href":40814},"\u002Fvideos\u002Fpulsar-virtual-summit-europe-2023-open-q-a-session",[48,40816,40817,40818,40822],{},"‍\nIn addition to answering several questions live, the panel also announced the release of the new ",[55,40819,40821],{"href":23526,"rel":40820},[264],"Pulsar website",". We'd like to also take a moment to thank the many Pulsar community members who worked together to design and implement a more engaging and professional looking website that captures our thriving open source community. Special thanks to Asaf Mesika, Emidio Cardeira, Tison Chen, all from StreamNative, and Kiryl Valkovich of Teal Tools, for their major effort creating the new design and implementation!",[48,40824,40825,40826,40831,40832,40831,40837,40842,40843,40842,40848],{},"@",[55,40827,40830],{"href":40828,"rel":40829},"https:\u002F\u002Ftwitter.com\u002FCardeiraEmidio",[264],"CardeiraEmidio",", @",[55,40833,40836],{"href":40834,"rel":40835},"https:\u002F\u002Ftwitter.com\u002Fasafmesika",[264],"asafmesika",[55,40838,40841],{"href":40839,"rel":40840},"https:\u002F\u002Ftwitter.com\u002Ftison1096",[264],"tison1096"," @",[55,40844,40847],{"href":40845,"rel":40846},"https:\u002F\u002Ftwitter.com\u002FTealTools",[264],"TealTools",[55,40849,40852],{"href":40850,"rel":40851},"https:\u002F\u002Ftwitter.com\u002Fvisortelle",[264],"visortelle",[3933,40854,40856],{"id":40855},"what-attendees-have-to-say-about-pulsar-summit","What Attendees Have to Say about Pulsar Summit:",[916,40858,40859],{},[48,40860,40861,40862,10259,40867,40872,40873,40878],{},"“I enjoyed these presentations because they were applicable to my use cases. I currently use",[55,40863,40866],{"href":40864,"rel":40865},"https:\u002F\u002Ftwitter.com\u002Fhashtag\u002FKafka?src=hashtag_click",[264]," #Kafka",[55,40868,40871],{"href":40869,"rel":40870},"https:\u002F\u002Ftwitter.com\u002Fhashtag\u002FConfluent?src=hashtag_click",[264],"#Confluent"," so learning some of the technologies in",[55,40874,40877],{"href":40875,"rel":40876},"https:\u002F\u002Ftwitter.com\u002Fhashtag\u002FApachePulsar?src=hashtag_click",[264]," #ApachePulsar"," helped me better understand what Pulsar is capable of.” -Anonymous survey response",[916,40880,40881],{},[48,40882,40883],{},"“Collaboration thrives as attendees connect and forge new alliances. The Pulsar community grows stronger, generating ideas and partnerships that will propel the ecosystem forward into 2023 and beyond.” -David Kjerrumgaard, StreamNative Dev Rel and Solution Engineer",[32,40885,40887],{"id":40886},"post-event-survey-responses","Post Event Survey Responses",[48,40889,40890],{},"The attendee survey added additional insight from the attendee experience:",[321,40892,40893,40896,40899,40902],{},[324,40894,40895],{},"94% of attendees who completed the survey ranked the Pulsar Virtual Summit as “Excellent: 5\u002F5”",[324,40897,40898],{},"100% said the event fulfilled their expectations",[324,40900,40901],{},"85% were “incredibly satisfied” by the agenda and speakers, with several open feedback responses asking for the Summit to be stretched into“two days, as there is so much good content”",[324,40903,40904],{},"94% responded that they “will attend future Pulsar Summits”",[40,40906,40908],{"id":40907},"it-takes-a-village","It Takes a Village…",[48,40910,40911,40912,1186,40917,1186,40922,40927,40928,190],{},"As the hosts of Pulsar Virtual Summit Europe, StreamNative wants to thank all of the ",[55,40913,40916],{"href":40914,"rel":40915},"https:\u002F\u002Fpulsar-summit.org\u002Fevent\u002Feurope-2023\u002Fpartners",[264],"Community Sponsors",[55,40918,40921],{"href":40919,"rel":40920},"https:\u002F\u002Fpulsar-summit.org\u002Fevent\u002Feurope-2023\u002Fschedule",[264],"Speakers",[55,40923,40926],{"href":40924,"rel":40925},"https:\u002F\u002Fpulsar-summit.org\u002Fevent\u002Feurope-2023\u002Fcommittee",[264],"Program Committee",", Moderators, and Panelists that helped to make this event a success. Are you interested in becoming a future sponsor or Program Committee member? Please reach out to event staff ",[55,40929,267],{"href":40930},"mailto:events@pulsar-summit.org?subject=I'm%20interested%20in%20sponsoring%20and%2For%20being%20on%20the%20next%20Pulsar%20Summit%20Program%20Committee!",[48,40932,3931],{},[48,40934,40935],{},[384,40936],{"alt":40937,"src":40938},"The in-person stage from Pulsar Summit San Fran 2022. ","\u002Fimgs\u002Fblogs\u002F647e529f4ea15738904858fc_PulsarSummit2.jpg",[40,40940,40942],{"id":40941},"coming-up-next-pulsar-summit-returns-to-the-live-stage","Coming Up Next: Pulsar Summit Returns to the Live Stage!",[48,40944,40945,40946],{},"Bay Area, October 2023. ",[55,40947,40950],{"href":40948,"rel":40949},"https:\u002F\u002Fshare.hsforms.com\u002F14jc1JA3MTVqiAaMKL55W-g3x5r4",[264],"Be the first to hear more.",[48,40952,40953],{},[384,40954],{"alt":40955,"src":40956},"Pulsar Summit, Hosted by StreamNative projection.","\u002Fimgs\u002Fblogs\u002F647e4f54a6fe5505a656ab18_PulsarSummitFloor.jpg",[48,40958,40959],{},[34077,40960],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":40962},[40963,40964,40965,40966,40967,40968,40969,40970,40973,40974],{"id":40490,"depth":19,"text":40491},{"id":40508,"depth":19,"text":40509},{"id":40524,"depth":19,"text":40525},{"id":40588,"depth":19,"text":40589},{"id":40642,"depth":19,"text":40643},{"id":40661,"depth":19,"text":40662},{"id":40695,"depth":19,"text":40696},{"id":40791,"depth":19,"text":40792,"children":40971},[40972],{"id":40886,"depth":279,"text":40887},{"id":40907,"depth":19,"text":40908},{"id":40941,"depth":19,"text":40942},"2023-06-06","Pulsar Virtual Summit Europe Videos available now. Watch the playlist of Apache Pulsar and StreamNative content recordings from the Summit.","\u002Fimgs\u002Fblogs\u002F649a9231bf02ed04f2575c92_Pulsar_summitEMEA_2023_Videos.jpg",{},"\u002Fblog\u002Fpulsar-virtual-summit-europe-2023-on-demand-videos-available-now",{"title":40483,"description":40976},"blog\u002Fpulsar-virtual-summit-europe-2023-on-demand-videos-available-now",[5376,821],"OBBijTn1ZPv7xJVbU06VOdWrQGUVeqsHXJhJ1eYSZOg",{"id":40985,"title":40986,"authors":40987,"body":40988,"category":3550,"createdAt":290,"date":41172,"description":41173,"extension":8,"featured":294,"image":41174,"isDraft":294,"link":290,"meta":41175,"navigation":7,"order":296,"path":41176,"readingTime":39247,"relatedResources":290,"seo":41177,"stem":41178,"tags":41179,"__hash__":41180},"blogs\u002Fblog\u002Fproduct-updates-june-2023-improved-onboarding-for-cluster-set-up-and-kafka-clients.md","Product Updates [June 2023]:  Improved Onboarding for Cluster Set Up and Kafka Clients",[32707],{"type":15,"value":40989,"toc":41165},[40990,40993,40996,41000,41007,41013,41045,41053,41057,41060,41086,41092,41098,41102,41105,41120,41123,41129,41133,41136,41141,41145,41154,41160,41163],[48,40991,40992],{},"See what's new this month for StreamNative Cloud!",[48,40994,40995],{},"We’ve been working on a better console experience to improve time to value for new users to start using Pulsar - both for initial cluster creation and set up as well as connecting to an existing cluster using Kafka clients and tools.  AND…Pulsar 3.0 is also available on StreamNative Cloud so that organizations can take advantage of improved performance at scale with the new load balancer and other enhancements.  Read on for all the details.",[40,40997,40999],{"id":40998},"new-onboarding-guide-for-managed-clusters","New Onboarding Guide for Managed Clusters",[48,41001,41002,41003,41006],{},"Pulsar is a powerful and sophisticated tool and StreamNative Cloud enables teams to set up and connect to their organization’s Pulsar clusters in a matter of minutes - and without needing to understand all of the underlying configurations.  In this release, we’ve focused on making this flow even easier with a step by step onboarding guide ",[2628,41004,41005],{},"with videos"," to enable new users to quickly set up, authenticate and connect to a Pulsar cluster on our hosted\u002Fmanaged service.",[48,41008,41009],{},[384,41010],{"alt":41011,"src":41012},"Onboarding Guide for StreamNative Cloud - Pulsar Cluster setup","https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F647e19ddaa9459ae8f919d63_Untitled%20(1).png",[321,41014,41015,41021,41027,41033,41039],{},[324,41016,41017,41018],{},"Create a service account: service accounts let teams connect their applications to Pulsar clusters. ****",[55,41019,36306],{"href":36304,"rel":41020},[264],[324,41022,41023,41024],{},"Create an instance and deploying a Pulsar cluster: instances let teams specify settings, and create\u002Fdeploy a Pulsar cluster. ****",[55,41025,36306],{"href":36318,"rel":41026},[264],[324,41028,41029,41030],{},"View the default tenants and namespaces, and learning about multi-tenancy: tenants let multiple teams securely use the same Pulsar cluster without interfering with each other. ****",[55,41031,36306],{"href":36331,"rel":41032},[264],[324,41034,41035,41036],{},"Create a topic: topics are how messages are grouped within Pulsar clusters, and . ****",[55,41037,36306],{"href":36344,"rel":41038},[264],[324,41040,41041,41042],{},"Set up a Pulsar client and send the first message to it: send messages to a Pulsar cluster via a client library or CLI tool. ",[55,41043,36306],{"href":36357,"rel":41044},[264],[48,41046,41047,41048,41052],{},"With streamlined onboarding, teams can unlock the full potential of StreamNative Cloud from day one. The guide can be viewed by ",[55,41049,41051],{"href":29174,"rel":41050},[264],"creating an account",", or reaching out to the StreamNative team.",[40,41054,41056],{"id":41055},"improved-experience-for-running-kafka-applications-with-streamnative-cloud","Improved experience for running Kafka Applications with StreamNative Cloud",[48,41058,41059],{},"As teams move to centralize their messaging and streaming services to Pulsar from legacy technologies - such as Kafka - we’re making it easier for developers to connect to their Pulsar clusters on the StreamNative Console.",[321,41061,41062,41071,41074,41077],{},[324,41063,41064,41065,41070],{},"Kafka developers can now easily connect to their Pulsar cluster with Kafka clients via ",[55,41066,41069],{"href":41067,"rel":41068},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcloud-connect-kafka#kafka-client-page-wizard",[264],"a new Kafka Clients page",". This page walks a user through a step by step guide to connect to the Organization’s Pulsar cluster using authentication tokens or OAuth2 and setting up",[324,41072,41073],{},"Client libraries (Java, Go, Python, and Node.js),",[324,41075,41076],{},"Apache Kafka CLI tools, KStream applications, KSQL",[324,41078,41079,41080,41085],{},"Also, the ",[55,41081,41084],{"href":41082,"rel":41083},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fkop-concepts",[264],"Kafka protocol"," is now enabled by default on all new managed Pulsar clusters (Hosted + BYOC), so teams can get right to work.",[48,41087,41088],{},[384,41089],{"alt":41090,"src":41091},"Connect Kafka clients to your Pulsar Cluster on StreamNative Cloud Console","\u002Fimgs\u002Fblogs\u002F647e1a23a57cc93cf3e8a0bf_Untitled.gif",[48,41093,41094,41095,190],{},"Lean more about how the Kafka protocol on StreamNative Cloud enables teams to ",[55,41096,41097],{"href":33995},"run legacy Kafka applications while taking advantage of the superior infrastructure and messaging technology that Pulsar offers",[40,41099,41101],{"id":41100},"improved-cluster-usage-visibility","Improved Cluster Usage Visibility",[48,41103,41104],{},"Teams can now visually monitor cluster usage on the console so that they can have more visibility into usage for managing costs and to scale up cluster resources as needed.",[48,41106,41107,41108,41113,41114,41119],{},"View real-time and historical usage data on the ",[55,41109,41112],{"href":41110,"rel":41111},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fview-usage-console#view-usage-details",[264],"new Cluster Usage page",". Cluster ",[55,41115,41118],{"href":41116,"rel":41117},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-overview#usage-dimensions",[264],"usage dimensions"," (Compute Units, Storage Units, Throughput, and Storage Size) are displayed in graphical format with an option to export the data to a csv file.",[48,41121,41122],{},"View usage by instance or in aggregate across an organization and update the date range to view a specific time period.",[48,41124,41125],{},[384,41126],{"alt":41127,"src":41128},"Pulsar Cluster Usage page for StreamNative Cloud","https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F647e1a30baa6926648310fed_Untitled%20(1).gif",[40,41130,41132],{"id":41131},"pulsar-30-available-to-test-on-streamnative-cloud","Pulsar 3.0 available to test on StreamNative Cloud",[48,41134,41135],{},"Pulsar 3.0, the next evolution of Pulsar, is now available for testing on StreamNative Cloud! Pulsar 3.0 allows teams to run even bigger workloads, and is easier for developers to work with locally. Updates include LTS support, an improved load balancer, several performance improvements, and Docker images for M1\u002FM2 Macs.",[48,41137,41138,41139,190],{},"StreamNative Cloud Customers can try Pulsar 3.0 on a new cluster or reach out to support about upgrading existing clusters. Learn more about Pulsar 3.0 ",[55,41140,267],{"href":38482},[40,41142,41144],{"id":41143},"producer-and-consumer-events-added-to-streamnative-audit-log","Producer and Consumer events added to StreamNative Audit Log",[48,41146,41147,41148,41153],{},"We’ve enhanced the amount of information available in the ",[55,41149,41152],{"href":41150,"rel":41151},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Faudit-log",[264],"audit log"," to include Pulsar producer and consumer events.  Audit log options can be configured on the cluster settings page on the StreamNative Console.",[48,41155,41156],{},[384,41157],{"alt":41158,"src":41159},"Pulsar Audit Log on Streamnative Cloud","\u002Fimgs\u002Fblogs\u002F647e1a42c8599263ce334952_Untitled.png",[48,41161,41162],{},"StreamNative Cloud is a fully managed Pulsar service that allows teams to leverage the  power of Apache Pulsar for their streaming and messaging workloads, without having to deal with the complexity of managing it themselves.",[48,41164,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":41166},[41167,41168,41169,41170,41171],{"id":40998,"depth":19,"text":40999},{"id":41055,"depth":19,"text":41056},{"id":41100,"depth":19,"text":41101},{"id":41131,"depth":19,"text":41132},{"id":41143,"depth":19,"text":41144},"2023-06-05","We’ve been working on a better console experience to improve time to value for new users to start using Pulsar - both for initial cluster creation and set up as well as connecting to an existing cluster using Kafka clients and tools. AND…Pulsar 3.0 is also available on StreamNative Cloud so that organizations can take advantage of improved performance at scale with the new load balancer and other enhancements.","\u002Fimgs\u002Fblogs\u002F649a92b45164803be341b59b_647e418ce8eef8782846eeb3_Illustration.jpg",{},"\u002Fblog\u002Fproduct-updates-june-2023-improved-onboarding-for-cluster-set-up-and-kafka-clients",{"title":40986,"description":41173},"blog\u002Fproduct-updates-june-2023-improved-onboarding-for-cluster-set-up-and-kafka-clients",[3550,799,821],"pKwXIq6jYJNSSrI-KL5CBylSi_hzLJk92aZTwps2kMg",{"id":41182,"title":33996,"authors":41183,"body":41186,"category":821,"createdAt":290,"date":41465,"description":41466,"extension":8,"featured":294,"image":41467,"isDraft":294,"link":290,"meta":41468,"navigation":7,"order":296,"path":33995,"readingTime":33204,"relatedResources":290,"seo":41469,"stem":41470,"tags":41471,"__hash__":41472},"blogs\u002Fblog\u002Ffutureproof-kafka-applications-and-embrace-pulsar-with-streamnative-cloud.md",[41184,28,41185],"Yvonne Jouffrault","Sherlock Xu",{"type":15,"value":41187,"toc":41450},[41188,41190,41193,41202,41205,41208,41211,41215,41218,41231,41234,41237,41240,41249,41252,41256,41259,41264,41267,41278,41285,41289,41292,41295,41298,41301,41305,41308,41311,41314,41325,41329,41332,41346,41350,41368,41376,41379,41383,41386,41397,41399,41411,41413,41416,41418,41423],[40,41189,46],{"id":42},[48,41191,41192],{},"Since its inception, Apache Kafka has been widely recognized for its robust data streaming capabilities, making it the go-to solution for numerous companies handling real-time data. However, Kafka’s architecture has its own limitations, including issues with scalability, rebalancing, node failure management, cloud-native compatibility, and jitter. In light of these challenges, organizations using Kafka are exploring alternative systems in the streaming space, such as Apache Pulsar.",[48,41194,41195,41196,41201],{},"Pulsar has been making waves in the messaging and streaming domain. Although Pulsar’s creation was inspired by Kafka’s classic architecture, and it shares familiar concepts like topics and brokers, it adopts an entirely different approach to managing computing and storage. Born for the cloud-native era, Pulsar features a decoupled architecture, which allows for independent scaling of its computing and storage layers. This innovative design effectively solves ",[55,41197,41200],{"href":41198,"rel":41199},"https:\u002F\u002Fdzone.com\u002Farticles\u002Fa-deep-dive-into-the-differences-between-kafka-and",[264],"some of the key issues experienced by Kafka users",". Moreover, Pulsar is designed natively with a suite of enterprise-grade features, including geo-replication, multi-tenancy, and tiered storage, positioning Pulsar as an attractive alternative to Kafka users.",[48,41203,41204],{},"Nevertheless, Kafka has been the major solution for a long time for many organizations and their applications are already bound with it. They might be reluctant to make the migration due to different organizational, operational, or technical considerations.",[48,41206,41207],{},"This raises an interesting question: Is there a way for organizations to keep using their Kafka applications without major changes while leveraging Pulsar’s infrastructure and superior messaging and streaming technology?",[48,41209,41210],{},"Pulsar features a protocol handler mechanism that allows teams to leverage the best of both worlds. StreamNative has implemented the Kafka wire protocol by leveraging the existing components (for example, topic discovery, the distributed log library - ManagedLedger, and cursors) that Pulsar already has. StreamNative Cloud, which provides fully managed Pulsar services in the cloud, has a built-in Kafka protocol with enterprise features. It enables teams to take advantage of Pulsar’s distinct features such as multi-tenancy and tiered storage while continuing to use their existing Kafka applications.",[40,41212,41214],{"id":41213},"futureproof-kafka-applications-with-pulsar","Futureproof Kafka applications with Pulsar",[48,41216,41217],{},"The most important benefit of the Kafka protocol on StreamNative Cloud is that it allows organizations to harness the strengths of both systems without disrupting their legacy Kafka applications. With a unified event streaming platform, they can take advantage of the following features that Pulsar has to offer.",[321,41219,41220,41223,41225,41228],{},[324,41221,41222],{},"Unified streaming and queuing",[324,41224,32501],{},[324,41226,41227],{},"Enhanced scalability and elasticity with a rebalance-free architecture",[324,41229,41230],{},"Infinite data retention with Apache BookKeeper and tiered storage",[48,41232,41233],{},"Now, let’s take a closer look at each of them by understanding how Pulsar can help solve some of the key pain points for Kafka.",[32,41235,41222],{"id":41236},"unified-streaming-and-queuing",[48,41238,41239],{},"Pulsar can be used to handle both real-time streaming scenarios like Kafka as well as traditional message queues like RabbitMQ or ActiveMQ. With the Kafka protocol on StreamNative Cloud, organizations maintaining multiple systems for different use cases can manage streaming and messaging semantics in a single platform.",[48,41241,41242,41243,41248],{},"This ability is embodied in Pulsar's ",[55,41244,41247],{"href":41245,"rel":41246},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.0.x\u002Fconcepts-messaging\u002F#subscriptions",[264],"four subscription types"," (Exclusive, Shared, Failover, and Key_Shared) and selective acknowledgment of messages. The former defines how messages are sent to the consumers of a topic. As a single topic can have multiple different subscriptions, that topic can be used to serve both queueing and messaging use cases. The latter means that you can use Pulsar to acknowledge messages individually. This is where Kafka falls short as it only allows you to commit a batch of messages by a given offset (Pulsar supports cumulative acknowledgment as well).",[48,41250,41251],{},"Note that Pulsar’s protocol handler mechanism allows brokers to dynamically load protocol handlers on runtime, including not just the Kafka protocol, but also the MQTT and AMQP protocols. They can be enabled at the same time while working independently of each other.",[32,41253,41255],{"id":41254},"enterprise-grade-multi-tenancy","Enterprise-grade multi-tenancy",[48,41257,41258],{},"In many organizations using Kafka, different teams are self-managing Kafka and have a decentralized structure where each application team manages its own Kafka cluster (and probably its Kubernetes cluster) with the help of the platform or data team. This might cause problems in terms of data governance, access control, data replication, as well as costs. For example, an organization must run a Kafka cluster for each use case or team to avoid sharing data, and each cluster needs to be overprovisioned to avoid downtime and ensure that there are enough resources; a single message might generate hundreds of events since it must be replicated for all the different clusters, applications and teams. All of these problems result from the lack of multi-tenancy in Kafka.",[916,41260,41261],{},[48,41262,41263],{},"Note: It is possible to use multi-tenancy in Kafka with a paid solution such as Conduktor Gateway, while it is more expensive and users can have a vendor lock-in issue.",[48,41265,41266],{},"Different from Kafka, Pulsar is designed as a multi-tenant system from the ground up. It features a three-level hierarchy of tenants, namespaces, and topics, offering an effective access control mechanism.",[321,41268,41269,41272,41275],{},[324,41270,41271],{},"Tenants provide a security boundary. Different teams of an organization can have their own tenant.",[324,41273,41274],{},"Namespaces allow teams to keep their data separate from each other and support custom policies, such as data retention and storage quotas.",[324,41276,41277],{},"Topics are named channels under namespaces for transmitting messages from producers to consumers.",[48,41279,41280,41281,190],{},"Pulsar’s multi-tenancy allows for the segregation and independent processing of data streams in large-scale applications. For more information, download our eBook ",[55,41282,41284],{"href":41283},"\u002Fwhitepapers\u002Fmulti-tenancy-and-isolation-with-apache-pulsar","Multi-Tenancy and Isolation: Scaling Real-Time Data Across Teams with Apache Pulsar",[32,41286,41288],{"id":41287},"unlimited-data-storage","Unlimited data storage",[48,41290,41291],{},"One of the key benefits of the Kafka protocol on StreamNative Cloud lies in Pulsar’s ability to achieve unlimited data storage.",[48,41293,41294],{},"In Kafka, the storage capacity is tied to the leader node and local disk partitions, making it difficult to scale (stateful brokers bring great difficulty in rebalancing). As a result, reaching the maximum storage capacity can hinder the acceptance of new messages. If you choose to scale up Kafka brokers or prepare a large storage cluster, you will end up with a costly infrastructure setting.",[48,41296,41297],{},"Our Kafka protocol on StreamNative Cloud allows you to persist data for longer periods by leveraging BookKeeper, which supports data persistence outside of Pulsar brokers. BookKeeper’s storage servers, also known as bookies, can be independently scaled. This means you can expand your cluster without worrying about storage limits to easily accommodate growing data workloads.",[48,41299,41300],{},"Another important feature that Pulsar natively offers to make unlimited data storage possible is tiered storage. You can store cold data in cheaper storage for extended periods based on your business needs. By contrast, Kafka does not provide tiered storage natively. Kafka brokers require high-performance disks for both writing and reading, which can be expensive. Vendors like Confluent do provide tiered storage for Kafka, while it also means you are vendor-locked.",[32,41302,41304],{"id":41303},"enhanced-scalability-and-elasticity","Enhanced scalability and elasticity",[48,41306,41307],{},"The biggest pain point in Kafka might be its scaling difficulty. When a Kafka broker goes down, a newly added broker cannot immediately serve the requests sent to the failed broker. You need to manually migrate the old partition and this process can be a nightmare.",[48,41309,41310],{},"By contrast, Pulsar separates computing from storage, allowing for vertical and horizontal scaling of both its processing and storage nodes. With a more flexible architecture than Kafka, Pulsar allows brokers and bookies to be scaled independently, and each layer does not even need to know what happens on either side.",[48,41312,41313],{},"In terms of elasticity, Kafka requires you to carefully plan in advance how many partitions and broker nodes are required in the cluster. With the Kafka protocol on StreamNative Cloud, you can enjoy better elasticity at the following three levels:",[321,41315,41316,41319,41322],{},[324,41317,41318],{},"Consumer: You don’t need to perform topic repartitioning to add a consumer.",[324,41320,41321],{},"Processing: Pulsar brokers are stateless and you can add a broker as needed.",[324,41323,41324],{},"Storage: Bookies can handle requests immediately after added and you can offload data to external storage without adding nodes.",[32,41326,41328],{"id":41327},"cost-effectiveness","Cost-effectiveness",[48,41330,41331],{},"The above-mentioned key benefits speak volumes about the cost-effectiveness of the Kafka protocol on StreamNative Cloud. Specifically, it helps save costs in the following ways:",[321,41333,41334,41337,41340,41343],{},[324,41335,41336],{},"Unified streaming and messaging. As a two-in-one system, Pulsar frees you from maintaining another queueing system, with less infrastructure overhead. StreamNative Cloud also supports other protocols such as MQTT and AMQP, which can be used with the Kafka protocol at the same time.",[324,41338,41339],{},"Multi-tenancy. Multiple users or applications can share the same Pulsar cluster while being isolated from each other. Cluster operators do not need to create separate clusters for each tenant as they can share the same resources, thus reducing infrastructure costs.",[324,41341,41342],{},"Scalability and elasticity. As Pulsar supports independent scaling of both brokers and bookies, you only need to pay for the nodes that are working to accommodate the real-time workloads. Pulsar’s great scalability and elasticity also mean better resource utilization.",[324,41344,41345],{},"Tiered storage. As mentioned above, Pulsar’s architecture allows for more efficient storage of messages than Kafka. With tiered storage, less frequently accessed data can be offloaded to cheaper storage systems, such as AWS S3 and Google Cloud Storage. This greatly reduces storage costs for large data sets.",[32,41347,41349],{"id":41348},"smooth-ecosystem-integration","Smooth ecosystem integration",[48,41351,41352,41353,1186,41358,5422,41362,41367],{},"One of the advantages that Kafka provides over Pulsar is its rich ecosystem of tools. Kafka has established itself as an industry standard, offering a wide range of connectors and client libraries that facilitate seamless integration with popular data streaming, processing, and database frameworks. Currently, Pulsar supports connectors for systems like ",[55,41354,41357],{"href":41355,"rel":41356},"https:\u002F\u002Fhub.streamnative.io\u002Fdata-processing\u002Fpulsar-spark\u002F3.1.1\u002F",[264],"Spark",[55,41359,8057],{"href":41360,"rel":41361},"https:\u002F\u002Fhub.streamnative.io\u002Fdata-processing\u002Fpulsar-flink\u002F1.15.1.1\u002F",[264],[55,41363,41366],{"href":41364,"rel":41365},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Felasticsearch-sink\u002F2.9.2\u002F",[264],"Elasticsearch",", while there are more available connectors in the Kafka ecosystem.",[48,41369,41370,41371,41375],{},"The Kafka protocol on StreamNative Cloud enables you to use Kafka connectors to implement more Pulsar connectors (for example, ",[55,41372,41374],{"href":41373},"\u002Fblog\u002Fstreaming-war-and-how-apache-pulsar-is-acing-the-battle#connectors-how-to-make-the-best-use-of-them","the Pulsar-Druid connector used by engineers at Nutanix","). This way, organizations can continue to use their existing Kafka connectors, and other ecosystem tools without major modifications. This compatibility allows for a more smooth transition, minimizing the learning curve and empowering organizations to leverage their existing investments in Kafka.",[48,41377,41378],{},"The above-mentioned benefits are only part of the major differentiators. As organizations grow Pulsar adoption across different teams and use cases, they can use the Kafka protocol on StreamNative Cloud as a gateway to drive Pulsar adoption and embrace more native benefits and features that Pulsar has to offer.",[40,41380,41382],{"id":41381},"real-world-use-cases","Real-world use cases",[48,41384,41385],{},"By integrating two popular event-streaming ecosystems, teams can harness the unique benefits of each ecosystem and build a unified event streaming platform with Pulsar to accelerate the development of real-time applications and services.",[321,41387,41388,41391,41394],{},[324,41389,41390],{},"Real-time Fraud Detection: In the financial industry, teams can use data from legacy Kafka applications to detect fraudulent activities in real time. Transaction data from multiple sources, such as credit card transactions and online payments, can be ingested into Pulsar. Stream processing applications can analyze the data in real time, identify suspicious patterns, and trigger alerts or take actions to prevent fraud.",[324,41392,41393],{},"Supply Chain Optimization: By streaming data from different stages of the supply chain, such as inventory systems, logistics providers, and point-of-sale systems, organizations can gain real-time visibility into their supply chain operations. This allows them to proactively identify bottlenecks, optimize inventory levels, and improve overall efficiency.",[324,41395,41396],{},"Gaming Telemetry and Analytics: In the gaming industry, the Kafka protocol on Pulsar can be utilized for collecting and processing telemetry data from games and game servers. This data can include player actions, game events, and performance metrics. Real-time analytics can be performed to monitor player behavior, identify cheating or hacking attempts, and optimize game balancing and monetization strategies.",[40,41398,22668],{"id":2146},[48,41400,41401,41402,41404,41405,41410],{},"The Kafka protocol is now available on ",[55,41403,3550],{"href":37361},", which delivers fully managed Apache Pulsar in the cloud of your choice. It offers three deployment options to easily and safely connect to your existing tech stack with StreamNative’s reliable, turnkey service. To get started, follow the ",[55,41406,41409],{"href":41407,"rel":41408},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fquickstart-kafka",[264],"instructions"," to use the Kafka protocol on StreamNative Cloud.",[40,41412,2125],{"id":2122},[48,41414,41415],{},"The Kafka protocol on StreamNative Cloud brings together the best of both Kafka and Pulsar, providing a powerful solution for modern data streaming needs. It unlocks new opportunities for data-driven innovation as organizations can continue to use legacy Kafka applications with the benefits that Pulsar has to offer. Furthermore, they can use it as a gateway for more Pulsar adoption without disrupting their Kafka applications.",[40,41417,40413],{"id":36476},[48,41419,38379,41420,40419],{},[55,41421,38384],{"href":38382,"rel":41422},[264],[321,41424,41425,41430,41435,41443],{},[324,41426,41427,41428,190],{},"Run fully managed Pulsar services and enable the Kafka protocol with ",[55,41429,3550],{"href":37361},[324,41431,38390,41432,190],{},[55,41433,31914],{"href":31912,"rel":41434},[264],[324,41436,41437,758,41439],{},[2628,41438,40436],{},[55,41440,41442],{"href":41198,"rel":41441},[264],"A Deep Dive Into the Differences Between Kafka and Pulsar",[324,41444,41445,758,41447],{},[2628,41446,40436],{},[55,41448,41449],{"href":32263},"Understanding Pulsar in 10 Minutes: A Guide for Kafka Users",{"title":18,"searchDepth":19,"depth":19,"links":41451},[41452,41453,41461,41462,41463,41464],{"id":42,"depth":19,"text":46},{"id":41213,"depth":19,"text":41214,"children":41454},[41455,41456,41457,41458,41459,41460],{"id":41236,"depth":279,"text":41222},{"id":41254,"depth":279,"text":41255},{"id":41287,"depth":279,"text":41288},{"id":41303,"depth":279,"text":41304},{"id":41327,"depth":279,"text":41328},{"id":41348,"depth":279,"text":41349},{"id":41381,"depth":19,"text":41382},{"id":2146,"depth":19,"text":22668},{"id":2122,"depth":19,"text":2125},{"id":36476,"depth":19,"text":40413},"2023-06-02","Leverage the Kafka protocol on StreamNative Cloud to easily run your legacy Kafka applications with it and use it as a gateway to drive Pulsar adoption.","\u002Fimgs\u002Fblogs\u002F64792e4610b5ca18fb84d54f_futureproof-kafka-applications-and-embrace-pulsar-with-streamnative-cloud.png",{},{"title":33996,"description":41466},"blog\u002Ffutureproof-kafka-applications-and-embrace-pulsar-with-streamnative-cloud",[799,3550,821,303],"B57kTqNXRPfyyY4yMjh69lyO12oZfr1jaMeswMa81-A",{"id":41474,"title":41475,"authors":41476,"body":41477,"category":821,"createdAt":290,"date":41683,"description":41684,"extension":8,"featured":294,"image":41685,"isDraft":294,"link":290,"meta":41686,"navigation":7,"order":296,"path":38482,"readingTime":23092,"relatedResources":290,"seo":41687,"stem":41688,"tags":41689,"__hash__":41690},"blogs\u002Fblog\u002Fpulsar-3-0-is-available-for-testing-on-streamnative-cloud.md","How Pulsar 3.0 will help teams run Pulsar faster and more reliably at scale",[807,41184],{"type":15,"value":41478,"toc":41672},[41479,41486,41490,41493,41495,41499,41502,41510,41514,41517,41525,41529,41532,41540,41543,41546,41549,41553,41556,41559,41563,41566,41569,41572,41576,41581,41584,41587,41590,41594,41597,41605,41613,41616,41620,41623,41627,41632,41635,41638,41641,41644,41648,41653,41656,41660,41668],[48,41480,41481,41482,41485],{},"As one of the more prolific open-source contributors to Apache Pulsar, StreamNative was excited to lead the recent efforts to release ",[55,41483,38483],{"href":37123,"rel":41484},[264],". And it is now available for StreamNative Cloud customers to test out on new clusters or upgrade existing clusters.",[3933,41487,41489],{"id":41488},"what-weve-learned-from-operating-pulsar-at-scale","What We’ve Learned from Operating Pulsar at Scale",[48,41491,41492],{},"Many of the improvements in Pulsar 3.0 allow teams to run Pulsar faster and more reliably at scale.  A large part of StreamNative’s contribution came from the input and feedback we’ve learned from our customers - many of which are teams pushing Pulsar to new limits and use cases - as well as our own experience managing more Pulsar clusters than any organization in the world.",[48,41494,3931],{},[3933,41496,41498],{"id":41497},"a-big-milestone-for-the-community","A Big Milestone for the Community…",[48,41500,41501],{},"Pulsar 3.0 signifies the growth evolution of the Apache Pulsar project has made over the last few years.",[321,41503,41504,41507],{},[324,41505,41506],{},"The community is getting bigger! Over 140 contributors submitted about 1500 commits to the Pulsar 3.0 release, which is the largest contribution yet for a project that is fast becoming one of the biggest open-source projects.",[324,41508,41509],{},"It includes support for LTS which delivers the predictability and stability that larger enterprise teams need to deliver a stable and reliable messaging and streaming service.",[40,41511,41513],{"id":41512},"whats-new-in-pulsar-30","What's new in Pulsar 3.0",[48,41515,41516],{},"This release improves the performance and stability for teams operating Pulsar at scale as well as making it more stable (and predictable) for powering messaging and data streaming services for mission critical use cases.",[48,41518,41519,41520,41524],{},"Here are a few highlights of whats included in 3.0. You can get the full list in the [official announcement. ](",[55,41521,41522],{"href":41522,"rel":41523},"http:\u002F\u002Fmission-critical",[264]," use cases)",[40,41526,41528],{"id":41527},"introducing-lts-for-pulsar","Introducing LTS for Pulsar:",[48,41530,41531],{},"As the Pulsar community has matured, more companies are adopting Pulsar for mission critical workloads and want to minimize the risk around version upgrades in production.",[48,41533,41534,41535,41539],{},"3.0 is the first version that introduces long term support and the community has committed to ",[55,41536,41538],{"href":35391,"rel":41537},[264],"releasing long term versions with feature releases"," between them, allowing teams who want a more stable release to use versions 3.0.x, while those seeking new features can use versions 3.x.",[48,41541,41542],{},"For risk-adverse teams running Pulsar for mission critical workloads, this will allow them to run the last ‘stable’ major release and upgrade less frequently.",[48,41544,41545],{},"For those of us who are contributing to building Pulsar, it enables the balance between moving fast and introducing exciting new features while balancing the need for stable releases for the many mission critical use cases that it supports.",[48,41547,41548],{},"This brings two main benefits to teams with organization wide implementations of Pulsar:",[32,41550,41552],{"id":41551},"stability-in-production","Stability in production:",[48,41554,41555],{},"Previously, a new Pulsar version was available every 3-4 months and teams were forced to upgrade to the next version to take advantage of needed bug fixes and security patches, while risking the introduction of new features and functionality into their production environments.",[48,41557,41558],{},"Now, with the long-term support version, teams can remain on a stable version of Pulsar and choose to update only bug fixes and maintain security patches while allowing them to try and experiment with the new features and functionality in the latest version in a testing environment, since these versions will be supported for a shorter amount of time.",[32,41560,41562],{"id":41561},"predictable-bug-fixes-and-updates","Predictable bug fixes and updates:",[48,41564,41565],{},"Previously new versions were released on a quarterly-ish schedule but releases were frequently delayed which created uncertainty around when new capabilities and improvements would be available, making it hard for teams to plan development.",[48,41567,41568],{},"Going forward, there will be a code freeze on each release 3 weeks prior, allowing teams to have certainty of what will be included in the next update and when it will be released.",[48,41570,41571],{},"StreamNative will continue to release a weekly update of our Pulsar distribution with bug fixes and patches so that StreamNative Cloud customers can benefit from these updates in advance of the Pulsar releases.",[40,41573,41575],{"id":41574},"new-updated-load-balancer","NEW Updated Load Balancer",[916,41577,41578],{},[48,41579,41580],{},"The new load balancer delivers on the promise of Pulsar’s horizontal scalability and enables teams with ‘spikey’ workloads to rest easy knowing that traffic will be quickly distributed as they scale up their brokers to meet the demands of their business.",[48,41582,41583],{},"One of Pulsar’s key differentiators from Kafka is its ability to scale horizontally without downtime or performance impact as workloads increase. This is achieved by Pulsar load balancer that equalizes traffic across all of the brokers in a cluster.  It ensures that some brokers do not become overloaded as traffic scales up and that idle brokers take on their fair share of the workload.",[48,41585,41586],{},"However, in situations where traffic was not well distributed - such as higher workloads for certain topics or sudden spikes in traffic - it took anywhere between a few minutes to a few hours for the load balancer to propagate the traffic correctly, leading to some brokers becoming overloaded while others sat idle. This was difficult to avoid or prevent since the cluster appeared to have ample capacity to handle the traffic, but because of the imbalance in traffic distribution, there would be performance degradation in some of the brokers (where the load was higher).",[48,41588,41589],{},"Note that because Pulsar is a stateful system and the data has locality, when traffic is changing rapidly, there can be topics with differing amounts of traffic and a spike for any specific topic.",[3933,41591,41593],{"id":41592},"enter-the-new-and-improved-load-balancer-in-pulsar-30","Enter the New and Improved Load Balancer in Pulsar 3.0!",[48,41595,41596],{},"The new load balancer available in Pulsar 3.0 equalizes the traffic much faster to the ideal state (ie. each broker is serving an equal amount of traffic)",[321,41598,41599,41602],{},[324,41600,41601],{},"when traffic spikes and there are idle brokers, or",[324,41603,41604],{},"when new brokers have been added",[48,41606,41607,41608,190],{},"This enables teams to manage their excess cluster capacity better and delivers a more uniform and predictable pattern for scaling up.  Learn more about it in the ",[55,41609,41612],{"href":41610,"rel":41611},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-architecture-overview\u002F#managed-ledgers",[264],"Pulsar docs",[48,41614,41615],{},"How much faster is it? We are currently running performance tests to show how quickly a system can rebalance when adding more brokers and will publish those in a follow-up blog post.",[40,41617,41619],{"id":41618},"performance-improvements","Performance Improvements",[48,41621,41622],{},"Pulsar 3.0 brings a large number of performance related improvements. While some of these improvements were introduced in Pulsar itself, the bulk of the changes come from the new Apache BookKeeper 4.16 release. We have concentrated the efforts in making the handling of a lot of small messages in BookKeeper more efficient. This can happen in situations where the load is spread over a large number of topics, or where for some reason, message batching cannot be applied. At the same time, BookKeeper 4.16 brings a new storage option to use DirectIO for completely bypassing the OS page cache mechanism, relying instead on the in-process caches. We have seen great improvements in both CPU usage, latency and overall maximum throughput for Pulsar. We will provide an in-depth analysis of all these changes and their impact in future blog posts.",[40,41624,41626],{"id":41625},"optimizations-for-scheduled-messages","Optimizations for Scheduled Messages",[916,41628,41629],{},[48,41630,41631],{},"Teams using Pulsar scheduled messages can track hundreds of millions of delayed (ie. scheduled) messages without having to worry about memory overloads or slow restarting\u002Fre-indexing time.  This reduces delays as well as the resources needed to store delayed messages.",[48,41633,41634],{},"A key messaging feature in Pulsar is the ability to schedule millions of messages to be delivered or retried at a future time, as this creates delayed messages that are stored in memory the until the time comes to deliver them.",[48,41636,41637],{},"Note that this is an important differentiator for Pulsar compared to other messaging services, as Kafka does not support delayed messages at all and RabbitMQ does not easily support delayed messages at high volumes.",[48,41639,41640],{},"Pulsar 3.0 includes some significant improvements as to how delayed messages are tracked so that they take up significantly less memory, eliminating the possibility of a memory overload, and do not require expensive index rebuilding.",[48,41642,41643],{},"This change allows the indexing of delayed messages to be more scalable and allows it to be broken down into micro-segments.  These segment do not need to be stored in memory at the same and and that they do not need to be re-indexed when the broker restarts.",[40,41645,41647],{"id":41646},"docker-images-for-arm64-deliver-improved-local-performance","Docker Images for Arm64 Deliver Improved Local Performance",[916,41649,41650],{},[48,41651,41652],{},"Mac users rejoice!   Pulsar is about to become a lot more stable to run on Mac local environments!   This small but significant improvement in 3.0 is due to the change that Pulsar will publish Docker images with versions both for Intel x86-64 and Arm64 architectures.",[48,41654,41655],{},"Previously, Pulsar only published intel based images which meant that when run on Arm64 architecture (such as Macs) it could run very slowly or even crash. Now, users can use Pulsar standalone or run TestContainer tests on a Mac M1\u002FM2 laptop with improved performance and avoid the issues with the Docker container engine when it emulates x86-64 CPU within an Arm64 host. At the same time, this image will make it possible to run Pulsar in a Docker\u002FKubernetes production environment on Arm64 machines.",[3933,41657,41659],{"id":41658},"ready-to-try-pulsar-30","Ready to Try Pulsar 3.0?",[48,41661,41662,41663,41667],{},"StreamNative Cloud Customers can try Pulsar 3.0 on a new cluster or ",[55,41664,41666],{"href":41665},"mailto:support@streamnative.io?subject=I'm%20ready%20to%20try%20Pulsar%203.0%20on%20StreamNative%20Cloud","reach out to support"," about upgrading existing clusters.",[48,41669,41670],{},[34077,41671],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":41673},[41674,41675,41679,41680,41681,41682],{"id":41512,"depth":19,"text":41513},{"id":41527,"depth":19,"text":41528,"children":41676},[41677,41678],{"id":41551,"depth":279,"text":41552},{"id":41561,"depth":279,"text":41562},{"id":41574,"depth":19,"text":41575},{"id":41618,"depth":19,"text":41619},{"id":41625,"depth":19,"text":41626},{"id":41646,"depth":19,"text":41647},"2023-05-22","Pulsar 3.0 delivers performance improvements and optimizations for teams operating Pulsar at scale as well as improved stability and predictability.","\u002Fimgs\u002Fblogs\u002F6480cca4043f9231bc9f05b3_Illustration.webp",{},{"title":41475,"description":41684},"blog\u002Fpulsar-3-0-is-available-for-testing-on-streamnative-cloud",[3550,821],"9PvmzALiBMTpGA1_LvSHqUdyCmdGACW4gwHwHIK5Sr4",{"id":41692,"title":41693,"authors":41694,"body":41696,"category":3550,"createdAt":290,"date":41858,"description":41859,"extension":8,"featured":294,"image":41860,"isDraft":294,"link":290,"meta":41861,"navigation":7,"order":296,"path":41862,"readingTime":11508,"relatedResources":290,"seo":41863,"stem":41864,"tags":41865,"__hash__":41866},"blogs\u002Fblog\u002Fnew-to-streamnative-cloud-apr-2023-google-cloud-marketplace-new-client-page-and-more.md","Product updates for StreamNative Cloud [Apr 2023]: Google Cloud Marketplace, and more!",[41695],"Jihyun Tornow",{"type":15,"value":41697,"toc":41850},[41698,41701,41705,41711,41714,41728,41732,41735,41742,41755,41759,41762,41776,41783,41787,41790,41801,41814,41818,41829,41832,41836,41842,41846],[48,41699,41700],{},"StreamNative delivers enterprise-grade tooling that enables teams to create advanced data streaming and event-driven architecture with Apache Pulsar. These new capabilities enable teams to provide an improved user experience with increased flexibility and control over Pulsar clusters.",[40,41702,41704],{"id":41703},"google-cloud-marketplace","Google Cloud Marketplace",[48,41706,41707,41708,190],{},"We are excited to announce that StreamNative Cloud's Pulsar-as-a-Service solution is now available on Google Cloud Marketplace. This new offering provides a turn-key solution that is easy to procure and deploy, with enterprise-grade security and SLA. Built on top of Apache Pulsar, StreamNative Cloud delivers a high-performance, low-latency platform capable of processing, analyzing, and acting on billions of events per second. Read the complete announcement ",[55,41709,267],{"href":41710},"\u002Fblog\u002Fstreamnative-clouds-pulsar-as-a-service-now-available-on-google-cloud-marketplace",[48,41712,41713],{},"Customers benefit from a simplified procurement process, streamlined purchasing and deployment, and unified billing included in the Google Cloud invoice. Additionally, the solution is fully integrated with other Google Cloud services, such as BigQuery, Dataflow, and Pub\u002FSub, providing customers with a complete end-to-end data processing solution.",[48,41715,41716,41717,41722,41723,41727],{},"Learn more on our ",[55,41718,41721],{"href":41719,"rel":41720},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-gcp",[264],"documentation page"," or head over to",[55,41724,41726],{"href":24192,"rel":41725},[264]," Google Cloud Marketplace"," and check out our solution.",[40,41729,41731],{"id":41730},"new-cloud-console-client-page","New Cloud Console Client Page",[48,41733,41734],{},"Hosted and Bring-Your-Own-Cloud (BYOC) customers can now benefit from the new client page for Pulsar clusters in the Cloud. This update offers a range of improvements to the Cloud Console, including additional troubleshooting tools and detailed information on topics, namespaces, and cluster resources.",[48,41736,41737,41738,190],{},"The new client page provides step-by-step guides for configuring popular programming languages and frameworks, such as Spring, Rust, C#, Node.JS, Python, and Rest API. It provides a better user experience to quickly find and configure Pulsar client or Pulsar CLI to Pulsar clusters. Log in to your console and explore the new layout or learn more from our ",[55,41739,41721],{"href":41740,"rel":41741},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fqs-connect#jumpstart-for-beginners",[264],[48,41743,41744,41745,41750,41751,41754],{},"Have questions about the new cloud console client page? Watch ",[55,41746,41749],{"href":41747,"rel":41748},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zwlXFIWdhQo",[264],"Getting Started on StreamNative Cloud video",".\n",[384,41752],{"alt":18,"src":41753},"\u002Fimgs\u002Fblogs\u002F644c36fadd08f8c56b9270e5_ImageClientPage.png","Cloud Console Client Page",[40,41756,41758],{"id":41757},"pulsar-transactions-on-streamnative-cloud-in-beta","​​Pulsar Transactions on StreamNative Cloud in Beta",[48,41760,41761],{},"The beta release of Pulsar Transactions is now available on Hosted Cloud and BYOC. This feature allows event-streaming applications to consume, process, and produce messages in one atomic operation. This feature supports:",[321,41763,41764,41767,41770,41773],{},[324,41765,41766],{},"Atomic writes across multiple topic partitions.",[324,41768,41769],{},"Atomic acknowledgments across multiple topic partitions.",[324,41771,41772],{},"All the operations made within one transaction either all succeed or all fail.",[324,41774,41775],{},"Consumers are ONLY allowed to read committed messages",[48,41777,10256,41778,190],{},[55,41779,41782],{"href":41780,"rel":41781},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Ftransactions-overview#enable-transactions-on-your-pulsar-cluster",[264],"Pulsar Transactions",[40,41784,41786],{"id":41785},"resources-operator-for-geo-replication","Resources Operator for Geo-Replication",[48,41788,41789],{},"The Pulsar Resources Operator has been updated to enable geo-replication support for StreamNative Private Cloud customers. These updates include:",[321,41791,41792,41795,41798],{},[324,41793,41794],{},"new CRD PulsarGeoReplication allowing a one-way geo-replication,",[324,41796,41797],{},"new field BrokerServiceURL to establish connections to remote clusters,",[324,41799,41800],{},"new fields ClusterName and GeoReplicationRefs to facilitate the management of geo-replication between clusters.",[48,41802,41803,41804,41808,41809,41813],{},"The Pulsar Resources Operator is the recommended tool for managing Pulsar resources on a Kubernetes cluster. Although CLI tools like pulsar-admin and pulsarctl can be used, the Pulsar Resources Operator provides a more streamlined and simplified approach to configure and manage multi-cluster environments. Learn more about how Resource Operator works in our",[55,41805,41807],{"href":20667,"rel":41806},[264]," GitHub repo",". Take a look at our ",[55,41810,41812],{"href":41811},"\u002Fblog\u002Fan-operators-guide-configuring-geo-replication-with-the-pulsar-resources-operator","Operator's guide"," for configuring geo-replication using the Pulsar Resources Operator.",[40,41815,41817],{"id":41816},"pulsar-admin-library-for-go","Pulsar Admin Library for Go",[48,41819,41820,41821,41825,41826,190],{},"The Pulsar Admin Go library has been released as an open-source REST API, enabling users to perform administrative tasks on Pulsar clusters. With its user-friendly Go interface, the library enables administrative tasks such as namespace creation, permission granting, and geo-replication changes. Try out the library and visit our ",[55,41822,7120],{"href":41823,"rel":41824},"https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-admin-go@v0.1.0\u002Fpkg\u002Fadmin\u002Fauth",[264]," page and read the ",[55,41827,39553],{"href":41828},"\u002Fblog\u002Fintroducing-pulsar-admin-go-library-for-go-developers",[48,41830,41831],{},"Our team is committed to providing you with the best possible experience on StreamNative Cloud, and we can’t wait to see the innovative solutions you’ll create with these new capabilities!",[40,41833,41835],{"id":41834},"connect-with-us","Connect with us!",[48,41837,41838,41839],{},"Learn more about StreamNative Cloud features and enhancements at ",[55,41840,41841],{"href":37361},"streamnative.io.",[48,41843,41844],{},[34077,41845],{"value":34079},[48,41847,3931,41848],{},[55,41849],{"href":37361},{"title":18,"searchDepth":19,"depth":19,"links":41851},[41852,41853,41854,41855,41856,41857],{"id":41703,"depth":19,"text":41704},{"id":41730,"depth":19,"text":41731},{"id":41757,"depth":19,"text":41758},{"id":41785,"depth":19,"text":41786},{"id":41816,"depth":19,"text":41817},{"id":41834,"depth":19,"text":41835},"2023-04-28","New features and benefits of StreamNative Cloud, Google Cloud Market Place","\u002Fimgs\u002Fblogs\u002F644c3bdb7cee1e7875ca33aa_StreamNative-Cloud.jpg",{},"\u002Fblog\u002Fnew-to-streamnative-cloud-apr-2023-google-cloud-marketplace-new-client-page-and-more",{"title":41693,"description":41859},"blog\u002Fnew-to-streamnative-cloud-apr-2023-google-cloud-marketplace-new-client-page-and-more",[3550,821,8058],"I_JD1uewW9JPL6UmRpfQsC-Mn4KaHS2Y7AnymHIG87U",{"id":41868,"title":27743,"authors":41869,"body":41870,"category":821,"createdAt":290,"date":42191,"description":42192,"extension":8,"featured":294,"image":42193,"isDraft":294,"link":290,"meta":42194,"navigation":7,"order":296,"path":21529,"readingTime":33204,"relatedResources":290,"seo":42195,"stem":42196,"tags":42197,"__hash__":42198},"blogs\u002Fblog\u002Fintroducing-oxia-scalable-metadata-and-coordination.md",[807],{"type":15,"value":41871,"toc":42179},[41872,41875,41887,41891,41894,41902,41905,41908,41931,41935,41938,41947,41950,41961,41964,41975,41978,41986,41990,41999,42002,42005,42009,42012,42015,42018,42021,42024,42032,42039,42042,42045,42048,42055,42059,42062,42065,42069,42077,42080,42084,42098,42101,42109,42113,42120,42123,42126,42130,42160,42163,42166,42175],[48,41873,41874],{},"We are excited to announce that StreamNative has open-sourced Oxia: a scalable metadata store and coordination system that can be used as the core infrastructure to build large-scale distributed systems.",[48,41876,41877,41878,41881,41882,190],{},"Oxia is available on ",[55,41879,39680],{"href":22142,"rel":41880},[264]," and released under ",[55,41883,41886],{"href":41884,"rel":41885},"https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0",[264],"Apache License, Version 2.0",[40,41888,41890],{"id":41889},"what-is-oxia","What is Oxia",[48,41892,41893],{},"To provide better clarity, here is some helpful context:",[321,41895,41896,41899],{},[324,41897,41898],{},"Coordination: Building a distributed system often involves having multiple nodes\u002Fmachines\u002Fprocesses to discover each other or to understand who’s serving a particular resource. In this context, “coordination” refers to service discovery, leader election, and operations on distributed locks.‍",[324,41900,41901],{},"Metadata: When building a stateful system whose purpose is to store data, it is often helpful to keep “metadata.” An example of metadata is a pointer to the actual data, such as pointing to the correct server and the filename\u002Foffset where the data is located.",[48,41903,41904],{},"Oxia takes a fresh approach to address the problem space typically addressed by systems like Apache ZooKeeper, Etcd, and others.",[48,41906,41907],{},"The principal design traits for Oxia are:",[321,41909,41910,41913,41916,41919,41922,41925,41928],{},[324,41911,41912],{},"Optimized for Kubernetes environment: Simplified architecture using Kubernetes primitives.",[324,41914,41915],{},"Linearizable per-key operations: The state is replicated and sharded across multiple nodes. Atomic operations are allowed over individual keys.",[324,41917,41918],{},"Transparent horizontal scalability: Trivial operations to add and remove capacity in the cluster.",[324,41920,41921],{},"Optimized data plane: Supports millions of read\u002Fwrite operations per second.",[324,41923,41924],{},"Large data storage capacity: Able to store hundreds of GBs (several orders of magnitudes more than current systems).",[324,41926,41927],{},"Ephemeral records: Records whose lifecycle is tied to a particular client instance, and they are automatically deleted when the client instance is closed.",[324,41929,41930],{},"Namespaces support: Improved control and visibility by isolating different use cases.",[40,41932,41934],{"id":41933},"motivations","Motivations",[48,41936,41937],{},"Apache Pulsar has traditionally relied on Apache ZooKeeper as the foundation for all coordination and metadata.",[48,41939,41940,41941,41946],{},"Over the past year, through the efforts of ",[55,41942,41945],{"href":41943,"rel":41944},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-45:-Pluggable-metadata-interface",[264],"PIP-45",", the coordination and metadata system has been placed behind a pluggable interface, enabling Pulsar to support additional backends, such as Etcd.",[48,41948,41949],{},"However, there remained a need to design a suitable system that could effectively address the limitations of existing solutions like ZooKeeper and Etcd:",[321,41951,41952,41955,41958],{},[324,41953,41954],{},"Fundamental Limitation: These systems are not horizontally scalable. An operator cannot add more nodes and expand the cluster capacity since each node must store the entire data set for the cluster.",[324,41956,41957],{},"Ineffective Vertical Scaling: Since the max data set and the throughput are capped, the next best alternative is to scale vertically (e.g., increasing CPU and IO resources to the same nodes). However, scaling vertically is a stop-gap solution that does not ultimately resolve the problem.",[324,41959,41960],{},"Inefficient Storage: Storing more than 1 GB of data in these systems is highly inefficient because of their periodic snapshots. This snapshot process repeatedly writes the same data, stealing all the IO resources and slowing down the write operations.",[48,41962,41963],{},"Today, Pulsar can support clusters with up to 1 million topics, which is already impressive, especially when compared to what similar systems can support. However, there are a few considerations to make:",[1666,41965,41966,41969,41972],{},[324,41967,41968],{},"This represents the upper limit, and there is no practical way to exceed that.",[324,41970,41971],{},"Reaching this amount of metadata in ZooKeeper\u002FEtcd requires careful hardware sizing, tuning, and constant monitoring.",[324,41973,41974],{},"Even before reaching the limits, ZooKeeper performance degrades as the metadata size grows, resulting in longer topic failover times and higher long-tail latencies for some Pulsar operations.",[48,41976,41977],{},"Ultimately, the goal is for Pulsar to reach a point where a cluster with hundreds of millions of topics is something ordinary that everyone can deploy without a lot of hardware or advanced skills. This will eventually change how developers approach messaging and simplify the architecture of their applications.",[48,41979,41980,41981,41985],{},"Oxia is a step towards this goal, though not the only one. Multiple changes are already happening within Pulsar: such as a new ",[55,41982,41984],{"href":36548,"rel":41983},[264],"load manager implementation",", rehauled metric collection component, and more updates to come.",[40,41987,41989],{"id":41988},"comparison-with-other-approaches","Comparison with other approaches",[48,41991,41992,41993,41998],{},"Other systems, such as Apache Kafka, have followed a different approach in addressing the limitation of ZooKeeper: KRaft, introduced in ",[55,41994,41997],{"href":41995,"rel":41996},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FKIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum",[264],"KIP-500",", has introduced an option to remove the dependency on ZooKeeper.",[48,42000,42001],{},"We feel that this approach replicates the same ZooKeeper\u002FEtcd architecture without significant improvements and does not remove any complexity from the system. Instead, the existing complexity of ZooKeeper has been transferred to the Kafka brokers, replacing the existing battle-tested code with new, unproven code that would just do the same job.",[48,42003,42004],{},"Designing a new system or component provides a good opportunity to examine the problem and past approaches and focus on designing a solution for the current operating environment.",[40,42006,42008],{"id":42007},"building-oxia","Building Oxia",[48,42010,42011],{},"When designing Oxia, the architecture was adapted to take advantage of the primitives available in a Kubernetes environment rather than designing solely for a bare-metal environment.",[48,42013,42014],{},"One clear aspect since the beginning was that we didn’t want to reimplement a Paxos\u002FRaft consensus protocol for data replication.",[48,42016,42017],{},"Instead, we bootstrap the cluster by using Kubernetes ConfigMaps as a source of cluster status checkpoint. This checkpoint is used to have a single consistent view of the Oxia cluster, its shards, and assignments.",[48,42019,42020],{},"This status is minimal in size and infrequently updated, and it enormously simplifies the task of consistent data replication.",[48,42022,42023],{},"Instead of implementing a full-blown Paxos\u002FRaft consensus algorithm, we can decouple the problem into two parts:",[1666,42025,42026,42029],{},[324,42027,42028],{},"Log-replication, without fault-recovery",[324,42030,42031],{},"The fault-recovery process",[48,42033,42034,42035,42038],{},"This is a similar approach to what is employed by ",[55,42036,862],{"href":23555,"rel":42037},[264]," for its data replication mechanism.",[48,42040,42041],{},"Log replication becomes more straightforward and approachable if we strip out the fault-recovery aspect, making it easier to implement and easier to optimize for speed.",[48,42043,42044],{},"On the other hand, fault recovery is generally more complex to understand and implement. However, it only needs to be optimized for “readability” rather than speed. Furthermore, using the cluster status checkpoint makes fault recovery easier because we can assume one single process to perform the recovery and have a monotonically increasing sequencer.",[48,42046,42047],{},"In Oxia, the leader election and fault-recovery tasks are assigned to the “Coordinator” process, while multiple storage pods serve client requests and perform log replication.",[48,42049,42050,42051,42054],{},"Figure 1 shows the architectural diagram of an Oxia cluster running in a Kubernetes environment.\n",[384,42052],{"alt":18,"src":42053},"\u002Fimgs\u002Fblogs\u002F6449cf2b67cd484ac4e5c05c_image2.webp","Figure 1. Oxia architecture",[40,42056,42058],{"id":42057},"verifying-correctness","Verifying correctness",[48,42060,42061],{},"Given the goal of Oxia being a critical component of Apache Pulsar and, in general, sitting at the core of distributed systems infrastructure, testing its correctness under all conditions is of paramount importance.",[48,42063,42064],{},"We have employed three approaches to validate the correctness of Oxia:",[32,42066,42068],{"id":42067},"tla-model","TLA+ model",[48,42070,42071,42076],{},[55,42072,42075],{"href":42073,"rel":42074},"https:\u002F\u002Flamport.azurewebsites.net\u002Ftla\u002Ftla.html",[264],"TLA+"," is a high-level language for modeling distributed and concurrent systems.",[48,42078,42079],{},"We have started by defining a TLA+ model of the Oxia replication protocol. Using the TLA+ tools, we ran the Oxia model and explored all the possible states and transitions, validating that the guarantees are not violated (e.g., all the updates are replicated across all the nodes in the correct order, with no missing or duplicated entries).",[32,42081,42083],{"id":42082},"maelstrom-jepsen-test","Maelstrom \u002F Jepsen test",[48,42085,42086,42091,42092,42097],{},[55,42087,42090],{"href":42088,"rel":42089},"https:\u002F\u002Fgithub.com\u002Fjepsen-io\u002Fmaelstrom",[264],"Maelstrom"," is a tool that makes it easy to run a ",[55,42093,42096],{"href":42094,"rel":42095},"https:\u002F\u002Fjepsen.io\u002F",[264],"Jepsen"," simulation to verify the correctness of a system.",[48,42099,42100],{},"Unlike TLA+, Maelstrom works by running the actual production code, injecting different kinds of failures, and verifying that the external properties are not violated using the Jepsen library.",[48,42102,42103,42104,190],{},"For Oxia, we run a multi-node Oxia cluster as a set of multiple processes running in a single physical machine. Instead of TCP networking through gRPC, we run Oxia nodes that use stdin\u002Fstdout to communicate using the JSON-based ",[55,42105,42108],{"href":42106,"rel":42107},"https:\u002F\u002Fgithub.com\u002Fjepsen-io\u002Fmaelstrom\u002Fblob\u002Fmain\u002Fdoc\u002Fprotocol.md",[264],"Maelstrom protocol",[32,42110,42112],{"id":42111},"chaos-mesh","Chaos Mesh",[48,42114,42115,42119],{},[55,42116,42112],{"href":42117,"rel":42118},"https:\u002F\u002Fchaos-mesh.org\u002F",[264]," is a tool that helps to define a testing plan and generate different classes of failure in a system.",[48,42121,42122],{},"In Oxia, we use ChaosMesh to validate how the system responds to the injected failures, whether the semantic guarantees are respected, and whether the degraded performance is appropriate with respect to the injected failures.",[48,42124,42125],{},"We continuously test Oxia’s correctness as a critical component of Apache Pulsar and distributed systems infrastructure. The testing with Chaos Mesh and Maelstom is ongoing and aims to ensure the system's correctness is not violated, it functions as expected, and the performance meets expectations.",[40,42127,42129],{"id":42128},"conclusion-and-future-work","Conclusion and future work",[48,42131,42132,42133,1186,42137,1186,42142,1186,42147,1186,42151,5422,42156,190],{},"A huge thanks to the team that created Oxia, including ",[55,42134,42136],{"href":42135},"mailto:davidamaughan@gmail.com","Dave Maughan",[55,42138,42141],{"href":42139,"rel":42140},"https:\u002F\u002Fgithub.com\u002Fandrasbeni",[264],"Andras Beni",[55,42143,42146],{"href":42144,"rel":42145},"https:\u002F\u002Fgithub.com\u002Fteabot",[264],"Elliot West",[55,42148,42150],{"href":42149},"mailto:qiang.zhao@streamnative.io","Qiang Zhao",[55,42152,42155],{"href":42153,"rel":42154},"https:\u002F\u002Fgithub.com\u002Fnodece",[264],"Zixuan Liu",[55,42157,42159],{"href":42158},"mailto:cong.zhao@streamnative.io","Cong Zhao",[48,42161,42162],{},"Replacing ZooKeeper usage in Apache Pulsar is just the tip of the iceberg for the versatile and powerful Oxia. Our team is confident that Oxia will prove to be a valuable solution in a wide range of applications, not just limited to Pulsar but also for other distributed systems experiencing similar problems and constraints with existing solutions.",[48,42164,42165],{},"There are numerous possibilities for further enhancing Oxia: such as augmenting the Oxia operator with more intelligence or introducing automatic shard splitting and merging to adapt to changing load conditions.",[48,42167,42168,42169,42174],{},"We invite everyone to try Oxia and ",[55,42170,42173],{"href":42171,"rel":42172},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia\u002Fdiscussions",[264],"reach out"," with any questions, feedback, or ideas for improvement. As an open-source project, we rely on community contributions to continue advancing the technology, and your involvement will help make Oxia more widely beneficial.",[48,42176,42177],{},[34077,42178],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":42180},[42181,42182,42183,42184,42185,42190],{"id":41889,"depth":19,"text":41890},{"id":41933,"depth":19,"text":41934},{"id":41988,"depth":19,"text":41989},{"id":42007,"depth":19,"text":42008},{"id":42057,"depth":19,"text":42058,"children":42186},[42187,42188,42189],{"id":42067,"depth":279,"text":42068},{"id":42082,"depth":279,"text":42083},{"id":42111,"depth":279,"text":42112},{"id":42128,"depth":19,"text":42129},"2023-04-27","StreamNative open-sourced Oxia, a scalable metadata store and coordination system that can be used as the core infrastructure to build large-scale distributed systems.","\u002Fimgs\u002Fblogs\u002F644aab256956c6403d7e919a_oxia.jpg",{},{"title":27743,"description":42192},"blog\u002Fintroducing-oxia-scalable-metadata-and-coordination",[302],"U4rcpLHNvIiry0I6HaqAl2aDJ7GGKcS6Wee1riWYAIU",{"id":42200,"title":42201,"authors":42202,"body":42203,"category":821,"createdAt":290,"date":42788,"description":42789,"extension":8,"featured":294,"image":42790,"isDraft":294,"link":290,"meta":42791,"navigation":7,"order":296,"path":42792,"readingTime":42793,"relatedResources":290,"seo":42794,"stem":42795,"tags":42796,"__hash__":42797},"blogs\u002Fblog\u002Fusing-pulsar-functions-in-a-cloud-native-way-with-function-mesh.md","Using Pulsar Functions in a Cloud-native Way with Function Mesh",[6500],{"type":15,"value":42204,"toc":42767},[42205,42208,42225,42229,42232,42258,42261,42266,42269,42273,42276,42287,42290,42301,42304,42309,42312,42316,42325,42328,42331,42336,42341,42344,42347,42352,42356,42359,42362,42365,42368,42379,42382,42385,42389,42392,42401,42404,42415,42418,42421,42424,42428,42437,42443,42452,42458,42461,42466,42470,42473,42476,42481,42485,42488,42497,42500,42514,42518,42525,42527,42532,42535,42538,42543,42547,42578,42586,42591,42595,42598,42602,42610,42613,42618,42620,42629,42635,42644,42647,42661,42664,42668,42683,42687,42690,42693,42707,42709,42715,42758,42762],[48,42206,42207],{},"Apache Pulsar, a distributed streaming and messaging platform, is inherently designed to excel in cloud-native environments. It offers Pulsar Functions, a serverless computing framework that enables users to create functions that utilize one topic as an input and another topic as an output. However, leveraging Pulsar Functions in a cloud-native setting may present challenges for users. In this blog post, I will discuss the following topics:",[321,42209,42210,42213,42216,42219,42222],{},[324,42211,42212],{},"Why individuals and organizations use Pulsar;",[324,42214,42215],{},"The challenges of running Pulsar Functions;",[324,42217,42218],{},"What is Function Mesh and how does it deal with the challenges;",[324,42220,42221],{},"New capabilities and extensions brought to Pulsar Functions on Kubernetes with Function Mesh;",[324,42223,42224],{},"Future plans for Function Mesh.",[40,42226,42228],{"id":42227},"pulsar-overview","Pulsar overview",[48,42230,42231],{},"In recent years, an increasing number of individuals and organizations have chosen to use Pulsar for various reasons, such as:",[321,42233,42234,42245,42248,42251],{},[324,42235,42236,42237,42240,42241,42244],{},"High throughput and low latency: According to the ",[55,42238,42239],{"href":27690},"Apache Pulsar vs. Apache Kafka 2022 Benchmark"," report and the ",[55,42242,42243],{"href":33988},"2023 Messaging Benchmark Report: Apache Pulsar vs. RabbitMQ vs. NATS JetStream",", Pulsar can achieve high throughput and low latency even with tens of thousands of topics or partitions in a cluster, while ensuring message persistence. It outperformed other messaging systems in the reports.",[324,42246,42247],{},"Excellent scalability: Pulsar’s exceptional scalability is another attractive feature. When users scale a cluster by adding new nodes, both Pulsar brokers and BookKeeper can immediately allocate new workloads to them without waiting for existing data to be redistributed. This operator-friendly feature significantly reduces the complexity and risks of scaling.",[324,42249,42250],{},"High availability for large-scale distributed data storage: Pulsar natively supports features like multi-tenancy, asynchronous geo-replication, and tiered storage. It is suitable for long-term persistent storage of large-scale streaming messages.",[324,42252,42253,42254,42257],{},"A thriving ecosystem: The ",[55,42255,38697],{"href":35258,"rel":42256},[264]," lists a variety of tools integrated into Pulsar’s ecosystem, such as IO connectors, protocol handlers, and offloaders. They allow for easy integration of Pulsar with other systems for data migration and processing.",[48,42259,42260],{},"Although Pulsar offers these open-source features natively, deploying a production-grade Pulsar cluster in a private environment and fully utilizing its capabilities is still a challenging task. As shown in Figure 1, a minimal Pulsar cluster includes a ZooKeeper cluster for metadata storage, a BookKeeper cluster as a distributed storage system, and a broker cluster for messaging and streaming capabilities. If you want to expose Pulsar externally, you need an additional proxy layer to route traffic.",[48,42262,42263],{},[384,42264],{"alt":18,"src":42265},"\u002Fimgs\u002Fblogs\u002F6447bfd55239d1749d93cfc2_image10.webp",[48,42267,42268],{},"For easier deployment and operation, more users may opt to use Pulsar in cloud-native environments such as Kubernetes. In this connection, the ability to efficiently utilize Pulsar's native features on Kubernetes is a crucial factor when making containerization decisions, with Pulsar Functions being a prominent example.",[40,42270,42272],{"id":42271},"understanding-pulsar-functions","Understanding Pulsar Functions",[48,42274,42275],{},"Before I talk about using Pulsar Functions on Kubernetes, let me briefly explain its concept. We know that big data computing typically falls into three categories:",[1666,42277,42278,42281,42284],{},[324,42279,42280],{},"Interactive queries: Common computing scenarios are based on Presto.",[324,42282,42283],{},"Batch\u002Fstream processing: Frequently used systems include Apache Flink and Apache Spark.",[324,42285,42286],{},"IO connectors: Pulsar provides sink and source connectors, allowing different engines to understand Pulsar schemas and treat Pulsar topics as tables to read data.",[48,42288,42289],{},"Different from the above-mentioned tools, which are used for complex computing scenarios, Pulsar Functions are lightweight computing processes that",[321,42291,42292,42295,42298],{},[324,42293,42294],{},"Consume messages from Pulsar topics;",[324,42296,42297],{},"Apply a user-supplied processing approach to each message;",[324,42299,42300],{},"Publish results to another Pulsar topic.",[48,42302,42303],{},"Figure 2 illustrates this process. Internally, Pulsar Functions offer simplified message processing with a function abstraction, which allows users to use basic features like creation, management, and replica scheduling.",[48,42305,42306],{},[384,42307],{"alt":18,"src":42308},"\u002Fimgs\u002Fblogs\u002F6447bff992f03d019741fa23_image6.webp",[48,42310,42311],{},"Pulsar Functions are not designed to provide a complex computing engine but to integrate serverless technologies with Pulsar. Common use cases such as ETL and real-time aggregation account for approximately 60%-70% of overall scenarios and about 80%-90% of IoT scenarios. With Pulsar Functions, users can perform basic data processing at Pulsar’s messaging end without building complex clusters, saving on data transmission and computing resources.",[32,42313,42315],{"id":42314},"function-workers","Function workers",[48,42317,42318,42319,42324],{},"We know that Pulsar brokers provide messaging and streaming services, but how do they schedule and manage functions and offer the corresponding APIs? Pulsar relies on function workers to monitor, orchestrate, and execute individual functions in the ",[55,42320,42323],{"href":42321,"rel":42322},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Ffunctions-deploy-cluster\u002F",[264],"cluster-mode"," deployment. Function workers provide a complete set of RESTful APIs for full lifecycle management of functions, which are integrated into tools like pulsar-admin.",[48,42326,42327],{},"When using Pulsar Functions, function workers run together with brokers, which is the default behavior set in the Helm chart provided by the Pulsar community. This is easy for deployment and management, suitable for scenarios with limited resources and non-intensive function usage.",[48,42329,42330],{},"If you require higher isolation and want to prevent function workers from impacting your cluster (intensive function usage), you can choose to run function workers as a separate cluster for functions. That said, this approach still needs more best practices for cloud-native deployment and management, so you may need to invest more time and effort in configuration and maintenance.",[48,42332,42333],{},[384,42334],{"alt":18,"src":42335},"\u002Fimgs\u002Fblogs\u002F6447c0266ca2bec699077837_image1.webp",[916,42337,42338],{},[48,42339,42340],{},"Note: The StreamNative team has validated this mode, and we will share our experience later in the article.",[48,42342,42343],{},"Function workers support running functions in different ways. Generally, they can run functions on their own or together with brokers; in other words, you can invoke function threads in function workers or in processes forked by function workers. As function workers include a Kubernetes runtime implementation, you can package functions as StatefulSets and deploy them on Kubernetes.",[48,42345,42346],{},"In Figure 4, functions are not running within broker or function worker Pods. They are deployed in a separate StatefulSet to avoid the security risks of running together with brokers or function workers.",[48,42348,42349],{},[384,42350],{"alt":18,"src":42351},"\u002Fimgs\u002Fblogs\u002F6447c049641c0c81b4d12aa9_image8.webp",[40,42353,42355],{"id":42354},"challenges-of-running-pulsar-functions-on-kubernetes","Challenges of running Pulsar Functions on Kubernetes",[48,42357,42358],{},"As the number of Pulsar Functions users increases, we have started to see the limitations of running Pulsar Functions on Kubernetes.",[48,42360,42361],{},"One major issue is the potential crash loop when launching functions on Kubernetes. As each broker has a function worker, all management and maintenance interfaces are aggregated for the corresponding function. When you submit a function to a function worker, its metadata information and related resources are stored in a topic. During scheduling, Kubernetes must access the topic to retrieve the function’s metadata (for example, replica count) before deploying it as a StatefulSet. If the broker is not started or is unavailable, a crash loop may occur. The function will not begin to run until the broker is back online.",[48,42363,42364],{},"Another challenge in the process is metadata management. This process contains metadata in two separate places: the function’s metadata stored in a Pulsar topic and the StatefulSet submitted to Kubernetes. This complicates metadata management. For example, when you use kubectl to manage a function StatefulSet, there is no mechanism to synchronize the data stored in the Pulsar topic, leaving the change unknown to the function worker.",[48,42366,42367],{},"In addition to the two major issues, Pulsar Functions have the following problems when running on Kubernetes:",[321,42369,42370,42373,42376],{},[324,42371,42372],{},"Non-cloud-native: Kubernetes provides powerful capabilities like dynamic scaling and management. However, it is very difficult to leverage these cloud-native features for Pulsar Functions.",[324,42374,42375],{},"Token expiration: Due to the limitations of the current Kubernetes runtime implementation, tokens are the only available method for authentication and authorization with Pulsar brokers when submitting functions. As a result, function instances may fail to start once the token expires. To address this issue, the Pulsar community added the --update-auth-data option for pulsar-admin to help update tokens. However, it requires you to manually run the command to maintain token validity.",[324,42377,42378],{},"Complex task handling: In many scenarios, you may need to use multiple functions for a single task, or even combine functions with source and sink tools as a whole. Additionally, you need to use multiple commands to operate each function with different topics. All of these contribute to higher management and operation pressure.",[48,42380,42381],{},"In light of these challenges, the community was looking for a more efficient and compatible way to bring Pulsar Functions to cloud-native environments, enabling users to better leverage Kubernetes capabilities to manage and use Pulsar Functions for complex use cases.",[48,42383,42384],{},"This is where Function Mesh comes to play.",[40,42386,42388],{"id":42387},"function-mesh-rising-to-the-challenges","Function Mesh: Rising to the challenges",[48,42390,42391],{},"The primary goal of Function Mesh is not to support more complex, universally applicable computing frameworks, but to help users manage and use Pulsar Functions in a cloud-native way.",[48,42393,42394,42395,42400],{},"In 2020, the StreamNative team ",[55,42396,42399],{"href":42397,"rel":42398},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-66%3A-Pulsar-Function-Mesh",[264],"submitted a PIP"," in the Pulsar community, as we looked to provide a unified component allowing users to easily describe the relations between functions (like which function serves as the input\u002Foutput of another function). By combining this mindset with features such as scheduling and scaling in Kubernetes, we might be able to provide a better user experience of Pulsar Functions. As such, StreamNative proposed the open-source Function Mesh Operator built with the Kubernetes operator framework.",[48,42402,42403],{},"Function Mesh is an open-source Kubernetes operator for:",[321,42405,42406,42409,42412],{},[324,42407,42408],{},"Running Pulsar Functions natively on Kubernetes;",[324,42410,42411],{},"Utilizing Kubernetes native resources and scheduling capabilities;",[324,42413,42414],{},"Integrating separate functions together to process data.",[48,42416,42417],{},"Let’s look at some core concepts of Function Mesh.",[32,42419,35497],{"id":42420},"kubernetes-operator",[48,42422,42423],{},"Generally, deploying a Kubernetes operator involves creating the associated custom resource definition (CRD) and the custom controller. I will explain these two concepts in more detail in the context of Function Mesh.",[3933,42425,42427],{"id":42426},"custom-resource-definitions","Custom resource definitions",[48,42429,42430,42431,42436],{},"With CRDs, the Kubernetes operator can solve two major problems when using Pulsar Functions: describing and submitting functions, and scheduling functions. All function, sink, and source configurations can be described using CRDs, such as parallelism, input and output topics, autoscaling, and resource quotas. The following code snippet displays some ",[55,42432,42435],{"href":42433,"rel":42434},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Ffunction-mesh\u002Fblob\u002Fmaster\u002Fconfig\u002Fcrd\u002Fbases\u002Fcompute.functionmesh.io_functions.yaml",[264],"function CRD"," specifications.",[8325,42438,42441],{"className":42439,"code":42440,"language":8330},[8328],"type FunctionSpec struct {\n  \u002F\u002F INSERT ADDITIONAL SPEC FIELDS - desired state of cluster\n  \u002F\u002F Important: Run \"make\" to regenerate code after modifying this file\n\n  Name         string                      `json:\"name,omitempty\"`\n  ClassName    string                      `json:\"className,omitempty\"`\n  Tenant       string                      `json:\"tenant,omitempty\"`\n  ClusterName  string                      `json:\"clusterName,omitempty\"`\n  Replicas     *int32                      `json:\"replicas,omitempty\"`\n  MaxReplicas  *int32                      `json:\"maxReplicas,omitempty\"`\n  Input        InputConf                   `json:\"input,omitempty\"`\n  Output       OutputConf                  `json:\"output,omitempty\"`\n  LogTopic     string                      `json:\"logTopic,omitempty\"`\n  FuncConfig   map[string]string           `json:\"funcConfig,omitempty\"`\n  Resources    corev1.ResourceRequirements `json:\"resources,omitempty\"`\n  SecretsMap   map[string]SecretRef        `json:\"secretsMap,omitempty\"`\n  VolumeMounts []corev1.VolumeMount        `json:\"volumeMounts,omitempty\"`\n}\n",[4926,42442,42440],{"__ignoreMap":18},[48,42444,42445,42446,42451],{},"Additionally, we provide the ",[55,42447,42450],{"href":42448,"rel":42449},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Ffunction-mesh\u002Fblob\u002Fmaster\u002Fconfig\u002Fcrd\u002Fbases\u002Fcompute.functionmesh.io_functionmeshes.yaml",[264],"FunctionMesh CRD"," that allows users to configure functions and sources\u002Fsinks in complex computing scenarios. See the Function Mesh specifications below.",[8325,42453,42456],{"className":42454,"code":42455,"language":8330},[8328],"type FunctionMeshSpec struct {\n  \u002F\u002F INSERT ADDITIONAL SPEC FIELDS - desired state of cluster\n  \u002F\u002F Important: Run \"make\" to regenerate code after modifying this file\n\n  Sources    []SourceSpec    `json:\"sources,omitempty\"`\n  Sinks      []SinkSpec      `json:\"sinks,omitempty\"`\n  Functions  []FunctionSpec  `json:\"functions,omitempty\"`\n}\n",[4926,42457,42455],{"__ignoreMap":18},[48,42459,42460],{},"Figure 5 depicts a typical use case of Function Mesh and let’s assume this is a CDC scenario. You may want to use a source connector to ingest data from MongoDB, configure ETL, filtering, and routing, and then deliver messages to MySQL through a sink connector. With the Function Mesh CRD, you can describe the entire process in YAML and run the corresponding custom resource (CR) on Kubernetes.",[48,42462,42463],{},[384,42464],{"alt":18,"src":42465},"\u002Fimgs\u002Fblogs\u002F6447c08a8ec6ec368ede52cf_image3.webp",[3933,42467,42469],{"id":42468},"custom-controller","Custom controller",[48,42471,42472],{},"After you create the CR on Kubernetes, the custom controller enables and manages functions. The controller is an extension of the Kubernetes control plane and interacts directly with the Kubernetes API. It maps CRD configurations to corresponding Kubernetes resources and manages them throughout their lifecycle. The controller converts operational knowledge into a program that performs certain operations on the Kubernetes cluster when needed (making sure resources are in their desired state). It acts like an engineer but with greater efficiency and speed.",[48,42474,42475],{},"As shown in Figure 6, you can create CRs using kubectl based on their associated CRDs. With the help of custom controllers, the Kubernetes API schedules internal resources and monitors the status of the CRs. If CRDs are updated, CRs will be changed accordingly. Note that the Pulsar cluster only provides data pipeline services and that it does not store any function metadata.",[48,42477,42478],{},[384,42479],{"alt":18,"src":42480},"\u002Fimgs\u002Fblogs\u002F6447c0bbb1c6b7616e6bdbd6_image9.webp",[32,42482,42484],{"id":42483},"function-runner","Function Runner",[48,42486,42487],{},"The second concept you need to know is the runtime, or the containers that run functions submitted by users (also known as the Function Runner). Pulsar Functions support multiple programming languages for runtime, including Java, Python, and Go. Generally, they are packaged together with Pulsar images. However, it is not practical to use Pulsar’s image for each function container in Function Mesh. Additionally, as functions may come from third-party programs, there are security risks in Pulsar images as root privileges are used by default prior to version 2.10.",[48,42489,42490,42491,42496],{},"For a more secure experience of using Pulsar Functions, the StreamNative team provides separate ",[55,42492,42495],{"href":42493,"rel":42494},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Ffunctions\u002Ffunction-crd#runner-images",[264],"runner images for different languages",", including Java, Python, and Go. The Java runner image is integrated with StreamNative Sink and Source Connectors, which can be used directly in Function Mesh.",[48,42498,42499],{},"With runner images, you can choose either of the following ways to submit functions.",[1666,42501,42502,42505],{},[324,42503,42504],{},"Use a runner image to package the function and dependencies into a new image and submit it to Function Mesh;",[324,42506,42507,42508,42513],{},"Interact with Pulsar’s ",[55,42509,42512],{"href":42510,"rel":42511},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fadmin-api-packages\u002F",[264],"package management service"," by uploading the package to Pulsar. Function Mesh will schedule and download functions, and then run them in runner Pods.",[32,42515,42517],{"id":42516},"function-mesh-worker-service","Function Mesh Worker Service",[48,42519,42520,42521,42524],{},"The Function Mesh Worker Service",[2628,42522,42523],{},"1"," is similar to the Pulsar Function Worker Service, while it uses the Function Mesh Operator to schedule and run functions. When using Pulsar Functions, you use Function Worker REST APIs to access data. In the case of Function Mesh, the Function Mesh Worker Service also allows you to manage functions on Kubernetes with CLI tools like pulsar-admin and pulsarctl, providing a consistent user experience. See Figure 7 for details.",[48,42526,3931],{},[48,42528,42529],{},[384,42530],{"alt":18,"src":42531},"\u002Fimgs\u002Fblogs\u002F6447c0e55e16201ad48c407e_image5.webp",[48,42533,42534],{},"The StreamNative team proposed a plan to abstract the Function Mesh Worker Service as an interface in Pulsar 2.8. Based on the interface, the Kubernetes API can be used as an independent Worker Service implementation. This way, users only need to deploy the new Worker Service to the cluster in the same way as the Function Worker, and they can continue using pulsar-admin and pulsarctl to manage the functions in Function Mesh. To allow users to better utilize Kubernetes’ native capabilities, we added some customizable configurations in the Mesh Worker Service.",[48,42536,42537],{},"The following table lists the existing differences between the Pulsar Functions and Function Mesh Worker Service interfaces. The Function Mesh Worker Service has implemented most of the basic management interfaces, such as Create, Delete, and Update.",[48,42539,42540],{},[384,42541],{"alt":18,"src":42542},"\u002Fimgs\u002Fblogs\u002F6447c10b0c4f5ccc806b4490_image4.webp",[32,42544,42546],{"id":42545},"getting-started-with-function-mesh","Getting started with Function Mesh",[48,42548,42549,42550,1154,42555,42560,42561,42564,42565,5422,42570,4003,42573,42577],{},"To install the Function Mesh Operator, you can use ",[55,42551,42554],{"href":42552,"rel":42553},"https:\u002F\u002Foperatorhub.io\u002Foperator\u002Ffunction-mesh",[264],"Operator Lifecycle Manager (OLM)",[55,42556,42559],{"href":42557,"rel":42558},"https:\u002F\u002Fartifacthub.io\u002Fpackages\u002Fhelm\u002Ffunction-mesh\u002Ffunction-mesh-operator",[264],"the Helm chart",". As the Function Mesh Operator has been ",[55,42562,34359],{"href":42563},"\u002Fblog\u002Fstreamnatives-function-mesh-operator-certified-red-hat-openshift-operator",", you can also deploy it on OpenShift. I will not demonstrate the installation steps in this post, as the deployment deserves a separate article to explain the details. For more information, see ",[55,42566,42569],{"href":42567,"rel":42568},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002F",[264],"the Function Mesh documentation",[55,42571,29463],{"href":34283,"rel":42572},[264],[55,42574,42517],{"href":42575,"rel":42576},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Ffunction-mesh-worker-service",[264]," GitHub repositories.",[48,42579,42580,42581,22220],{},"Note that all the key features in Pulsar Functions are now supported by Function Mesh as shown in Table 2, including end-to-end encryption, secret management, and stateful functions. You can see ",[55,42582,42585],{"href":42583,"rel":42584},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-108:-Pulsar-Feature-Matrix-(Client-and-Function)",[264],"PIP 108",[48,42587,42588],{},[384,42589],{"alt":18,"src":42590},"\u002Fimgs\u002Fblogs\u002F6447c14230a3e7527293e8ce_image2.webp",[40,42592,42594],{"id":42593},"using-pulsar-functions-in-cloud-native-environments","Using Pulsar Functions in cloud-native environments",[48,42596,42597],{},"Now that we have a basic understanding of Function Mesh, let’s explore what we can do with it for Pulsar Functions in cloud-native environments.",[32,42599,42601],{"id":42600},"automatic-scaling","Automatic scaling",[48,42603,42604,42605,42609],{},"In Kubernetes, a ",[55,42606,42608],{"href":34630,"rel":42607},[264],"HorizontalPodAutoscaler (HPA)"," supports scaling based on CPU, memory, or custom metrics. Function Mesh allows users to define CRD-level autoscaling policies. Using tools like Prometheus and Prometheus Metrics Adapter, we can use Pulsar topic metrics or function metrics for HPA references in response to varying workloads.",[48,42611,42612],{},"As shown in Figure 7, a single-copy function saw increasing workloads, and the HPA immediately scaled the number of replicas to 10 according to the corresponding metric. After the load decreased, the HPA instructed the resource to scale back down. Previously, implementing load-based autoscaling was challenging in Pulsar Functions, but it becomes much easier with Function Mesh.",[48,42614,42615],{},[384,42616],{"alt":18,"src":42617},"\u002Fimgs\u002Fblogs\u002F6447c16230a3e7eac194186f_image7.webp",[32,42619,4301],{"id":4298},[48,42621,42622,42623,42628],{},"In a Kubernetes cluster, a Pod often needs to communicate with another Pod or even an external entity. When using Pulsar Functions, you need to run external code in many cases. If you can impose some network restrictions, you can greatly enhance cluster security. Therefore, we integrated Function Mesh with Istio, allowing users to leverage Istio’s capabilities to define Pod network rules through ",[55,42624,42627],{"href":42625,"rel":42626},"https:\u002F\u002Fistio.io\u002Flatest\u002Fdocs\u002Freference\u002Fconfig\u002Fsecurity\u002Fauthorization-policy\u002F",[264],"Istio Authorization Policy",". As shown below, you can allow the function to only talk to the broker Pods, preventing it from accessing other services like BookKeeper.",[8325,42630,42633],{"className":42631,"code":42632,"language":8330},[8328],"apiVersion: security.istio.io\u002Fv1beta1\nkind: AuthorizationPolicy\n...\nspec:\n  rules:\n    - from:\n        - source:\n            principals:\n              - demo\u002Fns\u002Fdemo\u002Fsa\u002Fcluster-broker\n  selector:\n    matchLabels:\n      cloud.streamnative.io\u002Fpulsar-cluster: cluster\n      cloud.streamnative.io\u002Frole: pulsar-function\n",[4926,42634,42632],{"__ignoreMap":18},[48,42636,42637,42638,42643],{},"To further enhance security for functions, you can also use built-in security policies in Kubernetes. For example, you can define privileges and access control settings for files and users using a ",[55,42639,42642],{"href":42640,"rel":42641},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Ftasks\u002Fconfigure-pod-container\u002Fsecurity-context\u002F",[264],"Security Context",", or create a Secret a configure a Service Account for each function to prevent unauthorized access to the Kubernetes API.",[48,42645,42646],{},"Security has always been a top priority in our work and we have put great efforts into ensuring the security of each and every component. For example:",[321,42648,42649,42652,42655,42658],{},[324,42650,42651],{},"Non-root images are introduced for runner images;",[324,42653,42654],{},"The controller ensures that functions run with non-root privileges;",[324,42656,42657],{},"Separate Service Accounts are used for functions;",[324,42659,42660],{},"Users are able to configure authorization with the broker for each function.",[48,42662,42663],{},"Function Mesh allows users to run Pulsar Functions in a more cloud-native way, and more importantly, it is more secure, manageable, and controllable. This is another vision of the StreamNative team when developing Function Mesh.",[32,42665,42667],{"id":42666},"ecosystem-integration","Ecosystem integration",[48,42669,42670,42671,42676,42677,42682],{},"Function Mesh integrates Pulsar Functions into the Kubernetes ecosystem, which means users can take advantage of Kubernetes’ powerful capabilities and use Pulsar Functions with more ecosystem tools. For example, ",[55,42672,42675],{"href":42673,"rel":42674},"https:\u002F\u002Fkeda.sh\u002F",[264],"KEDA"," can help Pulsar Functions scale more efficiently; ",[55,42678,42681],{"href":42679,"rel":42680},"https:\u002F\u002Fbuildpacks.io\u002F",[264],"Buildpacks"," allow Function Mesh to build function images at runtime, letting users upload code directly or submit Pulsar Functions through GitHub repositories; it is even possible to integrate WebAssembly and Rust into Pulsar Functions using Krustlet. The Kubernetes ecosystem offers more possibilities, enabling Pulsar users to leverage functions in a wider range of use cases.",[40,42684,42686],{"id":42685},"conclusion-and-future-plans","Conclusion and future plans",[48,42688,42689],{},"Function Mesh simplifies the management of Pulsar Functions and enables users to leverage more powerful features in Kubernetes like autoscaling. By bringing Pulsar Functions into the cloud-native world, functions can run as first-class citizens and benefit from the Kubernetes ecosystem. With Function Mesh, Pulsar Functions can run in a separate cluster (not in a Pulsar cluster, but in a compute-intensive cluster), which greatly improves resource scheduling and utilization.",[48,42691,42692],{},"Here are StreamNative’s future plans for Function Mesh:",[321,42694,42695,42698,42701,42704],{},[324,42696,42697],{},"Improve the Function Mesh Operator, especially in terms of observability and autoscaling;",[324,42699,42700],{},"Feature parity with Pulsar Functions to ensure a consistent user experience;",[324,42702,42703],{},"Provide better tools to help users orchestrate Function Mesh resources and easily build complete workflows;",[324,42705,42706],{},"Support package integration with cloud storage providers.",[40,42708,38376],{"id":38375},[48,42710,38379,42711,42714],{},[55,42712,38384],{"href":38382,"rel":42713},[264]," over the past few years, with a vibrant community driving innovation and improvements to the project. Check out the following resources to learn more about Pulsar, Pulsar Functions, and Function Mesh.",[321,42716,42717,42729,42734,42742,42749],{},[324,42718,42719,42720,4003,42724,20076],{},"Pulsar Virtual Summit Europe 2023 will take place on Tuesday, May 23rd, 2023! See the ",[55,42721,42723],{"href":42722},"\u002Fblog\u002Fspeakers-and-agenda-announced-for-pulsar-virtual-summit-europe-2023","schedule",[55,42725,42728],{"href":42726,"rel":42727},"https:\u002F\u002Fevents.zoom.us\u002Fev\u002FAp6rsDg9LeVfmdajJ_eB13HH026J1d_o8OoTKkQnl_jzVl-srhwB~AggLXsr32QYFjq8BlYLZ5I06Dg",[264],"register now for free",[324,42730,38390,42731,190],{},[55,42732,31914],{"href":31912,"rel":42733},[264],[324,42735,42736,758,42738],{},[2628,42737,40436],{},[55,42739,42741],{"href":42740},"\u002Fblog\u002Fusing-cloud-native-buildpacks-improve-function-image-building-capability-function-mesh","Using Cloud Native Buildpacks to Improve the Function Image Building Capability of Function Mesh",[324,42743,42744,758,42746],{},[2628,42745,40436],{},[55,42747,42748],{"href":42563},"StreamNative’s Function Mesh Operator Certified as a Red Hat OpenShift Operator",[324,42750,42751,758,42754],{},[2628,42752,42753],{},"Doc",[55,42755,42757],{"href":42567,"rel":42756},[264],"What is Function Mesh?",[40,42759,42761],{"id":42760},"notes","Notes",[48,42763,42764,42766],{},[2628,42765,42523],{}," The Function Mesh Worker Service is part of the StreamNative Cloud offering. It provides compatibility with the Pulsar Functions admin API, allowing you to submit functions using Pulsar's admin tools without altering your existing function deployment workflow.",{"title":18,"searchDepth":19,"depth":19,"links":42768},[42769,42770,42773,42774,42780,42785,42786,42787],{"id":42227,"depth":19,"text":42228},{"id":42271,"depth":19,"text":42272,"children":42771},[42772],{"id":42314,"depth":279,"text":42315},{"id":42354,"depth":19,"text":42355},{"id":42387,"depth":19,"text":42388,"children":42775},[42776,42777,42778,42779],{"id":42420,"depth":279,"text":35497},{"id":42483,"depth":279,"text":42484},{"id":42516,"depth":279,"text":42517},{"id":42545,"depth":279,"text":42546},{"id":42593,"depth":19,"text":42594,"children":42781},[42782,42783,42784],{"id":42600,"depth":279,"text":42601},{"id":4298,"depth":279,"text":4301},{"id":42666,"depth":279,"text":42667},{"id":42685,"depth":19,"text":42686},{"id":38375,"depth":19,"text":38376},{"id":42760,"depth":19,"text":42761},"2023-04-25","Understand the limitations of using Pulsar Functions on Kubernetes and explore a more cloud-native way with Function Mesh.","\u002Fimgs\u002Fblogs\u002F6447bd1fd3c0ee995ef99e07_using-pulsar-functions-in-a-cloud-native-way-with-function-mesh.png",{},"\u002Fblog\u002Fusing-pulsar-functions-in-a-cloud-native-way-with-function-mesh","9 min read",{"title":42201,"description":42789},"blog\u002Fusing-pulsar-functions-in-a-cloud-native-way-with-function-mesh",[9636,821,4839,16985],"9ELjlGNTdGvWVOoKuuhhMfZqBj0kvukkrqo4RR6x6jw",{"id":42799,"title":42800,"authors":42801,"body":42802,"category":821,"createdAt":290,"date":43286,"description":42800,"extension":8,"featured":294,"image":43287,"isDraft":294,"link":290,"meta":43288,"navigation":7,"order":296,"path":41811,"readingTime":5505,"relatedResources":290,"seo":43289,"stem":43290,"tags":43291,"__hash__":43292},"blogs\u002Fblog\u002Fan-operators-guide-configuring-geo-replication-with-the-pulsar-resources-operator.md","An Operator’s Guide: Configuring Geo-Replication with the Pulsar Resources Operator",[24776,41695],{"type":15,"value":42803,"toc":43268},[42804,42806,42809,42812,42824,42861,42870,42874,42877,42879,42909,42913,42916,42930,42934,42967,43052,43056,43059,43063,43069,43072,43109,43113,43122,43126,43142,43146,43153,43157,43160,43186,43189,43193,43196,43199,43201,43204,43207,43209,43225,43266],[40,42805,46],{"id":42},[48,42807,42808],{},"Apache Pulsar is a messaging and streaming platform that provides various features for businesses to manage their data and ensure its availability and reliability. One of the most powerful features of Pulsar is its built-in geo-replication functionality, which enables businesses to replicate data across multiple data centers, ensuring that data remains accessible and dependable even in the face of network failures or degraded performance between data centers.",[48,42810,42811],{},"By following best practices, such as monitoring the replication backlog, planning for capacity, and throttling replication, businesses can maximize the benefits of geo-replication in Pulsar and maintain a high-performing and dependable messaging system.",[48,42813,42814,42815,4003,42819,42823],{},"While configuring geo-replication using CLI tools such as ",[55,42816,38169],{"href":42817,"rel":42818},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fpulsar-admin\u002F",[264],[55,42820,34522],{"href":42821,"rel":42822},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsarctl",[264]," is possible, there are better practices than this when running a Pulsar cluster on Kubernetes. These methods can be complex and time-consuming, requiring in-depth knowledge of the Pulsar ecosystem. To make this process more straightforward and efficient, the Pulsar Resources Operator provides a streamlined approach to configuring and managing Pulsar resources on a Kubernetes cluster. The Pulsar Resources Operator is a controller that automates the management of Pulsar resources through manifest files, enabling full lifecycle management for the resources necessary for geo-replication, including creation, update, and deletion for the following resources:",[321,42825,42826,42833,42840,42847,42854],{},[324,42827,42828],{},[55,42829,42832],{"href":42830,"rel":42831},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator\u002Fblob\u002Fmain\u002Fdocs\u002Fpulsar_connection.md",[264],"PulsarConnection",[324,42834,42835],{},[55,42836,42839],{"href":42837,"rel":42838},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator\u002Fblob\u002Fmain\u002Fdocs\u002Fpulsar_tenant.md",[264],"Tenants",[324,42841,42842],{},[55,42843,42846],{"href":42844,"rel":42845},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator\u002Fblob\u002Fmain\u002Fdocs\u002Fpulsar_namespace.md",[264],"Namespaces",[324,42848,42849],{},[55,42850,42853],{"href":42851,"rel":42852},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator\u002Fblob\u002Fmain\u002Fdocs\u002Fpulsar_topic.md",[264],"Topics",[324,42855,42856],{},[55,42857,42860],{"href":42858,"rel":42859},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator\u002Fblob\u002Fmain\u002Fdocs\u002Fpulsar_permission.md",[264],"Permissions",[48,42862,42863,42864,42869],{},"With the Pulsar Resources Operator, configuring and managing multi-cluster environments no longer requires expertise in command-line tools. In the latest release of the Pulsar Resources Operator, version 0.3.0, we introduced a new Custom Resource named ",[55,42865,42868],{"href":42866,"rel":42867},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator\u002Fblob\u002Fmain\u002Fdocs\u002Fpulsar_geo_replication.md",[264],"PulsarGeoReplication",". This controller provides a declarative way to configure and manage geo-replication efficiently.",[40,42871,42873],{"id":42872},"get-started-with-the-pulsar-resources-operator","Get started with the Pulsar Resources Operator",[48,42875,42876],{},"You can install the Pulsar Resources Operator using the officially supported pulsar-resources-operator Helm chart. It provides Custom Resource Definitions (CRDs) and Controllers to manage Pulsar resources.",[32,42878,10104],{"id":10103},[321,42880,42881,42890,42898,42901],{},[324,42882,42883,42884,42889],{},"Install ",[55,42885,42888],{"href":42886,"rel":42887},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Ftasks\u002Ftools\u002F#kubectl",[264],"kubectl"," (v1.16 - v1.25), compatible with your cluster (+\u002F- 1 minor release from your cluster).",[324,42891,42883,42892,42897],{},[55,42893,42896],{"href":42894,"rel":42895},"https:\u002F\u002Fhelm.sh\u002Fdocs\u002Fintro\u002Finstall\u002F",[264],"Helm"," (v3.0.2 or higher).",[324,42899,42900],{},"Prepare a Kubernetes cluster (v1.16 - v1.25).",[324,42902,42903,42904,190],{},"Prepare a ",[55,42905,42908],{"href":42906,"rel":42907},"https:\u002F\u002Fdocs.streamnative.io\u002Foperators\u002Fpulsar-operator\u002Ftutorial\u002Fdeploy-pulsar",[264],"Pulsar cluster",[32,42910,42912],{"id":42911},"steps","Steps",[48,42914,42915],{},"To install the Pulsar Resources Operator, perform the following steps.",[1666,42917,42918,42924],{},[324,42919,42920,42921],{},"Add the StreamNative chart repository.\n",[384,42922],{"alt":18,"src":42923},"\u002Fimgs\u002Fblogs\u002F6441adf1bcc40476119378ee_Screen-Shot-2023-04-20-at-2.25.31-PM.jpg",[324,42925,42926,42927],{},"Install the operator using the pulsar-resources-operator Helm chart.\n",[384,42928],{"alt":18,"src":42929},"\u002Fimgs\u002Fblogs\u002F6441ae328724936591b91f1d_Screen-Shot-2023-04-20-at-2.26.47-PM.webp",[40,42931,42933],{"id":42932},"configuring-geo-replication-using-the-pulsar-resources-operator","Configuring Geo-Replication using the Pulsar Resources Operator",[48,42935,42936,42937,42940,42941,42943,42944,42947,42948,42951,42952,1186,42955,4003,42958,42961,42962,42966],{},"We added a new Custom Resource ",[55,42938,42868],{"href":42866,"rel":42939},[264],", which describes the unidirectional replication between Pulsar clusters. Below, the ",[4926,42942,42868],{}," CR shows the replication from the us-east-cluster to its destination cluster.\n",[384,42945],{"alt":18,"src":42946},"\u002Fimgs\u002Fblogs\u002F6441ae6cae4dd548f20e4e9a_Screen-Shot-2023-04-20-at-2.27.40-PM.webp","\nWe added a new field, ",[4926,42949,42950],{},"geoReplicationRefs",", to the ",[4926,42953,42954],{},"PulsarTenant",[4926,42956,42957],{},"PulsarNamespace",[4926,42959,42960],{},"PulsarTopic"," resources to enable geo-replication at the tenant, namespace or topic level. However, note enabling geo-replication at the topic level requires an additional step of enabling the ",[55,42963,42965],{"href":42964},"\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-0#topic-level-policy","topic-level-policy"," on the Pulsar cluster.",[48,42968,42969,42970,4003,42973,42976,42977,42980,42981,42983,42984,42986,42987,42990,42991,42993,42994,42996,42997,29496,42999,41750,43001,43004,43005,1186,43007,4003,43009,43011,43012,43014,43015,43018,42980,43021,43023,43024,1186,43026,5422,43028,43011,43030,43032,43033,43036,43039,43042,43045,43046,43048,43049,43051],{},"Here we use two Pulsar clusters, ",[4926,42971,42972],{},"us-east",[4926,42974,42975],{},"us-west",", running on two separate Kubernetes platforms, to demonstrate the geo-replication setup based on the Pulsar Resources Operator.\n",[384,42978],{"alt":18,"src":42979},"\u002Fimgs\u002Fblogs\u002F6441ad6be5b6d329b632431e_Screen-Shot-2023-04-20-at-2.22.35-PM.webp","\nOn the ",[4926,42982,42972],{}," cluster, we create a ",[4926,42985,42832],{}," for the local cluster and the destination cluster to establish a connection between the two clusters.\n",[384,42988],{"alt":18,"src":42989},"\u002Fimgs\u002Fblogs\u002F6441aeb48e41170d579eea1d_Screen-Shot-2023-04-20-at-2.28.59-PM.webp","\nOnce the ",[4926,42992,42832],{}," is established, we use the ",[4926,42995,42868],{}," Custom Resource to configure the replication from ",[4926,42998,42972],{},[4926,43000,42975],{},[384,43002],{"alt":18,"src":43003},"\u002Fimgs\u002Fblogs\u002F6441aee079fc3d2b78aead52_Screen-Shot-2023-04-20-at-2.29.47-PM.webp","\nWe can now create the ",[4926,43006,42954],{},[4926,43008,42957],{},[4926,43010,42960],{}," resources with the ",[4926,43013,42950],{}," configuration, which enables geo-replication between the source and destination clusters.\n",[384,43016],{"alt":18,"src":43017},"\u002Fimgs\u002Fblogs\u002F6441af2d37fca63dccf5ecb5_Screen-Shot-2023-04-20-at-2.31.03-PM.webp",[384,43019],{"alt":18,"src":43020},"\u002Fimgs\u002Fblogs\u002F6441b274309f7228d1d978e4_Screen-Shot-2023-04-20-at-2.44.59-PM.webp",[4926,43022,42975],{}," cluster, we need to create the corresponding resources in reverse as follows. We then create ",[4926,43025,42954],{},[4926,43027,42957],{},[4926,43029,42960],{},[4926,43031,42950],{}," configuration, as before, but with the source and destination clusters reversed.\n",[384,43034],{"alt":18,"src":43035},"\u002Fimgs\u002Fblogs\u002F6441b355bbc0f51c70e5df14_Screen-Shot-2023-04-20-at-2.48.42-PM.webp",[384,43037],{"alt":18,"src":43038},"\u002Fimgs\u002Fblogs\u002F6441b37fbbc0f5aa50e60dc1_Screen-Shot-2023-04-20-at-2.49.31-PM.webp",[384,43040],{"alt":18,"src":43041},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F6441b3dcc52c6863971826ea_Screen%20Shot%202023-04-20%20at%202.50.40%20PM%20(1).webp",[384,43043],{"alt":18,"src":43044},"\u002Fimgs\u002Fblogs\u002F6441b412efd26a0bcd105ecc_Screen-Shot-2023-04-20-at-2.51.51-PM.webp","\nThis sets up a unidirectional replication of messages from the ",[4926,43047,42975],{}," cluster to the ",[4926,43050,42972],{}," cluster, providing a complete geo-replication solution between the two clusters.",[40,43053,43055],{"id":43054},"tips-and-best-practices-for-geo-replication","Tips and best practices for geo-replication",[48,43057,43058],{},"After setting up the geo-replication through the Pulsar Resources Operator, you can manage and monitor the replication to ensure it’s working properly. Here are some tips and best practices to help you do that.",[32,43060,43062],{"id":43061},"modifying-resource-configuration","‍Modifying resource configuration",[48,43064,43065,43066,190],{},"The Resource Operator provides a declarative API to manage the resource objects, which means you can modify your Custom Resource YAML configuration and apply the changes using ",[4926,43067,43068],{},"kubectl apply",[48,43070,43071],{},"However, it’s important to be aware of the implicit dependencies between different resource objects when modifying the configuration.",[321,43073,43074,43082,43095],{},[324,43075,43076,43078,43079,43081],{},[4926,43077,42868],{}," CR depends on the ",[4926,43080,42832],{}," object",[324,43083,43084,1186,43086,1186,43088,43078,43090,4003,43092,43094],{},[4926,43085,42954],{},[4926,43087,42957],{},[4926,43089,42960],{},[4926,43091,42832],{},[4926,43093,42868],{}," objects.",[324,43096,43097,1186,43099,43101,43102,43104,43105,1773,43107,190],{},[4926,43098,42957],{},[4926,43100,42960],{}," CR with ",[4926,43103,42950],{}," field depends on the target ",[4926,43106,42954],{},[4926,43108,42950],{},[32,43110,43112],{"id":43111},"monitoring-replication-status","Monitoring replication status",[48,43114,43115,43116,43121],{},"To monitor the status of the geo-replication, you can use the ",[55,43117,43120],{"href":43118,"rel":43119},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Freference-metrics\u002F#replication-metrics-1",[264],"Pulsar Broker Replication metrics",". These metrics allow you to track the performance and health of the replication, with detailed information on the replication status, including the number of messages replicated, the lag between the primary and secondary clusters, and the replication rate.",[32,43123,43125],{"id":43124},"upgrading-from-01x-or-02x-resources-operator","Upgrading from 0.1.x or 0.2.x Resources Operator",[48,43127,43128,43129,43131,43132,4003,43135,43138,43139,43141],{},"If you have existing ",[4926,43130,42832],{}," objects running with the 0.1.x or 0.2.x Resource Operator and want to configure the geo-replication, you need to add the ",[4926,43133,43134],{},"brokerServiceURL",[4926,43136,43137],{},"clusterName"," fields for the ",[4926,43140,42832],{}," objects after upgrading to 0.3.0 version. These fields are required for geo-replication to work properly.",[32,43143,43145],{"id":43144},"integrating-with-gitops-for-version-control-and-approval","Integrating with GitOps for version control and approval",[48,43147,43148,43149,38617],{},"You can integrate the Pulsar Resources Operator with GitOps workflows for version control and approval of your Kubernetes configurations. With tools like ArgoCD and GitHub, you can control Pulsar CR changes in an elegant way. This integration ensures that all changes to the Pulsar cluster are properly tracked, reviewed, and deployed in a controlled manner. Check out ",[55,43150,43152],{"href":43151},"\u002Fblog\u002Fpulsar-operators-tutorial-part-2-manage-pulsar-custom-resources-argocd","Managing Pulsar CRs with ArgoCD",[32,43154,43156],{"id":43155},"cleaning-up-geo-replication","Cleaning up geo-replication",[48,43158,43159],{},"If you want to clean up geo-replication on resource objects, it’s best to follow these steps:",[1666,43161,43162,43175,43181],{},[324,43163,43164,43165,43167,43168,1186,43170,1186,43172,43174],{},"Remove the ",[4926,43166,42950],{}," field on the ",[4926,43169,42954],{},[4926,43171,42957],{},[4926,43173,42960],{}," CRs.",[324,43176,43177,43178,43180],{},"Delete the ",[4926,43179,42868],{}," object.",[324,43182,43183,43184,43180],{},"Delete the destination ",[4926,43185,42832],{},[48,43187,43188],{},"This will ensure that all geo-replication resources are properly removed and that there are no lingering dependencies.",[32,43190,43192],{"id":43191},"testing-failover","Testing failover",[48,43194,43195],{},"It's important to test your geo-replication failover setup regularly to ensure that it works as expected in case of a disaster. You can do this by simulating a network or hardware failure on one of your Pulsar clusters and verifying that the replication fails over to the other cluster seamlessly.",[48,43197,43198],{},"By following these tips and best practices, you can ensure that your geo-replication setup is reliable, efficient, and easy to manage.",[40,43200,2125],{"id":2122},[48,43202,43203],{},"StreamNative’s Pulsar Resources Operator provide a reliable and efficient way to set up geo-replication for disaster recovery. With geo-replication, data can be replicated across multiple clusters in different geographical locations. In this guide, we have covered how to set up geo-replication using the Pulsar Resources Operator. We have also discussed some tips and best practices to follow when configuring geo-replication on resource objects.",[48,43205,43206],{},"StreamNative offers a comprehensive solution for managing Pulsar clusters on Kubernetes. In addition to the Pulsar Resources Operator, StreamNative provides a set of Kubernetes controllers called Pulsar Operators that simplify the deployment and management of Pulsar clusters on Kubernetes. With StreamNative Pulsar Operators, SREs can bridge the gap between complex operations of distributed systems and the declarative nature of GitOps, allowing them to deploy changes with confidence.",[40,43208,36477],{"id":36476},[48,43210,43211,43212,5422,43216,5422,43220,43224],{},"Join the Apache Pulsar community today and take part in shaping the future of messaging and streaming. Check out the ",[55,43213,43215],{"href":20667,"rel":43214},[264],"GitHub repos",[55,43217,7120],{"href":43218,"rel":43219},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fgeo-replication-concepts",[264],[55,43221,43223],{"href":36222,"rel":43222},[264],"contribute"," to building an exciting project.",[321,43226,43227,43235,43243,43250,43258],{},[324,43228,43229,758,43231],{},[2628,43230,40436],{},[55,43232,43234],{"href":43233},"\u002Fblog\u002Fintroducing-pulsar-resources-operator-kubernetes","Introducing Pulsar Resources Operator for Kubernetes",[324,43236,43237,758,43239],{},[2628,43238,40436],{},[55,43240,43242],{"href":43241},"\u002Fblog\u002Fpulsar-operators-tutorial-part-1-create-apache-pulsar-cluster-kubernetes","Pulsar Operators Tutorial Part 1: Create an Apache Pulsar Cluster on Kubernetes",[324,43244,43245,758,43247],{},[2628,43246,40436],{},[55,43248,43249],{"href":43151},"Pulsar Operators Tutorial Part 2: Manage Pulsar Custom Resources with ArgoCD",[324,43251,43252,758,43254],{},[2628,43253,40436],{},[55,43255,43257],{"href":43256},"\u002Fblog\u002Fpulsar-operators-tutorial-part-3-create-and-deploy-a-containerized-pulsar-client","Pulsar Operators Tutorial Part 3: Create and Deploy a Containerized Pulsar Client",[324,43259,43260,758,43262],{},[2628,43261,40436],{},[55,43263,43265],{"href":43264},"\u002Fblog\u002Fpulsar-operators-tutorial-part-4-use-kpack-to-streamline-the-build-process","Pulsar Operators Tutorial Part 4: Use kpack to Streamline the Build Process",[48,43267,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":43269},[43270,43271,43275,43276,43284,43285],{"id":42,"depth":19,"text":46},{"id":42872,"depth":19,"text":42873,"children":43272},[43273,43274],{"id":10103,"depth":279,"text":10104},{"id":42911,"depth":279,"text":42912},{"id":42932,"depth":19,"text":42933},{"id":43054,"depth":19,"text":43055,"children":43277},[43278,43279,43280,43281,43282,43283],{"id":43061,"depth":279,"text":43062},{"id":43111,"depth":279,"text":43112},{"id":43124,"depth":279,"text":43125},{"id":43144,"depth":279,"text":43145},{"id":43155,"depth":279,"text":43156},{"id":43191,"depth":279,"text":43192},{"id":2122,"depth":19,"text":2125},{"id":36476,"depth":19,"text":36477},"2023-04-20","\u002Fimgs\u002Fblogs\u002F6441b53c7518a6ab97ce970d_Geo-replication-Resources-Operator.png",{},{"title":42800,"description":42800},"blog\u002Fan-operators-guide-configuring-geo-replication-with-the-pulsar-resources-operator",[11899,38442],"tKXSLqwPTfQxoGFss5kq67MoWtWtuezcifTGDEgtiVI",{"id":43294,"title":43295,"authors":43296,"body":43298,"category":821,"createdAt":290,"date":43611,"description":43612,"extension":8,"featured":294,"image":43613,"isDraft":294,"link":290,"meta":43614,"navigation":7,"order":296,"path":43615,"readingTime":5505,"relatedResources":290,"seo":43616,"stem":43617,"tags":43618,"__hash__":43619},"blogs\u002Fblog\u002Fmigrating-tenants-across-clusters-with-pulsars-geo-replication.md","Migrating Tenants across Clusters with Pulsar’s Geo-replication",[43297],"Mingze Han",{"type":15,"value":43299,"toc":43593},[43300,43303,43317,43321,43328,43332,43339,43343,43346,43349,43356,43360,43363,43366,43373,43379,43382,43386,43393,43397,43400,43407,43413,43416,43422,43425,43431,43442,43446,43449,43453,43462,43466,43469,43473,43476,43480,43483,43486,43497,43500,43504,43507,43514,43519,43526,43543,43545,43548,43550,43555],[48,43301,43302],{},"Apache Pulsar is a distributed messaging system that offers robust features such as geo-replication, which allows for the replication of data across multiple data centers or geographical regions. In this blog, I will discuss the following topics:",[321,43304,43305,43308,43311,43314],{},[324,43306,43307],{},"How geo-replication works in Pulsar;",[324,43309,43310],{},"How Pulsar synchronizes consumption progress across clusters;",[324,43312,43313],{},"The problems during consumption progress synchronization in Pulsar and how we optimized the existing logic for our use case at Tencent Cloud;",[324,43315,43316],{},"How to migrate Pulsar tenants across clusters using geo-replication.",[40,43318,43320],{"id":43319},"understanding-geo-replication-in-pulsar","Understanding geo-replication in Pulsar",[48,43322,43323,43324,43327],{},"Geo-replication in Pulsar enables the replication of messages across multiple data centers or geographical regions, providing data redundancy and disaster recovery for Pulsar topics. This ensures that your entire system remains available and resilient to failures or regional outages, maintaining data consistency and enabling low-latency access to data for consumers in different locations.\n",[384,43325],{"alt":18,"src":43326},"\u002Fimgs\u002Fblogs\u002F643ca460774f257b85b2f556_image3.webp","Figure 1. Geo-replication in Pulsar\nA typical use case of geo-replication is that producers and consumers can be located in separate regions. For example, producers can be located in San Francisco while consumers may be in Houston. This can happen in cases where latency requirements between message production and consumption are low. The benefit is that it ensures all writes occur in the same place with a low write latency. After data is replicated to different locations, consumers can all read messages no matter where they are.",[40,43329,43331],{"id":43330},"how-does-geo-replication-work","How does geo-replication work?",[48,43333,43334,43335,43338],{},"The logic behind Pulsar's geo-replication is quite straightforward. Typically, if you want to replicate data across regions (without using Pulsar’s geo-replication),  you may want to create a service that includes both a consumer and a producer. The consumer retrieves data from the source cluster, and the producer sends the data to the target cluster. Pulsar’s geo-replication feature follows a similar pattern as depicted in Figure 2.\n",[384,43336],{"alt":18,"src":43337},"\u002Fimgs\u002Fblogs\u002F643ca48387ca32e4c05dfe9b_image11.webp","Figure 2. How geo-replication works in Pulsar\nIf you enable geo-replication, Pulsar creates a Replication Cursor and a Replication Producer for each topic. The Replication Producer retrieves messages from the local cluster and dispatches them to the target cluster. The Replication Cursor is used to track the data replication process using an internal subscription. Similarly, if a producer sends messages to the target cluster, it can also create its own Replication Cursor and Producer to dispatch the messages back to the source cluster. The replication process does not impact message reads and writes in the local cluster.",[40,43340,43342],{"id":43341},"understanding-consumption-progress-synchronization","Understanding consumption progress synchronization",[48,43344,43345],{},"In some use cases, it is necessary to synchronize the consumption progress of subscriptions between clusters located in different regions. In a disaster recovery scenario, for example, if the primary data center in San Francisco experiences an outage, you must switch to a backup cluster in Houston to continue your service. In this case, clients should be able to continue consuming messages from the Houston cluster from where they left off in the primary cluster.",[48,43347,43348],{},"If the consumption progress is not synchronized, it would be difficult to know which messages have already been consumed in the primary data center. If a client starts consuming messages from the latest position, it might lead to message loss; if it starts from the earliest message, it could result in duplicate consumption. Both ways are usually unacceptable to the client. A possible compromise is to rewind topics to a specific message and begin reading from there. However, this approach still can’t guarantee messages are not lost or repeatedly consumed.",[48,43350,43351,43352,43355],{},"To solve this issue, Pulsar supports consumption progress synchronization for subscriptions so that users can smoothly transition to a backup cluster during disaster recovery without worrying about message duplication or loss. Figure 3 shows an example where both messages and consumption progress are synchronized between Cluster-A and Cluster-B.\n",[384,43353],{"alt":18,"src":43354},"\u002Fimgs\u002Fblogs\u002F643ca4a8eab04a00762959b0_image8.webp","Figure 3. Consumption progress synchronization between Cluster-A and Cluster-B",[32,43357,43359],{"id":43358},"how-does-pulsar-track-consumption-progress","How does Pulsar track consumption progress?",[48,43361,43362],{},"Before I explain how consumption progress is synchronized between clusters, let’s first understand the consumption tracking mechanism in Pulsar, which leverages two important attributes - markDeletePosition and individuallyDeletedMessages.",[48,43364,43365],{},"markDeletePosition is similar to the consumer offset in Kafka. The message marked by markDeletePosition and all the preceding messages have been acknowledged, which means they are ready for deletion.",[48,43367,43368,43369,43372],{},"individuallyDeletedMessages is what sets Pulsar apart from most streaming and messaging systems. Unlike them, Pulsar supports both selective and cumulative acknowledgments. The former allows consumers to individually acknowledge entries, the information of which is stored in individuallyDeletedMessages.\n",[384,43370],{"alt":18,"src":43371},"\u002Fimgs\u002Fblogs\u002F643ca501511bd424727b0bae_image4.webp","Figure 4. markDeletePosition and individuallyDeletedMessages\nAs illustrated in Figure 4, let’s consider a shared subscription with multiple consumer instances. Messages from 0 to 9 are distributed to all of them. Each consumer may consume messages at different speeds, so the order of message delivery and acknowledgment may vary. Suppose messages 0, 1, 2, 3, 4, 6, and 9 have been acknowledged, while messages 5, 7, and 8 have not. The markDeletePosition marker, which represents the consumption progress, points to message 4, indicating that all messages before 4 (inclusive) have been successfully consumed. If you check the statistics of the topic (pulsar-admin topics stats), you can see that markDeletePosition and individuallyDeletedMessages have the following values:",[8325,43374,43377],{"className":43375,"code":43376,"language":8330},[8328],"\"markDeletePosition\": \"1:4\",\n\"individuallyDeletedMessages\": \"[(1:5‥1:6], (1:8‥1:9]]\",\n",[4926,43378,43376],{"__ignoreMap":18},[48,43380,43381],{},"These values are essentially message IDs and intervals. A message ID consists of a ledger ID and an entry ID. A left-open and right-closed interval means the message at the beginning of this interval has not been acknowledged while the message at the end has.",[32,43383,43385],{"id":43384},"message-id-inconsistency-across-clusters","Message ID inconsistency across clusters",[48,43387,43388,43389,43392],{},"The complexity of consumption progress synchronization lies in the ID inconsistency of the same message across different clusters. It’s impossible to ensure that the ledger ID and the entry ID of the same message are consistent. In Figure 5, for example, the ID of message A is 1:0 in cluster A while it is 3:0 in cluster B.\n",[384,43390],{"alt":18,"src":43391},"\u002Fimgs\u002Fblogs\u002F643ca592511bd44e7b7b62d4_image10.webp","Figure 5. Message ID inconsistency across clusters\nIf the message IDs for the same message were consistent across both clusters, synchronizing consumption progress would be very simple. For instance, if a message with ID 1:2 is consumed in cluster A, cluster B could simply acknowledge message 1:2. However, messages IDs can hardly be the same across clusters and without knowing the relation of different message IDs across clusters, how can we synchronize the consumption progress?",[32,43394,43396],{"id":43395},"cursor-snapshots","Cursor snapshots",[48,43398,43399],{},"Pulsar uses cursor snapshots to let clusters know how different message IDs are related to each other.",[48,43401,43402,43403,43406],{},"As shown in Figure 6 and the code snippet below, when acknowledging a message, Cluster A immediately creates a snapshot and sends a ReplicatedSubscriptionsSnapshotRequest to both Cluster B and Cluster C. It requires them to tell it the respective IDs of this message in Cluster B and Cluster C.\n",[384,43404],{"alt":18,"src":43405},"\u002Fimgs\u002Fblogs\u002F643ca5bd511bd4f5647b7f31_image9.webp","Figure 6. Cluster A sends a ReplicatedSubscriptionsSnapshotRequest to both clusters",[8325,43408,43411],{"className":43409,"code":43410,"language":8330},[8328],"\"ReplicatedSubscriptionsSnapshotRequest\" : {\n    \"snapshot_id\" : \"444D3632-F96C-48D7-83DB-041C32164EC1\",\n    \"source_cluster\" : \"a\"\n}\n",[4926,43412,43410],{"__ignoreMap":18},[48,43414,43415],{},"Upon receiving the request from Cluster A, Cluster B (and Cluster C) responds with the ID of this message in its cluster. See the code snippet below for details.",[8325,43417,43420],{"className":43418,"code":43419,"language":8330},[8328],"\"ReplicatedSubscriptionSnapshotResponse\" : {\n    \"snapshotid\" : \"444D3632-F96C-48D7-83DB-041C32164EC1\",\n    \"cluster\" : {\n        \"cluster\" : \"b\",\n        \"message_id\" : {\n            \"ledger_id\" : 1234,\n            \"entry_id\" : 45678\n            }\n    }\n}\n",[4926,43421,43419],{"__ignoreMap":18},[48,43423,43424],{},"After receiving the message IDs from Cluster B and Cluster C, Cluster A stores them in the cursor snapshot as below. This allows Pulsar to know which messages should be acknowledged in Cluster B and Cluster C when the same message is acknowledged in Cluster A.",[8325,43426,43429],{"className":43427,"code":43428,"language":8330},[8328],"{\n    \"snapshot_id\" : \"444D3632-F96C-48D7-83DB-041C32164EC1\",\n    \"local_message_id\" : {\n         \"ledger_id\" : 192,\n         \"entry_id\" : 123123\n    },\n    \"clusters\" : [\n        {\n            \"cluster\" : \"b\",\n            \"message_id\" : {\n                \"ledger_id\" : 1234, \n                \"entry_id\" : 45678\n            }\n        },\n        {\n            \"cluster\" : \"c\",\n            \"message_id\" : {\n                \"ledger_id\" : 7655,\n                \"entry_id\" : 13421\n            }\n        }\n    ],\n}\n",[4926,43430,43428],{"__ignoreMap":18},[48,43432,43433,43434,43437,43438,43441],{},"Let’s look at the implementation in more detail. Based on cursor snapshots, Pulsar creates the corresponding snapshot markers and puts them between messages within the original topic. When the consumer reaches the snapshot marker, it will be loaded into memory. With message 3 acknowledged in Cluster A (i.e. markDeletePosition moves to message 3), the markDeletePosition of the same messages in Cluster B and Cluster C will also be updated.\n",[384,43435],{"alt":18,"src":43436},"\u002Fimgs\u002Fblogs\u002F643ca6a402179a228df19428_image1.webp","Figure 7. Snapshot marker\nIn the example in Figure 8, Cluster A has two snapshots on message 1:2 and message 1:6 respectively. When the markDeletePosition of Cluster A points to message 1:4, the markDeletePosition of Cluster B can move to message 3:4 as it knows the same message has already been acknowledged according to the snapshot.\n",[384,43439],{"alt":18,"src":43440},"\u002Fimgs\u002Fblogs\u002F643ca6e40e200542af29ce6d_image7.webp","Figure 8. How cursor snapshots work in Pulsar\nNote that Figure 8 is a very simple illustration of how Pulsar synchronizes consumption progress across clusters. This process includes many details and explaining all of them requires a separate blog post. If you are interested in this topic, I am willing to share more in the Pulsar community.",[40,43443,43445],{"id":43444},"problems-in-consumption-progress-synchronization","Problems in consumption progress synchronization",[48,43447,43448],{},"Before diving into tenant migration across clusters, I would like to analyze three major problems during consumption progress synchronization. These issues are the primary obstacles in tenant migration.",[32,43450,43452],{"id":43451},"no-synchronization-for-individuallydeletedmessages","No synchronization for individuallyDeletedMessages",[48,43454,43455,43456,43461],{},"The current implementation ensures that markDeletePosition is synchronized across different clusters but individuallyDeletedMessages is not. This can lead to a large number of unacknowledged messages (namely acknowledgment holes), particularly impacting scenarios with ",[55,43457,43460],{"href":43458,"rel":43459},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fconcepts-messaging\u002F#delayed-message-delivery",[264],"delayed messages",". If a topic contains a delayed message set to be delivered one day later, the acknowledgment of it will be postponed by a day. In this case, markDeletePosition can only point to the latest acknowledged message before the delayed message; if you switch to a new cluster, it will result in duplicate message consumption. This is because the new cluster does not know which individual messages after markDeletePosition have already been acknowledged in the primary cluster (in other words, individuallyDeletedMessages is not synchronized).",[32,43463,43465],{"id":43464},"synchronization-blocked-by-message-backlogs","Synchronization blocked by message backlogs",[48,43467,43468],{},"In the previous examples (Figure 6 and Figure 7), Cluster A doesn’t send requests through an RPC interface. Instead, snapshot markers are written into the topic alongside other messages. If the target cluster (Cluster B) has a large message backlog, requests sent by the primary cluster (Cluster A) may remain unprocessed for a long time (there is an internal timeout mechanism waiting for the target cluster's response for 30 seconds). As a result, the snapshot cannot be successfully created, preventing synchronization of consumption progress and markDeletePosition.",[32,43470,43472],{"id":43471},"periodic-creation-of-cursor-snapshots","Periodic creation of cursor snapshots",[48,43474,43475],{},"Pulsar does not create a cursor snapshot for every message. Instead, snapshots are created periodically. In Figure 8, only message 1:2 and message 1:6 have snapshots; it is impossible for Cluster B to know markDeletePosition points to 1:4 in Cluster A, so it cannot acknowledge the same message in its own cluster.",[40,43477,43479],{"id":43478},"optimizing-the-consumption-progress-synchronization-logic","Optimizing the consumption progress synchronization logic",[48,43481,43482],{},"The issues mentioned above can cause duplicate consumption of messages. For our online business, a small amount of short-term and controllable duplicate consumption may be acceptable, but it makes no sense if clients need to consume an excessive number of duplicate messages.",[48,43484,43485],{},"As such, we optimized the existing logic by synchronizing both markDeletePosition and individuallyDeletedMessages before migration. However, establishing the connections of message IDs for the same messages between different clusters still remained the most challenging part.",[48,43487,43488,43489,43492,43493,43496],{},"To solve this issue, we added the originalClusterPosition and the entry position to the message’s metadata when sending a message from the original cluster to the target cluster. originalClusterPosition is used to store the message ID in the original cluster. See Figure 9 for details.\n",[384,43490],{"alt":18,"src":43491},"\u002Fimgs\u002Fblogs\u002F643ca753774f250cedb44d07_image2.webp","Figure 9. Introducing originalClusterPosition in the message metadata\nThe updated logic allows us to easily retrieve the ID of a message in the primary cluster according to originalClusterPosition and compare it with the information of individuallyDeletedMessages synchronized to the target cluster. This way, messages that have already been acknowledged in the primary cluster will not be sent to the consumers of the target cluster.\n",[384,43494],{"alt":18,"src":43495},"\u002Fimgs\u002Fblogs\u002F643ca77d02179a02e6f21786_image6.webp","Figure 10. How acknowledged messages are filtered out with the updated logic\nFigure 10 shows the implementation logic in more detail. Before migration, we need to synchronize individuallyDeletedMessages from the primary cluster (cluster-1) to the target cluster (cluster-2). Before sending messages to consumers, we use the filterEntriesForConsumer method to filter out messages already consumed in cluster-1 and only push unacknowledged messages to the consumers of cluster-2.",[48,43498,43499],{},"The updated logic above represents “a shift in thinking”. In the original implementation, the primary cluster periodically creates snapshots to figure out the relations of messages between clusters. After messages are acknowledged in the primary cluster, they can be acknowledged in target clusters based on the snapshots. By contrast, our implementation puts message position information directly into the metadata instead of using a separate entity to synchronize the consumption progress. This approach keeps duplicate consumption within an acceptable range.",[40,43501,43503],{"id":43502},"migrating-tenants-across-pulsar-clusters","Migrating tenants across Pulsar clusters",[48,43505,43506],{},"Previously, we were using shared physical clusters at Tencent Cloud to support different business scenarios. However, this could lead to mutual interference between users. Additionally, different users may have different service SLA requirements. For those who demand higher service quality, we may need to set up a dedicated cluster to physically isolate resources to reduce the impact on other users. In such cases, we need a smooth migration plan.",[48,43508,43509,43510,43513],{},"Figure 11 shows the diagram of our internal implementation for tenant migration across Pulsar clusters. The core module, LookupService, handles clients’ lookup requests. It stores the map of each tenant to the corresponding physical cluster. When a client’s lookup request arrives, we forward it to the associated physical cluster, allowing the client to establish connections with the broker. Note that LookupService also acts as the proxy for getPartitionState, getPartitionMetadata, and getSchema requests. However, it does not proxy data stream requests, which are sent directly to the cluster via CLB or VIP without going through LookupService.\n",[384,43511],{"alt":18,"src":43512},"\u002Fimgs\u002Fblogs\u002F643ca7c60d0f4475895db2c7_image12.webp","Figure 11. Tenant migration",[916,43515,43516],{},[48,43517,43518],{},"Note: LookupService is not designed specifically for cross-cluster migration. Its primary purpose is to provide centralized processing of different network service routes for cloud clusters. During cross-cluster migration, we used LookupService to ensure a smooth cluster switch while utilizing Pulsar’s geo-replication feature to synchronize data.",[48,43520,43521,43522,43525],{},"Now, let’s look at the five steps during migration:\n",[384,43523],{"alt":18,"src":43524},"\u002Fimgs\u002Fblogs\u002F643ca7e3e4b0fa2ceb6756b3_image5.webp","Figure 12. Data migration process",[1666,43527,43528,43531,43534,43537,43540],{},[324,43529,43530],{},"Synchronize metadata: Create the corresponding resources on the target cluster, such as tenants, namespaces, topics, subscriptions, and roles.",[324,43532,43533],{},"Synchronize topic data: Enable geo-replication to migrate the topic data of each tenant.",[324,43535,43536],{},"Synchronize consumption progress: Enable consumption progress synchronization to synchronize each subscription’s individuallyDeletedMessages and markDeleteMessages to the target cluster.",[324,43538,43539],{},"Switch to the new cluster: Modify the tenant-to-physical cluster map in LookupService and trigger topic unload so that clients can renew the server’s IP address. LookupService will return the address of the new cluster based on the new map.",[324,43541,43542],{},"Clean up resources: Delete unnecessary resources in the original cluster after the migration is complete.",[40,43544,2125],{"id":2122},[48,43546,43547],{},"There are many ways to migrate your data across clusters. In this article, I shared a method with low implementation costs, less complexity, and high reliability on the public cloud. This approach allows for a smooth migration without modifying Pulsar’s protocol on the client and server sides.",[40,43549,38376],{"id":38375},[48,43551,38379,43552,40419],{},[55,43553,38384],{"href":38382,"rel":43554},[264],[321,43556,43557,43562,43568,43577,43585],{},[324,43558,38390,43559],{},[55,43560,31914],{"href":31912,"rel":43561},[264],[324,43563,43564],{},[55,43565,43567],{"href":43566},"\u002Fuse-cases\u002Fgeo-replication","Geo-replication use cases",[324,43569,43570,758,43572],{},[2628,43571,42753],{},[55,43573,43576],{"href":43574,"rel":43575},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fconcepts-replication\u002F",[264],"Geo-replication",[324,43578,43579,758,43581],{},[2628,43580,40436],{},[55,43582,43584],{"href":43583},"\u002Fblog\u002Fclient-optimization-how-tencent-maintains-apache-pulsar-clusters-100-billion-messages-daily","Client Optimization: How Tencent Maintains Apache Pulsar Clusters with over 100 Billion Messages Daily",[324,43586,43587,758,43589],{},[2628,43588,40436],{},[55,43590,43592],{"href":43591},"\u002Fblog\u002F600k-topics-per-cluster-stability-optimization-apache-pulsar-tencent-cloud","600K Topics Per Cluster: Stability Optimization of Apache Pulsar at Tencent Cloud",{"title":18,"searchDepth":19,"depth":19,"links":43594},[43595,43596,43597,43602,43607,43608,43609,43610],{"id":43319,"depth":19,"text":43320},{"id":43330,"depth":19,"text":43331},{"id":43341,"depth":19,"text":43342,"children":43598},[43599,43600,43601],{"id":43358,"depth":279,"text":43359},{"id":43384,"depth":279,"text":43385},{"id":43395,"depth":279,"text":43396},{"id":43444,"depth":19,"text":43445,"children":43603},[43604,43605,43606],{"id":43451,"depth":279,"text":43452},{"id":43464,"depth":279,"text":43465},{"id":43471,"depth":279,"text":43472},{"id":43478,"depth":19,"text":43479},{"id":43502,"depth":19,"text":43503},{"id":2122,"depth":19,"text":2125},{"id":38375,"depth":19,"text":38376},"2023-04-17","Learn how geo-replication works in Apache Pulsar and how Tencent Cloud migrated tenants across clusters using the feature.","\u002Fimgs\u002Fblogs\u002F644009e24b93f00ba2e70cfe_Blog-Migrating-Tenants-across-Clusters-with-Pulsar's-Geo-replication.png",{},"\u002Fblog\u002Fmigrating-tenants-across-clusters-with-pulsars-geo-replication",{"title":43295,"description":43612},"blog\u002Fmigrating-tenants-across-clusters-with-pulsars-geo-replication",[11899,38442],"tDGZ3qlCcIOe7cHplvnj-v-SEkk60ANl79WvDfap8DQ",{"id":43621,"title":43622,"authors":43623,"body":43624,"category":821,"createdAt":290,"date":43761,"description":43762,"extension":8,"featured":294,"image":43763,"isDraft":294,"link":290,"meta":43764,"navigation":7,"order":296,"path":43765,"readingTime":43766,"relatedResources":290,"seo":43767,"stem":43768,"tags":43769,"__hash__":43770},"blogs\u002Fblog\u002Fsharpen-your-apache-pulsar-skills-with-streamnatives-hands-on-self-paced-courses.md","Sharpen Your Apache Pulsar Skills with StreamNative’s Hands-On Self-Paced Courses",[32291,40485],{"type":15,"value":43625,"toc":43755},[43626,43629,43632,43635,43642,43645,43649,43656,43660,43663,43670,43673,43682,43685,43689,43692,43696,43699,43714,43717,43721,43724,43728,43735,43744,43748,43751],[8300,43627,43622],{"id":43628},"sharpen-your-apache-pulsar-skills-with-streamnatives-hands-on-self-paced-courses",[48,43630,43631],{},"Interested in learning about Apache Pulsar, the cloud-native, distributed messaging and streaming platform? StreamNative Academy is your one-stop shop for everything you need to know about Pulsar.",[48,43633,43634],{},"StreamNative training for Apache Pulsar comes in two flavors:",[40,43636,43638],{"id":43637},"i-instructor-led-training",[55,43639,43641],{"href":43640},"\u002Ftraining","I. Instructor-led training",[48,43643,43644],{},"StreamNative’s instructor-led classes are now self-paced, complete with coding demonstrations and hands-on challenges in a dedicated training environment. StreamNative’s full-time technical trainer is available to answer questions during office hours and can troubleshoot any issues remotely in your training environment. Two classes are offered:",[3933,43646,43648],{"id":43647},"practical-apache-pulsar-application-development-3-days","Practical Apache Pulsar Application Development (3 days):",[48,43650,43651,43652,190],{},"Aimed at developers new to Apache Pulsar or needing to broaden their understanding of Apache Pulsar clients, this course focuses on many of the key features of the Apache Pulsar Java client for producing and consuming messages to a Pulsar cluster. Coding exercises include publishing messages with synchronous and asynchronous Java code, consuming messages using all four subscription types, using keyed messages with partitioned topics and table views, and much more. You will receive a certificate of completion after meeting course requirements. To demonstrate mastery, developers can consider certification by completing ",[55,43653,43655],{"href":33854,"rel":43654},[264],"Apache Pulsar Developer Certification Level 1: Fundamentals (~12 hours)",[3933,43657,43659],{"id":43658},"foundations-of-apache-pulsar-operations-coming-soon-3-days","Foundations of Apache Pulsar Operations (coming soon, 3 days):",[48,43661,43662],{},"In this course, operators will get hands-on experience installing and configuring a single-node Apache Pulsar cluster in a dedicated StreamNative training environment. Hands-on exercises include installing a Pulsar cluster using helm, monitoring the cluster with Prometheus\u002FGrafana, completing a server upgrade, using pulsar-admin to configure policies, and using pulsar-perf to test the cluster. This course is a great hands-on way to better understand how various components work together to create a resilient Pulsar cluster. You will receive a certificate of completion after meeting course requirements.",[40,43664,43666],{"id":43665},"ii-free-on-demand-courses",[55,43667,43669],{"href":36485,"rel":43668},[264],"II. Free on-demand courses",[48,43671,43672],{},"Not sure if you want to commit to a three-day course? StreamNative offers Pulsar learning content curated for different roles: business leaders, developers, and operators. These learning paths were developed based on valuable feedback from learners who shared that different jobs require different learning pathways.",[48,43674,43675,43676,43681],{},"All users should start with ",[55,43677,43680],{"href":43678,"rel":43679},"https:\u002F\u002Fwww.academy.streamnative.io\u002Fcourses\u002Fcourse-v1:streamnative+Beg-001+2022\u002Fabout",[264],"Introduction to Apache Pulsar"," to learn the basics of Apache Pulsar use cases, the Apache Pulsar messaging model, and Apache Pulsar architecture and design.",[48,43683,43684],{},"From here trainees for each role can focus on their specialized content:",[225,43686,43688],{"id":43687},"business-leaders","Business Leaders",[48,43690,43691],{},"Learn more about why companies adopt Apache Pulsar and typical application and data architectures.",[225,43693,43695],{"id":43694},"developers","Developers",[48,43697,43698],{},"‍Continue on your developer learning journey with one of our API essentials courses. We currently have the following languages. These courses will introduce you to the basics of the respective Apache Pulsar client and come with a dedicated training environment.",[321,43700,43701,43703,43706,43709,43711],{},[324,43702,11285],{},[324,43704,43705],{},"C++",[324,43707,43708],{},"C#",[324,43710,11288],{},[324,43712,43713],{},"Go",[48,43715,43716],{},"After completing API essentials, continue on the developer track to learn about specific features of Apache Pulsar or consider taking the full instructor-led training Practical Apache Pulsar Application Development to dive deep into the Apache Pulsar Java client.",[225,43718,43720],{"id":43719},"operators","Operators",[48,43722,43723],{},"‍Continue on your operator learning journey with additional video content on topics such as understanding the BookKeeper storage model, scaling Apache Pulsar, and troubleshooting performance issues. To get hands-on experience installing a single node Pulsar cluster using helm, consider taking the full instructor-led training Foundations of Apache Pulsar Operations.",[40,43725,43727],{"id":43726},"need-help-were-here-for-you","Need help? We’re here for you!",[48,43729,43730,43731,43734],{},"You can contact ",[55,43732,32462],{"href":43733},"mailto:training@streamnative.io"," anytime during your training. Our training staff are committed to your success, and we are happy to help out if you have any questions.",[48,43736,43737,43738,43743],{},"Additionally, you can join ",[55,43739,43742],{"href":43740,"rel":43741},"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fstreamnativecommunity\u002Fshared_invite\u002Fzt-yyhbi2w7-_siL0C__aZHn24Ivt2DPIg",[264],"StreamNative Community on Slack"," and add yourself to the training channel. Post questions related to the training and one of our expert Pulsar instructors will be happy to answer.",[40,43745,43747],{"id":43746},"we-welcome-your-feedback","We welcome your feedback",[48,43749,43750],{},"Feedback is critical as we continually update and improve our training. There is a survey at the end of each course where you can provide your thoughts on the completed course. We deeply appreciate you taking the time to help us make StreamNative Academy the best Pulsar training available!",[48,43752,43753],{},[34077,43754],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":43756},[43757,43758,43759,43760],{"id":43637,"depth":19,"text":43641},{"id":43665,"depth":19,"text":43669},{"id":43726,"depth":19,"text":43727},{"id":43746,"depth":19,"text":43747},"2023-04-13","Sharpen your Apache Pulsar skills with StreamNative's Hand-on, Self-Paced Training Courses. Get started with instructor-led or free on-demand training.","\u002Fimgs\u002Fblogs\u002F643896e22543f3724800be2d_SNAcademyBlog.jpg",{},"\u002Fblog\u002Fsharpen-your-apache-pulsar-skills-with-streamnatives-hands-on-self-paced-courses","4 minute read",{"title":43622,"description":43762},"blog\u002Fsharpen-your-apache-pulsar-skills-with-streamnatives-hands-on-self-paced-courses",[7347,821],"TuoOv90eJPvYV7pcM8LWW-UHd3tv8aRRk3miFq02uZY",{"id":43772,"title":43773,"authors":43774,"body":43776,"category":821,"createdAt":290,"date":43942,"description":43943,"extension":8,"featured":294,"image":43944,"isDraft":294,"link":290,"meta":43945,"navigation":7,"order":296,"path":41828,"readingTime":17439,"relatedResources":290,"seo":43946,"stem":43947,"tags":43948,"__hash__":43949},"blogs\u002Fblog\u002Fintroducing-pulsar-admin-go-library-for-go-developers.md","Introducing the Pulsar Admin Library for Go",[24776,41695,43775],"Max Xu",{"type":15,"value":43777,"toc":43934},[43778,43781,43784,43787,43791,43794,43811,43814,43825,43829,43832,43838,43842,43845,43848,43851,43857,43860,43866,43869,43875,43879,43882,43885,43891,43894,43900,43902,43905,43911,43915,43918,43922],[48,43779,43780],{},"Apache Pulsar is a highly scalable and reliable messaging system that is gaining popularity among developers. Pulsar provides a wide range of features and benefits that make it a popular choice for modern data streaming applications. However, managing a Pulsar cluster can be a complex task, which is why StreamNative has created the pulsar-admin-go library.",[48,43782,43783],{},"The pulsar-admin-go is a Go library that provides developers with a unified set of APIs to programmatically manage Pulsar clusters. It allows for easy automation of tasks and seamless integration of Pulsar management into your applications.",[48,43785,43786],{},"In this blog post, we'll take a closer look at the pulsar-admin-go library, its features, benefits, and advanced usage. We'll also provide step-by-step instructions on how to install and use the library.",[40,43788,43790],{"id":43789},"i-overview-of-pulsar-admin-go-library","I. Overview of pulsar-admin-go Library",[48,43792,43793],{},"The pulsar-admin-go library offers a range of useful management functionalities working with topics, partitions, and subscriptions.",[321,43795,43796,43799,43802,43805,43808],{},[324,43797,43798],{},"Topics: create, delete, get topic metadata, and list topics.",[324,43800,43801],{},"Partitions: add and remove partitions from a topic.",[324,43803,43804],{},"Subscriptions: create and delete subscriptions, get metadata about existing subscriptions, and list subscriptions.",[324,43806,43807],{},"Topic stats: list producers and consumers for a topic through topic stats.",[324,43809,43810],{},"Clusters: get cluster metadata, list clusters, as well as set and update cluster properties.",[48,43812,43813],{},"The pulsar-admin-go library provides several benefits for developers:",[1666,43815,43816,43819,43822],{},[324,43817,43818],{},"Unified Go API: developers can operate Pulsar resources using a unified Go API. This simplifies Pulsar management tasks by abstracting the underlying Pulsar admin HTTP operations.",[324,43820,43821],{},"Simplified Development: seamlessly integrates with other management tools like terraform-provider-pulsar, pulsar-resources-operator, and pulsarctl with the pulsar-admin-go library.",[324,43823,43824],{},"Improved Dependency Management: easier to control Go module dependencies and software releases.",[40,43826,43828],{"id":43827},"ii-installing-pulsar-admin-go-library","II. Installing pulsar-admin-go Library",[48,43830,43831],{},"To use the pulsar-admin-go library, you need Go version 1.18 or higher and Go Modules enabled. To install the library, run the following command:",[8325,43833,43836],{"className":43834,"code":43835,"language":8330},[8328],"go get github.com\u002Fstreamnative\u002Fpulsar-admin-go\n",[4926,43837,43835],{"__ignoreMap":18},[40,43839,43841],{"id":43840},"iii-basic-usage","III. Basic Usage",[48,43843,43844],{},"Here are some basic examples of how to use the pulsar-admin-go library.",[48,43846,43847],{},"To connect to a Pulsar cluster, you can create an Admin client with ServiceURL or Auth Token.",[48,43849,43850],{},"Create an Admin client with ServiceURL.",[8325,43852,43855],{"className":43853,"code":43854,"language":8330},[8328],"import (\n    \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\"\n)\n\nfunc main() {\n    cfg := &pulsaradmin.Config{\n        WebServiceURL: \"http:\u002F\u002Flocalhost:8080\",\n    }\n    admin, err := pulsaradmin.NewClient(cfg)\n    if err != nil {\n        panic(err)\n    }\n}\n}\n",[4926,43856,43854],{"__ignoreMap":18},[48,43858,43859],{},"Create an Admin client with Auth Token.",[8325,43861,43864],{"className":43862,"code":43863,"language":8330},[8328],"import (\n    \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\"\n)\n\nfunc main() {\n    cfg := &pulsaradmin.Config{\n        \u002F\u002F Use JWT Token for authentication and please note to allocate a token with Pulsar admin role\n        \u002F\u002F https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-jwt\u002F#configure-jwt-authentication-in-pulsar-clients\n        Token: \"eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJKb2UifQ.ipevRNuRP6HflG8cFKnmUPtypruRC4fb1DWtoLL62SY\",\n    }\n    admin, err := pulsaradmin.NewClient(cfg)\n    if err != nil {\n        panic(err)\n    }\n}\n",[4926,43865,43863],{"__ignoreMap":18},[48,43867,43868],{},"Create a tenant, namespace, and topic.",[8325,43870,43873],{"className":43871,"code":43872,"language":8330},[8328],"import (\n    \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\"\n    \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\u002Fpkg\u002Futils\"\n)\n\nfunc main() {\n    cfg := &pulsaradmin.Config{}\n    admin, err := pulsaradmin.NewClient(cfg)\n    if err != nil {\n        panic(err)\n    }\n\n    \u002F\u002F Create a new tenant\n    admin.Tenants().Create(utils.TenantData{\n        Name: \"new-tenant\",\n    })\n\n    \u002F\u002F Create a new namespace\n    admin.Namespaces().CreateNamespace(\"new-tenant\u002Fnew-namespace\")\n\n    \u002F\u002F Create a new namespace with 3 partitions\n    topic, _ := utils.GetTopicName(\"new-tenant\u002Fnew-namespace\u002Fnew-topic\")\n    admin.Topics().Create(*topic, 3)\n}\n",[4926,43874,43872],{"__ignoreMap":18},[40,43876,43878],{"id":43877},"iv-advanced-usage","IV. Advanced Usage",[48,43880,43881],{},"Here are some advanced examples of how to use the pulsar-admin-go library.",[48,43883,43884],{},"Configure geo-replication.",[8325,43886,43889],{"className":43887,"code":43888,"language":8330},[8328],"import (\n    \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\"\n    \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\u002Fpkg\u002Futils\"\n)\n\nfunc main() {\n    cfg := &pulsaradmin.Config{}\n    admin, err := pulsaradmin.NewClient(cfg)\n    if err != nil {\n        panic(err)\n    }\n\n    \u002F\u002F Deploy two pulsar clusters, and set the clusters array as your deployed pulsar clusters name\n    \u002F\u002F https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Finstall-deploy-upgrade-landing\u002F\n    \u002F\u002F https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fadministration-geo\u002F\n    clusters := []string{\"us-west\", \"us-east\"}\n\n    admin.Tenants().Create(utils.TenantData{\n        Name:            \"geo-tenant\",\n        AllowedClusters: clusters,\n    })\n\n    admin.Namespaces().CreateNamespace(\"geo-tenant\u002Fgeo-ns\")\n    admin.Namespaces().SetNamespaceReplicationClusters(\"geo-tenant\u002Fgeo-ns\", clusters)\n}\n",[4926,43890,43888],{"__ignoreMap":18},[48,43892,43893],{},"Configure permissions for namespace and topic.",[8325,43895,43898],{"className":43896,"code":43897,"language":8330},[8328],"import (\n   \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\"\n   \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\u002Fpkg\u002Futils\"\n)\n\nfunc main() {\n   cfg := &pulsaradmin.Config{}\n\n   admin, err := pulsaradmin.NewClient(cfg)\n   if err != nil {\n      panic(err)\n   }\n\n   \u002F\u002F Configure admin permission for namespace public\u002Fdefault\n   ns, _ := utils.GetNamespaceName(\"public\u002Fdefault\")\n   admin.Namespaces().GrantNamespacePermission(*ns, \"admin\", []utils.AuthAction{\"produce\", \"consume\"})\n\n   \u002F\u002F Configure admin permission for topic public\u002Fdefault\u002Fadmin\n   tp, _ := utils.GetTopicName(\"public\u002Fdefault\u002Fadmin\")\n   admin.Topics().GrantPermission(*tp, \"admin\", []utils.AuthAction{\"produce\", \"consume\"})\n",[4926,43899,43897],{"__ignoreMap":18},[48,43901,3931],{},[48,43903,43904],{},"Configure retention policy for namespace.",[8325,43906,43909],{"className":43907,"code":43908,"language":8330},[8328],"import (\n    \"fmt\"\n\n    \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\"\n    \"github.com\u002Fstreamnative\u002Fpulsar-admin-go\u002Fpkg\u002Futils\"\n)\n\nfunc main() {\n    cfg := &pulsaradmin.Config{}\n    admin, err := pulsaradmin.NewClient(cfg)\n    if err != nil {\n        panic(err)\n    }\n\n    \u002F\u002F Create a new tenant\n    admin.Tenants().Create(utils.TenantData{\n        Name: \"new-tenant\",\n    })\n\n    \u002F\u002F Create a new namespace\n    admin.Namespaces().CreateNamespace(\"new-tenant\u002Fnew-namespace\")\n\n    \u002F\u002F Set the Retention Policy for this namespace\n    admin.Namespaces().SetRetention(\"new-tenant\u002Fnew-namespace\", utils.RetentionPolicies{RetentionSizeInMB: 10240, RetentionTimeInMinutes: 180})\n\n    \u002F\u002F Get the Retention Policy for this namespace\n    fmt.Println(admin.Namespaces().GetRetention(\"new-tenant\u002Fnew-namespace\"))\n}\n",[4926,43910,43908],{"__ignoreMap":18},[40,43912,43914],{"id":43913},"vi-conclusion","VI. Conclusion",[48,43916,43917],{},"The pulsar-admin-go library is a convenient way to manage Apache Pulsar clusters using Go. The library provides a set of intuitive interfaces that allow you to perform a wide range of tasks with ease. This library allows you to automate Pulsar management tasks, and integrate them into your applications. By using pulsar-admin-go, managing Pulsar clusters becomes easier and more efficient, allowing you to get the most out of this powerful messaging system.\n‍",[40,43919,43921],{"id":43920},"vii-more-resources","VII. More Resources",[48,43923,43211,43924,5422,43928,5422,43931,43224],{},[55,43925,43215],{"href":43926,"rel":43927},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-admin-go",[264],[55,43929,7120],{"href":41823,"rel":43930},[264],[55,43932,43223],{"href":36222,"rel":43933},[264],{"title":18,"searchDepth":19,"depth":19,"links":43935},[43936,43937,43938,43939,43940,43941],{"id":43789,"depth":19,"text":43790},{"id":43827,"depth":19,"text":43828},{"id":43840,"depth":19,"text":43841},{"id":43877,"depth":19,"text":43878},{"id":43913,"depth":19,"text":43914},{"id":43920,"depth":19,"text":43921},"2023-04-11","Created by StreamNative, the Pulsar-admin-go is a Go library that provides developers with a unified set of APIs to programmatically manage Pulsar clusters.","\u002Fimgs\u002Fblogs\u002F6435d00f3e8670998834c3bb_PulsarAdminGo.jpg",{},{"title":43773,"description":43943},"blog\u002Fintroducing-pulsar-admin-go-library-for-go-developers",[38442,821],"cAQXEk6HTHSMMXjNKBhGfHP7UuAbQVx42rlmVGN8VCY",{"id":43951,"title":43952,"authors":43953,"body":43954,"category":7338,"createdAt":290,"date":44065,"description":44066,"extension":8,"featured":294,"image":44067,"isDraft":294,"link":290,"meta":44068,"navigation":7,"order":296,"path":42722,"readingTime":11508,"relatedResources":290,"seo":44069,"stem":44070,"tags":44071,"__hash__":44072},"blogs\u002Fblog\u002Fspeakers-and-agenda-announced-for-pulsar-virtual-summit-europe-2023.md","Speakers and Agenda Announced for Pulsar Virtual Summit Europe 2023!",[41185,40485],{"type":15,"value":43955,"toc":44062},[43956,43960,43963,43966,43970,43973,43977,43980,43983,43986,43990,43993,43996,44000,44003,44006,44009,44012,44014,44018,44021,44024,44026,44030,44033,44036,44038,44042,44045,44048,44050,44058],[40,43957,43959],{"id":43958},"were-excited-to-invite-you-to-pulsar-virtual-summit-europe-2023","We’re excited to invite you to Pulsar Virtual Summit Europe 2023!",[48,43961,43962],{},"Join the Apache Pulsar community online on Tuesday, May 23rd for this exciting one-day event. Don’t miss 5 keynotes and 14 breakout sessions, and the opportunity to network with fellow attendees in this free online event. Not in the timezone? No problem! Register today to ensure you get the recorded sessions a week before general public release.",[48,43964,43965],{},"Global Pulsar Summits gather developers, architects, and data engineers to discuss the latest in real-time data streaming and message queuing. Past Pulsar Summits have featured more than 200 interactive sessions presented by tech leaders from WeChat, Blizzard, Intuit, Micro Focus, Salesforce, Splunk, Verizon Media, Tencent, Uber, and more. The Summits garnered 2,200+ global attendees representing top technology, fintech, and media companies, such as Google, Amazon, eBay, Microsoft, American Express, the LEGO Group, Athena Health, Paypal, just to name a few.",[48,43967,43968],{},[34077,43969],{"value":34079},[48,43971,43972],{},"This year, Pulsar Virtual Summit Europe will include tech deep dives, use cases, best practices, and insights into Pulsar’s global adoption and thriving community. Take a sneak peek below at a few of the featured sessions:\n‍",[3933,43974,43976],{"id":43975},"_1-challenges-of-hosting-a-pulsar-as-a-service-platform-under-a-shared-responsibility-model","1. Challenges of Hosting a Pulsar-as-a-Service Platform Under a Shared Responsibility Model",[48,43978,43979],{},"Edgaras Petovradzius, Senior Engineer, the LEGO Group",[48,43981,43982],{},"Mathias Ravn Tversted, Engineer, the LEGO Group",[48,43984,43985],{},"This talk will explore the challenges the LEGO Group encountered to host and manage Pulsar-as-a-Service across multiple domains, and how they collaborated with StreamNative in the process. It will highlight how the organization used OAuth2 for authentication and its self-service management platform for access control of Pulsar resources. The speakers will also discuss their observability tooling, incorporating cloud-native tools like Prometheus and OTel collectors, as well as internal tools for metrics and an ELK stack. Finally, the presentation will share best practices for using Pulsar clients in various programming languages and explore alternative methods of producing and consuming messages.",[3933,43987,43989],{"id":43988},"_2-pulsar-observability-in-high-topic-cardinality-deployments-for-telco","2. Pulsar Observability in High-Topic Cardinality Deployments for Telco",[48,43991,43992],{},"Habip Kenan Üsküdar, DevGitOps Engineer, Axon Networks",[48,43994,43995],{},"Don't miss speaker Habip Kenan Üsküda from Axon Networks, as he shares the experience of building an observability stack using Grafana and Prometheus for their cloud-native platform based on Apache Pulsar. As the number of topic pairings scaled beyond 50,000, with the goal of exceeding 1 million, the team encountered major challenges in scaling their monitoring stack. The presentation will cover their innovative approach to partitioning labels and tenancy management, as well as their efforts to extend Grafana Agent Operators’ Helm Charts to overcome these bottlenecks.",[3933,43997,43999],{"id":43998},"_3-building-a-full-lifecycle-streaming-data-pipeline","3. Building a Full Lifecycle Streaming Data Pipeline",[48,44001,44002],{},"Timothy Spann, Principal Developer Advocate for Data in Motion, Cloudera",[48,44004,44005],{},"David Kjerrumgaard, Systems Engineer, Developer Advocate and Author of \"Pulsar in Action\", StreamNative",[48,44007,44008],{},"Julien Jakubowski, Developer Advocate, StreamNative\n‍",[48,44010,44011],{},"Join Tim, David, and Julien as they delve into the process of building a full lifecycle streaming data pipeline using Apache Pulsar, Spring, Java, Apache Pinot, Trino, and Apache Iceberg. This session will provide an overview of each tool’s key features and capabilities, demonstrating their integration for a robust and efficient real-time streaming data pipeline. The talk will also cover best practices for using these tools together, as well as case studies and real-world examples of successful pipeline implementations.",[48,44013,3931],{},[3933,44015,44017],{"id":44016},"_4-the-future-of-metrics-in-pulsar","4. The Future of Metrics in Pulsar",[48,44019,44020],{},"‍Asaf Mesika, Principal Engineer, StreamNative",[48,44022,44023],{},"In this session, Asaf Mesika will discuss the challenges of using observability metrics in Pulsar from both user and committer perspectives. He will highlight issues such as high topic count limitations, improper histogram use in Grafana, and implementation difficulties. The talk will also present a proposal to address these problems by adopting the OpenTelemetry Java SDK, offering insights for leveraging metrics in Pulsar.",[48,44025,3931],{},[3933,44027,44029],{"id":44028},"_5-pulsar-in-finance-a-tale-of-a-migration","5. Pulsar in Finance - A Tale of a Migration",[48,44031,44032],{},"George Orban, Tech Lead, Senior Architect, Quant Developer, Daiwa Capital Markets",[48,44034,44035],{},"In this talk, George Orban will share the experience of migrating a pricing engine and trading system from TIBCO Rendezvous and other messaging solutions to Apache Pulsar. He will discuss the reasons for choosing Pulsar, its suitability for enterprise applications and finance, and how it improved their stack’s resilience, robustness, and speed. The presentation will also cover new functionalities gained through Pulsar’s richer semantics, lessons learned, tools developed and open-sourced during the migration, and the future of Pulsar at Daiwa.‍",[48,44037,3931],{},[3933,44039,44041],{"id":44040},"_6-oxia-scaling-pulsars-metadata-to-100x","6. Oxia: Scaling Pulsar’s Metadata to 100x‍",[48,44043,44044],{},"Matteo Merli, Apache Pulsar PMC Chair, CTO, StreamNative",[48,44046,44047],{},"Join Apache Pulsar PMC Chair and StreamNative CTO, Matteo Merli, for the much-anticipated introduction of Oxia, a metadata store and coordination system designed to overcome the limitations of ZooKeeper in scaling Pulsar clusters. Learn first-hand the design goals, architecture, and development journey of Oxia. Matteo will also explain how Oxia’s design leverages modern cloud-native environments to provide a highly flexible and dynamic operational environment. Watch this keynote to learn more about scaling Pulsar’s metadata with Oxia and how it can improve the performance and scalability of Pulsar clusters.",[48,44049,3931],{},[48,44051,44052,44053],{},"Check out the full agenda and details at ",[55,44054,44057],{"href":44055,"rel":44056},"http:\u002F\u002Fpulsar-summit.org",[264],"pulsar-summit.org.",[48,44059,44060],{},[34077,44061],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":44063},[44064],{"id":43958,"depth":19,"text":43959},"2023-04-06","Pulsar Virtual Summit Europe Speakers, Agenda, and Featured Sessions Announcement","\u002Fimgs\u002Fblogs\u002F642e04a1f23853f272098988_OpenGraph.png",{},{"title":43952,"description":44066},"blog\u002Fspeakers-and-agenda-announced-for-pulsar-virtual-summit-europe-2023",[5376,821],"NWIkJn2maGl62wge4sIuIDPdCPunVp-vgG6wywqaV90",{"id":44074,"title":44075,"authors":44076,"body":44078,"category":3550,"createdAt":290,"date":44433,"description":44434,"extension":8,"featured":294,"image":44435,"isDraft":294,"link":290,"meta":44436,"navigation":7,"order":296,"path":44437,"readingTime":11508,"relatedResources":290,"seo":44438,"stem":44439,"tags":44440,"__hash__":44441},"blogs\u002Fblog\u002Finstall-streamnative-platform-on-minikube.md","Install StreamNative Platform on minikube",[44077],"Vikas Dadhich",{"type":15,"value":44079,"toc":44422},[44080,44088,44097,44101,44104,44109,44115,44126,44130,44133,44139,44143,44146,44154,44160,44165,44170,44176,44181,44187,44192,44198,44203,44209,44217,44221,44224,44232,44236,44241,44247,44252,44258,44263,44269,44273,44278,44284,44289,44295,44300,44305,44310,44314,44319,44325,44330,44336,44341,44346,44351,44357,44362,44368,44374,44377,44379,44384,44390,44395,44397,44400],[48,44081,44082,44087],{},[55,44083,44086],{"href":44084,"rel":44085},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fplatform-overview",[264],"StreamNative Platform"," is a cloud-native messaging and event-streaming platform that enables you to build a real-time application and data infrastructure for both real-time and historical events. In this article, I will demonstrate how to install StreamNative Platform on minikube on a local machine to explore it with the lightweight Kubernetes distribution, and deploy Pulsar on it. I will perform all the steps on a Mac computer with Apple silicon.",[48,44089,44090,44091,44096],{},"This setup mainly required DFD (Docker for Desktop). You can download ",[55,44092,44095],{"href":44093,"rel":44094},"https:\u002F\u002Fdocs.docker.com\u002Fdesktop\u002Finstall\u002Fmac-install\u002F",[264],"the dmg file"," for Mac.",[40,44098,44100],{"id":44099},"install-minikube","Install minikube",[48,44102,44103],{},"minikube is a lightweight Kubernetes distribution that allows you to quickly create a local Kubernetes cluster. For educational purposes, we will use minikube in this blog to set up our environment with Kubernetes 1.23 as an example.",[1666,44105,44106],{},[324,44107,44108],{},"Download and install minikube using the following commands.",[8325,44110,44113],{"className":44111,"code":44112,"language":8330},[8328],"curl -LO https:\u002F\u002Fstorage.googleapis.com\u002Fminikube\u002Freleases\u002Flatest\u002Fminikube-darwin-arm64\nsudo install minikube-darwin-arm64 \u002Fusr\u002Flocal\u002Fbin\u002Fminikube\n",[4926,44114,44112],{"__ignoreMap":18},[1666,44116,44117],{"start":19},[324,44118,44119,44120,44125],{},"Start ",[55,44121,44124],{"href":44122,"rel":44123},"https:\u002F\u002Fwww.docker.com\u002Fproducts\u002Fdocker-desktop\u002F",[264],"Docker Desktop",". You can test minikube by running minikube start --kubernetes-version=v1.23.0. Once it is started, you can stop it.",[40,44127,44129],{"id":44128},"install-helm","Install Helm",[48,44131,44132],{},"Use the following command to install Helm 3 with brew. We will use Helm to install StreamNative Platform later.",[8325,44134,44137],{"className":44135,"code":44136,"language":8330},[8328],"brew install helm\n",[4926,44138,44136],{"__ignoreMap":18},[40,44140,44142],{"id":44141},"install-streamnative-platform","Install StreamNative Platform",[48,44144,44145],{},"Once all the above dependencies are installed and configured, follow the steps below to install StreamNative Platform on minikube.",[1666,44147,44148,44151],{},[324,44149,44150],{},"Start Docker Desktop.",[324,44152,44153],{},"Start minikube with Kubernetes version 1.23.",[8325,44155,44158],{"className":44156,"code":44157,"language":8330},[8328],"minikube start -p minikube --kubernetes-version=v1.23.0\n",[4926,44159,44157],{"__ignoreMap":18},[916,44161,44162],{},[48,44163,44164],{},"Pulsar Operators will be installed later, which support Kubernetes versions between v1.16 (inclusive) and v1.26 (exclusive).",[1666,44166,44167],{"start":279},[324,44168,44169],{},"Set the default context to minikube.",[8325,44171,44174],{"className":44172,"code":44173,"language":8330},[8328],"kubectl config use-context minikube\n",[4926,44175,44173],{"__ignoreMap":18},[1666,44177,44178],{"start":20920},[324,44179,44180],{},"Create a Kubernetes namespace called pulsar. As we are going to install everything under the same namespace, we can set the default context to the pulsar namespace. This allows us to perform operations in the pulsar namespace by default without using the -n option every time.",[8325,44182,44185],{"className":44183,"code":44184,"language":8330},[8328],"kubectl create ns pulsar\nkubectl config set-context --current --namespace=pulsar\n",[4926,44186,44184],{"__ignoreMap":18},[1666,44188,44189],{"start":20934},[324,44190,44191],{},"Add the streamnative repository using helm.",[8325,44193,44196],{"className":44194,"code":44195,"language":8330},[8328],"helm repo add streamnative https:\u002F\u002Fcharts.streamnative.io\nhelm repo update\n",[4926,44197,44195],{"__ignoreMap":18},[1666,44199,44200],{"start":20948},[324,44201,44202],{},"Install Pulsar Operators.",[8325,44204,44207],{"className":44205,"code":44206,"language":8330},[8328],"helm upgrade --install pulsar-operator streamnative\u002Fpulsar-operator\n",[4926,44208,44206],{"__ignoreMap":18},[48,44210,44211,44212,190],{},"To fully leverage the power of StreamNative Platform, you can choose to install the Vault operator, cert-manager, and Function Mesh operator. For more information, see ",[55,44213,44216],{"href":44214,"rel":44215},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fplatform-quickstart",[264],"the StreamNative Platform documentation",[40,44218,44220],{"id":44219},"install-apache-pulsar","Install Apache Pulsar",[48,44222,44223],{},"Now that we have installed Pulsar Operators, we can deployed the custom resources (CRs) of ZooKeeper, BookKeeper, and brokers on minikube.",[48,44225,44226,44227,44231],{},"To deploy these CRs, download the YAML files from this ",[55,44228,39680],{"href":44229,"rel":44230},"https:\u002F\u002Fgithub.com\u002Fyuweisung\u002Fpulsar-ops\u002Ftree\u002Fmain\u002Fdefault",[264]," repository and apply them. Alternatively, follow the examples below to install and create Pods for each component. Install ZooKeeper first, then BookKeeper, and finally the Pulsar broker.",[32,44233,44235],{"id":44234},"create-the-zookeeper-custom-resource","Create the ZooKeeper custom resource",[1666,44237,44238],{},[324,44239,44240],{},"Create a zookeeper.yaml manifest file.",[8325,44242,44245],{"className":44243,"code":44244,"language":8330},[8328],"apiVersion: zookeeper.streamnative.io\u002Fv1alpha1\nkind: ZooKeeperCluster\nmetadata:\n  name: my\n  namespace: pulsar\nspec:\n  image: streamnative\u002Fpulsar:2.11.0.1\n  replicas: 1\n  pod:\n    resources:\n      requests:\n        cpu: \"200m\"\n        memory: \"512Mi\"\n  persistence:\n    reclaimPolicy: Retain\n    data:\n      accessModes:\n      - ReadWriteOnce\n      resources:\n        requests:\n          storage: \"10Gi\"\n    dataLog:\n      accessModes:\n      - ReadWriteOnce\n      resources:\n        requests:\n          storage: \"20Gi\"\n",[4926,44246,44244],{"__ignoreMap":18},[1666,44248,44249],{"start":19},[324,44250,44251],{},"Run the following command to create the ZooKeeper Pod.",[8325,44253,44256],{"className":44254,"code":44255,"language":8330},[8328],"kubectl apply -f zookeeper.yaml\n",[4926,44257,44255],{"__ignoreMap":18},[1666,44259,44260],{"start":279},[324,44261,44262],{},"Verify if the ZooKeeper Pod has been created. Once it is up and running, you can install BookKeeper.",[8325,44264,44267],{"className":44265,"code":44266,"language":8330},[8328],"kubectl get pods\n",[4926,44268,44266],{"__ignoreMap":18},[32,44270,44272],{"id":44271},"install-the-bookkeeper-custom-resource","Install the BookKeeper custom resource",[1666,44274,44275],{},[324,44276,44277],{},"Create a bookkeeper.yaml manifest file.",[8325,44279,44282],{"className":44280,"code":44281,"language":8330},[8328],"apiVersion: bookkeeper.streamnative.io\u002Fv1alpha1\nkind: BookKeeperCluster\nmetadata:\n  name: my\n  namespace: pulsar\nspec:\n  image: streamnative\u002Fpulsar:2.11.0.1\n  replicas: 1\n  pod:\n    resources:\n      requests:\n        cpu: \"200m\"\n        memory: \"512Mi\"\n  storage:\n    reclaimPolicy: Retain\n    journal:\n      numDirsPerVolume: 1\n      numVolumes: 1\n      volumeClaimTemplate:\n        accessModes:\n        - ReadWriteOnce\n        resources:\n          requests:\n            storage: \"8Gi\"\n    ledger:\n      numDirsPerVolume: 1\n      numVolumes: 1\n      volumeClaimTemplate:\n        accessModes:\n        - ReadWriteOnce\n        resources:\n          requests:\n            storage: \"16Gi\"\n  zkServers: my-zk-headless:2181\n",[4926,44283,44281],{"__ignoreMap":18},[1666,44285,44286],{"start":19},[324,44287,44288],{},"Run the following command to create the BookKeeper Pod.",[8325,44290,44293],{"className":44291,"code":44292,"language":8330},[8328],"kubectl apply -f bookkeeper.yaml\n",[4926,44294,44292],{"__ignoreMap":18},[1666,44296,44297],{"start":279},[324,44298,44299],{},"Verify if the BookKeeper Pod has been created. Once it is up and running, you can install the broker.",[8325,44301,44303],{"className":44302,"code":44266,"language":8330},[8328],[4926,44304,44266],{"__ignoreMap":18},[916,44306,44307],{},[48,44308,44309],{},"If you are running three bookie Pods, you must set anti-affinity to false.",[32,44311,44313],{"id":44312},"install-the-pulsar-broker-custom-resource","Install the Pulsar broker custom resource",[1666,44315,44316],{},[324,44317,44318],{},"Create a broker.yaml manifest file.",[8325,44320,44323],{"className":44321,"code":44322,"language":8330},[8328],"apiVersion: pulsar.streamnative.io\u002Fv1alpha1\nkind: PulsarBroker\nmetadata:\n  name: my\n  namespace: pulsar\nspec:\n  image: streamnative\u002Fpulsar:2.11.0.1\n  pod:\n    resources:\n      requests:\n        cpu: 200m\n        memory: 512Mi\n    terminationGracePeriodSeconds: 30\n  config:\n    custom:\n  replicas: 1\n  zkServers: my-zk-headless:2181\n",[4926,44324,44322],{"__ignoreMap":18},[1666,44326,44327],{"start":19},[324,44328,44329],{},"Run the following command to create the broker Pod.",[8325,44331,44334],{"className":44332,"code":44333,"language":8330},[8328],"kubectl apply -f broker.yaml\n",[4926,44335,44333],{"__ignoreMap":18},[1666,44337,44338],{"start":279},[324,44339,44340],{},"Verify if the broker Pod has been created.",[8325,44342,44344],{"className":44343,"code":44266,"language":8330},[8328],[4926,44345,44266],{"__ignoreMap":18},[1666,44347,44348],{"start":20920},[324,44349,44350],{},"Expected output:",[8325,44352,44355],{"className":44353,"code":44354,"language":8330},[8328],"NAME                                                             READY   STATUS             RESTARTS      AGE\nmy-bk-0                                                          1\u002F1     Running            0             2m30s\nmy-bk-auto-recovery-0                                            1\u002F1     Running            0             70s\nmy-broker-0                                                      1\u002F1     Running            0             30s\nmy-zk-0                                                          1\u002F1     Running            0             3m23s\npulsar-operator-bookkeeper-controller-manager-76948555c6-xcw4r   1\u002F1     Running            0             5m9s\npulsar-operator-pulsar-controller-manager-694c8974-kcgfq         1\u002F1     Running            0             5m9s\npulsar-operator-zookeeper-controller-manager-69c6d-2qjml         1\u002F1     Running            0             5m9s\n",[4926,44356,44354],{"__ignoreMap":18},[1666,44358,44359],{"start":20934},[324,44360,44361],{},"To access the cluster, you can exec into the broker Pod and run the producer or consumer client.",[8325,44363,44366],{"className":44364,"code":44365,"language":8330},[8328],"kubectl exec my-broker-0 -it bash\n",[4926,44367,44365],{"__ignoreMap":18},[8325,44369,44372],{"className":44370,"code":44371,"language":8330},[8328],"I have no name!@my-broker-0:\u002Fpulsar\u002Fbin$ .\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 tenants list\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:\u002Fpulsar\u002Flib\u002Forg.slf4j-slf4j-log4j12-1.7.25.jar!\u002Forg\u002Fslf4j\u002Fimpl\u002FStaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:\u002Fpulsar\u002Flib\u002Forg.apache.logging.log4j-log4j-slf4j-impl-2.18.0.jar!\u002Forg\u002Fslf4j\u002Fimpl\u002FStaticLoggerBinder.class]\nSLF4J: See http:\u002F\u002Fwww.slf4j.org\u002Fcodes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\nlog4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).\nlog4j:WARN Please initialize the log4j system properly.\nlog4j:WARN See http:\u002F\u002Flogging.apache.org\u002Flog4j\u002F1.2\u002Ffaq.html#noconfig for more info.\npublic\npulsar\nsn\n",[4926,44373,44371],{"__ignoreMap":18},[48,44375,44376],{},"For the MacBook Pro M1 chip, make sure IPV6 is disabled. Always use the latest version of Docker Desktop and ensure the following settings are enabled before creating CRs.",[48,44378,3931],{},[48,44380,44381,33315],{},[384,44382],{"alt":18,"src":44383},"\u002Fimgs\u002Fblogs\u002F642a3cb91115fd6197d6b299_docker-system-settings.webp",[48,44385,44386,44389],{},[384,44387],{"alt":18,"src":44388},"\u002Fimgs\u002Fblogs\u002F642a3cd4f0fb297f26e38af5_docker-rosetta.webp","\nNow, you can use pulsar-admin to manage clusters, tenants, namespaces, topics, and more.",[48,44391,44392],{},[36,44393,44394],{},"☁️ Happy Learning ☁️",[40,44396,40413],{"id":36476},[48,44398,44399],{},"Powered by Apache Pulsar, StreamNative Platform makes it easy to build mission-critical messaging and streaming applications and real-time data pipelines by integrating data from multiple sources into a single, central messaging and event streaming platform for your company. See the following resources for more details:",[321,44401,44402,44408,44415],{},[324,44403,44404],{},[55,44405,44407],{"href":44084,"rel":44406},[264],"What is StreamNative Platform",[324,44409,44410],{},[55,44411,44414],{"href":44412,"rel":44413},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fpub-sub-concepts",[264],"Key concepts in StreamNative Platform",[324,44416,44417],{},[55,44418,44421],{"href":44419,"rel":44420},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fdeploy-snp-aws",[264],"Deploy StreamNative Platform on AWS",{"title":18,"searchDepth":19,"depth":19,"links":44423},[44424,44425,44426,44427,44432],{"id":44099,"depth":19,"text":44100},{"id":44128,"depth":19,"text":44129},{"id":44141,"depth":19,"text":44142},{"id":44219,"depth":19,"text":44220,"children":44428},[44429,44430,44431],{"id":44234,"depth":279,"text":44235},{"id":44271,"depth":279,"text":44272},{"id":44312,"depth":279,"text":44313},{"id":36476,"depth":19,"text":40413},"2023-04-03","Learn how to deploy StreamNative Platform on minikube for tests and development.","\u002Fimgs\u002Fblogs\u002F642a3791be4c9ab95ec754d4_install-streamnative-platform-on-minikube.png",{},"\u002Fblog\u002Finstall-streamnative-platform-on-minikube",{"title":44075,"description":44434},"blog\u002Finstall-streamnative-platform-on-minikube",[3550,821,16985,303],"5YHy_G2Jn0I-MAYceXq-_WSG3hr_bNhZbcdfT9_IEAM",{"id":44443,"title":44444,"authors":44445,"body":44447,"category":3550,"createdAt":290,"date":44532,"description":44533,"extension":8,"featured":294,"image":44534,"isDraft":294,"link":290,"meta":44535,"navigation":7,"order":296,"path":41710,"readingTime":7986,"relatedResources":290,"seo":44536,"stem":44537,"tags":44538,"__hash__":44539},"blogs\u002Fblog\u002Fstreamnative-clouds-pulsar-as-a-service-now-available-on-google-cloud-marketplace.md","StreamNative Cloud's Pulsar-as-a-Service now available on Google Cloud Marketplace",[41695,44446],"Benjamin Nelson",{"type":15,"value":44448,"toc":44527},[44449,44452,44456,44459,44470,44474,44477,44480,44482,44485,44487,44515,44523,44525],[48,44450,44451],{},"StreamNative Cloud’s Pulsar-as-a-Service solution is now available on Google Cloud Marketplace, delivering a turn-key solution with enterprise-grade security and SLA. Built on top of Apache Pulsar, StreamNative Cloud provides a high-performance, low-latency platform capable of processing, analyzing, and acting on billions of events per second.",[32,44453,44455],{"id":44454},"streamnative-clouds-pulsar-as-a-service","StreamNative Cloud’s Pulsar-as-a-Service",[48,44457,44458],{},"StreamNative Cloud provides a versatile set of flexible tools for engineering teams of all sizes to run and manage Pulsar at scale. With protocol handlers supporting Kafka, RabbitMQ, and MQTT, StreamNative Cloud is an excellent choice for those looking to streamline their event-driven applications while having access to the full capabilities of Apache Pulsar, including Pulsar Functions, Pulsar I\u002FO, and Tiered Storage.",[321,44460,44461,44464,44467],{},[324,44462,44463],{},"Scalability without Downtime - StreamNative manages Pulsar clusters without requiring manual partition rebalancing or long maintenance periods.",[324,44465,44466],{},"Built-in Enterprise-Grade Security - Manage Pulsar clusters with governance and full audit control while maintaining global compliance standards.",[324,44468,44469],{},"Backed by the original creators of Apache Pulsar - Unparalleled expertise with enterprise-grade support from StreamNative.",[32,44471,44473],{"id":44472},"streamnative-cloud-on-google-cloud-marketplace","StreamNative Cloud on Google Cloud Marketplace",[48,44475,44476],{},"StreamNative Cloud's Pulsar-as-a-Service solution on Google Cloud Marketplace provides a simplified procurement process, with streamlined purchasing and deployment and unified billing included in the Google Cloud invoice. Additionally, it is fully integrated with other Google Cloud  services, such as BigQuery, Dataflow, and Pub\u002FSub, providing customers with a complete end-to-end data processing solution.",[48,44478,44479],{},"“Streamlined enterprise data processing is essential to businesses today,” Dai Vu, Managing Director, Marketplace & ISV GTM Programs, Google Cloud. “With StreamNative Cloud’s Pulsar-as-a-Service now available on Google Cloud Marketplace, organizations can strengthen their data processing and analysis capabilities, ultimately driving stronger insights.\"",[32,44481,22668],{"id":2146},[48,44483,44484],{},"Get started with StreamNative Cloud on Google Cloud Marketplace with three easy steps:",[48,44486,3931],{},[1666,44488,44489,44499,44505],{},[324,44490,44491,44492,44496],{},"Subscribe via your ",[55,44493,44495],{"href":24192,"rel":44494},[264],"Google Cloud account.",[384,44497],{"alt":18,"src":44498},"\u002Fimgs\u002Fblogs\u002F641cc8622f66a6e916e5e87e_Google-Edits-StreamNative-Cloud%E2%80%99s-Pulsar-as-a-Service-now-available-on-GCP-Marketplace.webp",[324,44500,44501,44502],{},"Select Manage on Provider to sign-up on the StreamNative Cloud Console.\n",[384,44503],{"alt":18,"src":44504},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F641cd17902b2d6131aea9c54_Google%20Edits%20-%20StreamNative%20Cloud%E2%80%99s%20Pulsar-as-a-Service%20now%20available%20on%20GCP%20Marketplace%20(1).webp",[324,44506,44507,44508,44511],{},"Follow the sign-up flow to create your organization and your first Pulsar Cluster.\n",[384,44509],{"alt":18,"src":44510},"\u002Fimgs\u002Fblogs\u002F641cbd1648e13b99294cfd7a_zEmwhWY1JtCKsS_0KJOJy9LpIXh0BpiG0rKtwRjN1pur-4p1hKEvOWCgk_TSwU3Wce_ZfYjrnG65Y15u6r4WNCEMugGhxra8t-sUuHQhPiVYEV441VkfwPbdCEbk2EnJbygOYOFABK8l7pf6xO8gRHM.png",[55,44512,44513],{"href":38403},[34077,44514],{"value":34079},[48,44516,44517,44518],{},"Already have a PoC and you’re ready to move to a fully-managed or fully-hosted cluster with 24\u002F7 support and enterprise SLAs? ",[55,44519,44522],{"href":44520,"rel":44521},"https:\u002F\u002Fconsole.streamnative.cloud\u002F?defaultMethod=login",[264],"Sign up here.",[48,44524,3931],{},[48,44526,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":44528},[44529,44530,44531],{"id":44454,"depth":279,"text":44455},{"id":44472,"depth":279,"text":44473},{"id":2146,"depth":279,"text":22668},"2023-03-28","StreamNative Cloud's Pulsar-as-a-Service solution on Google Cloud Marketplace provides a simplified procurement process, with streamlined purchasing and deployment and unified billing included in the Google Cloud invoice. Additionally, it is fully integrated with other Google Cloud services, such as BigQuery, Dataflow, and Pub\u002FSub, providing customers with a complete end-to-end data processing solution.","\u002Fimgs\u002Fblogs\u002F641cbcceb504dd1d0a53ed92_GCP-Marketplace.png",{},{"title":44444,"description":44533},"blog\u002Fstreamnative-clouds-pulsar-as-a-service-now-available-on-google-cloud-marketplace",[3550,821,4301],"Da9JwIvBggDH8b_vPR8q8FMrbyBLnVp2tHiP9tv1PQI",{"id":44541,"title":44542,"authors":44543,"body":44545,"category":290,"createdAt":290,"date":44829,"description":44830,"extension":8,"featured":294,"image":44831,"isDraft":294,"link":290,"meta":44832,"navigation":7,"order":296,"path":44833,"readingTime":31039,"relatedResources":290,"seo":44834,"stem":44835,"tags":44836,"__hash__":44837},"blogs\u002Fblog\u002Fopentelemetry-metrics-primer-for-java-developers.md","OpenTelemetry Metrics Primer for Java Developers",[44544],"Asaf Mesika",{"type":15,"value":44546,"toc":44816},[44547,44563,44566,44570,44573,44576,44590,44593,44596,44604,44607,44611,44614,44618,44621,44624,44630,44633,44636,44640,44643,44674,44677,44683,44687,44690,44693,44697,44700,44706,44715,44718,44721,44725,44728,44731,44739,44742,44745,44759,44762,44765,44771,44774,44777,44780,44786,44789,44795,44798,44802,44805,44808,44810,44813],[48,44548,44549,44550,44555,44556,44559,44560,44562],{},"I spent the last months learning about ",[55,44551,44554],{"href":44552,"rel":44553},"https:\u002F\u002Fopentelemetry.io\u002F",[264],"OpenTelemetry"," and its Java SDK while researching how to integrate it into ",[55,44557,821],{"href":23526,"rel":44558},[264]," at my work at ",[55,44561,4496],{"href":10259},", which provides flexible Pulsar-as-a-Service that can run in the cloud. If you don’t know Pulsar, you should — it’s a game-changer technology.",[48,44564,44565],{},"OpenTelemetry is a project that is gaining traction these days. Understanding what it is, its features, and how it works requires quite a substantial amount of time (days), even if you try getting some help from Google by using the articles or videos that appear there. In this blog post, I’ll try to summarize the key information to save you a lot of time.",[40,44567,44569],{"id":44568},"super-short-intro-to-opentelemetry","Super short intro to OpenTelemetry",[48,44571,44572],{},"Before diving into the Metrics part of OpenTelemetry, we need a basic understanding of the project.",[48,44574,44575],{},"OpenTelemetry’s goal is to provide a complete solution for telemetry applications. Telemetry means Metrics, Traces, and Logs. Complete means:",[321,44577,44578,44581,44584,44587],{},[324,44579,44580],{},"Defining an API, meaning a library containing interfaces for you to use to define metrics, report their values, define loggers, report logs, and define traces and report spans for them.",[324,44582,44583],{},"Creating implementation for those APIs — called the SDK — which also contains additional functionality for manipulating the telemetry and exporting it in various formats.",[324,44585,44586],{},"Creating an efficient protocol for relaying this telemetry data. The protocol here mainly means schema for the data (i.e., Protobuf schema), its encoding (Protobuf), and the protocol to use to carry it on the wire (gRPC or HTTP).",[324,44588,44589],{},"A Telemetry Collector, a lightweight process written in Go, which allows you to configure multiple ways to receive the data (protocols, push\u002Fpull), transform it, and then send it to various destinations. The latter includes some open-source formats and databases and some proprietary vendors. You can extend it easily by writing a plugin to either of the 3: source, transform, or sink. Most chances, you won’t need to since there are so many community contributions already. You can bundle all the plugins you need yourself or just use a binary distribution (Docker image primarily) by a specific vendor containing their specific plugins.",[48,44591,44592],{},"The novelty of OpenTelemetry (a.k.a., OTel for short) is that they wanted it to look the same way in every language, so they created specifications for both the API and the SDK. If you understand the basic entities of the SDK and the API in one language, switching between different languages using its respective SDK should feel almost the same.",[48,44594,44595],{},"Their end goal is that every library will use OTel API. Today, library owners have two ways to expose their metrics to your application:",[1666,44597,44598,44601],{},[324,44599,44600],{},"Write an extension for each metric framework (Dropwizard, Prometheus Client, Micrometer, etc.) to expose the metrics to it. Application developers using your library will also use the extension, matching their metrics framework.",[324,44602,44603],{},"Not everybody uses the popular metrics frameworks, so library developers are forced to create a bespoke interface (since there aren’t standards yet for this) for supplying the metrics, and you implement this interface to connect it to your custom metrics framework.",[48,44605,44606],{},"OTel aims to be the interface through which the library reports logs and traces as well. In Java, the logging bit feels like that today due to SLF4J, as most libraries are using it and most logging frameworks support a bridge from SLF4J to them. The key difference in OpenTelemetry is that they don’t want to rely on static variables, so they encourage library maintainers to receive the OpenTelemetry interface via a parameter at the library initialization and use that to report metrics, logs, and traces.",[40,44608,44610],{"id":44609},"the-api","The API",[48,44612,44613],{},"Before I explain what the API is used for and what it offers, let’s see a few concepts used in OTel.",[32,44615,44617],{"id":44616},"concepts","Concepts",[48,44619,44620],{},"In OTel Instruments are the entities through which you report measurements. An instrument is very much like in real life, a device, but since this is a programming language, it’s in the form of an object you use through its methods. The instrument methods allow you to report Measurements. For example, add 5 to http.request.body.lines, add -1 to processing.jobs.executing, and report 32 (milliseconds) to http.server.response.latency. The numbers are the measurements.",[48,44622,44623],{},"When you report a measurement to an instrument, you are most likely doing it for specific Attributes. For example, if you have an instrument named http.server.response.latency, you would report a specific response latency together with several attributes of the request, such as response status code and request method:",[8325,44625,44628],{"className":44626,"code":44627,"language":8330},[8328],"httpResponseLatency.record(32,\n Attributes.of(\n   AttributeKey.longKey(\"statusCode\"), 404L,\n   AttributeKey.stringKey(\"method\"), \"GET\"));\n",[4926,44629,44627],{"__ignoreMap":18},[48,44631,44632],{},"Attributes are key-value pairs of attribute name and attribute value.",[48,44634,44635],{},"Instruments are grouped into Meters, each having a name and a version. All instrument creation is done through a Meter. In your microservice, you will use a meter for its metrics, while your connection pool library will have its Meter and its instruments defined using it.",[32,44637,44639],{"id":44638},"instruments","Instruments",[48,44641,44642],{},"Instruments have a name, like http.request.count, a description (will show up in UIs like Grafana), and a unit. The instruments offered by the API are:",[321,44644,44645,44648,44651,44654],{},[324,44646,44647],{},"Counter — An instrument that only increases and never decreases: DoubleCounter, LongCounter. Examples: HTTP request count, number of logins, etc.",[324,44649,44650],{},"UpDownCounter — An instrument that can increase or decrease: DoubleUpDownCounter, LongUpDownCounter. Examples: Number of concurrently running background jobs, number of active connections, etc. It’s a number that you can aggregate across attributes. This is very different from a Gauge.",[324,44652,44653],{},"Gauge — An instrument only registered via a callback - a function returning the gauge value. A gauge value cannot be aggregated across attributes. Gauge examples are temperature and CPU usage.",[324,44655,44656,44657,44662,44663,44667,44668,44673],{},"Histogram — used to collect measurements that are aggregated to statistically meaningful numbers. OTel supports Explicit Bucket Histograms and Exponential Bucket Histograms, while Summary is not supported (There is an ",[55,44658,44661],{"href":44659,"rel":44660},"https:\u002F\u002Fgithub.com\u002Fopen-telemetry\u002Fopentelemetry-specification\u002Fissues\u002F2704",[264],"issue"," addressing that). As opposed to the known metric libraries, in OTel, there isn’t a specific interface for an explicit bucket or exponential histogram (In the Prometheus client, you have ",[55,44664,319],{"href":44665,"rel":44666},"https:\u002F\u002Fgithub.com\u002Fprometheus\u002Fclient_java#summary",[264]," for summary and ",[55,44669,44672],{"href":44670,"rel":44671},"https:\u002F\u002Fgithub.com\u002Fprometheus\u002Fclient_java#histogram",[264],"Histogram"," for explicit bucket histogram). There is a way to configure OpenTelemetry (the SDK — implementation), upon initialization, instructing what histograms would be by default and deciding that also for specific histograms — i.e., decide whether it will be an Explicit Bucket or Exponential Bucket and specify the bucket list. I’ll describe that in the SDK section. The interfaces are DoubleHistogram and LongHistogram.",[48,44675,44676],{},"Here is a code example for defining instruments using the API only.",[8325,44678,44681],{"className":44679,"code":44680,"language":8330},[8328],"LongCounter bytesOutCounter = meter.counterBuilder(\"pulsar_bytes_out\")\n       .setDescription(\"Size of messages dispatched from this broker to consumers\")\n       .setUnit(\"bytes\")\n       .build();\n    \nmeter.gaugeBuilder(\"room_temperature\")\n       .setUnit(\"celsius\")\n       .buildWithCallback(observableDoubleMeasurement ->\n               observableDoubleMeasurement.record(\n                     RoomManager.currentRoom().getTemperature(),\n                     Attributes.of(\n                             AttributeKey.stringKey(\"room\"),\n                             RoomManager.currentRoom().getName())));\n\nmeter.histogramBuilder(\"http.response.latency\")\n       .setUnit(\"seconds\")\n       .setDescription(\"HTTP Response Latency\")\n       .build();\n",[4926,44682,44680],{"__ignoreMap":18},[40,44684,44686],{"id":44685},"the-sdk","The SDK",[48,44688,44689],{},"As we explained before, the SDK is the implementation of the interfaces contained within the API: MeterProvider, Meter, and all the instruments described above. It also contains several other entities used for reading and exporting the metrics and configuring instruments further (override).",[48,44691,44692],{},"Before we explain Metric Reader, Metric Exporter, and Views, we first need to learn an important concept in OTel called Aggregations.",[32,44694,44696],{"id":44695},"aggregations","Aggregations",[48,44698,44699],{},"When you learn OTel for the first time by reading its API or just trying out its API, you stumble across the following scenario ending up with a question: “I just defined a histogram, but I can’t find a way to define its buckets — how can it be?!”",[8325,44701,44704],{"className":44702,"code":44703,"language":8330},[8328],"meter.histogramBuilder(\"http.response.latency\")\n       .setUnit(\"seconds\")\n       .setDescription(\"HTTP Response Latency\")\n       .build();\n",[4926,44705,44703],{"__ignoreMap":18},[48,44707,44708,44709,44714],{},"You expected to have setBuckets(10, 100, 1000, 5000), but this method doesn’t exist. There is a logic behind it which is actually pretty amazing, yet there is also ",[55,44710,44713],{"href":44711,"rel":44712},"https:\u002F\u002Fgithub.com\u002Fopen-telemetry\u002Fopentelemetry-specification\u002Fissues\u002F2229",[264],"ongoing work"," to add such a method.",[48,44716,44717],{},"The basic idea in the SDK is that an instrument has an associated aggregation, which is an object through which you feed the measurements, and it’s the one deciding how it aggregates those measurements and what it outputs. For example, when you define a Counter, you normally have a Sum aggregation associated with it, adding the measurements you report (those +1, +3) into a sum counter variable. Upon collection, it emits the counter sum so far. Another example is Explicit Bucket aggregation: When you report the measurement, it finds the matching bucket counter, increases it by 1, and increases a sum counter by the measurement. It emits a sum of the values, a count of the values, and a bucket counter counting each value reported matching the bucket boundaries.",[48,44719,44720],{},"There are sensible default aggregations per instrument, like Sum for Counter or Explicit Buckets for a histogram. The latter also comes with a default bucket boundaries list. OTel allows you to override the default aggregation and configure it per instrument using another concept called Views which you configure upon initialization. The last part is exactly why people created the GitHub issue above since, in some cases, it doesn’t make sense to split the definition of a histogram into two separate places in your code.",[32,44722,44724],{"id":44723},"views","Views",[48,44726,44727],{},"Views are the most powerful tool OTel SDK offers, and it is a unique feature compared to all other metric libraries.",[48,44729,44730],{},"You can configure multiple views for an instrument. A view allows you to define an aggregation, configure it, and override the name, description, and units. In essence, you create multiple instruments from the same original instrument. Think of it as such: When you defined an instrument with a name, you defined a way to report many numbers (measurements). A view takes all those measurements as input and uses the aggregation defined to create a metric, using the name, units, and description defined in the view (if not defined, take the defaults from the instrument definition). So you can decide, for example, to take http.response.latency which was defined as a histogram, and create 2 views for it:",[1666,44732,44733,44736],{},[324,44734,44735],{},"An explicit bucket histogram, using buckets (1, 10, 1000) named http.response.latency.",[324,44737,44738],{},"A metric showing the last latency collected named http.response.latency.last where you defined a Last aggregation (which only keeps the last measurement reported and emits it as gauge)",[48,44740,44741],{},"If you only define a single view for an instrument, you just override the original definition and perhaps override the default aggregation and its default configuration.",[48,44743,44744],{},"The second strong part about views is that you can also define them to be applied to multiple instruments. For example, you can say that all instruments with histogram type named “*latency” should have their aggregation set to Explicit Histogram and have their buckets be 10, 200, 3000. It is done by something called an Instrument Selector, allowing you to choose multiple instruments based on the following:",[321,44746,44747,44750,44753,44756],{},[324,44748,44749],{},"name wildcard",[324,44751,44752],{},"instrument type",[324,44754,44755],{},"instrumentation scope (I will explain it later)",[324,44757,44758],{},"…",[48,44760,44761],{},"For each instrument selected, the view defined will be added.",[48,44763,44764],{},"Here’s a code example:",[8325,44766,44769],{"className":44767,"code":44768,"language":8330},[8328],"SdkMeterProvider meterProvider = SdkMeterProvider.builder()\n       .registerView(\n               InstrumentSelector.builder()\n                       .setName(\"*latency\")\n                       .build(),\n               View.builder()\n                       .setAggregation(Aggregation.explicitBucketHistogram(List.of(10.0, 20.0, 100.0)))\n                       .build())\n       .registerView(\n               InstrumentSelector.builder()\n                       .setMeterName(\"hikari\")\n                       .setType(InstrumentType.HISTOGRAM)\n                       .build(),\n               View.builder()\n                       .setAggregation(Aggregation.explicitBucketHistogram(List.of(2.0, 10.0, 50.0, 200.0)))\n                       .build())\n       .build();\n",[4926,44770,44768],{"__ignoreMap":18},[48,44772,44773],{},"Views provide a brilliant way to manipulate metrics you didn’t code yourself — coming from the libraries you use. You can decide whether a latency reported in the Hikari Connection Pool library will have buckets as you wish it to be (something you can’t do in other metric frameworks) or even drop it by setting the Drop aggregation for certain instruments of that library.",[48,44775,44776],{},"Finally, views also allow you to select only a subset of the reported attributes, thus achieving less cardinality without losing data since the measurements will be rolled up to your defined attributes.",[48,44778,44779],{},"Your HTTP client may have the following in its code:",[8325,44781,44784],{"className":44782,"code":44783,"language":8330},[8328],"var attr = Attributes.of(AttributeKey.longKey(\"statusCode\"), requestStatusCode,\n             AttributeKey.stringKey(\"method\"), requestMethod);\nhttpRequestLatency.record(requestLatency, attr)\n",[4926,44785,44783],{"__ignoreMap":18},[48,44787,44788],{},"You can decide to modify it only to include the attribute statusCode:",[8325,44790,44793],{"className":44791,"code":44792,"language":8330},[8328],".registerView(\n       InstrumentSelector.builder()\n               .setMeterName(\"http-commons\")\n               .setName(\"http.request.latency\")\n               .build(),\n       View.builder()\n               .setAttributeFilter(attrName -> attrName.equals(\"statusCode\"))\n               .build())\n",[4926,44794,44792],{"__ignoreMap":18},[48,44796,44797],{},"In the implementation, when you report the value 30 associated with the attributes (statusCode=500, method=GET), it will modify the attributes to be (statusCode=500) and report the value 30 for it; thus, you achieve a roll-up of the (statusCode, method) to statusCode for the instrument the view is configured for. It means that the roll-up is only in the scope of a single instrument, not multiple.",[32,44799,44801],{"id":44800},"metric-reader-and-exporter","Metric Reader and Exporter",[48,44803,44804],{},"When you initialize the SDK, you can (should) provide a Metric Reader. It’s the component that reads the metrics from the SDK and uses a Metric Exporter to expose them out — either via a pull mechanism (like exposing a REST endpoint that responds with the metrics in a certain format) or a push mechanism which periodically pushes the metrics to the exporter (writing it in OTLP protocol to Open Telemetry Collector).",[48,44806,44807],{},"Some Metric Readers have a bundled exporter like Prometheus Metric Exporter. Others, like the Periodic Metric Reader, require you to pass an exporter when creating them. Exporters can be OTLP gRPC exporters or HTTP OTLP Exporters.",[40,44809,319],{"id":316},[48,44811,44812],{},"OTel is, in my opinion, the best metric library created for the JVM. They literally thought of everything and managed to design it with elegance. Using specifications to make all SDKs look the same is brilliant, as it makes moving between languages a breeze, and packing it with an external collector capable of modifying, keeping state, and exporting to all the destinations needed. The only downside OTel has is the documentation, as it requires you to take a few days at the very least to understand how it works and how to use it, and I hope in time, it will improve. This blog post's goal was to try to explain it “shortly,” so in 10–20 minutes of reading, you’ll understand the basic workings of it.",[48,44814,44815],{},"I haven’t touched all the aspects of OTel Metrics — I will leave them to future blog posts. I believe OTel will revolutionize the Metrics JVM frameworks, just like Docker and Maven were in their respective terms.",{"title":18,"searchDepth":19,"depth":19,"links":44817},[44818,44819,44823,44828],{"id":44568,"depth":19,"text":44569},{"id":44609,"depth":19,"text":44610,"children":44820},[44821,44822],{"id":44616,"depth":279,"text":44617},{"id":44638,"depth":279,"text":44639},{"id":44685,"depth":19,"text":44686,"children":44824},[44825,44826,44827],{"id":44695,"depth":279,"text":44696},{"id":44723,"depth":279,"text":44724},{"id":44800,"depth":279,"text":44801},{"id":316,"depth":19,"text":319},"2023-03-13","Learn the basics of OpenTelemetry.","\u002Fimgs\u002Fblogs\u002F640e7f87b006224ae0cc2adb_OpenTelemetry-Metrics-Primer-for-Java-Developers.png",{},"\u002Fblog\u002Fopentelemetry-metrics-primer-for-java-developers",{"title":44542,"description":44830},"blog\u002Fopentelemetry-metrics-primer-for-java-developers",[821,26747],"4HKQtKzke5FRJfSKhzRF7k0gujJ6LxufOfJJvmpYxys",{"id":44839,"title":44840,"authors":44841,"body":44844,"category":3550,"createdAt":290,"date":45239,"description":45240,"extension":8,"featured":294,"image":45241,"isDraft":294,"link":290,"meta":45242,"navigation":7,"order":296,"path":45243,"readingTime":3556,"relatedResources":290,"seo":45244,"stem":45245,"tags":45246,"__hash__":45247},"blogs\u002Fblog\u002Fannouncing-the-snowflake-sink-connector-for-apache-pulsar.md","Announcing the Snowflake Sink Connector for Apache Pulsar",[44842,44843],"Bonan Hou","Alice Bi",{"type":15,"value":44845,"toc":45228},[44846,44849,44853,44856,44862,44866,44869,44872,44875,44879,44882,44905,44909,44912,44942,44946,44959,44963,44965,44968,44993,44997,45004,45016,45019,45024,45030,45035,45041,45046,45052,45056,45063,45066,45069,45075,45078,45084,45088,45091,45112,45115,45129,45132,45138,45141,45153,45156,45162,45164],[48,44847,44848],{},"We are excited to share that the Snowflake Sink Connector for Apache Pulsar is now generally available. This connector enables you to utilize Pulsar to preprocess data from a variety of sources and seamlessly offload the processed data into Snowflake in real-time. The Snowflake sink connector allows you to leverage Pulsar and Snowflake to develop high-performance data applications and perform advanced analytics.",[40,44850,44852],{"id":44851},"what-is-the-snowflake-sink-connector-for-apache-pulsar","What is the Snowflake Sink Connector for Apache Pulsar?",[48,44854,44855],{},"The Snowflake sink connector for Apache Pulsar is a tool that pulls data from Pulsar topics and securely stores it in Snowflake. This connector provides a seamless and efficient way to persist data to Snowflake.",[48,44857,44858],{},[384,44859],{"alt":44860,"src":44861},"offload data from Pulsar topics to Snowflake ","\u002Fimgs\u002Fblogs\u002F640a310276d8341b843cf287_image1.png",[40,44863,44865],{"id":44864},"why-snowflake-apache-pulsar","Why Snowflake + Apache Pulsar?",[48,44867,44868],{},"Snowflake is a global cloud data warehouse that provides organizations with the tools to transform their data into valuable, real-time, and predictive insights. By uniting isolated data silos, Snowflake enables users to build data applications, models, and pipelines directly where the data is stored. Snowflake can handle a wide range of workloads, of varying types and scales, and can operate efficiently across multiple clouds. However, to fully leverage the capabilities of Snowflake, it is critical to ingest data in real-time.",[48,44870,44871],{},"Apache Pulsar is a real-time data platform designed to simplify the complexities of messaging and streaming workloads, making it easier to build end-to-end data pipelines. With its extensive range of connectors and serverless functions, Pulsar is ideal for integrating many different data sources and loading the data into Snowflake.",[48,44873,44874],{},"StreamNative, a company that provides a cloud-native data streaming platform powered by Apache Pulsar, developed the Snowflake Sink Connector for Apache Pulsar. This connector makes it simple for Snowflake users to utilize the full capabilities of Pulsar, enabling them to streamline data integration and ingestion into Snowflake. With this connector, Snowflake users can easily leverage Pulsar’s rich set of features to build high-performance data applications and perform advanced analytics.",[40,44876,44878],{"id":44877},"what-are-the-benefits-of-using-the-snowflake-connector","What are the benefits of using the Snowflake connector?",[48,44880,44881],{},"The Snowflake sink connector provides several key benefits:",[321,44883,44884,44887,44890],{},[324,44885,44886],{},"Simplicity: Easily load data from Pulsar to Snowflake in real-time without the need to write user code.",[324,44888,44889],{},"Efficiency: Reduce your time in configuring the data layer. This means you have more time to discover the maximum business value from real-time data in an effective manner.",[324,44891,44892,44893,44898,44899,44904],{},"Flexible configuration: Configure the connector using a JSON or YAML file when running connectors in a cluster with ",[55,44894,44897],{"href":44895,"rel":44896},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Ffunctions-worker\u002F",[264],"Pulsar Function Worker",". For those running connectors with Function Mesh, ",[55,44900,44903],{"href":44901,"rel":44902},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Fconcepts\u002Fextend-kubernetes\u002Fapi-extension\u002Fcustom-resources\u002F",[264],"CustomResourceDefinitions (CRD)"," can be created to create a Snowflake sink connector.",[40,44906,44908],{"id":44907},"what-are-the-features-of-the-snowflake-connector","What are the features of the Snowflake connector?",[48,44910,44911],{},"The Snowflake sink connector offers a rich set of features:",[1666,44913,44914,44917,44920,44936,44939],{},[324,44915,44916],{},"Delivery guarantees: The connector supports the at-least-once delivery guarantees to ensure zero message loss.",[324,44918,44919],{},"Auto table creation: Configure the connector such that tables are automatically created when they do not exist. Mapping relationships between topics and tables can also be specified.",[324,44921,44922,44923,1186,44926,1186,44929,44931,44932,44935],{},"Metadata fields mapping: The connector allows you to map the metadata of a Pulsar message. Metadata fields including ",[44,44924,44925],{},"message_id",[44,44927,44928],{},"partition",[44,44930,9857],{},",  and ",[44,44933,44934],{},"event_time"," are automatically created. Other supported fields include schema_version, event_time, publish_time, sequence_id, and producer_name.",[324,44937,44938],{},"Schema conversion: The connector supports Pulsar schema conversions for JSON, AVRO, and PRIMITIVE.",[324,44940,44941],{},"Batch sending: Configure the buffer size and latency for the Snowflake connector to increase write throughput and enable batch sending.",[40,44943,44945],{"id":44944},"how-to-get-started-with-the-snowflake-connector","How to get started with the Snowflake connector?",[48,44947,44948,44949,44954,44955,44958],{},"In this section, we walk through how to deploy the Snowflake connector depending on where you run it. In self-managed open-source Pulsar, ",[55,44950,44953],{"href":44951,"rel":44952},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.10.x\u002Ffunctions-worker\u002F",[264],"function workers"," must be used to set up the connector. In StreamNative Cloud, you can leverage our built-in, cloud-native Kubernetes operator – ",[55,44956,29463],{"href":44957},"\u002Fblog\u002Frelease\u002F2021-05-03-function-mesh-open-source\u002F"," – to deploy the connector.",[32,44960,44962],{"id":44961},"start-the-connector-in-open-source-pulsar-using-functions-workers","Start the connector in open-source Pulsar using Functions Workers",[3933,44964,10104],{"id":10103},[48,44966,44967],{},"You should have an Apache Pulsar cluster and a Snowflake service set up:",[321,44969,44970,44978],{},[324,44971,44972,44973,44977],{},"If you are using the self-managed open-source version, you can run Pulsar in standalone mode on your machine. Refer to the ",[55,44974,7120],{"href":44975,"rel":44976},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.9.x\u002Fgetting-started-standalone\u002F",[264]," for information on how to set this up.",[324,44979,44980,44981,44986,44987,44992],{},"Ensure that your Snowflake service is properly configured. Refer to the Snowflake ",[55,44982,44985],{"href":44983,"rel":44984},"https:\u002F\u002Fdocs.snowflake.com\u002Fen\u002Fuser-guide-getting-started.html",[264],"Quickstarts"," for detailed instructions. It is important to note that ",[55,44988,44991],{"href":44989,"rel":44990},"https:\u002F\u002Fdocs.snowflake.com\u002Fen\u002Fuser-guide\u002Fdata-load-snowpipe-rest-gs.html#step-3-configure-security-per-user",[264],"security settings must be configured"," to access Snowflake.",[3933,44994,44996],{"id":44995},"get-the-connector","Get the connector",[48,44998,44999,45000,45003],{},"If you plan to run the Snowflake sink connector in a cluster using ",[55,45001,44897],{"href":44895,"rel":45002},[264],", you can obtain it using one of the following methods.",[321,45005,45006,45013],{},[324,45007,45008,190],{},[55,45009,45012],{"href":45010,"rel":45011},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-snowflake\u002Freleases\u002Fdownload\u002Fv2.9.4.3\u002Fpulsar-io-snowflake-2.9.4.3.nar",[264],"Download the NAR package",[324,45014,45015],{},"Build it from the source code.",[48,45017,45018],{},"To build the Snowflake sink connector from the source code, follow these steps.",[1666,45020,45021],{},[324,45022,45023],{},"Clone the source code to your machine.",[8325,45025,45028],{"className":45026,"code":45027,"language":8330},[8328],"git clone https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-snowflake\n",[4926,45029,45027],{"__ignoreMap":18},[1666,45031,45032],{},[324,45033,45034],{},"Build the connector in the pulsar-io-snowflake directory.",[8325,45036,45039],{"className":45037,"code":45038,"language":8330},[8328],"mvn clean install -DskipTests\n",[4926,45040,45038],{"__ignoreMap":18},[1666,45042,45043],{},[324,45044,45045],{},"After the connector is successfully built, a NAR package is generated under the target directory.",[8325,45047,45050],{"className":45048,"code":45049,"language":8330},[8328],"ls target\nPulsar-io-snowflake-{{connector:version}}.nar\n",[4926,45051,45049],{"__ignoreMap":18},[3933,45053,45055],{"id":45054},"configure-the-connector","Configure the connector",[48,45057,45058,45059,45062],{},"You can create a configuration file (JSON or YAML) to set the properties if you use ",[55,45060,44897],{"href":44895,"rel":45061},[264]," to run connectors in a cluster.",[48,45064,45065],{},"Here is an example of how to set the properties in JSON and YAML formats.",[48,45067,45068],{},"JSON",[8325,45070,45073],{"className":45071,"code":45072,"language":8330},[8328],"{\n     \"tenant\": \"public\",\n     \"namespace\": \"default\",\n     \"name\": \"snowflake-sink\",\n     \"archive\": \"connectors\u002Fpulsar-io-snowflake-{{connector:version}}.nar\",\n     \"inputs\": [\n       \"test-snowflake-pulsar\"\n     ],\n     \"parallelism\": 1,\n     \"retainOrdering\": true,\n     \"processingGuarantees\": \"ATLEAST_ONCE\",\n     \"sourceSubscriptionName\": \"sf_sink_sub\",\n     \"configs\": {\n       \"user\": \"TEST\",\n       \"host\": \"ry77682.us-central1.gcp.snowflakecomputing.com:443\",\n       \"schema\": \"DEMO\",\n       \"warehouse\": \"SNDEV\",\n       \"database\": \"TESTDB\",\n       \"privateKey\": \"SECRETS\"\n   }\n }\n",[4926,45074,45072],{"__ignoreMap":18},[48,45076,45077],{},"YAML",[8325,45079,45082],{"className":45080,"code":45081,"language":8330},[8328],"tenant: public\nnamespace: default\nname: snowflake-sink\nparallelism: 1\ninputs:\n  - test-snowflake-pulsar\narchive: connectors\u002Fpulsar-io-snowflake-{{connector:version}}.nar\nsourceSubscriptionName: sf_sink_sub\nretainOrdering: true\nprocessingGuarantees: ATLEAST_ONCE\nconfigs:\n  user: TEST\n  host: ry77682.us-central1.gcp.snowflakecomputing.com:443\n  schema: DEMO\n  warehouse: SNDEV\n  database: TESTDB\n  privateKey: SECRETS\n",[4926,45083,45081],{"__ignoreMap":18},[32,45085,45087],{"id":45086},"start-the-connector-in-streamnative-cloud-using-function-mesh","Start the connector in StreamNative Cloud using Function Mesh",[3933,45089,10104],{"id":45090},"prerequisites-1",[321,45092,45093,45101],{},[324,45094,45095,45096,3931],{},"Deploy one Pulsar cluster in StreamNative Cloud. For instructions, see ",[55,45097,45100],{"href":45098,"rel":45099},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fcluster",[264],"create clusters through StreamNative Cloud Console.",[324,45102,45103,45104,1154,45107,190],{},"Log in to the ",[55,45105,3911],{"href":24460,"rel":45106},[264],[55,45108,45111],{"href":45109,"rel":45110},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fsnctl-reference",[264],"snctl CLI tool",[3933,45113,44996],{"id":45114},"get-the-connector-1",[48,45116,45117,45118,45123,45124,45128],{},"You can pull the Snowflake sink connector Docker image from the ",[55,45119,45122],{"href":45120,"rel":45121},"https:\u002F\u002Fhub.docker.com\u002Fr\u002Fstreamnative\u002Fpulsar-io-snowflake",[264],"Docker Hub"," if you use ",[55,45125,29463],{"href":45126,"rel":45127},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fconnectors\u002Frun-connector",[264]," to run the connector.",[48,45130,45131],{},"If you use SN Console UI to create the Snowflake connector, you can follow the steps in the screenshot: 1. Connectors > 2. Create a Sink > 3. Select Snowflake",[48,45133,45134],{},[384,45135],{"alt":45136,"src":45137},"Get Snowflake sink connector in StreamNative Console","\u002Fimgs\u002Fblogs\u002F640a313c5e3bdc5de2dac2d6_image2.png",[3933,45139,45055],{"id":45140},"configure-the-connector-1",[48,45142,45143,45144,45147,45148,190],{},"To create a Snowflake sink connector using Function Mesh, you can define a ",[55,45145,44903],{"href":44901,"rel":45146},[264]," file (YAML) with the desired properties. This approach enables seamless integration with the Kubernetes ecosystem. For more information on Pulsar sink CRD configurations, check out our ",[55,45149,45152],{"href":45150,"rel":45151},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fconnectors\u002Fio-crd-config\u002Fsink-crd-config",[264],"resource documentation",[48,45154,45155],{},"Here is an example of how to set the properties in the CRD file (YAML).",[8325,45157,45160],{"className":45158,"code":45159,"language":8330},[8328],"apiVersion: compute.functionmesh.io\u002Fv1alpha1\nkind: Sink\nmetadata:\n  name: snowflake-sink-sample\nspec:\n  image: streamnative\u002Fpulsar-io-snowflake:{{connector:version}}\n  replicas: 1\n  maxReplicas: 1\n  retainOrdering: true\n  input:\n    topics: \n      - persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftest-snowflake-pulsar\n  sinkConfig:\n    user: TEST\n    host: ry77682.us-central1.gcp.snowflakecomputing.com:443\n    schema: DEMO\n    warehouse: SNDEV\n    database: TESTDB\n    privateKey: SECRETS\n  pulsar:\n    pulsarConfig: \"test-pulsar-sink-config\"\n  resources:\n    limits:\n      cpu: \"0.2\"\n      memory: 1.1G\n    requests:\n      cpu: \"0.1\"\n      memory: 1G\n  java:\n    jar: connectors\u002Fpulsar-io-snowflake-{{connector:version}}.nar\n  clusterName: test-pulsar\n  autoAck: false\n",[4926,45161,45159],{"__ignoreMap":18},[40,45163,40413],{"id":36476},[1666,45165,45166,45178,45197,45214,45221],{},[324,45167,45168,45169,4003,45173,190],{},"Learn about the Snowflake Sink Connector for Apache Pulsar by exploring the ",[55,45170,7120],{"href":45171,"rel":45172},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Fsnowflake-sink\u002F2.9.4",[264],[55,45174,45177],{"href":45175,"rel":45176},"https:\u002F\u002Fyoutu.be\u002FK2dxHlXajpo",[264],"video tutorial",[324,45179,45180,45181,1186,45185,1186,45189,5422,45193,190],{},"Pulsar also offers connectors for other data warehouse and lakehouse technologies: ",[55,45182,45184],{"href":45183},"\u002Fblog\u002Fannouncing-google-cloud-bigquery-sink-connector-apache-pulsar","Google Cloud BigQuery Sink Connector",[55,45186,45188],{"href":45187},"\u002Fblog\u002Fannouncing-delta-lake-sink-connector-apache-pulsar","Delta Lake Sink Connector",[55,45190,45192],{"href":45191},"\u002Fblog\u002Fannouncing-hudi-sink-connector-for-pulsar","Hudi Sink Connector",[55,45194,45196],{"href":45195},"\u002Fblog\u002Fannouncing-iceberg-sink-connector-apache-pulsar","Iceberg Sink Connector",[324,45198,45199,45200,1154,45204,45209,45210,3931],{},"Pulsar Summit Europe 2023 is taking place virtually on May 23rd. ",[55,45201,45203],{"href":35357,"rel":45202},[264],"Register today",[55,45205,45208],{"href":45206,"rel":45207},"https:\u002F\u002F6585952.fs1.hubspotusercontent-na1.net\u002Fhubfs\u002F6585952\u002FSponsorship%20Prospectus%20Pulsar%20Virtual%20Summit%20Europe%202023.pdf",[264],"become a community sponsor"," (no fee required).",[55,45211,3931],{"href":45212,"rel":45213},"https:\u002F\u002Fhubs.ly\u002FQ016_Wgd0",[264],[324,45215,45216,45217,45220],{},"Make an inquiry: Interested in a fully-managed Pulsar offering built by the original creators of Pulsar? ",[55,45218,38404],{"href":45219},"\u002Fcontact\u002F"," now.‍",[324,45222,45223,45224,45227],{},"Learn the Pulsar Fundamentals: Sign up for ",[55,45225,31914],{"href":31912,"rel":45226},[264],", developed by the original creators of Pulsar, and learn at your own pace with on-demand courses and hands-on labs.",{"title":18,"searchDepth":19,"depth":19,"links":45229},[45230,45231,45232,45233,45234,45238],{"id":44851,"depth":19,"text":44852},{"id":44864,"depth":19,"text":44865},{"id":44877,"depth":19,"text":44878},{"id":44907,"depth":19,"text":44908},{"id":44944,"depth":19,"text":44945,"children":45235},[45236,45237],{"id":44961,"depth":279,"text":44962},{"id":45086,"depth":279,"text":45087},{"id":36476,"depth":19,"text":40413},"2023-03-09","Use Pulsar to preprocess data from a variety of sources and seamlessly offload the processed data into Snowflake in real-time. This connector allows you to leverage Pulsar and Snowflake to develop high-performance data applications and perform advanced analytics.","\u002Fimgs\u002Fblogs\u002F640a294443273a641ab2af39_Snowflake-sink-connector.png",{},"\u002Fblog\u002Fannouncing-the-snowflake-sink-connector-for-apache-pulsar",{"title":44840,"description":45240},"blog\u002Fannouncing-the-snowflake-sink-connector-for-apache-pulsar",[28572,18653],"StHK23ZhTWO7rzOYC0eh0GITBBGYRFTukFGf95nfCH8",{"id":45249,"title":45250,"authors":45251,"body":45252,"category":821,"createdAt":290,"date":45514,"description":45515,"extension":8,"featured":294,"image":45516,"isDraft":294,"link":290,"meta":45517,"navigation":7,"order":296,"path":45518,"readingTime":33204,"relatedResources":290,"seo":45519,"stem":45520,"tags":45521,"__hash__":45522},"blogs\u002Fblog\u002Funderstanding-and-configuring-mtls-in-apache-pulsar.md","Understanding and Configuring mTLS in Apache Pulsar",[42155],{"type":15,"value":45253,"toc":45504},[45254,45260,45267,45271,45274,45281,45285,45288,45291,45295,45298,45301,45304,45308,45311,45322,45331,45335,45338,45344,45348,45351,45357,45361,45364,45370,45374,45380,45384,45387,45395,45398,45401,45403,45409,45412,45414,45420,45423,45426,45432,45435,45441,45443,45446,45448,45453],[48,45255,45256,45259],{},[55,45257,821],{"href":23526,"rel":45258},[264]," is an open-source messaging and streaming system that provides high throughput and low latency for enterprises. To power use cases requiring strict security controls, it supports a variety of popular security frameworks, like TLS, mTLS, Athenz, Kerberos, JWT, and OAuth2.0. In this blog, I will introduce how to configure mTLS encryption and authentication in Apache Pulsar.",[916,45261,45262],{},[48,45263,45264,45265,190],{},"I will not dive deep into each security mechanism in Pulsar as this blog is focused on mTLS configuration in Pulsar. That said, I do think gaining a basic understanding of them is very important for you to choose the right security policy for your organization. If you want to know more about available security combinations in Pulsar, read the blog ",[55,45266,34047],{"href":34046},[40,45268,45270],{"id":45269},"what-is-mtls","What is mTLS?",[48,45272,45273],{},"Before I talk about mTLS, let me explain TLS and how it works at a cursory level. Transport Layer Security (TLS), formerly known as SSL, is a cryptographic protocol to secure communications between two entities over a network. It guarantees data integrity and confidentiality with a public key, a private key, and a TLS certificate. Only the private key can decrypt the data encrypted by the public key. The certificate, which contains the public key, is used to verify the identity of the server.",[48,45275,45276,45277,45280],{},"Compared with TLS, Mutual TLS (mTLS) is a more secure protocol as it uses two-way authentication to make sure both entities are who they claim to be. As an extension of TLS, mTLS allows both the client and the server to use the certificates of each side to confirm their identities. This way, only trusted entities can have data access.\n",[384,45278],{"alt":18,"src":45279},"\u002Fimgs\u002Fblogs\u002F64055e6ba0f2b88023e538ee_what-is-mtls.webp","Figure 1. Mutual TLS",[40,45282,45284],{"id":45283},"why-do-i-need-mtls-in-pulsar","Why do I need mTLS in Pulsar?",[48,45286,45287],{},"Typically, mTLS is the preferred solution for configuring encryption and authentication on the cloud. As it ensures both the client and the server can verify their identities, it provides an extra layer of security.",[48,45289,45290],{},"By default, there is no security policy applied in Pulsar. Clients communicate with Pulsar in plain text. To protect sensitive information against attackers and eavesdroppers, you can use mTLS to encrypt your data streams in transit. Additionally, Pulsar provides a built-in TLS authentication plugin that can identify the client through the common name in the certificate.",[40,45292,45294],{"id":45293},"configuring-mtls-in-pulsar","Configuring mTLS in Pulsar",[48,45296,45297],{},"To use mTLS, you need some configurations both on the client and on the server. When configuring the server, you may need to set some parameters for Pulsar proxies as well depending on your deployment. Before I introduce how to configure mTLS connections, let me briefly explain why you may need the proxy layer. I believe this contributes to your understanding of how encryption and authentication work in Pulsar.",[48,45299,45300],{},"There are two typical ways to connect to Pulsar depending on how your cluster is deployed. You can either connect to Pulsar brokers directly or send requests to the proxy layer, which routes traffic to brokers. The proxy layer is optional and is commonly used in environments where external requests cannot be directly sent to brokers. For example, you can use proxy Pods to serve as the gateway if your Pulsar cluster is deployed on Kubernetes.",[48,45302,45303],{},"Now that you have a basic understanding of how clients can connect to Pulsar, let’s see how to configure mTLS for transport encryption and identity authentication respectively.",[32,45305,45307],{"id":45306},"mtls-encryption","mTLS encryption",[48,45309,45310],{},"Transport encryption means that you encrypt your data before it travels across the network. As it gets decrypted on the server side, your data remains secure during transmission. To use mTLS encryption, you can set related parameters on the broker, the proxy, and the client. Note that you must create the following certificates and keys beforehand:",[321,45312,45313,45316,45319],{},[324,45314,45315],{},"CA certificate",[324,45317,45318],{},"Server’s certificate and private key",[324,45320,45321],{},"Client’s certificate and private key",[48,45323,45324,45325,45330],{},"For more information on certificates and keys, see ",[55,45326,45329],{"href":45327,"rel":45328},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-tls-transport\u002F#configure-mtls-encryption-with-pem",[264],"the Pulsar documentation",". When you have them ready, refer to the configurations below for each component.",[3933,45332,45334],{"id":45333},"broker-configurations","Broker configurations",[48,45336,45337],{},"Add the following configurations to broker.conf to enable mTLS.",[8325,45339,45342],{"className":45340,"code":45341,"language":8330},[8328],"# TLS ports\nbrokerServicePortTls=6651\nwebServicePortTls=8081\n\n# CA certificate\ntlsTrustCertsFilePath=\u002Fpath\u002Fto\u002Fca.cert.pem\n# Server certificate\ntlsCertificateFilePath=\u002Fpath\u002Fto\u002Fbroker.cert.pem\n# Private key for the server\ntlsKeyFilePath=\u002Fpath\u002Fto\u002Fbroker.key-pk8.pem\n\n# Enable mTLS\ntlsRequireTrustedClientCertOnConnect=true\n\n# Configure TLS for the internal client to connect to the broker\nbrokerClientTlsEnabled=true\nbrokerClientTrustCertsFilePath=\u002Fpath\u002Fto\u002Fca.cert.pem\nbrokerClientCertificateFilePath=\u002Fpath\u002Fto\u002Fclient.cert.pem\nbrokerClientKeyFilePath=\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\n",[4926,45343,45341],{"__ignoreMap":18},[3933,45345,45347],{"id":45346},"proxy-configurations","Proxy configurations",[48,45349,45350],{},"Add the following configurations to proxy.conf to enable mTLS.",[8325,45352,45355],{"className":45353,"code":45354,"language":8330},[8328],"# TLS ports\nservicePortTls=6651\nwebServicePortTls=8081\n\n# CA certificate\ntlsTrustCertsFilePath=\u002Fpath\u002Fto\u002Fca.cert.pem\n# Server certificate\ntlsCertificateFilePath=\u002Fpath\u002Fto\u002Fproxy.cert.pem\n# Private key for the server\ntlsKeyFilePath=\u002Fpath\u002Fto\u002Fproxy.key-pk8.pem\n\n# Enable mTLS\ntlsRequireTrustedClientCertOnConnect=true\n\n# Configure TLS for the internal client to connect to the broker\ntlsEnabledWithBroker=true\nbrokerClientTrustCertsFilePath=\u002Fpath\u002Fto\u002Fca.cert.pem\nbrokerClientCertificateFilePath=\u002Fpath\u002Fto\u002Fclient.cert.pem\nbrokerClientKeyFilePath=\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\n",[4926,45356,45354],{"__ignoreMap":18},[3933,45358,45360],{"id":45359},"cli-tools","CLI tools",[48,45362,45363],{},"Add the following configurations to client.conf to enable mTLS.",[8325,45365,45368],{"className":45366,"code":45367,"language":8330},[8328],"webServiceUrl=https:\u002F\u002Flocalhost:8081\u002F\nbrokerServiceUrl=pulsar+ssl:\u002F\u002Flocalhost:6651\u002F\nauthPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls\nauthParams=tlsCertFile:\u002Fpath\u002Fto\u002Fclient.cert.pem,tlsKeyFile:\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\n",[4926,45369,45367],{"__ignoreMap":18},[3933,45371,45373],{"id":45372},"client-code-examples-java","Client code examples (Java)",[8325,45375,45378],{"className":45376,"code":45377,"language":8330},[8328],"PulsarAdmin admin = PulsarAdmin.builder().serviceHttpUrl(\"https:\u002F\u002Flocalhost:8081\")\n       .tlsTrustCertsFilePath(\"\u002Fpath\u002Fto\u002Fca.cert.pem\")\n       .tlsKeyFilePath(\"\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\")\n       .tlsCertificateFilePath(\"\u002Fpath\u002Fto\u002Fclient.cert.pem\")\n       .build();\n\nPulsarClient client = PulsarClient.builder().serviceUrl(\"pulsar+ssl:\u002F\u002Flocalhost:6651\")\n       .tlsTrustCertsFilePath(\"\u002Fpath\u002Fto\u002Fca.cert.pem\")\n       .tlsKeyFilePath(\"\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\")\n       .tlsCertificateFilePath(\"\u002Fpath\u002Fto\u002Fclient.cert.pem\")\n       .build();\n",[4926,45379,45377],{"__ignoreMap":18},[32,45381,45383],{"id":45382},"mtls-authentication","mTLS authentication",[48,45385,45386],{},"Authentication refers to the process of verifying the identity of requesters using their credentials. To use mTLS authentication, you can set related parameters on the broker, the proxy, and the client. Note that you must create the following certificates and keys beforehand:",[321,45388,45389,45391,45393],{},[324,45390,45315],{},[324,45392,45318],{},[324,45394,45321],{},[48,45396,45397],{},"When you have them ready, refer to the configurations below for each component.",[3933,45399,45334],{"id":45400},"broker-configurations-1",[48,45402,45337],{},[8325,45404,45407],{"className":45405,"code":45406,"language":8330},[8328],"# TLS ports\nbrokerServicePortTls=6651\nwebServicePortTls=8081\n\n# CA certificate\ntlsTrustCertsFilePath=\u002Fpath\u002Fto\u002Fca.cert.pem\n# Server certificate\ntlsCertificateFilePath=\u002Fpath\u002Fto\u002Fbroker.cert.pem\n# Private key for the server\ntlsKeyFilePath=\u002Fpath\u002Fto\u002Fbroker.key-pk8.pem\n\n# Enable mTLS\ntlsRequireTrustedClientCertOnConnect=true\n\n# Enable authentication\nauthenticationEnabled=true\n# Set the TLS authentication plugin\nauthenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderTls\n\n# Configure TLS for the internal client to connect to the broker\nbrokerClientTlsEnabled=true\nbrokerClientTrustCertsFilePath=\u002Fpath\u002Fto\u002Fca.cert.pem\nbrokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls\nbrokerClientAuthenticationParameters={\"tlsCertFile\":\"\u002Fpath\u002Fto\u002Fclient.cert.pem\",\"tlsKeyFile\":\"\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\"}\n",[4926,45408,45406],{"__ignoreMap":18},[3933,45410,45347],{"id":45411},"proxy-configurations-1",[48,45413,45350],{},[8325,45415,45418],{"className":45416,"code":45417,"language":8330},[8328],"# TLS ports\nservicePortTls=6651\nwebServicePortTls=8081\n\n# CA certificate\ntlsTrustCertsFilePath=\u002Fpath\u002Fto\u002Fca.cert.pem\n# Server certificate\ntlsCertificateFilePath=\u002Fpath\u002Fto\u002Fproxy.cert.pem\n# Private key for the server\ntlsKeyFilePath=\u002Fpath\u002Fto\u002Fproxy.key-pk8.pem\n\n# Enable mTLS\ntlsRequireTrustedClientCertOnConnect=true\n\n# Enable authentication\nauthenticationEnabled=true\n# Set the TLS authentication plugin\nauthenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderTls\n\n# Configure TLS for the internal client to connect to the broker\ntlsEnabledWithBroker=true\nbrokerClientTrustCertsFilePath=\u002Fpath\u002Fto\u002Fca.cert.pem\nbrokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls\nbrokerClientAuthenticationParameters={\"tlsCertFile\":\"\u002Fpath\u002Fto\u002Fclient.cert.pem\",\"tlsKeyFile\":\"\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\"}\n",[4926,45419,45417],{"__ignoreMap":18},[3933,45421,45360],{"id":45422},"cli-tools-1",[48,45424,45425],{},"Add the following configurations to client.conf to use mTLS.",[8325,45427,45430],{"className":45428,"code":45429,"language":8330},[8328],"authPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls\nauthParams=tlsCertFile:\u002Fpath\u002Fto\u002Fclient.cert.pem,tlsKeyFile:\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\n",[4926,45431,45429],{"__ignoreMap":18},[3933,45433,45373],{"id":45434},"client-code-examples-java-1",[8325,45436,45439],{"className":45437,"code":45438,"language":8330},[8328],"Authentication tlsAuth = new AuthenticationTls(\"\u002Fpath\u002Fto\u002Fclient.cert.pem\", \"\u002Fpath\u002Fto\u002Fclient.key-pk8.pem\");\n\nPulsarAdmin admin = PulsarAdmin.builder().serviceHttpUrl(\"pulsar+ssl:\u002F\u002Flocalhost:6651\")\n       .tlsTrustCertsFilePath(\"\u002Fpath\u002Fto\u002Fca.cert.pem\")\n       .authentication(tlsAuth)\n       .build();\n\nPulsarClient client = PulsarClient.builder().serviceUrl(\"https:\u002F\u002Flocalhost:8081\")\n       .tlsTrustCertsFilePath(\"\u002Fpath\u002Fto\u002Fca.cert.pem\")\n       .authentication(tlsAuth)\n       .build();\n",[4926,45440,45438],{"__ignoreMap":18},[40,45442,2125],{"id":2122},[48,45444,45445],{},"mTLS is an ideal solution for securing service-to-service communications for modern applications. That said, the authentication process itself requires some CPU resources, which means your cluster performance can be impacted. In the ever-changing cybersecurity field, I believe there is no one-size-fits-all solution and we need to have a flexible security policy based on the actual needs. This also means we need to make some trade-offs under certain circumstances. For example, in a “Zero Trust” environment, I suggest you enable mTLS for all connections so that every component on the network needs authentication to gain data access. When the connection between the proxy and the broker is trusted, you can only configure mTLS on the proxy layer for better CPU utilization.",[40,45447,38376],{"id":38375},[48,45449,38379,45450,40419],{},[55,45451,38384],{"href":38382,"rel":45452},[264],[321,45454,45455,45469,45474,45481,45487,45495],{},[324,45456,45457,45458,29496,45461,1154,45466,45209],{},"Pulsar Virtual Summit Europe 2023 will take place on Tuesday, May 23rd, 2023! See this ",[55,45459,39553],{"href":45460},"\u002Fblog\u002Fannouncing-pulsar-virtual-summit-europe-2023-cfp-is-now-open",[55,45462,45465],{"href":45463,"rel":45464},"https:\u002F\u002Fsessionize.com\u002Fpulsar-virtual-summit-europe-2023\u002F",[264],"submit your session",[55,45467,45208],{"href":45206,"rel":45468},[264],[324,45470,38390,45471,190],{},[55,45472,31914],{"href":31912,"rel":45473},[264],[324,45475,45476,45477,45480],{},"Spin up a Pulsar cluster in minutes with ",[55,45478,3550],{"href":45479},"\u002Fstreamnativecloud\u002F",". StreamNative Cloud provides a simple, fast, and cost-effective way to run Pulsar in the public cloud.",[324,45482,45483,758,45485],{},[2628,45484,40436],{},[55,45486,34047],{"href":34046},[324,45488,45489,758,45491],{},[2628,45490,42753],{},[55,45492,45494],{"href":45327,"rel":45493},[264],"Configure mTLS encryption with PEM",[324,45496,45497,758,45499],{},[2628,45498,42753],{},[55,45500,45503],{"href":45501,"rel":45502},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-tls-authentication\u002F",[264],"Authentication using mTLS",{"title":18,"searchDepth":19,"depth":19,"links":45505},[45506,45507,45508,45512,45513],{"id":45269,"depth":19,"text":45270},{"id":45283,"depth":19,"text":45284},{"id":45293,"depth":19,"text":45294,"children":45509},[45510,45511],{"id":45306,"depth":279,"text":45307},{"id":45382,"depth":279,"text":45383},{"id":2122,"depth":19,"text":2125},{"id":38375,"depth":19,"text":38376},"2023-03-06","Learn how to configure mutual TLS in Apache Pulsar.","\u002Fimgs\u002Fblogs\u002F64055980141ee681bdbb07b1_understanding-and-configuring-mtls-in-apache-pulsar.png",{},"\u002Fblog\u002Funderstanding-and-configuring-mtls-in-apache-pulsar",{"title":45250,"description":45515},"blog\u002Funderstanding-and-configuring-mtls-in-apache-pulsar",[38442,821,4301,303],"WCWkwqD6xEr_mgL_wY_kSngDkVD5vqLulqHqq56uh00",{"id":45524,"title":33989,"authors":45525,"body":45526,"category":821,"createdAt":290,"date":46110,"description":46111,"extension":8,"featured":294,"image":46112,"isDraft":294,"link":290,"meta":46113,"navigation":7,"order":296,"path":33988,"readingTime":46114,"relatedResources":290,"seo":46115,"stem":46116,"tags":46117,"__hash__":46118},"blogs\u002Fblog\u002Fcomparison-of-messaging-platforms-apache-pulsar-vs-rabbitmq-vs-nats-jetstream.md",[41695,42146,807],{"type":15,"value":45527,"toc":46086},[45528,45532,45535,45547,45550,45552,45558,45590,45592,45595,45598,45601,45604,45607,45610,45613,45617,45620,45623,45627,45630,45633,45637,45640,45644,45647,45651,45654,45658,45661,45664,45671,45673,45676,45680,45683,45686,45690,45693,45696,45699,45703,45706,45709,45712,45716,45719,45722,45726,45729,45732,45735,45737,45743,45749,45755,45761,45764,45767,45770,45772,45775,45778,45781,45784,45787,45790,45796,45802,45808,45814,45817,45820,45828,45831,45834,45836,45839,45842,45845,45848,45851,45854,45860,45866,45872,45878,45884,45890,45893,45901,45904,45907,45909,45912,45915,45918,45921,45924,45927,45933,45939,45945,45951,45957,45963,45966,45969,45972,45975,45978,45980,45983,45986,45989,45991,45994,45997,46000,46014,46017,46019,46021,46041,46043,46048,46053,46064,46073,46079,46084],[40,45529,45531],{"id":45530},"executive-summary","Executive Summary",[48,45533,45534],{},"When building scalable, reliable, and efficient applications, choosing the right messaging and streaming platform is critical. In this benchmark report, we compare the technical performances of three of the most popular messaging platforms: Apache PulsarTM, RabbitMQTM, and NATS JetStream.",[48,45536,45537,45538,45543,45544,190],{},"The tests assessed each messaging platform’s throughput and latency under varying workloads, node failures, and backlogs. Please note that Apache Kafka was not included in our benchmark as ",[55,45539,45542],{"href":45540,"rel":45541},"https:\u002F\u002Fwww.splunk.com\u002Fen_us\u002Fblog\u002Fit\u002Fcomparing-pulsar-and-kafka-unified-queuing-and-streaming.html",[264],"Kafka does not support queuing scenarios",". For more information on Kafka, please refer to the ",[55,45545,45546],{"href":21458},"Pulsar vs. Kafka 2022 Benchmark report",[48,45548,45549],{},"Our objective was to provide guidance on each platform’s capabilities and reliability, and help potential users choose the right technology for their specific needs. The results of these tests provide valuable insights into the performance characteristics of each platform and will be helpful for those considering using these technologies.",[32,45551,22053],{"id":22052},[48,45553,45554],{},[384,45555],{"alt":45556,"src":45557},"Figure 1  - Apache Pulsar, RabbitMQ, and NATS JetStream Comparison","\u002Fimgs\u002Fblogs\u002F63ff8c703c54b76ef8085574_Figure-1-key-findings.png",[321,45559,45560,45563,45566,45569,45572,45575,45578,45581,45584,45587],{},[324,45561,45562],{},"Throughput:",[324,45564,45565],{},"~ Pulsar showed a higher peak consumer throughput of 2.6M msgs\u002Fs compared to RabbitMQ’s 48K msgs\u002Fs and NATS JetStream’s 160K msgs\u002Fs.",[324,45567,45568],{},"~ Pulsar was able to support a producer rate of 1M msgs\u002Fs — 33x faster than RabbitMQ and 20x faster than NATS JetStream.",[324,45570,45571],{},"Backlog:",[324,45573,45574],{},"~ Pulsar outperformed RabbitMQ during the backlog drain with a stable publish rate of 100K msgs\u002Fs, while RabbitMQ's publish rate dropped by more than 50%.",[324,45576,45577],{},"Latency:",[324,45579,45580],{},"~ Pulsar's p99 latency was 300x better than RabbitMQ and 40x better than NATS JetStream at a topic count of 50.",[324,45582,45583],{},"Scalability:",[324,45585,45586],{},"~ Pulsar achieved 1M msgs\u002Fs up to 50 topics and provided a publish rate above 200K msgs\u002Fs for up to 20K topics.",[324,45588,45589],{},"~ RabbitMQ was able to process 20K msgs\u002Fs, and NATS was able to support 30K msgs\u002Fs for topic counts up to 500.",[40,45591,19156],{"id":19155},[48,45593,45594],{},"Before we dive into the benchmark tests, let’s start with a brief overview of the architecture, features, and ideal applications for each messaging platform.",[32,45596,821],{"id":45597},"apache-pulsar",[48,45599,45600],{},"Apache Pulsar is an open-source, cloud-native messaging and streaming platform designed for building scalable, reliable applications in elastic cloud environments. Its multi-layer architecture includes multi-tenancy with resource separation and access control, geo-replication across regions, tiered storage, and support for five official client languages. These capabilities make Pulsar an ideal choice for building applications that require scalability and reliability.",[48,45602,45603],{},"One of the standout features of Pulsar is its shared subscription, which is handy for queuing applications and natively supports delayed and scheduled messages. Additionally, Pulsar simplifies application architecture by supporting up to 1M unique topics, making it widely used for high-performance data pipelines, event-driven microservices, real-time analytics, and other real-time workloads. Originally developed at Yahoo! and committed to open source in 2016, Pulsar has become popular among developers and leading organizations.",[32,45605,11043],{"id":45606},"rabbitmq",[48,45608,45609],{},"RabbitMQ is a popular and mature open-source distributed messaging platform that implements the Advanced Message Queuing Protocol (AMQP) — often used for asynchronous communication between services using the pub\u002Fsub model. The core of RabbitMQ’s architecture is the message exchange, which includes direct, topic, headers, and fanout exchanges. RabbitMQ is designed to be flexible, scalable, and reliable, making it an effective tool for building distributed systems that require asynchronous message exchange.",[48,45611,45612],{},"RabbitMQ is a good choice if you have simple applications where message durability, ordering, replay, and retention are not critical factors. However, RabbitMQ has limitations in dealing with massive data distribution and may not be suitable for applications with heavy messaging traffic. In addition, the platform does not support other messaging patterns such as request\u002Fresponse or event-driven applications.",[32,45614,45616],{"id":45615},"nats-jetstream","NATS JetStream",[48,45618,45619],{},"NATS is an open-source messaging platform optimized for cloud-native and microservices applications. Its lightweight, high-performance design supports pub\u002Fsub and queue-based messaging and stream data processing. NATS JetStream is a second-generation streaming platform that integrates directly into NATS. NATS JetStream replaces the older NATS streaming platform and addresses its limitations, such as the lack of message replay, retention policies, persistent storage, stream replication, stream mirroring, and exactly-once semantics.",[48,45621,45622],{},"NATS utilizes a single-server architecture, which makes it easy to deploy and manage, particularly in resource-constrained environments. However, NATS does not support message durability and may not be suitable for applications that require this or complex message routing and transformations. Despite this, NATS offers an asynchronous, event-driven model that is well-suited for simple pub\u002Fsub and queue-based messaging patterns due to its high performance and low latencies.",[40,45624,45626],{"id":45625},"overview-of-tests","Overview of Tests",[32,45628,45629],{"id":39944},"What We Tested",[48,45631,45632],{},"We conducted four benchmark tests to evaluate each platform’s performance under various conditions, such as workload variations, node failure, and backlogs. The aim was to assess each platform’s responses to these conditions and to provide insights into their capabilities in a given environment.",[3933,45634,45636],{"id":45635},"_1-node-failure","1. Node failure",[48,45638,45639],{},"Failures will inevitably occur in any platform, so it’s vital to understand how each platform will respond to and recover when such events occur. This test aimed to evaluate the performance of each platform in response to a single node failure and subsequent recovery. To simulate a node failure, we performed broker terminations and resumptions via systemctl stop on the node. We then monitored the performance of the remaining nodes as they took on the workload of the failed node. We anticipated a decrease in producer throughput and an increase in producer latency upon failure due to the overall reduction in the cluster’s resources.",[3933,45641,45643],{"id":45642},"_2-topic-counts","2. Topic counts",[48,45645,45646],{},"This test examined the relationship between peak throughput and latency and the number of topics within a platform. We measured the performance of each platform at various topic counts, from very small to very large, to understand how the platform’s performance changed as the number of topics grew. We expected that for very small topic counts, the platform would exhibit sub-par performance due to its inability to utilize available concurrency effectively. On the other hand, for very large topic counts, we expected performance to degrade as resource contention became more pronounced. This test aimed to determine the maximum number of topics each messaging platform could support while maintaining acceptable performance levels.",[3933,45648,45650],{"id":45649},"_3-subscription-counts","3. Subscription counts",[48,45652,45653],{},"Scaling a messaging platform can be a challenging task. As the number of subscribers per topic increases, changes in peak throughput and latency are expected due to the read-amplification effect. The imbalance between writes and reads occurs because each message is read multiple times. Despite this, we would expect the tail reads to be relatively lightweight compared to the producer’s writes, which are most likely coming from a cache. Increased competition among consumers to access each topic may also lead to a drop in performance. This test aimed to determine the maximum number of subscriptions per topic that could be achieved on each messaging platform while maintaining acceptable performance levels. However, scaling complexity increases non-linearly and potential bottlenecks arise from shared resources.",[3933,45655,45657],{"id":45656},"_4-backlog-draining","4. Backlog draining",[48,45659,45660],{},"One of the essential roles of a messaging bus is to act as a buffer between different applications or platforms. When consumers are unavailable or not enough, the platform accumulates the data for later processing. In these situations, it is vital that consumers can quickly drain the backlog of accumulated data and catch up with the newly produced data. During this catch-up process, it is crucial that the performance of existing producers is not impacted in terms of throughput and latency, either on the same topic or on other topics within the cluster. This test aimed to evaluate the ability of each messaging bus to effectively support consumers in catching up with backlog data while minimizing the impact on the producer performance.",[32,45662,45663],{"id":39989},"How We Set Up the Tests",[48,45665,39993,45666,45670],{},[55,45667,45669],{"href":39996,"rel":45668},[264],"OpenMessaging Benchmark tool"," on AWS EC2 instances. For consistency, we utilized similar instances to test each messaging platform. Our workloads used 1KB messages with randomized payloads and a single partition per topic. We had 16 producers, and 16 consumers per subscription, with one subscription in total. To ensure durability, we configured topics to have two guaranteed copies of each message, resulting in a replica count of three. We documented any deviations from the protocol in the individual tests.",[48,45672,3931],{},[48,45674,45675],{},"We conducted these tests at each platform’s “maximum producer rate” for the outlined hardware and workload configuration. Although the OMB tool includes an adaptive producer throughput mode, this was not found to be reliable and would often undershoot or behave erratically. Instead, we adopted a manual protocol to determine appropriate producer throughput rates. For each workload, we ran multiple test instances at different rates, narrowing down the maximum attainable producer rate that resulted in no producer errors and no accumulating producer or consumer backlog. In this scenario, we could be confident that the platform would be in a steady state of near maximum end-to-end throughput. Given the discrete nature of this protocol, it is possible that real-world maximum producer rates could be slightly higher and have greater variability than those determined for the tests.",[3933,45677,45679],{"id":45678},"infrastructure-topology","Infrastructure topology",[48,45681,45682],{},"Client instances:\t\t4 × m5n.8xlarge",[48,45684,45685],{},"Broker instances:\t\t3 × i3en.6xlarge",[3933,45687,45689],{"id":45688},"platform-versions","Platform versions",[48,45691,45692],{},"Apache Pulsar:\t\t2.11.0",[48,45694,45695],{},"RabbitMQ:\t\t\t3.10.7",[48,45697,45698],{},"NATS JetStream:\t\t2.9.6",[3933,45700,45702],{"id":45701},"platform-specific-caveats","Platform-specific caveats",[48,45704,45705],{},"Pulsar – Our Pulsar setup had the broker and bookies co-located on the same VM, 3 × i3en.6xlarge topology. The ZooKeeper instance was set up separately with 3 × i3en.2xlarge topology.",[48,45707,45708],{},"RabbitMQ – We conducted tests using Quorum Queues, the recommended method for implementing durable and replicated messaging. While the results indicated that this operating mode in RabbitMQ has slightly lower performance than the “classic” mode, it offers better resilience against single-node failures.",[48,45710,45711],{},"‍NATS JetStream – During our tests, we attempted to follow the recommended practices for deliverGroups and deliverSubjects in NATS, but encountered difficulties. Our NATS subscriptions failed to act in a shared mode and instead exhibited a fan-out behavior, resulting in a significant read amplification of 16 times. This likely significantly impacted the overall publisher performance in the subscription count test. Despite our best efforts, we were unable to resolve this issue.",[40,45713,45715],{"id":45714},"benchmark-parameters-results","Benchmark Parameters & Results",[48,45717,45718],{},"All reported message rates are platform aggregates, not for individual topics, producers, subscriptions, or consumers.",[32,45720,45636],{"id":45721},"_1-node-failure-1",[3933,45723,45725],{"id":45724},"test-parameters","Test Parameters",[48,45727,45728],{},"In a departure from the standard test parameters, in this test we employed five broker nodes instead of three and five client nodes instead of three — two producing and three consuming. We made this change to satisfy the requirement for three replicas of a topic, even when one cluster node is absent.",[48,45730,45731],{},"In each case, messages were produced onto 100 topics by 16 producers per topic. Messages were consumed using a single subscription per topic, shared between 16 consumers.",[48,45733,45734],{},"We adopted the following test protocol: five minutes of warm-up traffic, clean termination of a single broker node, five minutes of reduced capacity operation, resumption of the terminated broker, and five minutes of normal operation. The broker was intentionally terminated and resumed using the systemctl stop command on the node to simulate a failure.",[3933,45736,36878],{"id":36877},[48,45738,45739],{},[384,45740],{"alt":45741,"src":45742},"Figure 2 - Node Failure and Recovery - Producer Throughput (msgs\u002Fs)","\u002Fimgs\u002Fblogs\u002F63ff8c221baa29f4bc1cd25c_Figure-2-node-failure-and-recovery-producer-throughput.png",[48,45744,45745],{},[384,45746],{"alt":45747,"src":45748},"Results for average producer throughput before, during, and after node failure","\u002Fimgs\u002Fblogs\u002F63ff8bc1f5f5f52071333290_Screen-Shot-2023-03-01-at-9.30.13-AM.png",[48,45750,45751],{},[384,45752],{"alt":45753,"src":45754},"Figure 3 - Node Failure and Recovery - Producer P99 Latency (ms)","\u002Fimgs\u002Fblogs\u002F63ff8d07f5f5f5b22033ba0f_Figure-3-node-failure-and-recovery-producer-p99-latency.png",[48,45756,45757],{},[384,45758],{"alt":45759,"src":45760},"Table showing results for producer p99 latency before, during, and after node failure","\u002Fimgs\u002Fblogs\u002F63ff8d40045efd6e074a1c73_Screen-Shot-2023-03-01-at-9.36.50-AM.png",[48,45762,45763],{},"Pulsar – Given that Pulsar separates computing and storage, we ran two experiments to test the behavior in the event of a failed broker and a failed bookie. We consistently observed the expected publisher failover in both cases, with an average publish rate of 260K msgs\u002Fs. There was no noticeable decline in publish rate and an increase in latency from 113 milliseconds to 147 milliseconds when running on fewer nodes. Our results for both broker and bookie termination scenarios were very similar.",[48,45765,45766],{},"RabbitMQ – In the test with RabbitMQ, we noted a successful failover of producers from the terminated node, maintaining an average publish rate of 45K msgs\u002Fs. At the time of node failure, the publish latency increased from 6.6 seconds to 7.6 seconds. However, upon restart, RabbitMQ did not rebalance traffic back onto the restarted node, resulting in a degraded publish latency of 8.2 seconds. We suspect this behavior is attributed to the absence of a load balancer in the default configuration used. Nevertheless, it should be possible to implement an external load-balancing mechanism.",[48,45768,45769],{},"NATS JetStream – During the test with NATS, we observed successful failover of producers from the terminated node, with an average publish rate of 45K msgs\u002Fs. When we attempted to reach higher publish rates, however, the failover did not always occur, resulting in a corresponding increase in publish errors. The producers switched over to the alternate node within approximately 20 seconds of the broker termination. The publisher rates remained stable with minimal disruptions throughout the test. Despite this, there was an increase in p99 publish latency (as seen in Figure 3), rising from 15 milliseconds to 40 milliseconds. This latency increase persisted for the test's duration, even after the terminated broker was resumed.",[3933,45771,40202],{"id":40201},[48,45773,45774],{},"All platforms successfully transferred the work of a failed broker to other nodes and maintained the target publisher rate. It’s important to note that NATS JetStream did not achieve this consistently. Both RabbitMQ and NATS JetStream showed an increase in p99 publish latency, which was expected, but they did not recover after the reintroduction of the terminated broker. This suggests that the platforms did not effectively redistribute the work to the resumed broker.",[48,45776,45777],{},"In contrast, Pulsar was the only platform that consistently and successfully transferred the work to other nodes and maintained an unaffected publish rate with a slight increase in p99 latency. Moreover, Pulsar was able to achieve an average publish rate of 260K msgs\u002Fs when running on fewer nodes, demonstrating its ability to scale efficiently even in the face of node failures.",[32,45779,45643],{"id":45780},"_2-topic-counts-1",[3933,45782,45725],{"id":45783},"test-parameters-1",[48,45785,45786],{},"In this test, we ran multiple tests on each platform, varying the number of independent topics in each instance and measuring the publish throughput and latency.",[3933,45788,36878],{"id":45789},"test-results-1",[48,45791,45792],{},[384,45793],{"alt":45794,"src":45795},"Figure 4 - Maximum Producer Throughput (msgs\u002Fs) by Number of Topics","\u002Fimgs\u002Fblogs\u002F63ff8d9d4c5dcf0387288ace_Figure-4-maximum-producer-throughput.png",[48,45797,45798],{},[384,45799],{"alt":45800,"src":45801},"Table showing maximum producer throughput by number of topics","\u002Fimgs\u002Fblogs\u002F63ff8dc88c4561a15cda7c31_Screen-Shot-2023-03-01-at-9.39.12-AM.png",[48,45803,45804],{},[384,45805],{"alt":45806,"src":45807},"Figure 5 - Producer P99 Latency (ms) by Number of Topics","\u002Fimgs\u002Fblogs\u002F63ff8e0984f06281f056c7b3_Figure-5-producer-p99-latency.png",[48,45809,45810],{},[384,45811],{"alt":45812,"src":45813},"Table showing producer p99 latency by number of topics","\u002Fimgs\u002Fblogs\u002F63ff8e3675c3e91338887177_Screen-Shot-2023-03-01-at-9.41.00-AM.png",[48,45815,45816],{},"Pulsar – The platform achieved an aggregate publisher throughput of 1M msgs\u002Fs with a topic count between 10 and 50. Across thousands of topics, Pulsar maintained low publisher p99 latency, ranging from single-digit milliseconds to low hundreds of milliseconds (~7 ms to 300 ms).",[48,45818,45819],{},"We can see from the chart that there was a negative inflection point in the throughput when the number of topics exceeded 100. This variation can be attributed to the effectiveness of batching at different topic counts:",[321,45821,45822,45825],{},[324,45823,45824],{},"With fewer topics, the throughput per topic is relatively high and it conducts for a very high batching ratio (messages\u002Fbatch). This means that it’s very efficient to move a large number of messages through the platform, in a small amount of batches. In these conditions, the bottleneck is typically on the I\u002FO system.",[324,45826,45827],{},"With more topics, we are spreading the throughput over a larger number of them. The per-topic throughput is therefore lower and the batching ratio decreases, until we end up with just one message per batch. At this point, the bottleneck has shifted to the CPU cost instead of the I\u002FO system.",[48,45829,45830],{},"RabbitMQ – The publisher throughput fluctuated between 20K and 40K msgs\u002Fs across the range of topics. Meanwhile, the p99 publish latency rose significantly. These latencies often exceeded multiple seconds, ranging from 344 milliseconds to nearly 14 seconds. Testing was stopped after 500 topics as it became challenging to construct the topics in a reasonable amount of time.",[48,45832,45833],{},"NATS JetStream – The best performance was observed when using 10 to 50 topics, with a rate of 50K msgs\u002Fs. As the number of topics increased beyond 50, the throughput gradually decreased. The p99 publisher latencies also started to increase, starting from 75 milliseconds at 10 topics to over one second at 100 topics. The testing was stopped at 500 topics due to the difficulty in constructing additional topics, but the system could still handle 30K msgs\u002Fs at this configuration.",[3933,45835,40202],{"id":40257},[48,45837,45838],{},"The results suggest that all of the platforms tested could handle larger topic counts in real-world scenarios where topics accumulate gradually over time, rather than the time-consuming process of generating test topics. Despite this, RabbitMQ and NATS JetStream demonstrated a performance decline when concurrently publishing a very large number of topics.",[48,45840,45841],{},"On the other hand, Pulsar outperformed RabbitMQ and NATS JetStream in the number of topics, publish rate, and latency. The results show that Pulsar could handle 10 times more topics. Pulsar achieved up to 1M msgs\u002Fs, surpassing RabbitMQ by 33 times and NATS JetStream by 20 times. Pulsar also demonstrated exceptional latency performance, with p99 latency 300 times better than RabbitMQ and 40 times better than NATS JetStream at 50 topics. Pulsar was able to maintain producer throughput of 200K msgs\u002Fs at 20K topics.",[32,45843,45650],{"id":45844},"_3-subscription-counts-1",[3933,45846,45725],{"id":45847},"test-parameters-2",[48,45849,45850],{},"In this test, we expected a significant boost in reads with a larger number of subscribers. To achieve this, we limited the number of concurrent topics to 50, assigned a single consumer to each subscription, and set a minimum aggregate publish rate of 1K msgs\u002Fs for the platform.",[3933,45852,36878],{"id":45853},"test-results-2",[48,45855,45856],{},[384,45857],{"alt":45858,"src":45859},"Figure 6 - Maximum Producer Throughput (msgs\u002Fs) by Number of Subscriptions","\u002Fimgs\u002Fblogs\u002F63ff8e7dc3b31433fb68acbf_Figure-6-maximum-producer-throughput-by-subscriptions.png",[48,45861,45862],{},[384,45863],{"alt":45864,"src":45865},"Table showing maximum producer throughput by number of subscriptions","\u002Fimgs\u002Fblogs\u002F63ff8ecfdff9895f71dc2cf6_Screen-Shot-2023-03-01-at-9.43.28-AM.png",[48,45867,45868],{},[384,45869],{"alt":45870,"src":45871},"Figure 7 - Maximum Consumer Throughput (msgs\u002Fs) by Number of Subscriptions","\u002Fimgs\u002Fblogs\u002F63ff8f1fdff98968d3dc5988_Figure-7-maximum-consumer-throughput-by-subscriptions.png",[48,45873,45874],{},[384,45875],{"alt":45876,"src":45877},"Table showing maximum consumer throughput by number of subscriptions","\u002Fimgs\u002Fblogs\u002F63ff8f66f3768148dfb0469d_Screen-Shot-2023-03-01-at-9.46.04-AM.png",[48,45879,45880],{},[384,45881],{"alt":45882,"src":45883},"Figure 8 - Producer P99 Latency (ms) by Number of Subscriptions","\u002Fimgs\u002Fblogs\u002F63ff8fba7516a596b978ecbd_Figure-8-producer-p99-latency.png",[48,45885,45886],{},[384,45887],{"alt":45888,"src":45889},"Table showing producer p99 latency by number of subscriptions","\u002Fimgs\u002Fblogs\u002F63ff8fe31baa296fba221c94_Screen-Shot-2023-03-01-at-9.48.10-AM.png",[48,45891,45892],{},"Pulsar – We were able to achieve a maximum of 5K subscriptions per topic before consumers started to fall behind. However, even with higher subscription numbers, the publish latency remained low. In fact, we measured peak consumer throughput at an impressive 2.6M msgs\u002Fs.",[48,45894,45895,45896,190],{},"During our test, we identified an issue with many concurrent I\u002FO threads competing for the same resource. However, we were able to address this in Pulsar version 2.11.1. For more information on this issue, please refer to the ",[55,45897,45900],{"href":45898,"rel":45899},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F19341",[264],"GitHub PR #19341",[48,45902,45903],{},"RabbitMQ – The maximum number of successful subscriptions per topic achieved was 64. Beyond that, the publish rate dropped to around 500 msgs\u002Fs, and the p99 publish latency increased significantly to tens of seconds. Additionally, the clients became unresponsive beyond 64 subscriptions. However, the aggregate consumer throughput remained around 35K msgs\u002Fs and reached a peak of 48K msgs\u002Fs when there were eight subscriptions.",[48,45905,45906],{},"NATS JetStream – We achieved a maximum of 128 subscriptions per topic. As the number of subscriptions increased, there was an increase in publisher errors and lagging consumers. Despite this, the publish latency remained consistently low, ranging from 3 milliseconds to 34 milliseconds across all subscriptions. The highest consumer throughput was recorded at 160K msgs\u002Fs during eight to 32 subscriptions.",[3933,45908,40202],{"id":40301},[48,45910,45911],{},"As expected in this test case, end-to-end throughput became limited by the consumer. Pulsar was able to support hundreds of subscriptions per topic while maintaining very low publish latency. RabbitMQ and NATS JetStream achieved fewer subscriptions, and RabbitMQ experienced a significant increase in publish latency as the number of subscriptions increased. Pulsar stood out as the most efficient platform, demonstrating a publish rate and an aggregate consumer throughput that were both an order of magnitude higher than the other platforms.",[32,45913,45657],{"id":45914},"_4-backlog-draining-1",[3933,45916,45725],{"id":45917},"test-parameters-3",[48,45919,45920],{},"In this test, the conditions were set to generate a backlog of messages before consumer activity began. Once the desired backlog size was reached, consumers were started, and messages continued to be produced at the specified rate. The backlog size was set to 300GB, larger than the available RAM of the brokers, simulating a scenario in which reads would need to come from slower disks rather than memory-resident caches. This was done to evaluate the platform's ability to handle catch-up reads, a common challenge in real-world scenarios.",[48,45922,45923],{},"During the tests, messages were produced on 100 topics, with 16 producers per topic. Messages were consumed using a single subscription per topic, shared between 16 consumers.",[3933,45925,36878],{"id":45926},"test-results-3",[48,45928,45929],{},[384,45930],{"alt":45931,"src":45932},"Figure 9 - Queue Backlog and Recovery - Producer Throughput (msgs\u002Fs)","\u002Fimgs\u002Fblogs\u002F63ff9029d2771029701f5d6a_Figure-9-queue-backlog-and-recovery.png",[48,45934,45935],{},[384,45936],{"alt":45937,"src":45938},"Table showing average producer throughput before, during, and after backlog drain","\u002Fimgs\u002Fblogs\u002F63ff904a2e4e1f9932eaa005_Screen-Shot-2023-03-01-at-9.49.53-AM.png",[48,45940,45941],{},[384,45942],{"alt":45943,"src":45944},"Figure 10 - Queue Backlog and Recovery - Consumer Throughput (msgs\u002Fs)","\u002Fimgs\u002Fblogs\u002F63ff90957fa1cc8947942ec2_Figure-10-queue-backlog-and-recovery.png",[48,45946,45947],{},[384,45948],{"alt":45949,"src":45950},"Table showing average consumer throughput before, during, and after backlog drain","\u002Fimgs\u002Fblogs\u002F63ff90bdca4b648ff37d4eb5_Screen-Shot-2023-03-01-at-9.51.47-AM.png",[48,45952,45953],{},[384,45954],{"alt":45955,"src":45956},"Figure 11  - Queue Backlog and Recovery - Producer P99 Latency (ms)","\u002Fimgs\u002Fblogs\u002F63ff91007223588f223a0a4e_Figure-11-backlog-drain-p99-latency.png",[48,45958,45959],{},[384,45960],{"alt":45961,"src":45962},"Figure showing average producer p99 latency before, during, and after backlog drain","\u002Fimgs\u002Fblogs\u002F63ff9126fffc706c6af719ef_Screen-Shot-2023-03-01-at-9.53.32-AM.png",[48,45964,45965],{},"Pulsar – In this test, Pulsar delivered impressive results in terms of producer and catch-up read rates. The producer rate remained stable at 100K msgs\u002Fs before, during, and after the drain, and catch-up reads averaged 200K msgs\u002Fs. The drain itself was completed in approximately 45 minutes.",[48,45967,45968],{},"During the backlog drain phase, a slight increase in p99 publish latency from 4.7 milliseconds to 5.3 milliseconds was observed. However, this was expected due to the increased contention between producers and consumers.",[48,45970,45971],{},"One of the most noteworthy findings of the test was that Pulsar’s consumer throughput returned to its pre-drain level after the drain was complete. This showcased Pulsar’s ability to handle high volumes of data without compromising performance.",[48,45973,45974],{},"RabbitMQ –RabbitMQ was able to achieve its target producer rate of 30K msgs\u002Fs, but the platform faced a challenge when reads dominated during backlog production, leading to a steal of IOPS and hindering message production. This resulted in a reduction of the producer rate to 12.5K msgs\u002Fs, with a latency increase of three times from 11 to 34 seconds. However, the catch-up reads were swift, starting at 80K msgs\u002Fs and steadily rising to 200K msgs\u002Fs. After 50 minutes, most of the backlog had been drained, and the producer throughput was regained, with the latency returning to approximately 13 seconds. Despite a consistent yet small consumer backlog, the platform remained stable.",[48,45976,45977],{},"NATS JetStream – Unfortunately, NATS could not produce any results in this test. The clients encountered OOM errors while building the backlog, which we suspect might be due to a potential issue in the jnats library.",[3933,45979,40202],{"id":40345},[48,45981,45982],{},"Pulsar demonstrated impressive producer and catch-up read rates during the test, with stable performance before, during, and after the drain. Pulsar's consumer throughput returned to its pre-drain level, showcasing its ability to handle high volumes of data without compromising performance. Pulsar also outperformed RabbitMQ by being 3.3 times faster in producing and consuming, and the drain would have been completed even faster if Pulsar had been set to a 30K msgs\u002Fs producer rate.",[48,45984,45985],{},"RabbitMQ demonstrated some impressive consumer rates when reading the backlog. However, this came at the cost of message production, as the consumers had clear priority. In a real-world scenario, applications would be unable to produce during the catch-up read and would have to either drop messages or take other mitigating actions.",[48,45987,45988],{},"It would have been interesting to see how NATS JetStream performed in this area, but further work will be needed to investigate and resolve the suspected client issue.",[40,45990,2125],{"id":2122},[48,45992,45993],{},"The benchmark tests showed that Pulsar can handle significantly larger workloads than RabbitMQ and NATS JetStream and remain highly performant in various scenarios. Pulsar proved its reliability in the presence of node failure and its high scalability for both topics and subscriptions. Conversely, RabbitMQ and NATS JetStream both showed a decline in performance when concurrently publishing a large number of topics.",[48,45995,45996],{},"The results suggest that while all three platforms are suitable for real-world scenarios, it is crucial to carefully evaluate and choose the technology that best aligns with the specific needs and priorities of the application.",[48,45998,45999],{},"Key findings summarizing Pulsar’s performance:",[1666,46001,46002,46005,46008,46011],{},[324,46003,46004],{},"Pulsar maintained high publish rates despite broker or bookie failure. No degradation in rates occurred when running on fewer nodes, with 5 times greater maximum publish rates than RabbitMQ and NATS JetStream.",[324,46006,46007],{},"Pulsar achieved high performance with 1M msgs\u002Fs, surpassing RabbitMQ by 33 times and NATS JestStream by 20 times. With a topic count of 50, p99 latency was 300 times better than RabbitMQ and 40 times better than NATS JetStream. Pulsar was able to maintain a producer throughput of 200K msgs\u002Fs at 20K topics. In contrast, RabbitMQ and NATS JetStream failed to construct topics beyond 500 counts.",[324,46009,46010],{},"Pulsar supported 1,024 subscriptions per topic without impacting consumer performance, while maintaining low publish latency and achieving a peak consumer throughput of 2.6M msgs\u002Fs. This was 54 times faster than RabbitMQ and 43 times faster than NATS JetStream.",[324,46012,46013],{},"Pulsar achieved stable publish rates and an average catch-up read throughput of 200K msgs\u002Fs during the backlog drain test case. In comparison, RabbitMQ’s publish rate dropped by over 50 percent during draining and resulted in an increase in publish latency by three times.",[48,46015,46016],{},"RabbitMQ may be a suitable option for applications with a small number of topics and a consistent publisher throughput, as the platform struggles to deal with node failures and large backlogs. NATS may be a good choice for applications with lower message rates and a limited number of topics (less than 50). Overall, the results show Pulsar outperforms RabbitMQ and NATS JetStream in terms of throughput, latency, and scalability, making Pulsar a strong candidate for large-scale messaging applications.",[32,46018,33331],{"id":32196},[48,46020,33334],{},[1666,46022,46023,46027,46031,46036],{},[324,46024,31889,46025,190],{},[55,46026,31893],{"href":31892},[324,46028,31896,46029,190],{},[55,46030,31899],{"href":27773},[324,46032,31902,46033,190],{},[55,46034,31906],{"href":31692,"rel":46035},[264],[324,46037,31909,46038,190],{},[55,46039,31914],{"href":31912,"rel":46040},[264],[32,46042,22673],{"id":22672},[48,46044,46045,46047],{},[2628,46046,42523],{}," Comparing Pulsar and Kafka: Unified Queuing and Streaming:",[48,46049,46050],{},[55,46051,45540],{"href":45540,"rel":46052},[264],[48,46054,46055,46058,46059],{},[2628,46056,46057],{},"2"," The Linux Foundation Open Messaging Benchmark suite: ",[55,46060,46063],{"href":46061,"rel":46062},"http:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks\u002F",[264],"http:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks",[48,46065,46066,46069,46070],{},[2628,46067,46068],{},"3"," The Open Messaging Benchmark Github repo: ",[55,46071,39996],{"href":39996,"rel":46072},[264],[48,46074,46075,46078],{},[2628,46076,46077],{},"4"," GitHub Pull Request #19341:",[48,46080,46081],{},[55,46082,45898],{"href":45898,"rel":46083},[264],[48,46085,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":46087},[46088,46091,46096,46100,46106],{"id":45530,"depth":19,"text":45531,"children":46089},[46090],{"id":22052,"depth":279,"text":22053},{"id":19155,"depth":19,"text":19156,"children":46092},[46093,46094,46095],{"id":45597,"depth":279,"text":821},{"id":45606,"depth":279,"text":11043},{"id":45615,"depth":279,"text":45616},{"id":45625,"depth":19,"text":45626,"children":46097},[46098,46099],{"id":39944,"depth":279,"text":45629},{"id":39989,"depth":279,"text":45663},{"id":45714,"depth":19,"text":45715,"children":46101},[46102,46103,46104,46105],{"id":45721,"depth":279,"text":45636},{"id":45780,"depth":279,"text":45643},{"id":45844,"depth":279,"text":45650},{"id":45914,"depth":279,"text":45657},{"id":2122,"depth":19,"text":2125,"children":46107},[46108,46109],{"id":32196,"depth":279,"text":33331},{"id":22672,"depth":279,"text":22673},"2023-03-01","Our comparison of messaging platforms looks at the performance, architecture, features, and ideal applications for Apache Pulsar, RabbitMQ, and NATS JetStream.","\u002Fimgs\u002Fblogs\u002F63ff931387c5e89f84e91fb0_Pulsar-Rabbitmq-benchmark.png",{},"14 min read",{"title":33989,"description":46111},"blog\u002Fcomparison-of-messaging-platforms-apache-pulsar-vs-rabbitmq-vs-nats-jetstream",[11043,799,821,10503],"-fBdQzc6VEazpJwZvZpe1PmbRdTzoJGD9_qpfRoGvmI",{"id":46120,"title":43265,"authors":46121,"body":46123,"category":821,"createdAt":290,"date":46345,"description":46346,"extension":8,"featured":294,"image":46347,"isDraft":294,"link":290,"meta":46348,"navigation":7,"order":296,"path":43264,"readingTime":33204,"relatedResources":290,"seo":46349,"stem":46350,"tags":46351,"__hash__":46352},"blogs\u002Fblog\u002Fpulsar-operators-tutorial-part-4-use-kpack-to-streamline-the-build-process.md",[46122],"Yuwei Sung",{"type":15,"value":46124,"toc":46339},[46125,46130,46137,46140,46144,46160,46165,46171,46176,46182,46187,46193,46196,46202,46205,46209,46212,46217,46223,46226,46232,46241,46247,46254,46256,46259,46262,46278,46280,46285],[916,46126,46127],{},[48,46128,46129],{},"Note: StreamNative now offers a unified approach to managing Pulsar clusters on Kubernetes systems, transitioning from two distinct versions of operators—Pulsar Operators (Basic Version) and StreamNative Operator (Advanced Version)—to a single, consolidated operator, StreamNative Operator, effective from the start of 2024. As part of this change, we will cease the release of new versions of Pulsar Operators, with future updates and enhancements being exclusively available through the StreamNative Operator, accessible only via StreamNative's paid services.",[48,46131,46132,46133,46136],{},"In the ",[55,46134,46135],{"href":43256},"previous blog",", I demonstrated how to containerize Pulsar client apps (producer and consumer) using Dockerfiles in VS Code. This is probably the most common way for the cloud-native build process. However, as Pulsar supports many languages, maintaining different Dockerfiles for Pulsar consumer\u002Fproducer\u002Ffunction apps can be difficult as your system grows. For example, specifying dependency versions, changing base build and run images, mounting new ConfigMaps and Secrets (externalizing configurations), and adding TLS certificates can become more complicated. Using Dockerfiles forces developers to maintain those items while writing cloud-native apps.",[48,46138,46139],{},"In this blog, I will demonstrate how to streamline this process using kpack so that developers can focus on writing Pulsar producers, consumers, or functions with different languages.",[40,46141,46143],{"id":46142},"install-and-configure-kpack","Install and configure kpack",[48,46145,46146,46151,46152,4003,46156,190],{},[55,46147,46150],{"href":46148,"rel":46149},"https:\u002F\u002Fgithub.com\u002Fpivotal\u002Fkpack",[264],"kpack"," is a Kubernetes operator implementing Cloud Native Buildpacks. If you like Google Cloud Build and want to implement it in your Kubernetes clusters, kpack is an ideal tool. For kpack\u002Fbuildpacks details, you can find their concepts ",[55,46153,267],{"href":46154,"rel":46155},"https:\u002F\u002Fbuildpacks.io\u002Fdocs\u002F",[264],[55,46157,267],{"href":46158,"rel":46159},"https:\u002F\u002Fbuildpacks.io\u002Fdocs\u002Ftools\u002Fkpack\u002F",[264],[1666,46161,46162],{},[324,46163,46164],{},"kpack provides a Kubernetes operator. First, you must install the kpack operator in the Kubernetes namespace kpack.",[8325,46166,46169],{"className":46167,"code":46168,"language":8330},[8328],"kubectl create namespace kpack\nkubectl apply -n kpack -f https:\u002F\u002Fgithub.com\u002Fpivotal\u002Fkpack\u002Freleases\u002Fdownload\u002Fv0.5.4\u002Frelease-0.5.4.yaml\n",[4926,46170,46168],{"__ignoreMap":18},[1666,46172,46173],{"start":19},[324,46174,46175],{},"Once the operator is installed, store the pull Secret of the Docker registry so the kpack operator can store the images. You can create a Secret to store your Docker registry pull credential or robot token.",[8325,46177,46180],{"className":46178,"code":46179,"language":8330},[8328],"kubectl create secret -n kpack docker-registry mydocker \\\n                 --docker-username= \\\n                 --docker-password= \\\n                 --docker-server=https:\u002F\u002Findex.docker.io\u002Fv1\u002F\n",[4926,46181,46179],{"__ignoreMap":18},[1666,46183,46184],{"start":279},[324,46185,46186],{},"Create a service account in the kpack namespace and associate the Secret with this service account. Note that you need secrets and imagePullSecrets in this service account.",[8325,46188,46191],{"className":46189,"code":46190,"language":8330},[8328],"kubectl apply -f - \n4. Create a custom resource ClusterStore to store the necessary buildpacks. Here, I list some buildpacks for Python (cpython, python-start, pip-install, pip, procfile and ca-certificates). Refer to the [kpack doc](https:\u002F\u002Fgithub.com\u002Fpivotal\u002Fkpack) for more details. \n\n",[4926,46192,46190],{"__ignoreMap":18},[48,46194,46195],{},"kubectl apply -f -\n5. Create a cluster stack which defines the build and run images. From this custom resource, you may find it is similar to “multi-stage build” in a Dockerfile.",[8325,46197,46200],{"className":46198,"code":46199,"language":8330},[8328],"kubectl apply -f - \n6. Define a builder. A kpack builder is similar to “docker build, tag, push”.\n\n",[4926,46201,46199],{"__ignoreMap":18},[48,46203,46204],{},"kubectl apply -f -",[40,46206,46208],{"id":46207},"build-the-client-app","Build the client app",[48,46210,46211],{},"After you deploy a ClusterStore, a ClusterStack and a Builder, you are ready to build some images. These images are defined as Custom Resources too.",[1666,46213,46214],{},[324,46215,46216],{},"Create the producer image.",[8325,46218,46221],{"className":46219,"code":46220,"language":8330},[8328],"kubectl apply -f - \n2. Create the consumer image.\n\n",[4926,46222,46220],{"__ignoreMap":18},[48,46224,46225],{},"kubectl apply -f -\n3. Once those two image CRs are applied, you can use kp (kpack cli) or kubectl to check the build status. After “Steps Completed” reaches “export”, you can find that the image is pushed to the Docker registry you specified in the “image.spec.tag”.",[8325,46227,46230],{"className":46228,"code":46229,"language":8330},[8328],"kubectl describe -n kpack build pulsar-consumer-image-build-1\n  …\n  Steps Completed:\n    prepare\n    analyze\n    detect\n    restore\n    build\n    export\n   …\n",[4926,46231,46229],{"__ignoreMap":18},[1666,46233,46234],{"start":20920},[324,46235,46236,46237,46240],{},"You can reuse the ConfigMap and Deployment in ",[55,46238,46239],{"href":43256},"Part 3"," to test the container images. The following code is the same as the one in Part 3.",[8325,46242,46245],{"className":46243,"code":46244,"language":8330},[8328],"kubectl apply -f - \n5. Create a Deployment and a ConfigMap for the consumer.\n\n",[4926,46246,46244],{"__ignoreMap":18},[48,46248,46249,46250,46253],{},"kubectl apply -f -\nOnce these two containers are deployed, you should find that the messages have been delivered. Then, you can follow the same ArgoCD project in ",[55,46251,46252],{"href":43151},"Part 2",". You can git push the kpack CRs to a GitHub repository and create an ArgoCD app to automate the image build process.",[40,46255,2125],{"id":2122},[48,46257,46258],{},"This blog shows how we can automate the container build process with two Pulsar Python client apps. As you can see, the Python code is just a GitHub repository tag in this tutorial. Whenever developers push their code to GitHub, the kpack build process will kick in and rebase the image.",[48,46260,46261],{},"You can find the example in my GitHub repositories.",[321,46263,46264,46271],{},[324,46265,46266],{},[55,46267,46270],{"href":46268,"rel":46269},"https:\u002F\u002Fgithub.com\u002Fyuweisung\u002Fpulsar-python\u002Ftree\u002Fkpack",[264],"Pulsar Python client code",[324,46272,46273],{},[55,46274,46277],{"href":46275,"rel":46276},"https:\u002F\u002Fgithub.com\u002Fyuweisung\u002Fkpack-pulsar",[264],"kpack Python example",[40,46279,38376],{"id":38375},[48,46281,38379,46282,40419],{},[55,46283,38384],{"href":38382,"rel":46284},[264],[321,46286,46287,46297,46302,46306,46313,46319,46325,46333],{},[324,46288,45457,46289,29496,46291,1154,46294,45209],{},[55,46290,39553],{"href":45460},[55,46292,45465],{"href":45463,"rel":46293},[264],[55,46295,45208],{"href":45206,"rel":46296},[264],[324,46298,38390,46299,190],{},[55,46300,31914],{"href":31912,"rel":46301},[264],[324,46303,45476,46304,45480],{},[55,46305,3550],{"href":45479},[324,46307,46308,758,46311],{},[2628,46309,46310],{},"﻿Blog",[55,46312,43242],{"href":43241},[324,46314,46315,758,46317],{},[2628,46316,40436],{},[55,46318,43249],{"href":43151},[324,46320,46321,758,46323],{},[2628,46322,40436],{},[55,46324,43257],{"href":43256},[324,46326,46327,758,46329],{},[2628,46328,46310],{},[55,46330,46332],{"href":46331},"\u002Fblog\u002Fstreamnatives-pulsar-operators-certified-red-hat-openshift-operators","StreamNative’s Pulsar Operators Certified as Red Hat OpenShift Operators",[324,46334,46335,758,46337],{},[2628,46336,46310],{},[55,46338,43234],{"href":43233},{"title":18,"searchDepth":19,"depth":19,"links":46340},[46341,46342,46343,46344],{"id":46142,"depth":19,"text":46143},{"id":46207,"depth":19,"text":46208},{"id":2122,"depth":19,"text":2125},{"id":38375,"depth":19,"text":38376},"2023-02-28","Learn how to streamline the build process for your Pulsar apps with kpack.","\u002Fimgs\u002Fblogs\u002F640639e5e725e073193db07d_pulsar-operators-tutorial-part-4-use-kpack-to-streamline-the-build-process.jpg",{},{"title":43265,"description":46346},"blog\u002Fpulsar-operators-tutorial-part-4-use-kpack-to-streamline-the-build-process",[38442,821,16985],"nx9Bsw9wDupkB3PdCB2js2iUhgunDn8ujPvWIHHZO44",{"id":46354,"title":46355,"authors":46356,"body":46358,"category":821,"createdAt":290,"date":46750,"description":46751,"extension":8,"featured":294,"image":46752,"isDraft":294,"link":290,"meta":46753,"navigation":7,"order":296,"path":46754,"readingTime":4475,"relatedResources":290,"seo":46755,"stem":46756,"tags":46757,"__hash__":46758},"blogs\u002Fblog\u002Fspring-into-pulsar-part-3-building-an-application-with-the-new-spring-library-for-apache-pulsar.md","Spring into Pulsar Part 3: Building An Application with the New Spring Library for Apache Pulsar",[46357],"Tim Spann",{"type":15,"value":46359,"toc":46741},[46360,46364,46372,46393,46397,46410,46416,46419,46425,46428,46434,46437,46443,46446,46452,46460,46463,46466,46472,46478,46487,46491,46499,46505,46508,46520,46523,46529,46532,46535,46590,46592,46595,46597,46671,46673,46706],[40,46361,46363],{"id":46362},"introduction-to-spring-with-pulsar","Introduction to Spring with Pulsar",[48,46365,46366,46367,46371],{},"In the first ",[55,46368,46370],{"href":46369},"\u002Fblog\u002Fspring-into-pulsar","article"," I discussed a way to use Spring with Apache Pulsar via the standard Java framework. In this blog, I will show you how to build a simple Spring Pulsar application utilizing the new official Spring Pulsar library.",[916,46373,46374],{},[48,46375,46376,46377,46382,46383,46388,46389,190],{},"The Spring-Pulsar library is currently available in ",[55,46378,46381],{"href":46379,"rel":46380},"https:\u002F\u002Fgithub.com\u002Fspring-projects-experimental\u002Fspring-pulsar",[264],"this GitHub repo",". You can watch a talk on this library at ",[55,46384,46387],{"href":46385,"rel":46386},"https:\u002F\u002Ftanzu.vmware.com\u002Fdeveloper\u002Ftv\u002Fgolden-path\u002F6\u002F",[264],"The Golden Path to Spring One",". The slides are available ",[55,46390,267],{"href":46391,"rel":46392},"https:\u002F\u002Fwww.slideshare.net\u002Fbunkertor\u002Fliving-the-stream-dream-with-pulsar-and-spring-boot",[264],[40,46394,46396],{"id":46395},"building-an-air-quality-application-with-spring-and-pulsar","Building an Air Quality Application with Spring and Pulsar",[48,46398,46399,46400,46403,46404,46409],{},"Below is a diagram of my example application that I will build. As you can see, Apache Pulsar is the lynchpin of this design. It acts as a router, gateway, messaging bus, and data distribution channel.\n",[384,46401],{"alt":18,"src":46402},"\u002Fimgs\u002Fblogs\u002F63f4261f07a375216781ced3_air-quality-app.webp","Figure 1. Building an air quality application with Spring and Pulsar\nFirst, we set the version of Pulsar to build against. For this example, I chose Pulsar ",[55,46405,46408],{"href":46406,"rel":46407},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002Fversioned\u002Fpulsar-2.10.1\u002F",[264],"2.10.1",". I am also using JDK 17. You can build a new Spring Boot Maven project with start.spring.io and choose Maven & Java 17. Add these properties in the properties section of your POM file:",[8325,46411,46414],{"className":46412,"code":46413,"language":8330},[8328],"\n    17\n    2.10.1\n\n",[4926,46415,46413],{"__ignoreMap":18},[48,46417,46418],{},"Next, let’s add the Pulsar client dependencies.",[8325,46420,46423],{"className":46421,"code":46422,"language":8330},[8328],"\n    org.springframework.pulsar\n    spring-pulsar-spring-boot-starter\n    0.1.0\n\n",[4926,46424,46422],{"__ignoreMap":18},[48,46426,46427],{},"Now we can compile with the following:",[8325,46429,46432],{"className":46430,"code":46431,"language":8330},[8328],"mvn clean package\n",[4926,46433,46431],{"__ignoreMap":18},[48,46435,46436],{},"To run the application, type:",[8325,46438,46441],{"className":46439,"code":46440,"language":8330},[8328],"mvn spring-boot:run\n",[4926,46442,46440],{"__ignoreMap":18},[48,46444,46445],{},"We need to populate our configuration file (application.yml) with the necessary values to connect to our cluster and ingest data. This file is typically in src\u002Fmain\u002Fresources.",[8325,46447,46450],{"className":46448,"code":46449,"language":8330},[8328],"spring:\n    pulsar:\n      client:\n#        service-url: pulsar+ssl:\u002F\u002Fsn-academy.sndevadvocate.snio.cloud:6651\n#        auth-plugin-class-name: org.apache.pulsar.client.impl.auth.oauth2.AuthenticationOAuth2\n#        authentication:\n#          issuer-url: https:\u002F\u002Fauth.streamnative.cloud\u002F\n#          private-key: file:\u002F\u002F\u002Fsndevadvocate-tspann.json\n#          audience: urn:sn:pulsar:sndevadvocate:my-instance\n        service-url: pulsar:\u002F\u002Flocalhost:6650\n      producer:\n        send-timeout-ms: 20000\n        producer-name: airqualityspringbootm1\n        topic-name: persistent:\u002F\u002Fpublic\u002Fdefault\u002Fairquality\nairnowapi:\n  base-url: https:\u002F\u002Fwww.airnowapi.org\n  airquality-uri: \u002Faq\u002Fobservation\u002FzipCode\u002Fcurrent\u002F?format=application\u002Fjson&distance=250&zipCode={zipCode}&API_KEY={apiKey}\n  api-key: ${API_KEY:}\n  zip-codes:\n    - 78701\n    - 08520\n    - 94027\n",[4926,46451,46449],{"__ignoreMap":18},[48,46453,46454,46455,46459],{},"The security.mode and pulsar.service.url are commented out. This allows me to switch between my unsecured development environment and my production StreamNative hosted cloud version. We could automate this or use environment variables to make this more production quality. The airnowapi.url variable is set by the environment and includes a custom token to access Air Now REST feeds. You will need to ",[55,46456,29176],{"href":46457,"rel":46458},"https:\u002F\u002Fdocs.airnowapi.org\u002F",[264]," and get your own if you wish to use this data stream.",[48,46461,46462],{},"We can now start building our application. First, we will need to configure our connection to our Pulsar cluster.",[48,46464,46465],{},"We can now configure a template to use in our service.",[8325,46467,46470],{"className":46468,"code":46469,"language":8330},[8328],"@Autowired\nprivate PulsarTemplate pulsarTemplate;\n",[4926,46471,46469],{"__ignoreMap":18},[8325,46473,46476],{"className":46474,"code":46475,"language":8330},[8328],"this.pulsarTemplate.setSchema(Schema.JSON(Observation.class));\n",[4926,46477,46475],{"__ignoreMap":18},[48,46479,46480,46481,46486],{},"In the above configuration code, we are building a Pulsar producer that will use a JSON Schema from the ",[55,46482,46485],{"href":46483,"rel":46484},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fairquality\u002Fblob\u002Fmain\u002Fsrc\u002Fmain\u002Fjava\u002Fdev\u002Fdatainmotion\u002Fairquality\u002Fmodel\u002FObservation.java",[264],"Observation"," class we built for our data. The Observation class has some FasterXML Jackson annotations, but is basically a Java bean with fields for date observed, hour observed, state code, latitude and longitude, and all the fields from the REST data feed.",[40,46488,46490],{"id":46489},"producer","Producer",[48,46492,46493,46494,190],{},"Let’s add our business logic and start sending events to our infinite messaging platform. The full source code is available ",[55,46495,46498],{"href":46496,"rel":46497},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fspring-pulsar-airquality",[264],"in this GitHub repo",[8325,46500,46503],{"className":46501,"code":46502,"language":8330},[8328],"List observations = airQualityService.fetchCurrentObservation();\nMessageId msgid = pulsarTemplate.newMessage(observation)\n    .withMessageCustomizer((mb) -> mb.key(uuidKey.toString()))\n    .send();\n",[4926,46504,46502],{"__ignoreMap":18},[40,46506,24840],{"id":46507},"consumer",[48,46509,46510,46511,46514,46515,46519],{},"Now that we have sent messages, we can also read them with Spring. In this section, we will build a consumer application to test ingesting the data. If we want to add logic, routing, or transformations to the events in one or more topics, we could use a Pulsar Function that we can write in Java, Python, or Go to achieve this instead of Spring Boot microservices. I chose to do both.\n",[384,46512],{"alt":18,"src":46513},"\u002Fimgs\u002Fblogs\u002F63f4266c80124e38c9c950af_real-time-data-pipeline.webp","Figure 2. Real-time data pipeline\nAn example Java Pulsar Function for processing air quality data is available ",[55,46516,46498],{"href":46517,"rel":46518},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fpulsar-airquality-function",[264],". As you can see in the architecture diagram, Functions, microservices, Spark jobs and Flink jobs can all collaborate as part of real-time data pipelines with ease.",[48,46521,46522],{},"We can reuse the connection configuration that we have from the Producer, but we need a configuration to produce our Consumer. The configuration class for the Consumer will need the consumer name, subscription name and topic name from the application.properties file. In the code we set the subscription type and starting point to Shared. We are also using the JSON Schema for Observation as used in the Pulsar Producer.",[8325,46524,46527],{"className":46525,"code":46526,"language":8330},[8328],"@PulsarListener(subscriptionName = \"pm25-spring-reader\", subscriptionType = Shared, schemaType = SchemaType.JSON, topics = \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Faq-pm25\")\n    public void echoObservation(Observation message) {\n        this.log.info(\"PM2.5 Message received: {}\", message);\n    }\n",[4926,46528,46526],{"__ignoreMap":18},[48,46530,46531],{},"As we can see, it is very easy to run the consumer. After we receive the event as a plain old Java object (POJO), we can do whatever we want with the data. For example, you could use another Spring library to store to a database, send to a REST service, or store to a file.",[48,46533,46534],{},"You can also use other protocols or tools for the application. See the following examples for details.",[321,46536,46537,46544,46551,46558,46564,46571,46578,46584],{},[324,46538,46539],{},[55,46540,46543],{"href":46541,"rel":46542},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fairquality-mqtt-consumer",[264],"MQTT to MoP",[324,46545,46546],{},[55,46547,46550],{"href":46548,"rel":46549},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fairquality-amqp-consumer",[264],"AMQP to AoP",[324,46552,46553],{},[55,46554,46557],{"href":46555,"rel":46556},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fairquality-kafka-consumer",[264],"Kafka to KoP",[324,46559,46560],{},[55,46561,562],{"href":46562,"rel":46563},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fpulsar-airquality-timeplus",[264],[324,46565,46566],{},[55,46567,46570],{"href":46568,"rel":46569},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fairquality-datastore",[264],"ScyllaDB",[324,46572,46573],{},[55,46574,46577],{"href":46575,"rel":46576},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiPN-AirQuality-REST",[264],"Apache NiFi",[324,46579,46580],{},[55,46581,46583],{"href":46575,"rel":46582},[264],"Websockets \u002F JQuery \u002F HTML",[324,46585,46586],{},[55,46587,46589],{"href":46517,"rel":46588},[264],"Function to Process Data",[40,46591,2125],{"id":2122},[48,46593,46594],{},"The key takeaways are the rich, diverse support given to Spring applications for interacting with Apache Pulsar. Java is a first-class client for Apache Pulsar and this shows its power and flexibility by building your Pulsar applications this way. Let’s Spring into action!",[40,46596,4135],{"id":4132},[321,46598,46599,46609,46617,46626,46635,46643,46653,46662],{},[324,46600,46601,758,46604],{},[2628,46602,46603],{},"Source Code",[55,46605,46608],{"href":46606,"rel":46607},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fairquality",[264],"Air quality example code",[324,46610,46611,758,46613],{},[2628,46612,46603],{},[55,46614,46616],{"href":46517,"rel":46615},[264],"Pulsar air quality function",[324,46618,46619,758,46621],{},[2628,46620,46603],{},[55,46622,46625],{"href":46623,"rel":46624},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fairquality-consumer",[264],"Air quality consumer",[324,46627,46628,758,46630],{},[2628,46629,46603],{},[55,46631,46634],{"href":46632,"rel":46633},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiPN-AirQuality-Checks",[264],"FLiPN air quality checks",[324,46636,46637,758,46639],{},[2628,46638,46603],{},[55,46640,46642],{"href":46575,"rel":46641},[264],"FLiPN air quality REST",[324,46644,46645,758,46648],{},[2628,46646,46647],{},"GitHub Repo",[55,46649,46652],{"href":46650,"rel":46651},"https:\u002F\u002Fgithub.com\u002Fmajusko\u002Fpulsar-java-spring-boot-starter",[264],"Spring Boot Starter for Apache Pulsar",[324,46654,46655,758,46657],{},[2628,46656,46647],{},[55,46658,46661],{"href":46659,"rel":46660},"https:\u002F\u002Fgithub.com\u002Fdatastax\u002Freactive-pulsar",[264],"Reactive Pulsar Adapter",[324,46663,46664,758,46666],{},[2628,46665,42753],{},[55,46667,46670],{"href":46668,"rel":46669},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fclient-libraries-java\u002F",[264],"Pulsar Java Client",[40,46672,38376],{"id":38375},[1666,46674,46675,46684,46691,46703],{},[324,46676,46677,46678,1154,46681,46683],{},"Learn Pulsar fundamentals. While this blog did not cover Pulsar fundamentals, there are great resources available to help you learn more. If you are new to Pulsar, you can take the ",[55,46679,36487],{"href":36485,"rel":46680},[264],[55,46682,36491],{"href":36490}," developed by the original creators of Pulsar. This will get you started with Pulsar and help accelerate your streaming.‍",[324,46685,46686,46687,46690],{},"Spin up a Pulsar cluster in minutes. If you want to try building microservices without having to set up a Pulsar cluster yourself, sign up for ",[55,46688,3550],{"href":17075,"rel":46689},[264]," today. StreamNative Cloud provides a simple, fast, and cost-effective way to run Pulsar in the public cloud.‍",[324,46692,46693,46694,29496,46696,1154,46699,46702],{},"Submit your session at Pulsar Virtual Summit Europe 2023. Pulsar Virtual Summit Europe 2023 will take place on Tuesday, May 23rd, 2023! See this ",[55,46695,39553],{"href":45460},[55,46697,45465],{"href":45463,"rel":46698},[264],[55,46700,45208],{"href":45206,"rel":46701},[264]," (no fee required).‍",[324,46704,46705],{},"Build microservices with Pulsar. If you are interested in learning more about microservices and Pulsar, take a look at the following resources:",[321,46707,46708,46718,46725,46733],{},[324,46709,46710,46711,46714,46715,190],{},"~",[2628,46712,46713],{},"3-Part Webinar Series"," Building Event-Driven Microservices with Apache Pulsar. Watch the webinars ",[55,46716,267],{"href":46717},"\u002Fwebinars\u002Fbuild-an-event-driven-architecture-with-pulsar-01-13-22",[324,46719,46710,46720,758,46722],{},[2628,46721,40436],{},[55,46723,46724],{"href":46369},"Spring into Pulsar",[324,46726,46710,46727,758,46729],{},[2628,46728,40436],{},[55,46730,46732],{"href":46731},"\u002Fblog\u002Fannouncing-spring-for-apache-pulsar","Announcing Spring for Apache Pulsar",[324,46734,46710,46735,758,46737],{},[2628,46736,40436],{},[55,46738,46740],{"href":46739},"\u002Fblog\u002Fspring-into-pulsar-part-2-spring-based-microservices-multiple-protocols-apache-pulsar","Spring into Pulsar Part 2: Spring-based Microservices for Multiple Protocols with Apache Pulsar",{"title":18,"searchDepth":19,"depth":19,"links":46742},[46743,46744,46745,46746,46747,46748,46749],{"id":46362,"depth":19,"text":46363},{"id":46395,"depth":19,"text":46396},{"id":46489,"depth":19,"text":46490},{"id":46507,"depth":19,"text":24840},{"id":2122,"depth":19,"text":2125},{"id":4132,"depth":19,"text":4135},{"id":38375,"depth":19,"text":38376},"2023-02-21","Learn how to build a simple Spring Pulsar application with the new official Spring Pulsar library.","\u002Fimgs\u002Fblogs\u002F6406944ce2dfd0dc96ea7ba3_spring-into-pulsar-part-3-building-an-application-with-the-new-spring-library-for-apache-pulsar.png",{},"\u002Fblog\u002Fspring-into-pulsar-part-3-building-an-application-with-the-new-spring-library-for-apache-pulsar",{"title":46355,"description":46751},"blog\u002Fspring-into-pulsar-part-3-building-an-application-with-the-new-spring-library-for-apache-pulsar",[38442,821],"s0Onyq01sf39oRxjkPKYRIcZ7nRQB0PX5L1IMH-BQyc",{"id":46760,"title":34047,"authors":46761,"body":46763,"category":821,"createdAt":290,"date":47350,"description":47351,"extension":8,"featured":294,"image":47352,"isDraft":294,"link":290,"meta":47353,"navigation":7,"order":296,"path":34046,"readingTime":31039,"relatedResources":290,"seo":47354,"stem":47355,"tags":47356,"__hash__":47357},"blogs\u002Fblog\u002Fa-practical-guide-to-enterprise-grade-security-in-apache-pulsar.md",[46762],"Teng Fu",{"type":15,"value":46764,"toc":47329},[46765,46768,46771,46775,46778,46792,46795,46799,46806,46810,46813,46817,46824,46828,46835,46839,46846,46850,46857,46861,46873,46877,46880,46883,46891,46897,46900,46905,46911,46916,46922,46925,46929,46932,46936,46943,46954,46957,46968,46973,46977,46984,46995,46998,47012,47015,47019,47022,47026,47043,47054,47057,47060,47079,47085,47093,47099,47104,47110,47118,47122,47125,47131,47134,47139,47145,47155,47158,47172,47175,47179,47188,47191,47194,47200,47203,47209,47212,47218,47221,47227,47234,47238,47241,47244,47246,47249,47252,47254,47257,47260,47262,47265,47267,47272],[48,46766,46767],{},"Data security represents an important part for modern data infrastructure. Applying best practices for authentication and authorization ensures enterprise data is accessible only to the right tenants or components. As a messaging and streaming system powering enterprises spanning multiple industries, Apache Pulsar deals with business-critical data. It supports a wide variety of security mechanisms (for example, TLS, Athenz, Kerberos, JWT, and OAuth2.0) for organizations based on their needs.",[48,46769,46770],{},"In this blog, I will introduce available security combinations in Pulsar and then give some best practices for implementing authentication and authorization.",[40,46772,46774],{"id":46773},"understanding-security-mechanisms-in-pulsar","Understanding security mechanisms in Pulsar",[48,46776,46777],{},"By default, all encryption, authentication, and authorization configurations in Pulsar are disabled. This means any client can access the cluster, leaving your sensitive information vulnerable to external eavesdroppers. For enterprises that require strict security controls and safeguards, they can use different security strategies that Pulsar provides.",[321,46779,46780,46783,46786,46789],{},[324,46781,46782],{},"Authentication: Validate the credentials for an entity to establish the connection.",[324,46784,46785],{},"Authorization: Grant permissions (support ACL) to an entity to perform actions on different resources in the cluster.",[324,46787,46788],{},"Transport encryption: Support TLS and mTLS for data security in transit.",[324,46790,46791],{},"End-to-end encryption: Only allow producers and consumers to encrypt and decrypt data.",[48,46793,46794],{},"Now, let’s take a look at how to use different combinations of these frameworks to achieve different levels of security in Pulsar.",[32,46796,46798],{"id":46797},"level-0-full-link-trusted","Level 0: Full Link Trusted",[48,46800,46801,46802,46805],{},"Figure 1 shows an example of Full Link Trusted. In this setting, all components within the cluster have direct access to others and no encryption mechanism is applied. For example, producers can send messages to Pulsar in plain text without any authentication. This configuration is suitable for internal tests or feature validations.\n",[384,46803],{"alt":18,"src":46804},"\u002Fimgs\u002Fblogs\u002F63ed93e5cb82ae2d4f4668a4_nWKhgxL9fYkjjo7nX_MHBBIJhLv7fwr79icbEKj6qxyD5yW815egEvcc1O8LGL7wqYEcsUJi6ipApNsF4O9r7gFwg4DAWfVRBeBO5Rp3bPLvm2nz_565xpbF8suSMGouG06Vs8OeQ4JWeKWJarqAqOM.png","Figure 1. Full Link Trusted",[32,46807,46809],{"id":46808},"level-1-intranet-trusted","Level 1: Intranet Trusted",[48,46811,46812],{},"In Intranet Trusted scenarios, the interactions between the components within the cluster are not restricted, while data transmission outside the cluster is encrypted. In these cases, you can enable TLS connections to brokers, proxies, or a load balancer.",[3933,46814,46816],{"id":46815},"tls-connections-to-brokers","TLS connections to brokers",[48,46818,46819,46820,46823],{},"Producers can send data to Pulsar brokers with SSL encryption. The decryption process may require some cluster resources and, to some extent, can impact cluster performance, which is usually acceptable. For interactions between brokers, bookies and ZooKeeper, as no authorization or authentication policy is applied, performance, latency, and throughput within the cluster are not affected.\n",[384,46821],{"alt":18,"src":46822},"\u002Fimgs\u002Fblogs\u002F63ed93e5f18a475a44bd1d81_XaCZ97zxgkhlN7BDedaZaFL8N9URNnGFjK1P5MWCUwbLiS6HG8ETB2Il3x5WqBPSK7Isz__cM8Mclx90sT7kKfjNn916n1_DOrUc8rR5P-pQZ_8GbZet0OebAgVG9MCv226hrAOJguEoQedCk6FNNB4.png","Figure 2. Intranet Trusted - TLS connections to brokers",[3933,46825,46827],{"id":46826},"tls-connections-to-proxies","TLS connections to proxies",[48,46829,46830,46831,46834],{},"Using a proxy layer for your Pulsar cluster, you can expose the address of proxy servers instead of brokers. This is especially useful when Pulsar is deployed in a private environment. In this case, you can configure authentication for data streams on the proxy layer. Communications between brokers, bookies, and ZooKeeper do not have security restrictions.\n",[384,46832],{"alt":18,"src":46833},"\u002Fimgs\u002Fblogs\u002F63ed93e5f6c87d861ff86c79__qqjy_9niK51kLEkMCU0aSTp9UrFeZS_Sat7JsQKDgRt2d8ZbjyF4-tJzilc12iNewPQQkjrHNA7QSM5KCx9lPe4NNWqfKCUgf7v5X3Y9a_q-H_adS8Ikzv-c3B-FXxm33TWIOv4gsxwhn0_Ea6n_sw.png","Figure 3. Intranet Trusted - TLS connections to proxies",[3933,46836,46838],{"id":46837},"tls-connections-to-a-load-balancer","TLS connections to a load balancer",[48,46840,46841,46842,46845],{},"In scenarios requiring a load balancer between clients and proxies, producers first send encrypted data to the load balancer. Then, the proxy validates the data from the load balancer and sends it to brokers, which is similar to the previous use case.\n",[384,46843],{"alt":18,"src":46844},"\u002Fimgs\u002Fblogs\u002F63ed93e5a20e31f7abaa6589_fz999cs3RN8yav-C3O8NWvJugPo-8os0GJh7HE8lt5oAPVTfvBUy6pxUDFtcNrQdi3fGXcl3Vs56Mp-QBcnb_u6chRNqYm-SR2Bsf-YK2SPoC0d6HnrYXaRn-dBlW8cH8VqZz09vIrzYJNuiEzJ5Py8.png","Figure 4. Intranet Trusted - TLS connections to a load balancer",[32,46847,46849],{"id":46848},"level-2-intranet-untrusted","Level 2: Intranet Untrusted",[48,46851,46852,46853,46856],{},"In some cases, external teams may need to access your cluster data, which could lead to extra security concerns. As such, I recommend enabling authentication for all components. Configuring encryption and authentication on all layers\u002Fnodes effectively avoids security issues. Additionally, you can enable authorization on the broker side to verify the permissions. This setting is commonly used for cross-departmental collaboration.\n",[384,46854],{"alt":18,"src":46855},"\u002Fimgs\u002Fblogs\u002F63ed93e560aed53726ebc5c4_SA6WZpxlT-F0_QBOJ-Lihy4m2SRcATt46X8ioDx79nhSOPjAj6fFVyoIrPndSyjfeMc56lv2reyGTiQX6poPI6Q8tadRXjmZxW2C1RmIgXL1XjVzSiy9lIjNK441n1JXloEs0SEKN_ICl3vXT_ct84w.png","Figure 5. Intranet Untrusted",[32,46858,46860],{"id":46859},"level-3-service-untrusted-end-to-end-encryption","Level 3: Service Untrusted (End-to-end encryption)",[48,46862,46863,46864,41750,46869,46872],{},"Some cloud providers may offer Pulsar as a service with high security requirements. In this setting that adopts end-to-end encryption, data transmission between all components are encrypted. This full-stack security policy only allows producers and consumers to access the original data. This is different from Level 2, where data can still be decrypted on the broker side. For more information, see ",[55,46865,46868],{"href":46866,"rel":46867},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-encryption\u002F",[264],"End-to-End Encryption in Pulsar",[384,46870],{"alt":18,"src":46871},"\u002Fimgs\u002Fblogs\u002F63ed93e5a81a51c07d67c2a8_uCm8nnBe2-6NntN0vDjK1Z-Zb6yp-MmjsJGSKgJwO9yEGGliW4miK4CTdVWzJ13VOeXCUywy9-VLdb5g_rsSxfljMP6dsF1V0LLTV7JE4PGoJ8cz2WMTN6VpH9a80FIETZ3lzIxqg0AXzZNVVePn9vE.png","Figure 6. Service Untrusted",[40,46874,46876],{"id":46875},"extensible-security-framework","Extensible security framework",[48,46878,46879],{},"Pulsar has a simple and scalable security framework. Enterprises can easily customize authentication and authorization plugins.",[48,46881,46882],{},"On the server side (for example, brokers and proxies), Pulsar validates the identity of clients and records their roles. More specifically, it uses an authentication provider or a “provider chain” to establish the identity of a client and then assign a role token to that client. You can consider the role as the identifier of the client. Available authentication options are listed below:",[321,46884,46885,46888],{},[324,46886,46887],{},"Built-in authentication plugins: TLS, Athenz, Kerberos, JWT, OAuth2.0, and Basic.",[324,46889,46890],{},"Authentication provider chain: You can configure multiple authentication providers at the same time. Pulsar caches all providers locally on the server side and initializes them. For each passed authentication type from the client, Pulsar checks the corresponding provider. It considers a client as valid if it is authenticated via at least one of the configured authentication providers. For example, you can use JWT and OAuth2.0 for authenticationProviders in broker.conf with a comma separating them.",[8325,46892,46895],{"className":46893,"code":46894,"language":8330},[8328],"authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderToken,org.apache.pulsar.broker.authentication.AuthenticationProviderBasic\n",[4926,46896,46894],{"__ignoreMap":18},[48,46898,46899],{},"Once Pulsar successfully authenticates a client through the authentication provider, it checks whether it has the permissions to perform certain operations. Pulsar offers two authorization plugins, which can be configured through authorizationProvider in broker.conf.",[321,46901,46902],{},[324,46903,46904],{},"AuthorizationProvider: This is the default authorization provider.",[8325,46906,46909],{"className":46907,"code":46908,"language":8330},[8328],"authorizationProvider=org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider\n",[4926,46910,46908],{"__ignoreMap":18},[321,46912,46913],{},[324,46914,46915],{},"MultiRolesTokenAuthorizationProvider: If a client is identified with multiple roles in the token, Pulsar can check all of its roles. The authorization will be successful as long as one of the roles has the required permissions. This method is only applicable to JWT authentication.",[8325,46917,46920],{"className":46918,"code":46919,"language":8330},[8328],"authorizationProvider=org.apache.pulsar.broker.authorization.MultiRolesTokenAuthorizationProvider\n",[4926,46921,46919],{"__ignoreMap":18},[48,46923,46924],{},"Note that if you only configure authentication without authorization enabled, any authenticated client will be able to perform any action in your Pulsar cluster.",[40,46926,46928],{"id":46927},"understanding-the-authentication-and-authorization-process","Understanding the authentication and authorization process",[48,46930,46931],{},"Now that we have learned some basics about authentication and authorization in Pulsar, let’s explore how each of them works in more depth.",[32,46933,46935],{"id":46934},"authentication","Authentication",[48,46937,46938,46939,46942],{},"Figure 7 depicts how authentication configurations are initialized.\n",[384,46940],{"alt":18,"src":46941},"\u002Fimgs\u002Fblogs\u002F63ed93e6a20e3127bcaa6594_lPjhPmaP0E1E4CANWGBvOsa35P4XwWZhfl7_QFw4ouanYhAOWoXWd8MOt4ozuKcdR_Dbw-abgdCPGB4Jm-4ZTjCyPgeD5mLkbf0uxYy0441kE1Osj0c7YWN3O1l0p0DF56XVv2vnSWMqwnH1OTSAuIE.png","Figure 7. Initializing authentication configurations",[1666,46944,46945,46948,46951],{},[324,46946,46947],{},"When the broker starts, it runs the BrokerService, which creates the AuthenticationService.",[324,46949,46950],{},"The AuthenticationService obtains authentication providers from broker.conf.",[324,46952,46953],{},"The AuthenticationService initializes and caches authentication plugins.",[48,46955,46956],{},"Authentication occurs when a client connects to Pulsar. The authentication process is as follows:",[1666,46958,46959,46962,46965],{},[324,46960,46961],{},"The client sends a CommandConnect command to the broker, which contains related authentication information.",[324,46963,46964],{},"The AuthenticationService obtains the authentication type (AuthMethodName) through the CommandConnect.",[324,46966,46967],{},"The AuthenticationService calls the authenticate method to authenticate the client.",[916,46969,46970],{},[48,46971,46972],{},"The broker caches the credentials used for authentication and periodically checks whether the credentials have expired. You can customize the interval through authenticationRefreshCheckSeconds in broker.conf, which defaults to 60 seconds.",[32,46974,46976],{"id":46975},"authorization","Authorization",[48,46978,46979,46980,46983],{},"Figure 8 depicts how authorization configurations are initialized.\n",[384,46981],{"alt":18,"src":46982},"\u002Fimgs\u002Fblogs\u002F63ed93e66b83150d889b9894_8ZT5tuve-TxK70oB3FVbmvgVMSsq_soCvh-iKbK4Hwt1efMMMKkbDVsaEG5P_X3eCkkEubnKhBz05FiljLAgbDdketpv1Cu3aWILEvvw3E_uTuY8VzzoyO6AJHp2XERfNoNoVOoTt0xSavJ4GGRc8P0.png","Figure 8. Initializing authorization configurations",[1666,46985,46986,46989,46992],{},[324,46987,46988],{},"When the broker starts, it runs the BrokerService, which creates the AuthorizationService.",[324,46990,46991],{},"The AuthorizationService obtains authorization providers from broker.conf.",[324,46993,46994],{},"The AuthorizationService initializes and caches authorization plugins.",[48,46996,46997],{},"Pulsar’s authorization framework contains different roles that can perform tasks at different levels.",[321,46999,47000,47003,47006,47009],{},[324,47001,47002],{},"Brokers: Superusers are administrators of the Pulsar cluster who have access to all resources. They create tenant administrators, who can help them manage tenant resources.",[324,47004,47005],{},"Tenants: Tenant administrators manage tenants and grant permissions to clients.",[324,47007,47008],{},"Namespaces: Tenant administrators set different policies for namespaces, such as retention, backlogs, functions, and resource quotas.",[324,47010,47011],{},"Topics: Clients can produce and consume messages.",[48,47013,47014],{},"Note that only superusers and tenant administrators can grant permissions to users.",[40,47016,47018],{"id":47017},"authentication-best-practices","Authentication best practices",[48,47020,47021],{},"In this section, I will introduce some best practices and tips for configuring authentication using JWT and Kerberos.",[32,47023,47025],{"id":47024},"jwt-authentication","JWT authentication",[48,47027,47028,25379,47033,47038,47039,47042],{},[55,47029,47032],{"href":47030,"rel":47031},"https:\u002F\u002Fjwt.io\u002Fintroduction",[264],"JSON Web Token",[55,47034,47037],{"href":47035,"rel":47036},"https:\u002F\u002Fwww.rfc-editor.org\u002Frfc\u002Frfc7519",[264],"RFC-7519",") is a common authentication method in web services, also known as JWT authentication. It identifies clients through a token string, which consists of three parts separated by dots. See an example in Figure 9.\n",[384,47040],{"alt":18,"src":47041},"\u002Fimgs\u002Fblogs\u002F63ed93e6a900a70550890271_5LHYgyLOvKL4ap1CfUyaFfqZA4TzyyDuumtSk_b2sRKsuKUpdsYY7sJRybplNBzWzpbd6OiSRgfTOrumdVRcLAfeRoImaw_9lqRKb9H47xP5iGD_eCSW2mhJsjMngwOptREhTM8Xan9RhZ6XErp9mIw.png","Figure 9. JWT authentication",[321,47044,47045,47048,47051],{},[324,47046,47047],{},"Header: Specifies the signature algorithm in JSON, encoded in base64url. In Pulsar, HS256 is used by default.",[324,47049,47050],{},"Payload: Specifies the claims such as subject and expiration time in JSON, encoded in base64url.",[324,47052,47053],{},"Signature: Specifies the algorithm to ensure the token is not changed, encoded by the header, the payload, and a secret.",[48,47055,47056],{},"Note that the header and the payload are decodable.",[48,47058,47059],{},"Here are some tips for using JWT authentication in Pulsar:",[1666,47061,47062,47065,47068,47076],{},[324,47063,47064],{},"You can use JWT for authentication and authorization but your data is still exposed. I recommend enabling TLS to encrypt data in transit especially when you have strict security requirements, though the performance may be compromised.",[324,47066,47067],{},"JWT is independent of third-party services. Once a token is signed, you cannot revoke it during the validity period. Therefore, it is a good practice to set a short validity period for tokens of important operations.",[324,47069,47070,47071,190],{},"You can create tokens for JWT authentication using two types of keys in Pulsar: a secret key (symmetric) and a private\u002Fpublic key pair (asymmetric). You only need to select one of them. For more information, see ",[55,47072,47075],{"href":47073,"rel":47074},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-jwt\u002F#create-client-certificates",[264],"Create client certificates",[324,47077,47078],{},"Validate your token after it is created to avoid misconfiguration. For example, you need to assign the right token if the subject needs to perform certain operations like pulsar-admin. To validate a token, use bin\u002Fpulsar tokens validate.",[8325,47080,47083],{"className":47081,"code":47082,"language":8330},[8328],"bin\u002Fpulsar tokens validate -pk  my-public.key -i \"eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJhZG1pbiJ9.ijp-Qw4JDn1aOQbYy4g4YGBbXYIgLA9lCVrnP-heEtPCdDq11_c-9pQdQwc6RdphvlSfoj50qwL5OtmFPysDuF2caSYzSV1kWRWN-tFzrt-04_LRN-vlgb6D06aWubVFJQBC4DyS-INrYqbXETuxpO4PI9lB6lLXo6px-SD5YJzQmcYwi2hmQedEWszlGPDYi_hDG9SeDYmnMpXTtPU3BcjaDcg9fO6PlHdbnLwq2MfByeIj-VS6EVhKUdaG4kU2EJf5uq2591JJAL5HHiuTZRSFD6YbRXuYqQriw4RtnYWSvSeVMMbcL-JzcSJblNbMmIOdiez43MPYFPTB7TMr8g\"\n\n{sub=admin}\n",[4926,47084,47082],{"__ignoreMap":18},[1666,47086,47087,47090],{"start":20934},[324,47088,47089],{},"As mentioned above, Pulsar brokers cache the authentication information of the client and check its validity periodically (60 seconds by default). You can customize the time interval through authenticationRefreshCheckSeconds in broker.conf.",[324,47091,47092],{},"You can configure the token through brokerClientAuthenticationParameters as a string or from a file.",[8325,47094,47097],{"className":47095,"code":47096,"language":8330},[8328],"# Use it as a string:\nbrokerClientAuthenticationParameters={\"token\":\"eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ0ZXN0LXVzZXIfQ.9OHgE9ZUDeBTZs7nSMEFIuGNEX18FLR3qvy8mqxSxXw\"}\n\n# Read it from a file:\nbrokerClientAuthenticationParameters={\"file\":\"\u002F\u002F\u002Fpath\u002Fto\u002Fproxy-token.txt\"}\n",[4926,47098,47096],{"__ignoreMap":18},[1666,47100,47101],{"start":25801},[324,47102,47103],{},"You can check the header and payload of the token using bin\u002Fpulsar tokens show.",[8325,47105,47108],{"className":47106,"code":47107,"language":8330},[8328],"bin\u002Fpulsar tokens show -i eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ0ZXN0LXVzZXIiLCJleHAiOjE2NTY3NzYwOTh9.awbp6DreQwUyV8UCkYyOGXCFbfo4ZoV-dofXYTnFXO8\n\n{\"alg\":\"HS256\"}\n---\n{\"sub\":\"test-user\",\"exp\":1656776098}\n",[4926,47109,47107],{"__ignoreMap":18},[48,47111,47112,47113,190],{},"For more information, see ",[55,47114,47117],{"href":47115,"rel":47116},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-jwt\u002F",[264],"Authentication using tokens based on JWT",[32,47119,47121],{"id":47120},"kerberos-authentication","Kerberos authentication",[48,47123,47124],{},"Kerberos is a popular solution for authentication in the big data field for its simplicity and stability. Pulsar supports Kerberos authentication through the Java Authentication and Authorization Service (JAAS) for SASL configuration. The information of a user in JAAS is saved in a section. For authentication using Kerberos, the most important user information is principal and keytab, which can be easily wrapped into one section. You can store all the information in a jaas.conf file as below.",[8325,47126,47129],{"className":47127,"code":47128,"language":8330},[8328],"SectionName {\n   com.sun.security.auth.module.Krb5LoginModule required\n   useKeyTab=true\n   storeKey=true\n   useTicketCache=false\n   keyTab=\"\u002Fetc\u002Fsecurity\u002Fkeytabs\u002Fpulsarbroker.keytab\"\n   principal=\"broker\u002Flocalhost@EXAMPLE.COM\";\n};\n AnotherSectionName {\n  ...\n};\n",[4926,47130,47128],{"__ignoreMap":18},[48,47132,47133],{},"In the above code snippet, SectionName encapsulates the information of a Kerberos user and uses the username as the unique identifier. After you create the JAAS file, you need to do the following:",[1666,47135,47136],{},[324,47137,47138],{},"Set the file path as a JVM parameter as below:",[8325,47140,47143],{"className":47141,"code":47142,"language":8330},[8328],"-Djava.security.auth.login.config=\u002Fetc\u002Fpulsar\u002Fjaas.conf\n",[4926,47144,47142],{"__ignoreMap":18},[1666,47146,47147],{"start":19},[324,47148,47149,47150,190],{},"Specify the section in broker.conf. For more information, see ",[55,47151,47154],{"href":47152,"rel":47153},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-kerberos\u002F#configure-brokers",[264],"Configure brokers",[48,47156,47157],{},"When using Kerberos authentication in Pulsar, the principal field can be easily misconfigured. The following is the naming convention of principal on the server side.",[321,47159,47160,47163,47166,47169],{},[324,47161,47162],{},"It should contain three parts - service\u002F{hostname}@{REALM}, like broker\u002Fhost1@MY.REALM.",[324,47164,47165],{},"The service field refers to the service type of each host. You can use keywords like broker and proxy for it. Other names might return a warning.",[324,47167,47168],{},"{hostname} should be consistent with advertisedAddress. For example, if the principal of the broker service is broker\u002F172.17.0.7@SNIO, the advertisedAddress should also be set to 172.17.0.7. I suggest you use the hostname directly to avoid IP configuration problems for multiple NICs. If you don’t want to configure DNS, you can use the IP address but make sure it is the same as advertisedAddress.",[324,47170,47171],{},"For {REALM}, I suggest you use uppercase letters.",[48,47173,47174],{},"Note that Kerberos requires that all your hosts be resolved with their FQDNs. You can add your machines with their DNS information to \u002Fetc\u002Fhosts.",[40,47176,47178],{"id":47177},"customizing-an-authorization-plugin-using-ranger","Customizing an authorization plugin using Ranger",[48,47180,47181,47182,47187],{},"Pulsar features a flexible authorization mechanism that allows you to easily customize your own authorization plugin. In this section, I will briefly explain how to create a custom authorization plugin using ",[55,47183,47186],{"href":47184,"rel":47185},"https:\u002F\u002Franger.apache.org\u002F",[264],"Apache Ranger"," for visualized permission management with some code examples. Ranger is a popular open-source project for data access governance in the big data area.",[48,47189,47190],{},"To create this plugin, you need to register Pulsar as a service in Ranger, and then implement a Pulsar authorization interface in the plugin with some methods. In the initialization method, you need to create a Ranger Client to connect to Ranger.",[48,47192,47193],{},"Additionally, you need to define Ranger resources, such as tenants, namespaces, and topics, and access types, and load them into Ranger. I will not explain related Range concepts in detail, but I provide an example here for your reference:",[8325,47195,47198],{"className":47196,"code":47197,"language":8330},[8328],"\"resources\": [\n   {\n           \"itemId\":1,\n           \"name\":\"tenant\",\n           \"type\":\"string\",\n           \"level\":1,\n           \"parent\":\"\",\n           \"mandatory\":true,\n           \"lookupSupported\":true,\n           \"recursiveSupported\":false,\n           \"excludesSupported\":true,\n      \"matcher\":\"org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher\",\n           \"matcherOptions\":{\n               \"wildCard\":true,\n               \"ignoreCase\":true\n           },\n           \"validationRegEx\":\"\",\n           \"validationMessage\":\"\",\n           \"uiHint\":\"\",\n           \"label\":\"tenant\",\n           \"description\":\"tenant\",\n   },\n],\n",[4926,47199,47197],{"__ignoreMap":18},[48,47201,47202],{},"Define access types:",[8325,47204,47207],{"className":47205,"code":47206,"language":8330},[8328],"\"accessTypes\": [\n   {\n       \"itemId\": 1,\n       \"name\": \"produce\",\n       \"label\": \"Produce\"\n   },\n   {\n       \"itemId\": 2,\n       \"name\": \"consume\",\n       \"label\": \"Consume\"\n   },\n],\n",[4926,47208,47206],{"__ignoreMap":18},[48,47210,47211],{},"The following is a code example of authorization implementation in Pulsar.",[8325,47213,47216],{"className":47214,"code":47215,"language":8330},[8328],"@Override\npublic CompletableFuture allowTopicOperationAsync(TopicName topicName,\n                                                           String role,\n                                                           TopicOperation operation,\n                                                           AuthenticationDataSource authData) {\n    if (log.isDebugEnabled()) {\n        log.debug (\"Check allowTopicOperationAsync [{}] on [()].\", operation.name(), topicName);\n    }\n\n    return validateTenantAdminAccess(topicName.getTenant(), role, authData)\n            .thenCompose(isSuperUserOrAdmin -> {\n                if (log.isDebugEnabled()) {\n                    log.debug(\"Verify if role (} is allowed to {} to topic {}: isSuperUserOrAdmin={}\",\n                            role, operation, topicName, isSuperUserOrAdmin);\n                }\n                if (isSuperUserOrAdmin) {\n                    return CompletableFuture.completedFuture(true);\n                } else {\n                    switch (operation) {\n                        case LOOKUP:\n                        case GET_STATS:\n                        case GET_METADATA:\n                            return canLookupAsync(topicName, role, authData);\n                        case PRODUCE:\n                            return canProduceAsync(topicName, role, authData);\n                        case GET_SUBSCRIPTIONS:\n                        case CONSUME:\n                        case SUBSCRIBE:\n                        case UNSUBSCRIBE:\n                        case SKIP:\n                        case EXPIRE_MESSAGES:\n                        case PEEK_MESSAGES:\n                        case RESET_CURSOR:\n                          case GET_BACKLOG_SIZE:\n                          case SET_REPLICATED_SUBSCRIPTION_STATUS:\n                          case GET_REPLICATED_SUBSCRIPTION_STATUS:\n                            return canConsumeAsync(topicName, role, authData, authData.getSubscription());\n                        case TERMINATE:\n                        case COMPACT:\n                        case OFFLOAD:\n                        case UNLOAD:\n                        case ADD_BUNDLE_RANGE:\n                        case GET_BUNDLE_RANGE:\n                        case DELETE_BUNDLE_RANGE:\n                            return CompletableFuture.completedFuture(false);\n                        default:\n                            return FutureUtil.failedFuture(new IllegalStateException(\n                                    \"TopicOperation [\" + operation.name() + \"] is not supported.\")) ;\n                    }\n                }\n           });\n}\n\n",[4926,47217,47215],{"__ignoreMap":18},[48,47219,47220],{},"An example of the canProduceAsync method:",[8325,47222,47225],{"className":47223,"code":47224,"language":8330},[8328],"@Override\npublic CompletableFuture canProduceAsync(TopicName topicName, String role,\n        AuthenticationDataSource authenticationData) {\n    CompletableFuture future = new CompletableFuture\u003C>();\n    \n    RangerAccessResourceImpl resource = new RangerAccessResourceImpl();\n    resource.setValue(KEY_TENANT, topicName.getTenant());\n    resource.setValue(KEY_NAMESPACE, topicName.getNamespacePortion());\n    resource.setValue(KEY_TOPIC, topicName.getLocalName().split(\"-partition-\") [0]);\n    \u002F\u002Fresource.setValue(KEY_TAG, \"*\");\n    \n    RangerAccessRequestImpl request = new RangerAccessRequestImpl();\n    \n    request.setAccessType(AuthAction.produce.name());\n    request.setUser(role);\n    request.setResource(resource);\n    request.setAction(AuthAction.produce.name());\n    \n    try {\n        RangerAccessResult result = rangerPlugin.isAccessAllowed(request);\n\n        log.info(\"request--->{}\", request);\n        log.info(\"result--->{}\", result);\n        \n        if (result.getIsAllowed()) {\n            future.complete (true);\n        } else {\n            String errMsg = String\n                    .format (\"User '%s' doesn't have produce access to %s, matched policy id = %d\",\n                             request.getUser(), topicName.toString(), result.getPolicyId());\n            log.error(errMsg);\n            future.completeExceptionally(new Exception(errMsg));\n        }\n    } catch (Exception e) {\n        \u002F\u002F access allowed in abnormal situation\n        log.error(\"User {} encounter exception in {} produce authorization step.\",\n                request.getUser(), topicName.toString(), e);\n        future. complete(true);\n    }\n    return future;\n}\n\n",[4926,47226,47224],{"__ignoreMap":18},[48,47228,47229,47230,47233],{},"Expected result in Ranger:\n",[384,47231],{"alt":18,"src":47232},"\u002Fimgs\u002Fblogs\u002F63ed93e7f18a47375abd1f8b_CRXEnmbASdhwYR0IIqFIaJ1EapBVrk9vR4wz1u-QiWliSdMd_dnITZY_ClbTWiWhs1MA4tpMso-NQVH-71deuAtSWQT7m3BkB2NIJzWDd0VxqH2pldULcRFQ0os3hTASWAS6NdRQQ22J7xo13jnnnvQ.png","Figure 10. Visualized permission management in Ranger",[40,47235,47237],{"id":47236},"frequently-asked-questions-about-authentication-and-authorization-in-pulsar","Frequently asked questions about authentication and authorization in Pulsar",[48,47239,47240],{},"Q: For JWT authentication, is the local token file read in real time? Do I need to restart proxies or brokers?",[48,47242,47243],{},"A: The change of the client token file has no impact on the server side, so you don’t need to restart the servers.",[48,47245,3931],{},[48,47247,47248],{},"Q: For JWT authentication, how do brokers cache the authentication information of the client?",[48,47250,47251],{},"A: A thread periodically checks whether the cached authentication information of the client has expired. If so, it sends the AuthChallenge command to the client. The client then sends the token file to the broker. After receiving the updated token, the broker validates the authentication and re-caches the information. If the client fails to send back valid information within the interval, the connection will be closed.",[48,47253,3931],{},[48,47255,47256],{},"Q: For external authorization, should I configure proxies or brokers?",[48,47258,47259],{},"A: Brokers are responsible for the authorization process, so you only need to configure it on the broker side.",[40,47261,2125],{"id":2122},[48,47263,47264],{},"For enterprises using Pulsar, adopting a proper security policy is essential to making their data safe and secure. I can imagine new Pulsar users may easily get overwhelmed by authentication, authorization, encryption and other security concepts. I hope this blog can help those new to Pulsar understand its pluggable security mechanism and benefit from some of the best practices I mentioned.",[40,47266,38376],{"id":38375},[48,47268,38379,47269,40419],{},[55,47270,38384],{"href":38382,"rel":47271},[264],[321,47273,47274,47284,47289,47293,47302,47311,47320],{},[324,47275,45457,47276,29496,47278,1154,47281,45209],{},[55,47277,39553],{"href":45460},[55,47279,45465],{"href":45463,"rel":47280},[264],[55,47282,45208],{"href":45206,"rel":47283},[264],[324,47285,38390,47286,190],{},[55,47287,31914],{"href":31912,"rel":47288},[264],[324,47290,45476,47291,45480],{},[55,47292,3550],{"href":45479},[324,47294,47295,758,47297],{},[2628,47296,42753],{},[55,47298,47301],{"href":47299,"rel":47300},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-authorization\u002F",[264],"Authentication and authorization in Pulsar",[324,47303,47304,758,47306],{},[2628,47305,42753],{},[55,47307,47310],{"href":47308,"rel":47309},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-kerberos\u002F",[264],"Authentication using Kerberos",[324,47312,47313,758,47316],{},[2628,47314,47315],{},"﻿Doc",[55,47317,47319],{"href":47115,"rel":47318},[264],"Authentication using JWT",[324,47321,47322,758,47324],{},[2628,47323,42753],{},[55,47325,47328],{"href":47326,"rel":47327},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002Fsecurity-extending\u002F",[264],"Extend authentication and authorization in Pulsar",{"title":18,"searchDepth":19,"depth":19,"links":47330},[47331,47337,47338,47342,47346,47347,47348,47349],{"id":46773,"depth":19,"text":46774,"children":47332},[47333,47334,47335,47336],{"id":46797,"depth":279,"text":46798},{"id":46808,"depth":279,"text":46809},{"id":46848,"depth":279,"text":46849},{"id":46859,"depth":279,"text":46860},{"id":46875,"depth":19,"text":46876},{"id":46927,"depth":19,"text":46928,"children":47339},[47340,47341],{"id":46934,"depth":279,"text":46935},{"id":46975,"depth":279,"text":46976},{"id":47017,"depth":19,"text":47018,"children":47343},[47344,47345],{"id":47024,"depth":279,"text":47025},{"id":47120,"depth":279,"text":47121},{"id":47177,"depth":19,"text":47178},{"id":47236,"depth":19,"text":47237},{"id":2122,"depth":19,"text":2125},{"id":38375,"depth":19,"text":38376},"2023-02-16","This blog introduces available security combinations in Pulsar and gives some best practices for implementing authorization and authentication.","\u002Fimgs\u002Fblogs\u002F640694ace2dfd06eaceb14a3_a-practical-guide-to-enterprise-grade-security-in-apache-pulsar.png",{},{"title":34047,"description":47351},"blog\u002Fa-practical-guide-to-enterprise-grade-security-in-apache-pulsar",[821,4301],"YlYsbz6YVC7ZSntQyJEMF2CUf8pqKam37ze67qXhyiU",{"id":47359,"title":47360,"authors":47361,"body":47362,"category":3550,"createdAt":290,"date":47350,"description":47519,"extension":8,"featured":294,"image":47520,"isDraft":294,"link":290,"meta":47521,"navigation":7,"order":296,"path":47522,"readingTime":7986,"relatedResources":290,"seo":47523,"stem":47524,"tags":47525,"__hash__":47526},"blogs\u002Fblog\u002Fnew-streamnative-cloud-feb-2023-audit-log-cluster-metrics.md","New to StreamNative Cloud [Feb 2023]: Audit Log, Cluster Metrics, and More",[41695],{"type":15,"value":47363,"toc":47511},[47364,47367,47370,47373,47380,47383,47394,47397,47405,47409,47421,47424,47426,47437,47445,47448,47452,47459,47462,47465,47468,47472,47480,47487,47489,47492,47494,47496,47504,47506,47509],[48,47365,47366],{},"We’re committed to providing enterprise-grade tooling that empowers teams to deliver sophisticated data streaming and event-driven architecture with Apache Pulsar. Whether you’re building new applications or future-proofing existing applications, StreamNative provides a versatile set of tools that are resilient and flexible for engineering teams of all sizes.",[48,47368,47369],{},"We’re rolling out some new capabilities in the next month that will help teams improve security and observability, and enable teams to operate Pulsar at scale.",[40,47371,33830],{"id":47372},"rest-api",[48,47374,47375,47376,47379],{},"We recently announced the ",[55,47377,33830],{"href":47378},"\u002Fblog\u002Fannouncing-the-streamnative-rest-api"," for Pulsar clusters on StreamNative Cloud. Connect to your Pulsar clusters using a RESTful interface, eliminating the dependency on specific client libraries.",[48,47381,47382],{},"Our Rest API allows you to:",[321,47384,47385,47388,47391],{},[324,47386,47387],{},"Effortlessly produce, consume, and acknowledge messages",[324,47389,47390],{},"Monitor the state of your Pulsar clusters",[324,47392,47393],{},"Perform a wide range of administrative actions",[48,47395,47396],{},"You can use the Rest API for use cases such as sending data to Pulsar from any application built in any language or ingesting messages into a stream processing framework that may not support Pulsar.",[48,47398,47399,47400,47404],{},"This feature is automatically enabled for Cloud console users. Private Cloud users can enable this feature by editing the PulsarBroker CR configuration. Learn more about the StreamNative Rest API in our ",[55,47401,41721],{"href":47402,"rel":47403},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fconnect-restapi",[264],". You can also check out this demo of the StreamNative Rest API in Action:",[40,47406,47408],{"id":47407},"pulsar-functions-on-cloud-in-beta","Pulsar Functions on Cloud in Beta",[48,47410,47411,47412,47416,47417,47420],{},"Leverage the full power of ",[55,47413,15627],{"href":47414,"rel":47415},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Ffunctions-overview\u002F",[264]," with StreamNative Cloud. ",[55,47418,47419],{"href":11302},"Pulsar Functions on StreamNative Cloud"," enable you to build real-time data pipelines for ETL jobs, event-driven applications, and simple data analytics applications.",[48,47422,47423],{},"By using Pulsar’s built-in framework instead of a separate stream processing engine, you can reduce architectural complexity and quickly deploy pipelines within StreamNative Cloud.",[48,47425,34330],{},[321,47427,47428,47431,47434],{},[324,47429,47430],{},"Real-time data analytics based on incoming Pulsar data such as fraud detection",[324,47432,47433],{},"Real-time data integration and transformation for AI feature extraction and machine learning model scoring",[324,47435,47436],{},"Event-driven pipelines based on Pulsar messages for Processing, verification, and notifications",[48,47438,10256,47439,47444],{},[55,47440,47443],{"href":47441,"rel":47442},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Ffunctions-overview#functions-on-cloud-overview",[264],"Functions on Cloud"," in our documentation page.",[48,47446,47447],{},"Watch our demo video on how to get started with Functions on StreamNative Cloud:",[40,47449,47451],{"id":47450},"audit-log","Audit Log",[48,47453,47454,47455,47458],{},"Now available: The StreamNative ",[55,47456,47451],{"href":41150,"rel":47457},[264]," lets you track and monitor administrative activity within your Pulsar clusters, tenants, namespaces, and topics. Identify which users or services performed specific actions, which resources were affected, and when the event occurred. You can also view the user's permission status for a comprehensive overview of your system’s activity.",[48,47460,47461],{},"Designed with security and compliance in mind, the Audit Log is particularly useful for customers with strict requirements. The Audit Log gives you the ability to track user and application access, identify abnormal behavior, and monitor for potential security risks.",[48,47463,47464],{},"The Audit Log supports a wide variety of audit event types, including the creation, updating, and deletion of clusters, tenants, namespaces, and topics. Each log entry contains detailed information about the event, event time, and permission status.",[48,47466,47467],{},"You can easily process and analyze the audit events stored in Pulsar topics using Pulsar clients, Pulsar CLI, Rest API, and sink connectors. This allows you to gain valuable insights into your system’s activity and take proactive measures to ensure the security and compliance of your organization.",[40,47469,47471],{"id":47470},"cluster-metrics-in-beta","Cluster Metrics in Beta",[48,47473,47474,47475,47479],{},"We’re excited to introduce our new Prometheus endpoint, which provides you with ",[55,47476,47478],{"href":33818,"rel":47477},[264],"real-time metrics of your Pulsar clusters",". By configuring your preferred observability tool (such as Grafana or NewRelic) to scrape this endpoint, you gain real-time visibility into your clusters and the ability to track performance over time.",[48,47481,47482,47483,47486],{},"Cluster Metrics enable you to collect metrics like the number of producers, consumers, throughput, message delivery, storage, and size of the backlog, among others. By analyzing trends and proactively maintaining your applications, you can ensure optimal performance and address potential issues.\n",[384,47484],{"alt":18,"src":47485},"\u002Fimgs\u002Fblogs\u002F63ee694f26a887cff9125d9b_Screen-Shot-2023-02-15-at-2.40.42-PM.png","The Cluster Metrics dashboard shares key cluster metrics at a glance.",[40,47488,2125],{"id":2122},[48,47490,47491],{},"With these new capabilities and enhancements, StreamNative Cloud provides more flexibility and control over your data, empowering you to make informed decisions. These features are a valuable addition to your toolkit, allowing you to streamline your workflows, boost productivity, and ultimately achieve greater business success.",[48,47493,41831],{},[40,47495,41835],{"id":41834},[48,47497,47498,47499,47503],{},"Learn more about upcoming StreamNative Cloud features and enhancements on our ",[55,47500,47502],{"href":20695,"rel":47501},[264],"docs site",", which also includes helpful tutorials and resources.",[48,47505,3931],{},[48,47507,47508],{},"The preceding information serves as a general guide for our product direction and should not be interpreted as a binding commitment to deliver any specific materials, code, or functionality. Please note that the development, release, timing, and pricing of any features or functionality described may be subject to change. We recommend that customers base their purchasing decisions on the services, features, and functions that are currently available.",[48,47510,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":47512},[47513,47514,47515,47516,47517,47518],{"id":47372,"depth":19,"text":33830},{"id":47407,"depth":19,"text":47408},{"id":47450,"depth":19,"text":47451},{"id":47470,"depth":19,"text":47471},{"id":2122,"depth":19,"text":2125},{"id":41834,"depth":19,"text":41835},"Check out the latest features coming to StreamNative Cloud that will help teams improve security and observability, and enable teams to operate Pulsar at scale.","\u002Fimgs\u002Fblogs\u002F63ee66cb3a4755166c77a694_Feb-2023-Features.png",{},"\u002Fblog\u002Fnew-streamnative-cloud-feb-2023-audit-log-cluster-metrics",{"title":47360,"description":47519},"blog\u002Fnew-streamnative-cloud-feb-2023-audit-log-cluster-metrics",[3550,821,4301,8058,26747],"gI9OUPRfeANvZA64wMwZyrBajo93--hzl6exKPAox9s",{"id":47528,"title":47529,"authors":47530,"body":47532,"category":3550,"createdAt":290,"date":47800,"description":47801,"extension":8,"featured":294,"image":47802,"isDraft":294,"link":290,"meta":47803,"navigation":7,"order":296,"path":11302,"readingTime":47804,"relatedResources":290,"seo":47805,"stem":47806,"tags":47807,"__hash__":47808},"blogs\u002Fblog\u002Fintroducing-pulsar-functions-on-streamnative-cloud.md","Introducing Pulsar Functions on StreamNative Cloud",[41695,47531,810,44843],"Thor Sigurjonsson",{"type":15,"value":47533,"toc":47786},[47534,47536,47539,47542,47545,47549,47561,47572,47575,47579,47582,47585,47588,47591,47602,47605,47609,47616,47620,47627,47631,47638,47642,47649,47653,47660,47664,47671,47675,47682,47685,47699,47705,47708,47722,47725,47739,47743,47746,47749,47751,47780,47782,47784],[40,47535,46],{"id":42},[48,47537,47538],{},"Pulsar Functions are now available on StreamNative Cloud. Leverage the full power of Pulsar Functions to build real-time data pipelines. This means you can quickly cover various messaging and streaming use cases, such as ETL pipelines, event-driven applications, and simple data analytics applications.",[48,47540,47541],{},"Simplify the creation and deployment of real-time data pipelines on StreamNative Cloud by using Pulsar's built-in framework, which requires less expertise and complexity to using an external stream processing engine.",[48,47543,47544],{},"In this blog, we look at what Pulsar Functions on StreamNative Cloud are, why you should use them, and how to get started.",[40,47546,47548],{"id":47547},"what-are-pulsar-functions","What are Pulsar Functions?",[48,47550,47551,47555,47556,47560],{},[55,47552,15627],{"href":47553,"rel":47554},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.10.x\u002Ffunctions-overview\u002F",[264]," are the computing infrastructure of the Pulsar messaging system. They are lightweight, Pulsar-native, ",[55,47557,47559],{"href":38992,"rel":47558},[264],"Lambda","-style functions that can:",[321,47562,47563,47566,47569],{},[324,47564,47565],{},"consume messages from one or more Topics.",[324,47567,47568],{},"apply user-defined processing logic to each message.",[324,47570,47571],{},"publish computation results to other Topics.",[48,47573,47574],{},"Pulsar Functions provide three delivery semantics, including at-most-once, at-least-once, and effectively-once. And Pulsar Functions support multiple programming languages, including Java, Python, and Go.",[40,47576,47578],{"id":47577},"what-are-the-use-cases-and-patterns","What are the use cases and patterns?",[48,47580,47581],{},"Pulsar Functions provide a highly flexible and modular architecture, similar to that of microservices. By breaking down your pipeline into smaller, more manageable pieces, teams can rapidly and frequently deliver new functionality, making it easier to evolve and adapt your technology stack.",[48,47583,47584],{},"Additionally, Pulsar Functions enable teams to focus on developing core business logic by abstracting away the interactions between the Pulsar cluster and user applications. You can achieve seamless integration of applications without requiring any changes to the applications themselves, by creating pipelines to adapt data flow between them. This also enables teams to move logic into these pipelines, which can then generate more valuable insights from the data.",[48,47586,47587],{},"In terms of use cases, Functions can be used to implement simple to complex messaging\u002Fstreaming patterns, real-time data pipelines, IoT systems, and analytics.",[48,47589,47590],{},"For example:",[321,47592,47593,47596,47599],{},[324,47594,47595],{},"Real-time data analytics based on incoming Pulsar data: Fraud detection, data aggregation, monitoring, and robot recommendation.",[324,47597,47598],{},"Real-time data integration and transformation: AI feature extraction and machine learning model scoring.",[324,47600,47601],{},"Event-driven pipelines based on Pulsar messages: Processing, verification, and notifications.",[48,47603,47604],{},"Here are some example patterns of how functions (f) can be implemented to perform basic yet essential message processing tasks. As depicted in these diagrams, functions are shown to have subscriptions (s) to input topics. To increase scalability, a shared subscription may also be used, allowing for an increase in the number of instances.",[32,47606,47608],{"id":47607},"content-filtering-pattern","Content Filtering Pattern",[48,47610,47611,47612,47615],{},"In a content filtering pattern, messages are received from an input topic, and then processed by a function. The function filters out any unwanted data or extracts relevant features and only passes on the relevant data to the output topic.\n",[384,47613],{"alt":18,"src":47614},"\u002Fimgs\u002Fblogs\u002F63e3c805748e304bac1c5646_content-filter.png","Figure1: Content filtering pattern, where (F) is a function and (s) is a subscription.",[32,47617,47619],{"id":47618},"message-filtering-pattern","Message Filtering Pattern",[48,47621,47622,47623,47626],{},"In a message filtering pattern, messages are received from an input topic, and then evaluated by a function. Only messages that meet certain criteria are passed on to the output topic.\n",[384,47624],{"alt":18,"src":47625},"\u002Fimgs\u002Fblogs\u002F63e3c8326cf0c3925fba30f1_message-filter.png","Figure 2. Message filtering pattern, where (f) is a function and (s) is a subscription.",[3933,47628,47630],{"id":47629},"enrichment-pattern","Enrichment Pattern",[48,47632,47633,47634,47637],{},"In an enrichment pattern, messages are received from an input topic, and then processed by a function. The function sends requests to an external service to augment the data in each message, and then passes the enriched messages on to the output topic.\n",[384,47635],{"alt":18,"src":47636},"\u002Fimgs\u002Fblogs\u002F63e3c88c2e9523926dc71e5d_encrich.png","Figure 3. Enrichment pattern, where (f) is a function and (s) is a subscription.",[32,47639,47641],{"id":47640},"routing-pattern","Routing Pattern",[48,47643,47644,47645,47648],{},"In a routing pattern, a function processes messages received from an input topic and routes them to other topics based on the specified routing criteria. The routing pattern is utilized in event auditing to record the processing and routing of events for monitoring system behavior.\n",[384,47646],{"alt":18,"src":47647},"\u002Fimgs\u002Fblogs\u002F63e3c8c2fa89c4332b62649d_router.png","Figure 4. Routing pattern, where (f) is a function and (s) is a subscription.",[32,47650,47652],{"id":47651},"gathering-pattern","Gathering Pattern",[48,47654,47655,47656,47659],{},"In a gathering pattern, a function merges messages received from multiple input topics and publishes them to a common output topic for downstream processing. This pattern is a critical component in machine learning workflows, as it collects and prepares data for model scoring.\n",[384,47657],{"alt":18,"src":47658},"\u002Fimgs\u002Fblogs\u002F63e3c9266fe91352be538cdb_gather.png","Figure 5. Gathering Pattern, where (f) is a function and (s) is a subscription.",[32,47661,47663],{"id":47662},"transformation-pattern","Transformation Pattern",[48,47665,47666,47667,47670],{},"In a transformation pattern, a function receives messages from an input topic, converts them into a different format, and then publishes them to an output topic.\n",[384,47668],{"alt":18,"src":47669},"\u002Fimgs\u002Fblogs\u002F63e3c95cfa815340fd3240ce_transform.png","Figure 6. Transformation Pattern, where (f) is a function and (s) is a subscription.",[40,47672,47674],{"id":47673},"comparing-functions-in-open-source-pulsar-vs-streamnative-cloud","Comparing Functions in Open-Source Pulsar vs StreamNative Cloud",[48,47676,47677,47678,47681],{},"When self-managing Pulsar Functions in open-source, ",[55,47679,44953],{"href":44951,"rel":47680},[264]," must be used for scheduling and running Pulsar Functions in production.",[48,47683,47684],{},"However, this approach has drawbacks:",[321,47686,47687,47690,47693,47696],{},[324,47688,47689],{},"Function workers are embedded in brokers, and function metadata is also stored in brokers. This can lead to a “noisy neighbor” effect on the brokers, and if the brokers become unavailable, the functions will fail to start. The recovery process will be manual, with a risk of losing function metadata.",[324,47691,47692],{},"To ensure reliable deployment of open-source Pulsar Functions, a visual tool to match topics and confirm connections must be built. Functions can then be deployed and managed individually through CLI or API.",[324,47694,47695],{},"There is no built-in autoscaling capability, meaning that managing loads of many Functions can be demanding as it requires manual workload distribution.",[324,47697,47698],{},"While it is possible to run Function Workers on Kubernetes, the Kubernetes runtime for Function Workers does not fully utilize cloud-native capabilities. All provisioning and scheduling responsibilities remain within the Function Worker, and the created Kubernetes resources such as StatefulSets, Services, and Secrets are not managed under a Kubernetes native abstraction. Additionally, this approach poses the risk of losing function metadata.",[48,47700,47701,47702,47704],{},"StreamNative Cloud addresses the limitations of using function workers in open-source Pulsar with its built-in, cloud-native Kubernetes operator, ",[55,47703,29463],{"href":44957},". This allows you to easily submit and manage Pulsar Functions using Kubernetes’ powerful deployment, scaling, and management capabilities.",[48,47706,47707],{},"The benefits of this improved workflow include:",[321,47709,47710,47713,47719],{},[324,47711,47712],{},"Enhanced reliability and stability: Function Mesh-based scheduling eliminates the function workers’ noisy impact on brokers, ensuring broker availability at all times and increasing the availability and security of metadata for all Pulsar Functions.",[324,47714,47715,47716],{},"Autoscaling: Function Mesh can support horizontally or vertically scaling the Function pods to meet the use case requirements with very few configurations, ",[36,47717,47718],{},"coming soon.",[324,47720,47721],{},"Streamlined management: Easily manage user-submitted functions across all namespaces.",[48,47723,47724],{},"In addition to the above benefits, using Pulsar Functions on StreamNative Cloud also offers:",[321,47726,47727,47730,47733,47736],{},[324,47728,47729],{},"Flexible user experience: Use all the familiar tools such as CLI (pulsar-admin, pulsarct), RESTful API, and UI to manage and update Pulsar Functions.",[324,47731,47732],{},"Reduced infrastructure and operations burden: Manage and implement lightweight computing operations with a standardized interface; avoid adding complexity with a larger overhead solution like Flink and Spark.",[324,47734,47735],{},"Enhanced security: with built-in OAuth2 authentication\u002Fauthorization to run Functions in a cloud environment.",[324,47737,47738],{},"Enhanced visibility: The UI allows you to quickly identify and address issues, with the option to leverage StreamNative’s expertise to accelerate debugging.",[40,47740,47742],{"id":47741},"get-started-with-functions-on-streamnative-cloud","Get started with Functions on StreamNative Cloud",[48,47744,47745],{},"StreamNative Cloud simplifies running Pulsar Functions in the cloud. Deploy your functions using the command line on your StreamNative Cloud cluster just as you would normally. Monitor all your functions in one place using the StreamNative Cloud Console, which provides a visual display of the current state of all of your functions.",[48,47747,47748],{},"Check out what you can do with Functions:",[40,47750,40413],{"id":36476},[321,47752,47753,47758,47769,47774],{},[324,47754,45216,47755,47757],{},[55,47756,38404],{"href":45219}," now.",[324,47759,47760,47761,1154,47765,45209],{},"Pulsar Summit Europe 2023 is taking place virtually on May 23rd. Engage with the community by ",[55,47762,47764],{"href":45463,"rel":47763},[264],"submitting a CFP",[55,47766,47768],{"href":45206,"rel":47767},[264],"becoming a community sponsor",[324,47770,45223,47771,45227],{},[55,47772,31914],{"href":31912,"rel":47773},[264],[324,47775,47776,47777,190],{},"Documentation: Read more about Pulsar Functions on StreamNative Cloud ",[55,47778,267],{"href":33752,"rel":47779},[264],[48,47781,3931],{},[48,47783,3931],{},[48,47785,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":47787},[47788,47789,47790,47797,47798,47799],{"id":42,"depth":19,"text":46},{"id":47547,"depth":19,"text":47548},{"id":47577,"depth":19,"text":47578,"children":47791},[47792,47793,47794,47795,47796],{"id":47607,"depth":279,"text":47608},{"id":47618,"depth":279,"text":47619},{"id":47640,"depth":279,"text":47641},{"id":47651,"depth":279,"text":47652},{"id":47662,"depth":279,"text":47663},{"id":47673,"depth":19,"text":47674},{"id":47741,"depth":19,"text":47742},{"id":36476,"depth":19,"text":40413},"2023-02-08","Learn how to easily build real-time data pipelines and quickly unlock messaging and streaming use cases such as ETL pipelines, event-driven applications, and simple data analytics applications.","\u002Fimgs\u002Fblogs\u002F63e682d80feecb4284d34e9b_Functions-on-Cloud.png",{},"15 min read",{"title":47529,"description":47801},"blog\u002Fintroducing-pulsar-functions-on-streamnative-cloud",[9636,3550,821,8058,303],"XwuvqRS2Csh-e0ZP0oxdUEMKDnhP2zwoNGms6mJXsZY",{"id":47810,"title":43257,"authors":47811,"body":47812,"category":821,"createdAt":290,"date":48129,"description":48130,"extension":8,"featured":294,"image":48131,"isDraft":294,"link":290,"meta":48132,"navigation":7,"order":296,"path":43256,"readingTime":4475,"relatedResources":290,"seo":48133,"stem":48134,"tags":48135,"__hash__":48136},"blogs\u002Fblog\u002Fpulsar-operators-tutorial-part-3-create-and-deploy-a-containerized-pulsar-client.md",[46122],{"type":15,"value":47813,"toc":48122},[47814,47818,47821,47825,47828,47834,47837,47842,47848,47851,47855,47860,47865,47870,47875,47880,47886,47891,47897,47902,47908,47913,47919,47924,47930,47935,47939,47944,47949,47954,47959,47964,47970,47975,47981,47986,47989,47993,47998,48004,48009,48015,48020,48026,48031,48037,48042,48048,48053,48060,48072,48074,48079],[916,47815,47816],{},[48,47817,46129],{},[48,47819,47820],{},"In this Part 3 blog, I will demonstrate how to containerize Pulsar client applications (producer and consumer) using Dockerfiles in VS Code. With Dockerfiles, we can build the container image in the local Docker daemon, test the image using docker run, tag the image and push it to the Docker registry. This is probably the most common approach for the cloud-native build process.",[40,47822,47824],{"id":47823},"preparation","Preparation",[48,47826,47827],{},"In this demo, I used python venv to control the Python version. The following code snippet shows how I created the Python environment and Python library before opening the folder using code .",[8325,47829,47832],{"className":47830,"code":47831,"language":8330},[8328],"mkdir cloudnative-pulsar\ncd cloudnative-pulsar\npython3 -m venv .py39\nsource .py39\u002Fbin\u002Factivate\npython -m pip install --upgrade pip\npip install pulsar-client==2.9.2\ngit init\ncode .\n",[4926,47833,47831],{"__ignoreMap":18},[48,47835,47836],{},"My initial VS Code interface looks like this. I also opened a terminal window to run kubectl or docker build in the same interface.",[48,47838,47839],{},[384,47840],{"alt":18,"src":47841},"\u002Fimgs\u002Fblogs\u002F63e0a6c3948a809295217670_HWCXgcmhkr22OCp_YD6e-Od5_ZGsVXe5RCmbQy9SGQjyIInq7ML2i6Ja9qf0ux5I9JzJqJigoWrBHUh9IfQ3ob-8qHKuHZ2yeJoP3hgwMQdKtSxYTv7VoMXMd_9sWOBy1KDoOSyhuLNgzbAqbToJwQ.png",[48,47843,38720,47844,47847],{},[55,47845,47846],{"href":43241},"Part 1",", I exposed the proxy Service as a load balancer (external IP). This way, I can connect to the broker on Kubernetes directly from my home network. I will use that same Pulsar cluster on Kubernetes for this demo.",[48,47849,47850],{},"Let’s get started with the Python producer and consumer.",[40,47852,47854],{"id":47853},"create-a-python-client","Create a Python client",[1666,47856,47857],{},[324,47858,47859],{},"Use ⇧⌘P to bring up the VS Code Command Palette, type “new,” and select “File: New Folder.” Add a folder called “producer.”",[48,47861,47862],{},[384,47863],{"alt":18,"src":47864},"\u002Fimgs\u002Fblogs\u002F63e0a6c34922abc447a7bda5_LVP2qT2n9ZTx-lrb3Hs-8gjokwc3apfKEPO099dTwQOhGnS1FZIhCDRjsKhMczItk8eGQNkVMMbxQzR2V0QrnSk-GvPX5PbODECKPlxwufIkjQCyX9OV2eJApPDmXxsuY8co8uXEQfmj7x89-b4XTg.png",[1666,47866,47867],{"start":19},[324,47868,47869],{},"You can use the icon in the project explorer to create a new Python file, like test_producer.py.",[48,47871,47872],{},[384,47873],{"alt":18,"src":47874},"\u002Fimgs\u002Fblogs\u002F63e0a6c3be9963b2d212d8bf_e486BMcOd5HEYhQOYjHIBSb6HZjJY2N6pAnodc988wznBXo3y3Z-d6_4oiPEYu_PVInOAjm6bQL5-XatthSpHXF8ZUliWYCAbLQw5ejn97wQSabpn2A7BRVKlrasKW3hHC6nqQixJHfBv4D_FmUHsQ.png",[1666,47876,47877],{"start":279},[324,47878,47879],{},"Before you create a client (either producer or consumer), you need to create a topic first. You can use either pulsar-admin or admin restful API to create topics. I used kubectl exec to run the pulsar-admin command in the broker container and created a topic using the VS Code terminal window.",[8325,47881,47884],{"className":47882,"code":47883,"language":8330},[8328],"kubectl exec -n sn-platform my-broker-0 -- bin\u002Fpulsar-admin topics create persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic1\n",[4926,47885,47883],{"__ignoreMap":18},[1666,47887,47888],{"start":20920},[324,47889,47890],{},"You should be able to see the topic created using the following command.",[8325,47892,47895],{"className":47893,"code":47894,"language":8330},[8328],"kubectl exec -n sn-platform my-broker-0 -- bin\u002Fpulsar-admin topics list public\u002Fdefault\n\"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic1\"\n",[4926,47896,47894],{"__ignoreMap":18},[1666,47898,47899],{"start":20934},[324,47900,47901],{},"Now we are ready to type some Python codes. Copy the following snippet to test_producer.py and save the file.",[8325,47903,47906],{"className":47904,"code":47905,"language":8330},[8328],"import pulsar\nclient = pulsar.Client('pulsar:\u002F\u002F10.0.0.36:6650')\nproducer = client.create_producer(\n   'persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic1',\n   block_if_queue_full=True,\n   batching_enabled=True,\n   batching_max_publish_delay_ms=10)\ndef producer_callback(res, msg_id):\n   print(f\"message published {msg_id}\")\ni = 0\nwhile i \n",[4926,47907,47905],{"__ignoreMap":18},[1666,47909,47910],{"start":20948},[324,47911,47912],{},"In the terminal, you can run the producer code like this. Note that my laptop can access the Pulsar proxy IP directly.",[8325,47914,47917],{"className":47915,"code":47916,"language":8330},[8328],"source .py39\u002Fbin\u002Factivate\npython producer\u002Ftest_producer.py\n2022-05-15 21:49:46.917 INFO  [0x104948580] ClientConnection:182 | [ -> pulsar:\u002F\u002F10.0.0.36:6650] Create ClientConnection, timeout=10000\n2022-05-15 21:49:46.918 INFO  [0x104948580] ConnectionPool:96 | Created connection for pulsar:\u002F\u002F10.0.0.36:6650\n2022-05-15 21:49:46.942 INFO  [0x16b81b000] ClientConnection:368 | [10.0.0.7:59912 -> 10.0.0.36:6650] Connected to broker\n2022-05-15 21:49:46.979 INFO  [0x16b81b000] HandlerBase:64 | [persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic1, ] Getting connection from pool\n2022-05-15 21:49:46.986 INFO  [0x16b81b000] ClientConnection:182 | [ -> pulsar:\u002F\u002F10.0.0.36:6650] Create ClientConnection, timeout=10000\n2022-05-15 21:49:46.986 INFO  [0x16b81b000] ConnectionPool:96 | Created connection for pulsar:\u002F\u002Fmy-broker-0.my-broker-headless.sn-platform.svc.cluster.local:6650\n2022-05-15 21:49:46.993 INFO  [0x16b81b000] ClientConnection:370 | [10.0.0.7:59913 -> 10.0.0.36:6650] Connected to broker through proxy. Logical broker: pulsar:\u002F\u002Fmy-broker-0.my-broker-headless.sn-platform.svc.cluster.local:6650\n2022-05-15 21:49:47.040 INFO  [0x16b81b000] ProducerImpl:188 | [persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic1, ] Created producer on broker [10.0.0.7:59913 -> 10.0.0.36:6650]\n\n",[4926,47918,47916],{"__ignoreMap":18},[1666,47920,47921],{"start":25801},[324,47922,47923],{},"The code returned without an error, but how do we know it published 1000 messages on the topic? Let’s use pulsar-admin to check the topic stats. Alternatively, you can use the restful endpoint to access the admin port 8080.",[8325,47925,47928],{"className":47926,"code":47927,"language":8330},[8328],"kubectl exec -n sn-platform my-broker-0 -- bin\u002Fpulsar-admin topics stats public\u002Fdefault\u002Fmy-topic1\nDefaulted container \"pulsar-broker\" out of: pulsar-broker, init-sysctl (init)\n{\n \"msgRateIn\" : 0.0,\n \"msgThroughputIn\" : 0.0,\n \"msgRateOut\" : 0.0,\n \"msgThroughputOut\" : 0.0,\n \"bytesInCounter\" : 14921,\n \"msgInCounter\" : 1000,\n \"bytesOutCounter\" : 0,\n \"msgOutCounter\" : 0,\n \"averageMsgSize\" : 0.0,\n \"msgChunkPublished\" : false,\n \"storageSize\" : 14921,\n \"backlogSize\" : 0,\n \"offloadedStorageSize\" : 0,\n \"lastOffloadLedgerId\" : 0,\n \"lastOffloadSuccessTimeStamp\" : 0,\n \"lastOffloadFailureTimeStamp\" : 0,\n \"publishers\" : [ ],\n \"waitingPublishers\" : 0,\n \"subscriptions\" : { },\n \"replication\" : { },\n \"deduplicationStatus\" : \"Disabled\",\n \"nonContiguousDeletedMessagesRanges\" : 0,\n \"nonContiguousDeletedMessagesRangesSerializedSize\" : 0,\n \"compaction\" : {\n   \"lastCompactionRemovedEventCount\" : 0,\n   \"lastCompactionSucceedTimestamp\" : 0,\n   \"lastCompactionFailedTimestamp\" : 0,\n   \"lastCompactionDurationTimeInMills\" : 0\n }\n}\n",[4926,47929,47927],{"__ignoreMap":18},[1666,47931,47932],{"start":25806},[324,47933,47934],{},"From the output, you can see that there are 1000 msgIn. With the client code ready, we can build the image and make this producer a Kubernetes Deployment in the next steps.",[40,47936,47938],{"id":47937},"containerize-and-deploy-the-producer-application","Containerize and deploy the producer application",[1666,47940,47941],{},[324,47942,47943],{},"Use the command palette to create a Dockerfile.",[48,47945,47946],{},[384,47947],{"alt":18,"src":47948},"\u002Fimgs\u002Fblogs\u002F63e0a6c3d96519218d795c1d_qzK6elwZaL1MIzg74r38LKTsIii6fVGbKqYrlkl_ByYjZpYxUBNNUhs8dUPEcz-kEBj-rLULmayYIDbHZZ7uYvzwMZ5zFHwHu7CaC_F79PHx1iF2yP1VF0gzQXg4zmhtc6S8XpNo5-tWcCXk9QI97A.png",[1666,47950,47951],{"start":19},[324,47952,47953],{},"You can find the Dockerfile and requirements.txt created in your project folder. I moved the files to the producer folder and modified the content to fit the folder because I wanted to create two images for the producer and the consumer, respectively. Don’t forget to put the Python dependency (pulsar-client==2.9.2) in the requirements.txt file.",[48,47955,47956],{},[384,47957],{"alt":18,"src":47958},"\u002Fimgs\u002Fblogs\u002F63e0a6c3b2bc1e48902270dc_OeGayIlORajgPngAeYJ_uLlNBo0xFVraEEMtunDstb9gOtmp7BmzXdWEpc7Xx-75qq6t5NxtM_6hfIHeg9yzDfTOTORUnYEcEAHgqGeLPkBfFGph0if4eVQ8T9dJYGN69aB-N17aVGZqciltYdYPHg.png",[1666,47960,47961],{"start":279},[324,47962,47963],{},"Now, we are ready to build the Docker image. Run the following command to build the image. Note that if you are using Mac M1, you need to specify the image platform to fit your Kubernetes worker OS (mine is Ubuntu). Also, remember to log in to your Docker Hub account. In my case, it is yuwsung1.",[8325,47965,47968],{"className":47966,"code":47967,"language":8330},[8328],"docker buildx build --platform linux\u002Famd64 . -t yuwsung1\u002Fpulsar-python-producer:v0.1\ndocker push yuwsung1\u002Fpulsar-python-producer:v0.1\n",[4926,47969,47967],{"__ignoreMap":18},[1666,47971,47972],{"start":20920},[324,47973,47974],{},"Once the image is pushed to Docker Hub, you can use kubectl to run the image as a container and check the topic stats for new messages.",[8325,47976,47979],{"className":47977,"code":47978,"language":8330},[8328],"kubectl run prod-test --image=yuwsung1\u002Fpulsar-python-producer:v0.1\nkubectl exec -n sn-platform my-broker-0 -- bin\u002Fpulsar-admin topics stats public\u002Fdefault\u002Fmy-topic1\nDefaulted container \"pulsar-broker\" out of: pulsar-broker, init-sysctl (init)\n{\n \"msgRateIn\" : 0.0,\n \"msgThroughputIn\" : 0.0,\n \"msgRateOut\" : 0.0,\n \"msgThroughputOut\" : 0.0,\n \"bytesInCounter\" : 29482,\n \"msgInCounter\" : 2000,\n \"bytesOutCounter\" : 0,\n \"msgOutCounter\" : 0,\n \"averageMsgSize\" : 0.0,\n \"msgChunkPublished\" : false,\n \"storageSize\" : 14921,\n \"backlogSize\" : 0,\n \"offloadedStorageSize\" : 0,\n \"lastOffloadLedgerId\" : 0,\n \"lastOffloadSuccessTimeStamp\" : 0,\n \"lastOffloadFailureTimeStamp\" : 0,\n \"publishers\" : [ ],\n \"waitingPublishers\" : 0,\n \"subscriptions\" : { },\n \"replication\" : { },\n \"deduplicationStatus\" : \"Disabled\",\n \"nonContiguousDeletedMessagesRanges\" : 0,\n \"nonContiguousDeletedMessagesRangesSerializedSize\" : 0,\n \"compaction\" : {\n   \"lastCompactionRemovedEventCount\" : 0,\n   \"lastCompactionSucceedTimestamp\" : 0,\n   \"lastCompactionFailedTimestamp\" : 0,\n   \"lastCompactionDurationTimeInMills\" : 0\n }\n}\n",[4926,47980,47978],{"__ignoreMap":18},[1666,47982,47983],{"start":20934},[324,47984,47985],{},"From the output above, you can see that the containerized producer published another 1000 messages on the topic.",[48,47987,47988],{},"However, this image is useless. The URL, topic name, and other producer properties are hard-coded in the Python code. Therefore, we need to set those properties to a Kubernetes ConfigMap and use a Deployment to mount the ConfigMap as container environment variables. Then in the Python code, we can import the OS module and read the environment variables to replace those properties.",[40,47990,47992],{"id":47991},"use-a-configmap-to-manage-producer-properties","Use a ConfigMap to manage producer properties",[1666,47994,47995],{},[324,47996,47997],{},"We can change the producer code by using environment variables:",[8325,47999,48002],{"className":48000,"code":48001,"language":8330},[8328],"import pulsar\nimport os\npulsar_url = os.environ.get('PULSAR_URL')\ntopic = os.environ.get('PULSAR_TOPIC')\nclient = pulsar.Client(pulsar_url)\nproducer = client.create_producer(\n   topic,\n   block_if_queue_full=True,\n   batching_enabled=True,\n   batching_max_publish_delay_ms=10)\ndef producer_callback(res, msg_id):\n   print(f\"message published {msg_id}\")\ni = 0\nwhile i \n",[4926,48003,48001],{"__ignoreMap":18},[1666,48005,48006],{"start":19},[324,48007,48008],{},"To test the code locally, I exported PULSAR_URL and TOPIC in my current local environment.",[8325,48010,48013],{"className":48011,"code":48012,"language":8330},[8328],"export PULSAR_URL='pulsar:\u002F\u002F10.0.0.36:6650'\nexport PULSAR_TOPIC='persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic1'\npython test_producer.py\nkubectl exec -n sn-platform my-broker-0 -- bin\u002Fpulsar-admin topics stats public\u002Fdefault\u002Fmy-topic1\nDefaulted container \"pulsar-broker\" out of: pulsar-broker, init-sysctl (init)\n{\n \"msgRateIn\" : 0.0,\n \"msgThroughputIn\" : 0.0,\n \"msgRateOut\" : 0.0,\n \"msgThroughputOut\" : 0.0,\n \"bytesInCounter\" : 44763,\n \"msgInCounter\" : 3000,\n \"bytesOutCounter\" : 0,\n \"msgOutCounter\" : 0,\n \"averageMsgSize\" : 0.0,\n \"msgChunkPublished\" : false,\n \"storageSize\" : 14921,\n \"backlogSize\" : 0,\n \"offloadedStorageSize\" : 0,\n \"lastOffloadLedgerId\" : 0,\n \"lastOffloadSuccessTimeStamp\" : 0,\n \"lastOffloadFailureTimeStamp\" : 0,\n \"publishers\" : [ ],\n \"waitingPublishers\" : 0,\n \"subscriptions\" : { },\n \"replication\" : { },\n \"deduplicationStatus\" : \"Disabled\",\n \"nonContiguousDeletedMessagesRanges\" : 0,\n \"nonContiguousDeletedMessagesRangesSerializedSize\" : 0,\n \"compaction\" : {\n   \"lastCompactionRemovedEventCount\" : 0,\n   \"lastCompactionSucceedTimestamp\" : 0,\n   \"lastCompactionFailedTimestamp\" : 0,\n   \"lastCompactionDurationTimeInMills\" : 0\n }\n}\n",[4926,48014,48012],{"__ignoreMap":18},[1666,48016,48017],{"start":279},[324,48018,48019],{},"Let’s rebuild the image with a new tag.",[8325,48021,48024],{"className":48022,"code":48023,"language":8330},[8328],"docker buildx build --platform linux\u002Famd64 . -t yuwsung1\u002Fpulsar-python-producer:v0.2\ndocker push yuwsung1\u002Fpulsar-python-producer:v0.2\n",[4926,48025,48023],{"__ignoreMap":18},[1666,48027,48028],{"start":20920},[324,48029,48030],{},"Note that the v2 needs PULSAR_URL and PULSAR_TOPIC ingested into the producer Pod. The followings are my ConfigMap and Deployment manifests for your reference.",[8325,48032,48035],{"className":48033,"code":48034,"language":8330},[8328],"apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: pulsar-producer-config\ndata:\n pulsar_url: \"pulsar:\u002F\u002F10.0.0.36:6650\"\n topic: \"my-topic1\"\n---\napiVersion: apps\u002Fv1\nkind: Deployment\nmetadata:\n name: my-producer\nspec:\n selector:\n   matchLabels:\n     app: my-producer\n replicas: 1\n template:\n   metadata:\n     labels:\n       app: my-producer\n   spec:\n     containers:\n     - name: pulsar-producer\n       image: yuwsung1\u002Fpulsar-python-producer:v0.2\n       resources:\n         limits:\n           cpu: \"500m\"\n           memory: \"128Mi\"\n       env:\n         - name: PULSAR_URL\n           valueFrom:\n             configMapKeyRef:\n               name: pulsar-producer-config\n               key: pulsar_url\n         - name: PULSAR_TOPIC\n           valueFrom:\n             configMapKeyRef:\n               name: pulsar-producer-config\n               key: topic\n",[4926,48036,48034],{"__ignoreMap":18},[1666,48038,48039],{"start":20934},[324,48040,48041],{},"Use kubectl to deploy the ConfigMap and the Deployment, then check the topic stats.",[8325,48043,48046],{"className":48044,"code":48045,"language":8330},[8328],"kubectl apply -f pulsar-producer.yaml\nkubectl exec -n sn-platform my-broker-0 -- bin\u002Fpulsar-admin topics stats public\u002Fdefault\u002Fmy-topic1\nDefaulted container \"pulsar-broker\" out of: pulsar-broker, init-sysctl (init)\n{\n \"msgRateIn\" : 0.0,\n \"msgThroughputIn\" : 0.0,\n \"msgRateOut\" : 0.0,\n \"msgThroughputOut\" : 0.0,\n \"bytesInCounter\" : 59684,\n \"msgInCounter\" : 4000,\n \"bytesOutCounter\" : 0,\n \"msgOutCounter\" : 0,\n \"averageMsgSize\" : 0.0,\n \"msgChunkPublished\" : false,\n \"storageSize\" : 14921,\n \"backlogSize\" : 0,\n \"offloadedStorageSize\" : 0,\n \"lastOffloadLedgerId\" : 0,\n \"lastOffloadSuccessTimeStamp\" : 0,\n \"lastOffloadFailureTimeStamp\" : 0,\n \"publishers\" : [ ],\n \"waitingPublishers\" : 0,\n \"subscriptions\" : { },\n \"replication\" : { },\n \"deduplicationStatus\" : \"Disabled\",\n \"nonContiguousDeletedMessagesRanges\" : 0,\n \"nonContiguousDeletedMessagesRangesSerializedSize\" : 0,\n \"compaction\" : {\n   \"lastCompactionRemovedEventCount\" : 0,\n   \"lastCompactionSucceedTimestamp\" : 0,\n   \"lastCompactionFailedTimestamp\" : 0,\n   \"lastCompactionDurationTimeInMills\" : 0\n }\n}\n",[4926,48047,48045],{"__ignoreMap":18},[1666,48049,48050],{"start":20948},[324,48051,48052],{},"From the above output, you can see that the new containerized producer mounted the Pulsar URL and topic name from the ConfigMap and produced another 1000 messages (4000 in total) to the topic.",[48,48054,48055,48056,190],{},"Now we can follow the same steps to create a consumer container image and the corresponding Deployment. I will skip the steps in this tutorial, but you can find the consumer code in my",[55,48057,41807],{"href":48058,"rel":48059},"https:\u002F\u002Fgithub.com\u002Fyuweisung\u002Fpulsa-python",[264],[48,48061,48062,48063,5422,48066,48071],{},"In the next blog, we will discuss using a Cloud Native Builder, ",[55,48064,46150],{"href":46148,"rel":48065},[264],[55,48067,48070],{"href":48068,"rel":48069},"https:\u002F\u002Fargo-cd.readthedocs.io\u002Fen\u002Fstable\u002F",[264],"ArgoCD"," to auto-build the container images.",[40,48073,38376],{"id":38375},[48,48075,38379,48076,40419],{},[55,48077,38384],{"href":38382,"rel":48078},[264],[321,48080,48081,48089,48094,48098,48104,48110,48116],{},[324,48082,45457,48083,48085,48086,20076],{},[55,48084,39553],{"href":45460}," to learn more and ",[55,48087,45465],{"href":45463,"rel":48088},[264],[324,48090,38390,48091,190],{},[55,48092,31914],{"href":31912,"rel":48093},[264],[324,48095,45476,48096,45480],{},[55,48097,3550],{"href":45479},[324,48099,48100,758,48102],{},[2628,48101,46310],{},[55,48103,43242],{"href":43241},[324,48105,48106,758,48108],{},[2628,48107,40436],{},[55,48109,43249],{"href":43151},[324,48111,48112,758,48114],{},[2628,48113,46310],{},[55,48115,46332],{"href":46331},[324,48117,48118,758,48120],{},[2628,48119,46310],{},[55,48121,43234],{"href":43233},{"title":18,"searchDepth":19,"depth":19,"links":48123},[48124,48125,48126,48127,48128],{"id":47823,"depth":19,"text":47824},{"id":47853,"depth":19,"text":47854},{"id":47937,"depth":19,"text":47938},{"id":47991,"depth":19,"text":47992},{"id":38375,"depth":19,"text":38376},"2023-02-06","Learn how to containerize Pulsar client apps using Dockerfiles in VS Code.","\u002Fimgs\u002Fblogs\u002F64069586fef8738367d22309_pulsar-operators-tutorial-part-3-create-and-deploy-a-containerized-pulsar-client.jpg",{},{"title":43257,"description":48130},"blog\u002Fpulsar-operators-tutorial-part-3-create-and-deploy-a-containerized-pulsar-client",[38442,821,16985],"T44vKRxy_FlCNOcaFroc7BtKNR2Vv4uwNF3X6z5xVFk",{"id":48138,"title":48139,"authors":48140,"body":48141,"category":821,"createdAt":290,"date":48293,"description":48294,"extension":8,"featured":294,"image":48295,"isDraft":294,"link":290,"meta":48296,"navigation":7,"order":296,"path":48297,"readingTime":11180,"relatedResources":290,"seo":48298,"stem":48299,"tags":48300,"__hash__":48301},"blogs\u002Fblog\u002Fapache-pulsar-hits-its-600th-contributor.md","Apache Pulsar Hits Its 600th Contributor",[41185],{"type":15,"value":48142,"toc":48284},[48143,48151,48154,48158,48162,48165,48171,48175,48178,48184,48188,48195,48199,48202,48227,48241,48243,48249,48267,48282],[48,48144,48145,48146,48150],{},"The Apache Pulsar community embraced a significant milestone last month as the project witnessed its 600th contributor to the ",[55,48147,48149],{"href":36230,"rel":48148},[264],"Pulsar main GitHub repository",". We would like to thank everyone in the Pulsar community who contributed to this remarkable achievement.",[48,48152,48153],{},"Since Pulsar’s graduation as a Top-Level Project (TLP) in September 2018, it has been driven by an active global community, with 160+ releases, 11K+ commits from 600 contributors, 12.2K+ stars, 3.2K+ forks, and 9600+ Slack users.",[40,48155,48157],{"id":48156},"strong-community-growth","Strong community growth",[32,48159,48161],{"id":48160},"_600-contributors","600 contributors",[48,48163,48164],{},"The number of contributors is an important metric to measure the health of an open-source project. In the last year alone, we added almost 130 contributors to the project, a 28% increase from the previous year. The image below shows the number of Pulsar contributors over the past 6 years.",[48,48166,24328,48167,48170],{},[384,48168],{"alt":18,"src":48169},"\u002Fimgs\u002Fblogs\u002F63dc6777daa1f9db453e4601_4eJf5r7QQv9EgmvM59AZvoeyC46IFjfW5WI8siz-erlxpk31v4LrWbau3aI03KEwWeJQyORz8h3Pt_gCx0jIuYbU-4C6pnJXY98UzV7qnhw9Q00Gt969LR-hzx7V9Asjs_f0AbQnDvNVFpo0g8T8GFE.jpeg","Figure 1. Pulsar GitHub contributors",[32,48172,48174],{"id":48173},"monthly-active-contributors-of-pulsar-and-kafka","Monthly active contributors of Pulsar and Kafka",[48,48176,48177],{},"Both Pulsar and Kafka are popular streaming systems with contributors across the globe and are adopted by organizations spanning different industries. Although Kafka outnumbers Pulsar in the total number of contributors, the latter surpassed the former in terms of monthly active contributors about 2 years ago and has maintained a strong momentum since then.",[48,48179,24328,48180,48183],{},[384,48181],{"alt":18,"src":48182},"\u002Fimgs\u002Fblogs\u002F63dc6777644aa9839308104f_1dQG-dv9hqelKA5sJpc3u3tQSwwXnHa9RunSPUQqdc5TYlT4zIQ4fbHnecw8tVMc1Z9zr7VloCiE5getBFmUppf2ONbehRI7PA9MopzNkhGSMOvugrsxxKMVp0SnCDlQHG-yYBCioc2CjBhGMI0StZE.jpeg","Figure 2. Pulsar vs. Kafka - Monthly active contributors",[32,48185,48187],{"id":48186},"_12k-github-stars","12K+ GitHub stars",[48,48189,48190,48191,48194],{},"GitHub stars are another key metric for open-source projects. Figure 3 displays the star history of Pulsar since its inception.\n",[384,48192],{"alt":18,"src":48193},"\u002Fimgs\u002Fblogs\u002F63dc677743a1ee58c92b91b8_evmGFF7_e4_sZcQFwQD_q3ezZ9EvnwYDC2HCHi23Wqkwe8yj2D7BF60qW6aUjqEDCWdCR994mu_1LFBN9kfAbCfMa9NRYDd1Dc-BLA2rdCE8PFDpDJLnBImlF8xJykKJeJ_Xa0rQxc9NSzMyltjCL08.png","Figure 3. Pulsar GitHub stars",[40,48196,48198],{"id":48197},"pulsar-adoption","Pulsar adoption",[48,48200,48201],{},"As the project achieves strong growth in contributors, it also sees widespread adoption by companies across industries. Their success stories speak volumes about a more stable and secure project capable of powering different use cases in the messaging and streaming space.",[48,48203,48204,48205,4003,48208,48211,48212,48216,48217,4003,48221,48226],{},"Pulsar has played an essential role in handling mission-critical workloads for both existing and new users. Tencent, one of the earliest companies to adopt Pulsar in production, has been consistently working to ",[55,48206,48207],{"href":43591},"improve the project for better stability",[55,48209,48210],{"href":43583},"shared their experience of handling 100 billion messages per day",". New adopters include Nippon Telegraph and Telephone Corporation (NTT) Software Innovation Center, which ",[55,48213,48215],{"href":48214},"\u002Fblog\u002Fhandling-100k-consumers-with-one-pulsar-topic","uses a single Pulsar topic to handle 100K consumers for its IoT use case",". Some organizations migrated from systems like Kafka to Pulsar, such as ",[55,48218,48220],{"href":48219},"\u002Fsuccess-stories\u002Fsina-weibo","Sina Weibo",[55,48222,48225],{"href":48223,"rel":48224},"https:\u002F\u002Fwww.mparticle.com\u002Fblog\u002Fapache-pulsar-migration\u002F",[264],"mParticle",". They select Pulsar not just for its flexibility, scalability, high availability, and unique architecture. More importantly, Pulsar solves the problems and pain points where other systems fall short.",[48,48228,48229,48230,48235,48236,190],{},"For more information, see this ",[55,48231,48234],{"href":48232,"rel":48233},"https:\u002F\u002Fpulsar.apache.org\u002Fpowered-by\u002F",[264],"list of companies using or contributing to Pulsar"," and check out ",[55,48237,48240],{"href":48238,"rel":48239},"https:\u002F\u002Fpulsar.apache.org\u002Fcase-studies\u002F",[264],"how different organizations are using Pulsar",[40,48242,39647],{"id":39646},[48,48244,48245,48246,190],{},"Backed by a diverse community of contributors, we believe that each and every pull request counts and would like to see more contributors join the journey. To start making your contribution to the project, see the ",[55,48247,36224],{"href":36222,"rel":48248},[264],[48,48250,48251,48252,4003,48255,48258,48259,48262,48263,190],{},"To stay up to date with community news and discuss hot topics with other members, you can subscribe to the Pulsar mailing lists for ",[55,48253,6986],{"href":48254},"mailto:users-subscribe@pulsar.apache.org",[55,48256,43694],{"href":48257},"mailto:dev-subscribe@pulsar.apache.org",", follow us on ",[55,48260,39691],{"href":36236,"rel":48261},[264],", and join the ",[55,48264,48266],{"href":31692,"rel":48265},[264],"Pulsar Slack workspace",[48,48268,48269,48270,48275,48276,1154,48279,45209],{},"The Pulsar community hosts events, meetups, and webinars for Pulsar users of all experience levels. Check out the ",[55,48271,48274],{"href":48272,"rel":48273},"https:\u002F\u002Fpulsar.apache.org\u002Fevents\u002F",[264],"Events"," page and join different user groups to stay tuned. Pulsar Summit Europe 2023 is taking place virtually on May 23rd. Engage with the community by ",[55,48277,47764],{"href":45463,"rel":48278},[264],[55,48280,47768],{"href":45206,"rel":48281},[264],[48,48283,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":48285},[48286,48291,48292],{"id":48156,"depth":19,"text":48157,"children":48287},[48288,48289,48290],{"id":48160,"depth":279,"text":48161},{"id":48173,"depth":279,"text":48174},{"id":48186,"depth":279,"text":48187},{"id":48197,"depth":19,"text":48198},{"id":39646,"depth":19,"text":39647},"2023-02-03","The Apache Pulsar community embraced a significant milestone last month as the project witnessed its 600th contributor to the Pulsar main GitHub repository.","\u002Fimgs\u002Fblogs\u002F6406960d420530c041d4df3f_apache-pulsar-hits-its-600th-contributor.png",{},"\u002Fblog\u002Fapache-pulsar-hits-its-600th-contributor",{"title":48139,"description":48294},"blog\u002Fapache-pulsar-hits-its-600th-contributor",[302,821],"Ssd41_9jqz14hGVFoORyFDRPGqMV0zbyRyeK-sVWQ70",{"id":48303,"title":48304,"authors":48305,"body":48306,"category":7338,"createdAt":290,"date":48434,"description":48435,"extension":8,"featured":294,"image":48436,"isDraft":294,"link":290,"meta":48437,"navigation":7,"order":296,"path":45460,"readingTime":11508,"relatedResources":290,"seo":48438,"stem":48439,"tags":48440,"__hash__":48441},"blogs\u002Fblog\u002Fannouncing-pulsar-virtual-summit-europe-2023-cfp-is-now-open.md","Announcing Pulsar Virtual Summit Europe 2023: CFP Is Now Open!",[40485],{"type":15,"value":48307,"toc":48426},[48308,48311,48313,48315,48317,48321,48323,48326,48343,48346,48350,48353,48361,48364,48368,48385,48389,48392,48395,48405,48411,48413,48415,48417,48420],[48,48309,48310],{},"We’re excited to announce that Pulsar Virtual Summit Europe 2023 will take place on Tuesday, May 23rd, 2023! We welcome your participation to help make the event a success, by submitting a talk for the event or offering sponsorship. Learn more about Pulsar Summit and the opportunities available to speak and sponsor the summit below.",[40,48312,39731],{"id":39730},[48,48314,39734],{},[48,48316,39737],{},[40,48318,48320],{"id":48319},"join-us-and-speak-at-pulsar-virtual-summit-europe-2023","Join us and speak at Pulsar Virtual Summit Europe 2023",[48,48322,39744],{},[48,48324,48325],{},"We’re looking for Pulsar stories that are innovative, informative, or thought-provoking. Here are some suggestions based on previous events:",[321,48327,48328,48331,48334,48337,48340],{},[324,48329,48330],{},"A Pulsar success story or case study",[324,48332,48333],{},"Use cases, operations, tools, techniques, or the Pulsar ecosystem",[324,48335,48336],{},"A deep dive into technologies",[324,48338,48339],{},"Best practices and lessons learned",[324,48341,48342],{},"Anything else related to Pulsar that inspires the audience",[48,48344,48345],{},"Interested in speaking at the summit?",[48,48347,48348],{},[34077,48349],{"value":34079},[48,48351,48352],{},"All levels of talks (beginner, intermediate, and advanced) are welcome. Remember to keep your proposal short, relevant, and engaging. The following session formats are acceptable:",[321,48354,48355,48358],{},[324,48356,48357],{},"Session Presentation: 30-minute presentation, can include demo and\u002For Q&A session",[324,48359,48360],{},"Lighting Talk:  20-minute presentation, no Q&A session",[48,48362,48363],{},"All accepted submissions will be pre-recorded. Due to time zone and network limitations, we do not recommend speakers present their talk live.",[40,48365,48367],{"id":48366},"event-dates-to-remember","Event dates to remember",[321,48369,48370,48373,48376,48379,48382],{},[324,48371,48372],{},"CFP opens: February 1st, 2023",[324,48374,48375],{},"CFP closes: March 3rd, 2023",[324,48377,48378],{},"CFP notifications: March 24th, 2023",[324,48380,48381],{},"Schedule announcement: March 31st, 2023",[324,48383,48384],{},"Event date: May 23rd, 2023",[40,48386,48388],{"id":48387},"community-sponsorships-available","Community sponsorships available",[48,48390,48391],{},"We invite you to participate as a Community Sponsor for Pulsar Virtual Summit Europe 2023. Sponsoring this event provides an excellent opportunity for your organization to further engage and connect with the quickly growing Pulsar and streaming communities.",[48,48393,48394],{},"Community Sponsorships for Pulsar Virtual Summit Europe 2023 include your company logo on the pulsarsummit.org website and on-screen during the welcome introduction presentation, as well as opportunities for you to help promote the event.",[48,48396,48397,48398,48402,48403,190],{},"To secure your no-fee Community Sponsorship, please submit your response ",[55,48399,267],{"href":48400,"rel":48401},"https:\u002F\u002Fshare.hsforms.com\u002F1q-62tijTRQORni4ngLBe3A3x5r4",[264],". The full Sponsorship Prospectus will be made available with your inquiry. For more information on becoming a sponsor, please contact the Pulsar Summit event organizers at ",[55,48404,39814],{"href":39813},[48,48406,48407,48408,39824],{},"Help us make #PulsarSummit Europe 2023 successful by spreading the word and submitting your proposal and sponsorship! Follow us on Twitter (",[55,48409,39823],{"href":39821,"rel":48410},[264],[40,48412,39828],{"id":39827},[48,48414,39831],{},[40,48416,39835],{"id":39834},[48,48418,48419],{},"StreamNative is proud to host Pulsar Virtual Summit Europe 2023. Founded by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative builds a cloud-native event streaming platform that enables enterprises to easily access data as real-time event streams. As the core developers of Pulsar, the StreamNative team is deeply versed in the technology, the community, and the use cases. Today, StreamNative is focusing on growing the Apache Pulsar and BookKeeper communities and bringing its deep experience across diverse Pulsar use cases to companies across the globe.",[48,48421,48422,48423,39859],{},"Want to stay informed of the latest developments regarding Pulsar Virtual Summit Europe 2023? ",[55,48424,39858],{"href":39856,"rel":48425},[264],{"title":18,"searchDepth":19,"depth":19,"links":48427},[48428,48429,48430,48431,48432,48433],{"id":39730,"depth":19,"text":39731},{"id":48319,"depth":19,"text":48320},{"id":48366,"depth":19,"text":48367},{"id":48387,"depth":19,"text":48388},{"id":39827,"depth":19,"text":39828},{"id":39834,"depth":19,"text":39835},"2023-02-01","Join the Apache Pulsar community for this exciting one-day virtual event! Call for papers is now open, or join as a Community Sponsor.","\u002Fimgs\u002Fblogs\u002F63ea8327dae10eb7e1e24bd8_OpenGraph.png",{},{"title":48304,"description":48435},"blog\u002Fannouncing-pulsar-virtual-summit-europe-2023-cfp-is-now-open",[5376,821],"8ljBTDJC7ISeGDedC7vyIg4sLD26Q1HyEcgHVIZeH9I",{"id":48443,"title":48444,"authors":48445,"body":48446,"category":3550,"createdAt":290,"date":48563,"description":48564,"extension":8,"featured":294,"image":48565,"isDraft":294,"link":290,"meta":48566,"navigation":7,"order":296,"path":47378,"readingTime":47804,"relatedResources":290,"seo":48567,"stem":48568,"tags":48569,"__hash__":48570},"blogs\u002Fblog\u002Fannouncing-the-streamnative-rest-api.md","Announcing the StreamNative Rest API",[41695,44843],{"type":15,"value":48447,"toc":48557},[48448,48451,48454,48462,48466,48469,48472,48475,48478,48482,48485,48496,48498,48515,48519,48521,48553,48555],[48,48449,48450],{},"We are excited to announce StreamNative Cloud now supports a RESTful interface to Pulsar clusters.",[48,48452,48453],{},"Seamlessly connect to your Pulsar clusters using a straightforward Rest API, eliminating the dependency on any specific client libraries:",[321,48455,48456,48458,48460],{},[324,48457,47387],{},[324,48459,47390],{},[324,48461,47393],{},[40,48463,48465],{"id":48464},"why-develop-the-streamnative-rest-api","Why develop the StreamNative Rest API",[48,48467,48468],{},"Previously, developers were required to use the Pulsar TCP protocol and a client library to move data in and out of Pulsar. However, if a suitable client library was unsupported, developers were restricted. The Rest API offers a straightforward alternative, particularly for those with simple use cases, where basic API to Pulsar is preferable.",[48,48470,48471],{},"REST is a widely used, user-friendly API for web services with the advantage of format-agonistic and client-server separation. The HTTP protocol-based Rest API does not require library installations.",[48,48473,48474],{},"By implementing the Rest API, StreamNative enables customers to access Pulsar clusters in a flexible, scalable manner, while also enjoying the simplicity of REST.",[48,48476,48477],{},"REST interfaces also allow for automation with DevOps and automation tools, making it easy to integrate Pulsar for event messaging, alerting and data distribution.   The decoupled design offers an integration solution for Pulsar access.",[40,48479,48481],{"id":48480},"supported-features-and-use-cases","Supported features and use cases",[48,48483,48484],{},"Rest API supports:",[321,48486,48487,48490,48493],{},[324,48488,48489],{},"Support on both non-partitioned and partitioned topics.",[324,48491,48492],{},"Produce, consume, and acknowledge messages through a RESTful interface without using the native Pulsar protocol or clients.",[324,48494,48495],{},"Support for basic and Avro base struct schema.",[48,48497,34330],{},[321,48499,48500,48503,48506,48509,48512],{},[324,48501,48502],{},"Send data to Pulsar from any frontend application built in any language.",[324,48504,48505],{},"Integrate Pulsar with existing automation tools.",[324,48507,48508],{},"Ingest Pulsar data into corporate dashboards and monitoring systems.",[324,48510,48511],{},"Provide instant access to data in motion for data scientist notebooks.",[324,48513,48514],{},"Ingest messages into a stream processing framework that may not support Pulsar.",[40,48516,48518],{"id":48517},"see-rest-api-in-action","See Rest API in action",[40,48520,40413],{"id":36476},[321,48522,48523,48534,48540,48545],{},[324,48524,48525,48526,48529,48530,190],{},"Documentation: Get started ",[55,48527,267],{"href":47402,"rel":48528},[264],". Learn more about our new ",[55,48531,48533],{"href":33836,"rel":48532},[264],"REST API here",[324,48535,48536,48537,190],{},"Learn the Pulsar Fundamentals: Developed by the original creators of Pulsar, sign-up with ",[55,48538,31914],{"href":31912,"rel":48539},[264],[324,48541,48542,48543,190],{},"Make an inquiry: We’re here to ensure our Pulsar deployment is a success, ",[55,48544,38404],{"href":45219},[324,48546,47760,48547,1154,48550,45209],{},[55,48548,47764],{"href":45463,"rel":48549},[264],[55,48551,47768],{"href":45206,"rel":48552},[264],[48,48554,3931],{},[48,48556,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":48558},[48559,48560,48561,48562],{"id":48464,"depth":19,"text":48465},{"id":48480,"depth":19,"text":48481},{"id":48517,"depth":19,"text":48518},{"id":36476,"depth":19,"text":40413},"2023-01-31","Seamlessly connect to your Pulsar clusters using a straightforward Rest API, eliminating the dependency on any specific client libraries.","\u002Fimgs\u002Fblogs\u002F63d875070441433f3a1dc715_REST-API.png",{},{"title":48444,"description":48564},"blog\u002Fannouncing-the-streamnative-rest-api",[3550,821,4301],"J9cMJ6oubDwPSRIBI3fDnZUIS_ule5hwJ940E94qzA8",{"id":48572,"title":48573,"authors":48574,"body":48576,"category":821,"createdAt":290,"date":48917,"description":48918,"extension":8,"featured":294,"image":48919,"isDraft":294,"link":290,"meta":48920,"navigation":7,"order":296,"path":45191,"readingTime":11508,"relatedResources":290,"seo":48921,"stem":48922,"tags":48923,"__hash__":48924},"blogs\u002Fblog\u002Fannouncing-hudi-sink-connector-for-pulsar.md","Announcing the Hudi Sink Connector for Apache Pulsar",[48575],"Yong Zhang",{"type":15,"value":48577,"toc":48908},[48578,48581,48587,48595,48599,48607,48613,48617,48620,48623,48626,48630,48633,48644,48648,48651,48653,48656,48678,48681,48686,48692,48697,48703,48709,48713,48718,48724,48729,48735,48740,48746,48751,48757,48762,48768,48773,48779,48782,48788,48791,48796,48802,48807,48812,48814,48820,48823,48828,48834,48839,48845,48848,48854,48858,48861,48903],[48,48579,48580],{},"We’re excited to announce the general availability of the Hudi Sink connector for Apache Pulsar. The connector enables seamless integration between Apache Hudi and Apache Pulsar, improving the diversity of the Apache Pulsar ecosystem. The Hudi + Pulsar connector offers a convenient, efficient, and flexible approach to moving data from Pulsar to Hudi without requiring user code.",[48,48582,48583,48584,190],{},"For more information on why lakehouse technologies are growing in popularity, check out ",[55,48585,48586],{"href":45187},"this blog",[48,48588,48589,48590,48594],{},"See the Hudi Sink connector in action during the Pulsar Summit SF 2022 talk ",[55,48591,48593],{"href":48592},"\u002Fvideos\u002Fpulsar-summit-san-francisco-2022-ecosystem-unlocking-the-power-of-lakehouse-architectures-with-apache-pulsar-and-apache-hudi","Unlocking the Power of Lakehouse Architectures with Pulsar and Hudi"," from Addison Higham (Chief Architect, StreamNative) and Alexey Kudinkin (Founding Engineer, Onehouse).",[40,48596,48598],{"id":48597},"what-is-the-hudi-sink-connector","What is the Hudi Sink connector?",[48,48600,3600,48601,48606],{},[55,48602,48605],{"href":48603,"rel":48604},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Flakehouse-sink\u002F2.9.2",[264],"Hudi Sink connector"," is a Pulsar IO connector that pulls data from Apache Pulsar topics and persists data to Hudi tables.",[48,48608,48609],{},[384,48610],{"alt":48611,"src":48612},"Diagram of how Pulsar connects to Hudi","\u002Fimgs\u002Fblogs\u002F63d2bf09f597090b196ee0a8_Pulsar-Hudi-sink-connector.png",[40,48614,48616],{"id":48615},"why-develop-the-hudi-sink-connector","Why develop the Hudi Sink connector?",[48,48618,48619],{},"In the last 5 years, the rise of streaming data and the need for lower data latency have pushed data lakes to their limits. As a result, lakehouse technologies such as Apache Hudi have seen rapid adoption. Apache Pulsar, a distributed, open-source pub-sub messaging and streaming platform for real-time workloads, is a natural fit for lakehouse architectures. Integrating Apache Pulsar with Lakehouse streamlines data lifecycle management and data analysis.",[48,48621,48622],{},"StreamNative built the Hudi Sink Connector to provide Hudi users with a way to connect the flow of messages from Pulsar and use more powerful features, while avoiding problems with connectivity that can appear when there are intrinsic differences1 between systems or privacy requirements.",[48,48624,48625],{},"The connector solves this problem by fully integrating with Pulsar (including its serverless functions, per-message processing, and event-stream processing). The connector presents a low-code solution with out-of-the-box capabilities such as multi-tenant connectivity, geo-replication, protocols for direct connection to end-user mobile clients or IoT clients, and more.",[40,48627,48629],{"id":48628},"what-are-the-benefits-of-using-the-hudi-sink-connector","What are the benefits of using the Hudi Sink connector?",[48,48631,48632],{},"The integration between Hudi and Apache Pulsar provides three key benefits:",[321,48634,48635,48638,48641],{},[324,48636,48637],{},"Simplicity: Quickly move data from Apache Pulsar to Hudi without any user code.",[324,48639,48640],{},"Efficiency: Reduce your time spent configuring the data layer. This means you more time to discover the maximum business value from real-time data in an effective way.",[324,48642,48643],{},"Scalability: Run in different modes (standalone or distributed). This allows you to build reactive data pipelines to meet business and operational needs in real time.",[40,48645,48647],{"id":48646},"how-do-i-get-started-with-the-hudi-sink-connector","How do I get started with the Hudi Sink connector?",[48,48649,48650],{},"The following example shows how to configure the connector running in a standalone Pulsar service.",[40,48652,10104],{"id":10103},[48,48654,48655],{},"First, you need to prepare these components:",[321,48657,48658,48665,48672,48675],{},[324,48659,48660],{},[55,48661,48664],{"href":48662,"rel":48663},"https:\u002F\u002Fwww.apache.org\u002Fdyn\u002Fmirrors\u002Fmirrors.cgi?action=download&filename=pulsar\u002Fpulsar-2.10.1\u002Fapache-pulsar-2.10.1-bin.tar.gz",[264],"Pulsar 2.10.1",[324,48666,48667],{},[55,48668,48671],{"href":48669,"rel":48670},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-lakehouse\u002Freleases\u002Ftag\u002Fv2.10.1.1",[264],"Lakehouse connector 2.10.1.1",[324,48673,48674],{},"Python3",[324,48676,48677],{},"pyspark 3.2.1",[48,48679,48680],{},"Initialize the pyspark environment.",[1666,48682,48683],{},[324,48684,48685],{},"Create a virtualenv with python3.",[8325,48687,48690],{"className":48688,"code":48689,"language":8330},[8328],"python3 -m venv .hudi-pyspark && source .hudi-pyspark\u002Fbin\u002Factivate\n",[4926,48691,48689],{"__ignoreMap":18},[1666,48693,48694],{"start":19},[324,48695,48696],{},"Download pyspark.",[8325,48698,48701],{"className":48699,"code":48700,"language":8330},[8328],"pip install pyspark==3.2.1 && export PYSPARK_PYTHON=$(which python3)\n",[4926,48702,48700],{"__ignoreMap":18},[48,48704,39596,48705,48708],{},[55,48706,20384],{"href":39599,"rel":48707},[264]," feature to run the connector. Follow the steps below to quickly get the connector up and running.",[40,48710,48712],{"id":48711},"configure-the-sink-connector-with-local-filesystem","Configure the sink connector with local Filesystem",[1666,48714,48715],{},[324,48716,48717],{},"Decompress the Pulsar package and go to its root directory.",[8325,48719,48722],{"className":48720,"code":48721,"language":8330},[8328],"tar -xvf apache-pulsar-2.10.1-bin.tar.gz && cd apache-pulsar-2.10.1\n",[4926,48723,48721],{"__ignoreMap":18},[1666,48725,48726],{"start":19},[324,48727,48728],{},"Start the Pulsar service with the daemon client tool.",[8325,48730,48733],{"className":48731,"code":48732,"language":8330},[8328],"bin\u002Fpulsar-daemon start standalone\n",[4926,48734,48732],{"__ignoreMap":18},[1666,48736,48737],{"start":279},[324,48738,48739],{},"Create a directory for storing table data.",[8325,48741,48744],{"className":48742,"code":48743,"language":8330},[8328],"mkdir hudi-sink\n",[4926,48745,48743],{"__ignoreMap":18},[1666,48747,48748],{"start":20920},[324,48749,48750],{},"Create the sink configuration file hudi-sink.json. Note that you need to update archive and hoodie.base.path to the correct path.",[8325,48752,48755],{"className":48753,"code":48754,"language":8330},[8328],"{\n         \"tenant\": \"public\",\n         \"namespace\": \"default\",\n         \"name\": \"hudi-sink\",\n         \"inputs\": [\n           \"test-hudi-pulsar\"\n         ],\n         \"archive\": \"\u002Fpath\u002Fto\u002Fpulsar-io-lakehouse-2.10.1.1.nar\",\n         \"parallelism\": 1,\n         \"processingGuarantees\": \"EFFECTIVELY_ONCE\",\n         \"configs\":   {\n             \"type\": \"hudi\",\n             \"hoodie.table.name\": \"hudi-connector-test\",\n             \"hoodie.table.type\": \"COPY_ON_WRITE\",\n             \"hoodie.base.path\": \"file:\u002F\u002F\u002Fpath\u002Fto\u002Fhudi-sink\",\n             \"hoodie.datasource.write.recordkey.field\": \"id\",\n             \"hoodie.datasource.write.partitionpath.field\": \"id\",\n                     \"maxRecordsPerCommit\": \"10\"\n         }\n     }\n",[4926,48756,48754],{"__ignoreMap":18},[1666,48758,48759],{"start":20934},[324,48760,48761],{},"Submit the Hudi sink with pulsar-admin.",[8325,48763,48766],{"className":48764,"code":48765,"language":8330},[8328],"bin\u002Fpulsar-admin sinks create --sink-config-file ${PWD}\u002Fhudi-sink.json\n",[4926,48767,48765],{"__ignoreMap":18},[1666,48769,48770],{"start":20948},[324,48771,48772],{},"Check the sink status to confirm it is running.",[8325,48774,48777],{"className":48775,"code":48776,"language":8330},[8328],"bin\u002Fpulsar-admin sinks status --name hudi-sink\n",[4926,48778,48776],{"__ignoreMap":18},[48,48780,48781],{},"The expected output:",[8325,48783,48786],{"className":48784,"code":48785,"language":8330},[8328],"{\n      \"numInstances\" : 1,\n      \"numRunning\" : 1,\n      \"instances\" : [ {\n        \"instanceId\" : 0,\n        \"status\" : {\n          \"running\" : true,\n          \"error\" : \"\",\n          \"numRestarts\" : 0,\n          \"numReadFromPulsar\" : 0,\n          \"numSystemExceptions\" : 0,\n          \"latestSystemExceptions\" : [ ],\n          \"numSinkExceptions\" : 0,\n          \"latestSinkExceptions\" : [ ],\n          \"numWrittenToSink\" : 0,\n          \"lastReceivedTime\" : 0,\n          \"workerId\" : \"c-standalone-fw-localhost-8080\"\n        }\n      } ]\n    }\n",[4926,48787,48785],{"__ignoreMap":18},[48,48789,48790],{},"numRunning shows 1 and running shows true mean that the sink is started successfully.",[1666,48792,48793],{"start":25801},[324,48794,48795],{},"Produce 100 messages to the topic test-hudi-pulsar to make Hudi flush records to the table hudi-connector-test.",[8325,48797,48800],{"className":48798,"code":48799,"language":8330},[8328],"for i in {1..10}; do bin\u002Fpulsar-client produce -vs 'json:{\"type\":\"record\",\"name\":\"data\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"}]}' -m \"{\\\"id\\\":$i}\" test-hudi-pulsar; done\n",[4926,48801,48799],{"__ignoreMap":18},[1666,48803,48804],{"start":25806},[324,48805,48806],{},"Check the sink status to confirm the message consumed.",[8325,48808,48810],{"className":48809,"code":48776,"language":8330},[8328],[4926,48811,48776],{"__ignoreMap":18},[48,48813,48781],{},[8325,48815,48818],{"className":48816,"code":48817,"language":8330},[8328],"{\n      \"numInstances\" : 1,\n      \"numRunning\" : 1,\n      \"instances\" : [ {\n        \"instanceId\" : 0,\n        \"status\" : {\n          \"running\" : true,\n          \"error\" : \"\",\n          \"numRestarts\" : 0,\n          \"numReadFromPulsar\" : 10,\n          \"numSystemExceptions\" : 0,\n          \"latestSystemExceptions\" : [ ],\n          \"numSinkExceptions\" : 0,\n          \"latestSinkExceptions\" : [ ],\n          \"numWrittenToSink\" : 10,\n          \"lastReceivedTime\" : 1657637475669,\n          \"workerId\" : \"c-standalone-fw-localhost-8080\"\n        }\n      } ]\n    }\n",[4926,48819,48817],{"__ignoreMap":18},[48,48821,48822],{},"numReadFromPulsar shows 10 and numWrittenToSink shows 10 mean that the messages are written into the sink.",[1666,48824,48825],{"start":25812},[324,48826,48827],{},"Start pyspark with Hudi.",[8325,48829,48832],{"className":48830,"code":48831,"language":8330},[8328],"pyspark \\\n    --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1 \\\n    --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \\\n    --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \\\n    --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'\n",[4926,48833,48831],{"__ignoreMap":18},[1666,48835,48836],{"start":25817},[324,48837,48838],{},"Execute the following code in the pyspark.",[8325,48840,48843],{"className":48841,"code":48842,"language":8330},[8328],"tablename=\"hudi-connector-test\"\nbasepath=\"file:\u002F\u002F\u002Fpath\u002Fto\u002Fhudi-sink\"\nval tripsSnapshotDF = spark.read.format(\"hudi\").load(basepath)\ntripsSnapshotDF.createOrReplaceTempView(\"pulsar\")\nspark.sql(\"select id from pulsar\").show()\n",[4926,48844,48842],{"__ignoreMap":18},[48,48846,48847],{},"Then it will show the table hudi-connector-test content, which is produced from the Pulsar topic test-hudi-pulsar.",[8325,48849,48852],{"className":48850,"code":48851,"language":8330},[8328],"+---+\n| id|\n+---+\n| 10|\n|  9|\n|  1|\n|  7|\n|  6|\n|  5|\n|  3|\n|  8|\n|  4|\n|  2|\n+---+\n",[4926,48853,48851],{"__ignoreMap":18},[40,48855,48857],{"id":48856},"how-can-i-get-involved","How can I get involved?",[48,48859,48860],{},"The Hudi Sink connector is a major step in the journey of integrating lakehouse systems into the Pulsar ecosystem. To get involved with the Hudi Sink connector for Apache Pulsar, check out the following featured resources:",[321,48862,48863,48875,48882,48895],{},[324,48864,48865,48866,39659,48870,48874],{},"Try out the Hudi Sink connector. To get started, ",[55,48867,36195],{"href":48868,"rel":48869},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-lakehouse\u002Freleases",[264],[55,48871,39663],{"href":48872,"rel":48873},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-lakehouse\u002Fblob\u002Fmaster\u002Fdocs\u002Flakehouse-sink.md",[264]," that walks you through the whole process.",[324,48876,48877,48878,39673],{},"Make a contribution. The Hudi Sink connector is a community-driven service, which hosts its source code on the StreamNative GitHub repository. If you have any feature requests or bug reports, do not hesitate to ",[55,48879,39672],{"href":48880,"rel":48881},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-lakehouse\u002Fissues\u002Fnew\u002Fchoose",[264],[324,48883,48884,48885,48888,48889,39687,48892,39692],{},"‍Contact us. Feel free to create an issue on ",[55,48886,39680],{"href":48880,"rel":48887},[264],", send emails to the ",[55,48890,39686],{"href":39684,"rel":48891},[264],[55,48893,39691],{"href":33664,"rel":48894},[264],[324,48896,47760,48897,1154,48900,45209],{},[55,48898,47764],{"href":45463,"rel":48899},[264],[55,48901,47768],{"href":45206,"rel":48902},[264],[1666,48904,48905],{},[324,48906,48907],{},"Intrinsic differences exist between platforms that have no notion of schema and the ones that have sophisticated schema capabilities because there is no simple way to translate between them. These platform differences range from traditional messaging like Amazon SQS to multi-level hierarchical Avro schema written to a data lake. Distinctions also exist between platforms relying on different data representations, such as Pandas DataFrames and simple messages.",{"title":18,"searchDepth":19,"depth":19,"links":48909},[48910,48911,48912,48913,48914,48915,48916],{"id":48597,"depth":19,"text":48598},{"id":48615,"depth":19,"text":48616},{"id":48628,"depth":19,"text":48629},{"id":48646,"depth":19,"text":48647},{"id":10103,"depth":19,"text":10104},{"id":48711,"depth":19,"text":48712},{"id":48856,"depth":19,"text":48857},"2023-01-26","We’re excited to announce the general availability of the Hudi Sink connector for Apache Pulsar. The Hudi + Pulsar connector offers a convenient, efficient, and flexible approach to moving data from Pulsar to Hudi without requiring user code.","\u002Fimgs\u002Fblogs\u002F63d47416f3473a0387a03015_Hudi-sink-connector.png",{},{"title":48573,"description":48918},"blog\u002Fannouncing-hudi-sink-connector-for-pulsar",[302,28572],"-lk6eKwZgfOlcWyqSx3Qn_lqjfGVXV4ztITVC7-xuwM",{"id":48926,"title":48927,"authors":48928,"body":48931,"category":821,"createdAt":290,"date":49479,"description":49480,"extension":8,"featured":294,"image":49481,"isDraft":294,"link":290,"meta":49482,"navigation":7,"order":296,"path":49483,"readingTime":38438,"relatedResources":290,"seo":49484,"stem":49485,"tags":49486,"__hash__":49487},"blogs\u002Fblog\u002Fstreaming-war-and-how-apache-pulsar-is-acing-the-battle.md","Streaming War and How Apache Pulsar is Acing the Battle",[48929,48930],"Shivji Kumar Jha","Sachidananda Maharana",{"type":15,"value":48932,"toc":49456},[48933,48936,48939,48943,48946,48949,48981,48985,49004,49006,49009,49012,49015,49022,49025,49028,49031,49035,49044,49067,49076,49079,49082,49085,49088,49091,49094,49097,49120,49123,49126,49135,49138,49141,49149,49152,49156,49159,49176,49180,49183,49195,49201,49204,49207,49213,49216,49226,49230,49237,49243,49246,49251,49264,49268,49271,49292,49296,49299,49303,49309,49320,49330,49334,49343,49347,49354,49358,49371,49382,49384,49387,49389,49398],[48,48934,48935],{},"Over the past few years, we have used different streaming solutions for a variety of use cases. They have helped us meet different requirements for scalability, high availability, disaster recovery, load balancing, low costs, multi-tenancy, and many more. With so many tools in the market, streaming becomes more like a battlefield where each character spares no effort to survive and thrive.",[48,48937,48938],{},"In this blog, we will first talk about the streaming war facing different streaming systems and what tools they should have in their arsenal. Next, we will compare each of them and explain why we think Apache Pulsar will win the battle. Lastly, we will demonstrate how to migrate to Pulsar from other platforms.",[40,48940,48942],{"id":48941},"game-and-arsenal","Game and arsenal",[48,48944,48945],{},"Simply put, the game has producers that publish messages to the middleware solution, which are then consumed by applications or microservices. They can either keep them to themselves or sink them into a big data store.",[48,48947,48948],{},"If you want to win a battle, you need to have sharpened tools and sufficient ammunition. Similarly, when choosing a streaming framework that backs your production workloads, you want it to have the best features so you can be well-prepared to ace any production use cases. Here are some of the weapons that are required for a modern streaming arsenal.",[321,48950,48951,48954,48957,48960,48963,48966,48969,48978],{},[324,48952,48953],{},"Real-time messaging. Today, speed plays an important role in a variety of use cases. For an e-commerce application, for example, if there's something wrong on the checkout page, you want to detect it and fix it as soon as possible. Otherwise, you may lose business for the delayed time, as your customers have poor user experiences like payment failure.",[324,48955,48956],{},"Scalability. Mobile phones, IoT devices, and microservices are producing large amounts of data every day. Therefore, your application needs to be able to work with a framework that can scale flexibly to handle the traffic if required.",[324,48958,48959],{},"High availability. If your website or application is not highly available, you will effectively lose revenue during the downtime (for example, due to single points of failure).",[324,48961,48962],{},"Disaster recovery. If your data is synchronized and replicated across regions, you need to have the redundancy to recover from any disaster situation.",[324,48964,48965],{},"Load balancing. Load balancing allows you to distribute data across your storage and computing clusters judiciously. It prevents nodes from being more loaded or less loaded. It is also one of Pulsar’s distinguishing features.",[324,48967,48968],{},"Low cost of operations. Cost is undoubtedly very important for the infrastructure you are using. Currently, we are running Pulsar at a low operation cost. We will compare the costs of using Kafka, Pulsar, and Kinesis later.",[324,48970,48971,48972,48977],{},"Multi-tenancy. A multi-tenant system allows you to ",[55,48973,48976],{"href":48974,"rel":48975},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=xjKNcKLuDZI&list=PLA7KYGkuAD071myyg4X5ShsDHsOaIpHOq&index=7",[264],"separate out different use cases"," and run them in isolation. If there are heavy loads in one of the use cases, only that one is stressed while other parts of the cluster work properly.",[324,48979,48980],{},"Flexibility. For a given use case, you may need high availability while another may require strong consistency. A flexible system should support configurations for varied use cases that allow you to perform operations at different tenancy levels.",[40,48982,48984],{"id":48983},"game-characters","Game characters",[48,48986,48987,48988,1186,48991,1186,48994,5422,48999,190],{},"Next, we will be looking at the pros and cons of popular streaming frameworks including ",[55,48989,821],{"href":23526,"rel":48990},[264],[55,48992,799],{"href":31428,"rel":48993},[264],[55,48995,48998],{"href":48996,"rel":48997},"https:\u002F\u002Faws.amazon.com\u002Fkinesis\u002Fdata-streams\u002F",[264],"Amazon Kinesis Data Streams",[55,49000,49003],{"href":49001,"rel":49002},"https:\u002F\u002Fnats.io\u002F",[264],"NATS",[32,49005,821],{"id":45597},[48,49007,49008],{},"As an open-source project, Pulsar is completely driven by the community. If there are any major improvements, like API changes, the community will send out a voting request so that community members can discuss how Pulsar moves forward.",[48,49010,49011],{},"Pulsar has a multi-layered architecture where storage is separated from computing. This means that you can scale either part independently. For us, as we deploy our Pulsar cluster on AWS, this loosely-coupled structure makes it very easy to select the type of nodes for scaling if required.",[48,49013,49014],{},"Multi-tenancy achieves different levels of resource isolation. It is one of Pulsar’s enterprise features from day one. All other features that came after were well integrated with all of its enterprise features.",[48,49016,49017,49018,49021],{},"On the flip side, since Pulsar is a modular system, you may feel intimidated when you start installing it for the first time. The deployment time is longer than Kafka and Kinesis. That said, if you are already using Kubernetes, you can use ",[55,49019,49020],{"href":43241},"operators to quickly deploy Pulsar",". Another thing that may cause trouble to Pulsar users is its ecosystem (like connectors), which is relatively small compared to Kafka. However, this is by no means a problem as we do have workarounds, which will be covered later.",[32,49023,799],{"id":49024},"apache-kafka",[48,49026,49027],{},"Kafka is also an open-source Apache project. It is battle-tested for the longest time in the streaming space with a mature community and an amazing ecosystem of connectors.",[48,49029,49030],{},"Unlike Pulsar, Kafka has a monolithic architecture, which means storage and computing are bundled together. When scaling your Kafka cluster, you may find it complicated and tricky to select the right types of nodes (for example, on AWS). Therefore, it is less flexible compared with Pulsar in terms of scaling.",[32,49032,49034],{"id":49033},"pulsar-vs-kafka","Pulsar vs. Kafka",[48,49036,49037,49038,49043],{},"Before we introduce the next character, let’s look at how Pulsar performs compared with Kafka. We carried out extensive performance tests on both Pulsar and Kafka and ",[55,49039,49042],{"href":49040,"rel":49041},"https:\u002F\u002Fmedium.com\u002F@yuvarajl\u002Fwhy-nutanix-beam-went-ahead-with-apache-pulsar-instead-of-apache-kafka-1415f592dbbb",[264],"ultimately chose Pulsar",". The following is a summary of our understanding based on the tests.",[321,49045,49046,49049,49052,49055,49058,49061,49064],{},[324,49047,49048],{},"2.5x maximum throughput compared to Kafka",[324,49050,49051],{},"100x lower single-digit publish latency than Kafka",[324,49053,49054],{},"1.5x faster historical read rate than Kafka",[324,49056,49057],{},"Preinstalled schema registry in Pulsar",[324,49059,49060],{},"Pulsar is enterprise-ready from day one",[324,49062,49063],{},"Local disk (Kafka) vs. Tiered\u002Fdecoupled storage (Pulsar)",[324,49065,49066],{},"Kafka wins on community support and ecosystem",[48,49068,49069,49070,49075],{},"Our tests show that Pulsar outperforms Kafka in terms of throughput, latency, and historical read rate. Pulsar features a segment-oriented architecture for every topic (partition), so you don’t have the problem of some topics being hot but others being cold. ",[55,49071,49074],{"href":49072,"rel":49073},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=xIibbB5Y0MM&list=PLA7KYGkuAD071myyg4X5ShsDHsOaIpHOq&index=5",[264],"The data is spread wonderfully across the storage nodes of Pulsar",", which are managed by BookKeeper. This is also the reason why Pulsar has faster historical reads.",[48,49077,49078],{},"Pulsar has almost everything open-sourced. Its schema registry is already integrated with Pulsar's broker. Kafka, on the other hand, does not have it in the open-source version.",[48,49080,49081],{},"As mentioned above, Pulsar is enterprise-ready with an arsenal of effective weapons. Geo-replication has been one of them since day one. As opposed to Pulsar, Kafka implements geo-replication through MirrorMaker while there are still some rough edges. Some features are probably not working that well with it.",[48,49083,49084],{},"There are also many other blogs comparing Pulsar and Kafka. For more information, see the Reference section below.",[32,49086,48998],{"id":49087},"amazon-kinesis-data-streams",[48,49089,49090],{},"Amazon Kinesis Data Streams is a serverless streaming data service provided by AWS. You can use it to collect and process large streams of data in real time. It’s available as a service so you can easily get started with only a few clicks.",[48,49092,49093],{},"The benefit of using Amazon Kinesis Data Streams is that it requires less maintenance work as AWS manages the infrastructure for you. Additionally, it works seamlessly with other AWS services like S3, DynamoDB, and Lambda.",[48,49095,49096],{},"However, it is not an ideal solution for us due to the following reasons:",[321,49098,49099,49102,49105,49108,49111,49114,49117],{},[324,49100,49101],{},"It is more costly compared to Pulsar, which will be explained later.",[324,49103,49104],{},"It is a closed-source system, which means we cannot customize it based on our needs.",[324,49106,49107],{},"As AWS manages everything for you, it is less flexible.",[324,49109,49110],{},"~ The retention period is fixed at 24 hours or up to 365 days. Storing data for a longer period of time than your need means unnecessary overhead.",[324,49112,49113],{},"~ The average record size cannot be more than 1 MB.",[324,49115,49116],{},"~ Data is always replicated to 3 availability zones (AZs). If your use case does not require more than 2 AZs, it comes with more cost.",[324,49118,49119],{},"Vendor lock-in.",[32,49121,49003],{"id":49122},"nats",[48,49124,49125],{},"Another character in the game is NATS, a CNCF incubating open-source project. It is a connective technology that powers modern distributed systems.",[48,49127,49128,49129,49134],{},"The design philosophy behind NATS is simple, agile, performant, secure, and resilient. Users can easily get started with NATS as it supports various flexible deployments. You can literally deploy it anywhere (on-premises, IoT, edge, and hybrid use cases). Additionally, it provides a base set of functionalities and qualities, also known as ",[55,49130,49133],{"href":49131,"rel":49132},"https:\u002F\u002Fdocs.nats.io\u002Fnats-concepts\u002Fcore-nats",[264],"Core NATS",", which supports models like Publish-Subscribe, Request-Reply, and Queue Groups.",[48,49136,49137],{},"NATS has a built-in distributed persistence system called JetStream. It enables new functionalities and higher qualities of service on top of the base Core NATS. However, it's not as mature as Kafka or Pulsar, and it still has much room for improvement, especially in disaster recovery.",[48,49139,49140],{},"For the consumer metadata like offsets, we want to duplicate and store them on different servers for better redundancy. As NATS makes a raft group for every consumer application, millions of consumers mean millions of Raft groups. This will cause considerable pressure on the network.",[48,49142,49143,49144,190],{},"To learn more about NATS, see ",[55,49145,49148],{"href":49146,"rel":49147},"https:\u002F\u002Fcontent.red-badger.com\u002Fwe-love-tech\u002Fnats\u002Fthe-power-of-nats",[264],"The power of NATS: Modernising communications on a global scale",[48,49150,49151],{},"With a basic understanding of these systems, we think Pulsar is a better choice for cloud-native applications. For IoT, edge, or hybrid use cases, we recommend NATS because of its lightweight server. Users can deploy NATS anywhere and use one cluster that spreads across different environments like cloud, IoT devices, edge, and even on-premises.",[40,49153,49155],{"id":49154},"let-the-battle-begin","Let the battle begin",[48,49157,49158],{},"To make a more comprehensive evaluation, we conducted some performance tests in one of our use cases and compared their respective costs. Here are the rules of the game:",[321,49160,49161,49164,49167,49170,49173],{},[324,49162,49163],{},"Ingest 12 TB\u002Fday: about 5 million messages daily with a size of 2.5 MB per message",[324,49165,49166],{},"Retention period: 24 hours",[324,49168,49169],{},"Replication factor: 2 (24 TB\u002Fday)",[324,49171,49172],{},"Data stored in multiple availability zones",[324,49174,49175],{},"Producers\u002Fconsumers preferably in the same availability zone as stream data",[32,49177,49179],{"id":49178},"pulsar-cost","Pulsar cost",[48,49181,49182],{},"Figure 1 depicts the schematic architecture of our Pulsar cluster.",[48,49184,24328,49185,49188,49189,49194],{},[384,49186],{"alt":18,"src":49187},"\u002Fimgs\u002Fblogs\u002F63c5fb42fcd1374b6f39edfb_AznQ_SCue_kY3nfaSt2uqFPEIP4Nj25qutwFyOathaoXlrysn7cR9oQDS8Rm08YfAPbYPkm35zgy6NYbdD3a1w5Mh0zSpg1V29na4DWXkMOQNKXKajpAYzoLUgPy4OXzYpDnKv8zxOveBqPR2Z4zxKgiHeHGLK8prDJNm5IILbkGGHPT8SAFVBqbpl5IXw.png","Figure 1. Pulsar cluster schematic architecture\nWe used the following commands to configure bookies and ",[55,49190,49193],{"href":49191,"rel":49192},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fadministration-isolation-bookie\u002F#configure-bookie-affinity-groups",[264],"affinity groups"," in different AZs, with group-bookie1 (bk1) being the primary group and group-bookie2 (bk2 and bk3) being the secondary group. Because Pulsar brokers are stateless, we put all the brokers in the same AZ (az1).",[8325,49196,49199],{"className":49197,"code":49198,"language":8330},[8328],"bin\u002Fpulsar-admin bookies set-bookie-rack --bookie pulsar-bookie-1:3181 --hostname pulsar-bookie-1:3181 --groupgroup-bookie1 --rack rack1\n\nbin\u002Fpulsar-admin bookies set-bookie-rack --bookie pulsar-bookie-2:3181,pulsar-bookie-3:3181 --hostname pulsar-bookie-2:3181,pulsar-bookie-3:3181 --group group-bookie2 --rack rack2\n\nbin\u002Fpulsar-admin namespaces set-bookie-affinity-group public\u002Fdefault --primary-group group-bookie1 --secondary-group group-bookie2\n",[4926,49200,49198],{"__ignoreMap":18},[48,49202,49203],{},"These configurations mean that whenever a broker is trying to write data on bookies, it will put one copy on bk1 and the other on either bk2 or bk3. This helps reduce data transfer costs between the AZs.",[48,49205,49206],{},"If brokers need to be deployed in different AZs, you can use the following command to set the primary brokers for the namespace.",[8325,49208,49211],{"className":49209,"code":49210,"language":8330},[8328],"bin\u002Fpulsar-admin ns-isolation-policy set --auto-failover-policy-type min_available --auto-failover-policy-params min_limit=1,usage_threshold=80 --namespaces public\u002Fdefault --primary pulsar-broker-node-1 --secondary pulsar-broker-node-2\n",[4926,49212,49210],{"__ignoreMap":18},[48,49214,49215],{},"Our Pulsar cluster was deployed on AWS instances. Table 1 and Table 2 show the breakdown of the cost.",[48,49217,49218,49221,49222,49225],{},[384,49219],{"alt":18,"src":49220},"\u002Fimgs\u002Fblogs\u002F63c603ef94c96da8983df22f_pulsar-cluster-cost-breakdown.png","Table 1. Pulsar cluster cost breakdown",[384,49223],{"alt":18,"src":49224},"\u002Fimgs\u002Fblogs\u002F63c6044c6de9b7d4f20f2c69_bookkeeper-ebs-monthly-cost.png","Table 2. BookKeeper EBS monthly cost\nWe deployed bookies on m5.4xlarge instances (16 vCPUs and 64 GB of memory), with 10 TB of storage attached to each of them for ledgers. We chose r5.2xlarge (8 vCPUs and 64 GB of memory) instances for brokers and t3.large (2 vCPUs and 8 GB of memory) instances for ZooKeeper. The total monthly cost was about $7.4K without data transfer. If we included the data transfer cost for 360 TB (about 12 TB of data per day as mentioned above), it would add another $7K, so the total monthly cost would be about $14K.",[32,49227,49229],{"id":49228},"kafka-cost","Kafka cost",[48,49231,49232,49233,49236],{},"Figure 2 depicts the schematic architecture of our Kafka cluster. Kafka has the same options available in replica assignment which uses a comma-separated list of preferred replicas.\n",[384,49234],{"alt":18,"src":49235},"\u002Fimgs\u002Fblogs\u002F63c5fb42ce3bfd413f67cf27_lR0C5TAm81dRvwH8p7MvA3VlwUt6K4H6LF6nP1vFMzxRAwskusarBt9RuYnneXb4pwRPQqF02HF5jr808I9bpU3Cz146lU3M_MTfRDOM0cJAf7r8x95A7VoeTiMMlJjy1W7iF8tw2fGfM8KzarIIr-FmtlpwFXDGvIb2HtrUUbiK9TebqFmRif9aCraXfg.png","Figure 2. Kafka cluster schematic architecture\nWe used the following commands to assign replicas.",[8325,49238,49241],{"className":49239,"code":49240,"language":8330},[8328],"bin\u002Fkafka-topics.sh --create --zookeeper localhost:2181 --topic topicA --replica-assignment 0:1,0:1,0:2 --partitions 3\n\nbin\u002Fkafka-topics.sh --alter --zookeeper localhost:2181 --topic topicA --replica-assignment 0:1,0:1,0:2,0:2 --partitions 4\n",[4926,49242,49240],{"__ignoreMap":18},[48,49244,49245],{},"In order to have all the data available in one AZ, broker 1 should have the leaders of all partitions. In Kafka, writes only go to the leaders. In our case, all the data was available in one broker for less data transfer costs between the AZs.",[916,49247,49248],{},[48,49249,49250],{},"Whenever a broker goes down, Kafka will restore the leadership to the broker that comes first in the list with the preferred replicas. This is the default behavior enabled in the latest version of Kafka with auto.leader.rebalance.enable=true.",[48,49252,49253,49254,49259,49260,49263],{},"In our performance test, the Kafka cluster was managed by ",[55,49255,49258],{"href":49256,"rel":49257},"https:\u002F\u002Faws.amazon.com\u002Fmsk\u002F",[264],"Amazon MSK",", using 5 m5.4xlarge broker instances, each having 16 vCPUs, 64 GB of memory, and 10 TB of storage. There is 50 TB in total, enough for 24 TB of data (replication factor set to 2) per month. In addition to the infrastructure, we had to pay for the intra-region data transfer cost between the AZs. The total monthly cost was almost $20K.\n",[384,49261],{"alt":18,"src":49262},"\u002Fimgs\u002Fblogs\u002F63c5fb4272078a700dd6500c_V1ZtwxejOwL3R8XKouOT0b6KHg8DBgH9uVjmG9pmLojBqO3IKK0Ssztsg-vlLH5hDH2dpAjxLuRj1nkajGoEqYRoAU11GUqHOpHrguH-_xYsfQRyAQ1Jly5sjqYQRB8p7ujSM_bkNToXaOPr0OEatUsRliQkDyMEBk-syfTmKGCbOCYss6R1tpr3WARQOA.png","Figure 3. Kafka cluster monthly cost",[32,49265,49267],{"id":49266},"kinesis-data-streams-cost","Kinesis Data Streams cost",[48,49269,49270],{},"As Amazon manages the Kinesis Data Streams service for us, we will only focus on the cost in this section.",[48,49272,49273,49274,49277,49278,49283,49284,49287,49288,49291],{},"As mentioned above, the average message size cannot be more than 1 MB for Kinesis Data Streams. Our internal team came up with a workaround that they put messages in S3 and provided the S3 path inside Kinesis. This means that the solution had extra S3 costs. To make up for that, I put 180 baseline records per second as per 1 MB of data.\n",[384,49275],{"alt":18,"src":49276},"\u002Fimgs\u002Fblogs\u002F63c5fb43a029e046b6034826_a9H910EDPDkjrhj3l3sIS442ggw4eyZiNM-s-2hAnjqbz873j6cyleayp9JARa_1M07VwBfh4nLxFmeEIVJVyAc3SNM448KGjMi4TfEqJci535QIQrpFCo3quvZsqnvGXJWp0YRPlPhbzP9Y9Bs9t3YEJKHNtIxMD-c2u8wKhyGrKJcVg-1vOxXVeOTXsg.png","Figure 4. 180 baseline records per second\nWe set the buffer for growth to 20% and the number of ",[55,49279,49282],{"href":49280,"rel":49281},"https:\u002F\u002Fdocs.aws.amazon.com\u002Fstreams\u002Flatest\u002Fdev\u002Fbuilding-enhanced-consumers-api.html",[264],"enhanced fan-out consumers"," to 3. Each consumer had dedicated throughput. If you increase the number of fan-out consumers, the cost will increase linearly.\n",[384,49285],{"alt":18,"src":49286},"\u002Fimgs\u002Fblogs\u002F63c5fb43f33b34b630d70b78_XcsCgqkD_rR0YObXPwsWh-8LiE-LIFhF4roXB641WDILFNLcAzrhR3bijZ5OCqZe-eYADp57SdKqTs3NVnYgY_NxKkok3DvnKzBdS0BW4r76wdlwNeq931wqwKd6wEIu-W-VG5IxyrveAFzqTb_L9I5oYjJqcVW1w89-HhKvgShJLdGju8CJPeeEeUerQA.png","Figure 5. The growth buffer and the number of enhanced fan-out consumers\nThe total monthly cost reached about 28K. Figure 6 shows the cost details.\n",[384,49289],{"alt":18,"src":49290},"\u002Fimgs\u002Fblogs\u002F63c5fb428028b3659e5b1536_iTWOoutFGftfiIzux2YtQSQ5FxysmrbBHFf-zY8z4gRakx2UKxfZ2Tiv3OB--ePqD6qYN9d96SpHnLCQvApipx66WfeyvCcE5B7GkouMbxYryx2yxFpHPNRZF1EVxafmd04sgNv2rtdn1wMJOBbcu60YLeIcwwouJswbGitxfctC8uNs_E9SdWbqkM0axw.png","Figure 6. Amazon Kinesis Data Streams monthly cost\nThe results show that Pulsar stands out as the most cost-effective option with only $14K per month compared to Kafka ($20K) and Kinesis ($28K). That's why our primary selection for streaming data is Pulsar.",[40,49293,49295],{"id":49294},"change-the-game-migrate-to-pulsar","Change the game: Migrate to Pulsar",[48,49297,49298],{},"Now that we know how Pulsar aces the game, let’s look at how to migrate to Pulsar from other systems like Kafka.",[32,49300,49302],{"id":49301},"kafka-on-pulsar-kop","Kafka-on-Pulsar (KoP)",[48,49304,49305,49308],{},[55,49306,35093],{"href":29592,"rel":49307},[264]," leverages a Kafka protocol handler on Pulsar brokers, which processes Kafka messages. It allows you to easily migrate your existing Kafka applications and services to Pulsar without modifying the code. You only need to make minor changes on the Pulsar server side. Clients do not even need to know whether they are connected to Kafka or Pulsar. Specifically, to get started with KoP, you need to do the following:",[1666,49310,49311,49314,49317],{},[324,49312,49313],{},"Add KoP configurations (for example, messagingProtocols and entryFormat) to broker.conf or standalone.conf.",[324,49315,49316],{},"Add the protocol handler (a nar file) to your server.",[324,49318,49319],{},"Remember to change the Kafka cluster URL to the Pulsar cluster URL in your client code.",[48,49321,24328,49322,49325,49326,190],{},[384,49323],{"alt":18,"src":49324},"\u002Fimgs\u002Fblogs\u002F63c5fb43c09588c381dedfa6_X05HC_VQd6Zr0N2hZacvKcL0NR7jhXQ6GttSnYvQlmnc17JhX_oIyE6T-Uh_SiEZvhEuEJRuMXdaiw1V7zsIMLZToxdGDH3KHCaPsjVxj3eF3nd7cTvKi-CiS-lzf1eiCkI7xkLjxomyDiDFpBxetxHUnmAD1EZeGY2W6qJGDvIf3nNEJdj57PYuEucVsQ.png","Figure 7. KoP architecture\nFor more information, see the ",[55,49327,3897],{"href":49328,"rel":49329},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=v7qYBQVFz_k#t=29m33s",[264],[32,49331,49333],{"id":49332},"pulsar-adaptor-for-apache-kafka","Pulsar adaptor for Apache Kafka",[48,49335,49336,49337,49342],{},"This ",[55,49338,49341],{"href":49339,"rel":49340},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.10.x\u002Fadaptors-kafka\u002F",[264],"tool"," was developed before KoP to help users migrate to Pulsar. To use it, you need to change the regular Kafka client dependency and replace it with the Pulsar Kafka wrapper. We don’t recommend it as it is only applicable to Java-based clients and it is not suitable for our use case.",[32,49344,49346],{"id":49345},"amqp-on-pulsar-aop","AMQP-on-Pulsar (AoP)",[48,49348,49349,49350,49353],{},"Similar to KoP, ",[55,49351,37239],{"href":37237,"rel":49352},[264]," is implemented as a Pulsar protocol handler. Messaging systems like RabbitMQ and ActiveMQ use the AMQP protocol for their messages.",[40,49355,49357],{"id":49356},"connectors-how-to-make-the-best-use-of-them","Connectors: How to make the best use of them",[48,49359,49360,49361,1186,49364,5422,49367,49370],{},"Pulsar’s ecosystem still has a long way to go compared with Kafka’s, especially for Pulsar connectors. Currently, Pulsar supports connectors for popular systems like ",[55,49362,41357],{"href":41355,"rel":49363},[264],[55,49365,8057],{"href":41360,"rel":49366},[264],[55,49368,41366],{"href":41364,"rel":49369},[264],", while there are more Kafka connectors available. In this connection, you can use the KoP-enabled Pulsar cluster with Kafka connectors to implement Pulsar connectors.",[48,49372,49373,49374,41750,49378,49381],{},"One such connector we created is the Pulsar-Druid connector. As the Kafka-Druid connector is already available, you can enable KoP for your Pulsar cluster; after Kafka clients publish live events to the Pulsar cluster, end users can query the live data as they are synchronized by the Kafka-Druid connector. For more information, see the ",[55,49375,3897],{"href":49376,"rel":49377},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=v7qYBQVFz_k#t=34m25s",[264],[384,49379],{"alt":18,"src":49380},"\u002Fimgs\u002Fblogs\u002F63c5fb431437ea90457d1c30_-B63mPXLiCqpsxVzdxKNQCi0NA7GQQhBxDl2VPbvS8ukHY-u15AyjBkNg4H2-fCrQ_evllSISPD4QmDmJskxY9bgV0ntx63ij9f7Jb00iWxcjsvgs3Ee9Yot2nOLvLdl7i5WVTY35lR4lsr7-lpGnsyccETEy4ibOn0HRY8t-3ErSej-tessq14HOmrnFQ.png","Figure 8. Implement the Pulsar-Druid connector with the KoP-enabled Pulsar cluster",[40,49383,2125],{"id":2122},[48,49385,49386],{},"In this blog, we discussed some common requirements for data streaming and introduced popular systems with their advantages and disadvantages. We performed some tests on these systems and explained why we ultimately selected Apache Pulsar as our messaging platform. To help migrate from other tools to Pulsar, the Pulsar community provides the protocol handler plugin to support a safe transition. In addition, you can also use it to achieve new Pulsar connectors with existing Kafka connectors.",[40,49388,36477],{"id":36476},[48,49390,47760,49391,1154,49394,49397],{},[55,49392,47764],{"href":45463,"rel":49393},[264],[55,49395,47768],{"href":45206,"rel":49396},[264]," (no fee required). Meanwhile, check out the following resources:",[321,49399,49400,49404,49411,49417,49424,49429,49436,49443,49450],{},[324,49401,49402],{},[55,49403,42239],{"href":27690},[324,49405,49406],{},[55,49407,49410],{"href":49408,"rel":49409},"https:\u002F\u002Fhevodata.com\u002Flearn\u002Fpulsar-vs-kafka\u002F",[264],"Apache Pulsar vs Kafka: Which is Better?",[324,49412,49413],{},[55,49414,49416],{"href":49040,"rel":49415},[264],"Why Nutanix Beam went ahead with Apache Pulsar instead of Apache Kafka?",[324,49418,49419],{},[55,49420,49423],{"href":49421,"rel":49422},"https:\u002F\u002Faws.amazon.com\u002Fkinesis\u002Fdata-streams\u002Ffaqs\u002F",[264],"Amazon Kinesis Data Streams FAQs",[324,49425,49426],{},[55,49427,49148],{"href":49146,"rel":49428},[264],[324,49430,49431],{},[55,49432,49435],{"href":49433,"rel":49434},"https:\u002F\u002Fnats.io\u002Fblog\u002Fmatrix-dendrite-kafka-to-nats\u002F",[264],"The Matrix Dendrite Project move from Kafka to NATS\t",[324,49437,49438],{},[55,49439,49442],{"href":49440,"rel":49441},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FReplication+tools#Replicationtools-Howtousethetool?.2",[264],"How to use the Kafka replication tool",[324,49444,49445],{},[55,49446,49449],{"href":49447,"rel":49448},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Flatest\u002Fconcepts\u002Fkop-concepts",[264],"KoP documentation",[324,49451,49452],{},[55,49453,49333],{"href":49454,"rel":49455},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fadaptors-kafka\u002F",[264],{"title":18,"searchDepth":19,"depth":19,"links":49457},[49458,49459,49466,49471,49476,49477,49478],{"id":48941,"depth":19,"text":48942},{"id":48983,"depth":19,"text":48984,"children":49460},[49461,49462,49463,49464,49465],{"id":45597,"depth":279,"text":821},{"id":49024,"depth":279,"text":799},{"id":49033,"depth":279,"text":49034},{"id":49087,"depth":279,"text":48998},{"id":49122,"depth":279,"text":49003},{"id":49154,"depth":19,"text":49155,"children":49467},[49468,49469,49470],{"id":49178,"depth":279,"text":49179},{"id":49228,"depth":279,"text":49229},{"id":49266,"depth":279,"text":49267},{"id":49294,"depth":19,"text":49295,"children":49472},[49473,49474,49475],{"id":49301,"depth":279,"text":49302},{"id":49332,"depth":279,"text":49333},{"id":49345,"depth":279,"text":49346},{"id":49356,"depth":19,"text":49357},{"id":2122,"depth":19,"text":2125},{"id":36476,"depth":19,"text":36477},"2023-01-17","This blog introduces and compares different streaming systems like Kafka, Pulsar, Amazon Kinesis, and NATS, and then explains how to migrate to Pulsar from other platforms.","\u002Fimgs\u002Fblogs\u002F63c60236f3fd0cf51bccd8c6_streaming-war-top-image.jpg",{},"\u002Fblog\u002Fstreaming-war-and-how-apache-pulsar-is-acing-the-battle",{"title":48927,"description":49480},"blog\u002Fstreaming-war-and-how-apache-pulsar-is-acing-the-battle",[7347,821,27847,5954],"OvhIgNcXT5DgDsMNbj3hGkcQdwvwjPAitl7Bqoh9VsM",{"id":49489,"title":49490,"authors":49491,"body":49493,"category":821,"createdAt":290,"date":49962,"description":49963,"extension":8,"featured":294,"image":49964,"isDraft":294,"link":290,"meta":49965,"navigation":7,"order":296,"path":48214,"readingTime":46114,"relatedResources":290,"seo":49966,"stem":49967,"tags":49968,"__hash__":49969},"blogs\u002Fblog\u002Fhandling-100k-consumers-with-one-pulsar-topic.md","Handling 100K Consumers with One Pulsar Topic",[49492],"Hongjie Zhai",{"type":15,"value":49494,"toc":49943},[49495,49497,49506,49509,49512,49518,49521,49524,49528,49531,49534,49537,49554,49557,49568,49571,49575,49578,49581,49587,49607,49610,49616,49619,49623,49626,49632,49637,49640,49644,49647,49653,49656,49660,49663,49669,49672,49676,49679,49685,49705,49708,49711,49719,49723,49726,49739,49742,49746,49749,49755,49769,49772,49778,49782,49785,49796,49802,49805,49808,49812,49815,49818,49824,49827,49831,49834,49840,49843,49847,49850,49859,49865,49868,49871,49874,49877,49880,49886,49889,49895,49898,49904,49907,49909,49912,49915,49917,49941],[40,49496,19156],{"id":19155},[48,49498,49499,49500,49505],{},"Nippon Telegraph and Telephone Corporation (NTT) is one of the world's leading telecommunications carriers. ",[55,49501,49504],{"href":49502,"rel":49503},"https:\u002F\u002Fwww.rd.ntt\u002Fe\u002Fsic\u002F",[264],"NTT Software Innovation Center"," creates innovative platform technologies to support the ICT service for prosperous future as a professional group on IT. It works to create innovative software platforms and computing platform technologies to support the evolution of the IoT\u002FAI service as a professional group on IT. It will not only proactively contribute to the open source community but also promote research and development through open innovation. It will also contribute to the reduction of CAPEX\u002FOPEX for IT or strategic utilization of IT, using the accumulated technologies and know-how regarding software development and operation.",[48,49507,49508],{},"Before I introduce how we use Apache Pulsar to handle 100K consumers, let me first explain our use case and the challenges facing us.",[48,49510,49511],{},"In our smart city scenario, we need to collect data from a large number of devices, such as cars, sensors, and cameras, and further analyze the data for different purposes. For example, if a camera detects any road damage, we need to immediately broadcast the information to the cars nearby, thus avoiding traffic congestion. More specifically, we provide a topic for each area and all the vehicles in that area are connected to the topic. For a huge city, we expect that there are about 100K vehicles publishing data to a single topic. In addition to the large data volume, we also need to work with different protocols used by these devices, like MQTT, REST, and RTSP.",[48,49513,49514],{},[384,49515],{"alt":49516,"src":49517},"Visualization of how NTT collects data ","\u002Fimgs\u002Fblogs\u002F63be1482ae551659c9393d2c_NTT-blog-image1.png",[48,49519,49520],{},"Data persistence is another challenge in this scenario. For essential data, like key scenes from cameras or key events from IoT devices, we need to securely store them for further analysis, perhaps for a long period of time. We also have to prepare proper storage solutions in the system.",[48,49522,49523],{},"With massive devices, various protocols, and different storage systems, our data pipeline becomes extremely complicated. It is almost impossible to maintain such a huge system.",[40,49525,49527],{"id":49526},"why-did-we-choose-apache-pulsar","Why did we choose Apache Pulsar",[48,49529,49530],{},"As we worked on solutions, we were thinking about introducing a unified data hub, like a large, centralized message broker that is able to support various protocols. This way, all the devices only need to communicate with a single endpoint.",[48,49532,49533],{},"Nowadays, many brokers provide their own storage solutions or even support tiered storage, which guarantees persistence for any data processed by the brokers. This also means that we only need to work with brokers and their topics, which allows us to have an easier and cleaner system.",[48,49535,49536],{},"Ultimately, we chose to build our system with Apache Pulsar as the basic framework. Pulsar is a cloud-native streaming and messaging system with the following key features.",[321,49538,49539,49542,49551],{},[324,49540,49541],{},"A loosely-coupled architecture. Pulsar uses Apache BookKeeper as its storage engine. This allows us to independently scale out the storage cluster without changing the number of brokers if we need to store more data.",[324,49543,49544,49545,49550],{},"A pluggable protocol handler. Pulsar’s ",[55,49546,49549],{"href":49547,"rel":49548},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fblob\u002Fmaster\u002Fpulsar-broker\u002Fsrc\u002Fmain\u002Fjava\u002Forg\u002Fapache\u002Fpulsar\u002Fbroker\u002Fprotocol\u002FProtocolHandler.java",[264],"protocol handler"," enables us to work with multiple protocols with just one broadcaster. It supports MQTT, Kafka, and many other brokers. This makes it very convenient to ingest data from various sources into a centralized Pulsar cluster.",[324,49552,49553],{},"High performance and low latency. Pulsar shows excellent performance as we tested it using different benchmarks. We will talk about this in more detail later.",[48,49555,49556],{},"So, does Pulsar meet the performance requirements of our use case? Let’s take a look at the breakdown of our requirements.",[321,49558,49559,49562,49565],{},[324,49560,49561],{},"A large number of consumers. Brokers should be able to manage messages and broadcast them to up to 100K vehicles.",[324,49563,49564],{},"Low latency. We have tons of notifications generated against the data in real time, which need to be broadcast at an end-to-end (E2E) latency of less than 1 second. In our case, the end-to-end latency refers to the duration between the time a message is produced by cloud services and the time it is received by the vehicle. Technically, it contains two phases - producing and consuming.",[324,49566,49567],{},"Large messages. Brokers should be able to handle large messages from cameras (for example, video streams) without performance issues. Most brokers focus on handling small messages, such as event data from microservices on the cloud, which are usually about several hundred kilobytes at most. When messages become larger, these brokers may have performance problems.",[48,49569,49570],{},"In this blog, we will focus on the first 2 requirements, namely how to broadcast messages for 100K consumers with an end-to-end latency of less than 1 second.",[40,49572,49574],{"id":49573},"benchmark-testing","Benchmark testing",[48,49576,49577],{},"To understand how Pulsar fits into our use case, we performed some benchmark tests on Pulsar and I will introduce some of them in this section.",[48,49579,49580],{},"Figure 2 shows the general structure of our benchmark tests.",[48,49582,49583],{},[384,49584],{"alt":49585,"src":49586},"The structure of how NTT did their benchmark tests","\u002Fimgs\u002Fblogs\u002F63be152b5bc73f70a44008fa_NTT-blog-image2.png",[321,49588,49589,49592,49595,49598,49601,49604],{},[324,49590,49591],{},"Broadcast task: Only 1 publisher sending messages to 1 persistent topic with a single Pulsar broker",[324,49593,49594],{},"Consumers: 20K-100K consumers (shared subscription)",[324,49596,49597],{},"Message size: 10 KB",[324,49599,49600],{},"Message dispatch rate: 1 msg\u002Fs",[324,49602,49603],{},"Pulsar version: 2.10",[324,49605,49606],{},"Benchmark: OpenMessaging Benchmark Framework (OMB)",[48,49608,49609],{},"Figure 3 shows our client and cluster configurations.",[48,49611,49612],{},[384,49613],{"alt":49614,"src":49615},"Diagram of NTT's client and cluster configurations","\u002Fimgs\u002Fblogs\u002F63be1599f7dfe733e19dfc19_NTT-blog-image3.png",[48,49617,49618],{},"We performed the benchmark tests on Amazon Web Services (AWS), with both the broker and bookies using the same machine type (i3.4xlarge). We provided sufficient network (10 Gbit) and storage (2 SSDs) resources for each node to avoid hardware bottlenecks. This allowed us to focus on the performance of Pulsar itself. As we had too many consumers, we put them onto several servers, or clients in Figure 3.",[32,49620,49622],{"id":49621},"overall-benchmark-results","Overall benchmark results",[48,49624,49625],{},"Table 1 displays our benchmark results. We can see that Pulsar worked well with 20K consumers, recording a P99 latency of 0.68 seconds and a connection time of about 4 minutes. Both of them are acceptable in real-world usage.",[48,49627,49628],{},[384,49629],{"alt":49630,"src":49631},"Table of benchmark test results","\u002Fimgs\u002Fblogs\u002F63be1bf861fd137a60dda464_Screen-Shot-2023-01-10-at-6.16.06-PM.png",[321,49633,49634],{},[324,49635,49636],{},"Connection time: the time between the start of the connections to all consumers and the end of all the connections.",[48,49638,49639],{},"As the number of consumers increased, we noticed a decline in performance. When we had 30K consumers, the P99 latency exceeded 1 second. When 40K consumers were involved, the P99 latency even topped 4 seconds, with a connection time of nearly 20 minutes, which is too long for our use case. For 100K consumers, they even failed to establish the connections since they took too much time.",[32,49641,49643],{"id":49642},"a-polynomial-curve-the-connection-time-and-the-number-of-consumers","A polynomial curve: The connection time and the number of consumers",[48,49645,49646],{},"To understand how the connection time is related to consumers, we conducted further research and made a polynomial curve for the approximations of the collection time as the number of consumers increases.",[48,49648,49649],{},[384,49650],{"alt":49651,"src":49652},"Chart showing the connection time and the number of consumers ","\u002Fimgs\u002Fblogs\u002F63be1828096bb97d9e19b17d_NTT-blog-image4.png",[48,49654,49655],{},"Based on the curve, we expected the connection time to reach 8,000 seconds (about 2.2 hours) at 100K consumers, which is unacceptable for our case.",[32,49657,49659],{"id":49658},"connection-time-distribution-the-long-tail-problem","Connection time distribution: The long tail problem",[48,49661,49662],{},"In addition, for the case with 20K consumers, we measured the connection time of each consumer and created a histogram to see the time distribution across them, as depicted in Figure 5.",[48,49664,49665],{},[384,49666],{"alt":49667,"src":49668},"Histogram of connection time","\u002Fimgs\u002Fblogs\u002F63be185c61fd130442da3dac_NTT-blog-image5.png",[48,49670,49671],{},"The Y-axis represents the number of consumers that finished their connections within the time range on the X-axis. As shown in Figure 5, about 20% of connections finished in about 3 seconds, and more than half of the connections finished within one minute. The problem lay with the long tail. Some consumers even spent more than 200 seconds, which greatly affected the overall connection time.",[32,49673,49675],{"id":49674},"a-breakdown-of-p99-latency","A breakdown of P99 latency",[48,49677,49678],{},"For the P99 latency, we split it into six stages and measured their respective processing time in the 40K-consumer case.",[48,49680,49681],{},[384,49682],{"alt":49683,"src":49684},"Six stages of P99 latency for 40K consumers","\u002Fimgs\u002Fblogs\u002F63be18b2f7dfe75124a1131c_NTT-blog-image6.png",[1666,49686,49687,49690,49693,49696,49699,49702],{},[324,49688,49689],{},"Producing: Includes message production by the publisher, network communications, and protocol processing.",[324,49691,49692],{},"Broker internal process: Includes message deduplication, transformation, and other processes.",[324,49694,49695],{},"Message persistence: The communication between the broker and BookKeeper.",[324,49697,49698],{},"Notification: The broker receives an update notification from BookKeeper.",[324,49700,49701],{},"Broker internal process: The broker prepares the message for consumption.",[324,49703,49704],{},"Broadcasting: All the messages are broadcast to all the consumers.",[48,49706,49707],{},"Our results show that message persistence took up about 27% of the total latency while broadcasting accounted for about 33%. These two stages combined were responsible for most of the delay time, so we needed to focus on reducing the latency for them specifically.",[48,49709,49710],{},"Before I continue to explain how we worked out a solution, let’s review the conclusion of our benchmark results.",[1666,49712,49713,49716],{},[324,49714,49715],{},"Pulsar is already good enough for scenarios where there are no more than 20K consumers with a P99 latency requirement of less than 0.7s. The consumer connection time is also acceptable.",[324,49717,49718],{},"As the number of consumers increases, it takes more time for connections to finish. For 100K consumers, Pulsar still needs to be improved in terms of latency and connection time. For latency, the persistence (connections with BookKeeper) and broadcasting (connections with consumers & acks) stages take too much time.",[40,49720,49722],{"id":49721},"approaches-to-100k-consumers","Approaches to 100K consumers",[48,49724,49725],{},"There are typically two ways to improve performance: scale-up and scale-out. In our case, we can understand them in the following ways.",[321,49727,49728,49731],{},[324,49729,49730],{},"Scale-up: Improve the performance of a single broker.",[324,49732,49733,49734,18054],{},"Scale-out: Let multiple brokers handle one topic at the same time. One of the possible scale-out solutions is called “Shadow Topic”, proposed by a Pulsar PMC member. It allows us to distribute subscriptions across multiple brokers by creating \"copies\" of the original topic. See ",[55,49735,49738],{"href":49736,"rel":49737},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F16153",[264],"PIP-180",[48,49740,49741],{},"This blog will focus on the first approach. More specifically, we created a broadcast-specific model for better performance and resolved the task congestion issue when there are too many connections.",[32,49743,49745],{"id":49744},"four-subscription-types-in-pulsar","Four subscription types in Pulsar",[48,49747,49748],{},"First, let’s explore Pulsar’s subscription model. In fact, many brokers share similar models. In Pulsar, a topic must have at least one subscription to dispatch messages and each consumer must be linked to one subscription to receive messages. A subscription is responsible for transferring messages from topics. There are four types of subscriptions in Pulsar.",[48,49750,49751],{},[384,49752],{"alt":49753,"src":49754},"Diagram showing four subscription types in Pulsar","\u002Fimgs\u002Fblogs\u002F63be1900b949328cee8e2dcb_NTT-blog-image7.png",[321,49756,49757,49760,49763,49766],{},[324,49758,49759],{},"Exclusive. Only one consumer is allowed to be associated with the subscription. This means if the consumer crashes or disconnects, the messages in this subscription will not be processed anymore.",[324,49761,49762],{},"Failover. Supports multiple consumers, but only one of the consumers can receive messages. When the working consumer crashes or disconnects, Pulsar can switch to another consumer to make sure messages keep being processed.",[324,49764,49765],{},"Shared. Distributes messages across multiple consumers. Each consumer will only receive parts of the messages, and the number of messages will be well-balanced across every consumer.",[324,49767,49768],{},"Key_Shared. Similar to Shared subscriptions, Key_Shared subscriptions allow multiple consumers to be attached to the same subscription. Messages are delivered across consumers and the messages with the same key or same ordering key are sent to only one consumer.",[48,49770,49771],{},"A problem with the subscription types is that there is no model designed for sending the same messages to multiple consumers. This means in our broadcasting case, we must create a subscription for each consumer. As shown in Figure 8, for example, we used 4 exclusive subscriptions, and each of them had a connected consumer, allowing us to broadcast messages to all of them.",[48,49773,49774],{},[384,49775],{"alt":49776,"src":49777},"Diagram showing each exclusive subscription has a consumer attached","\u002Fimgs\u002Fblogs\u002F63be196408486ea2e5283b2b_NTT-blog-image8.png",[32,49779,49781],{"id":49780},"using-multiple-subscriptions-for-broadcasting-messages","Using multiple subscriptions for broadcasting messages",[48,49783,49784],{},"However, creating multiple subscriptions can increase latency, especially when you have too many consumers. To understand the reason, let’s take a look at how a subscription works. Figure 9 displays the general design of a subscription, which is comprised of three components:",[1666,49786,49787,49790,49793],{},[324,49788,49789],{},"The subscription itself.",[324,49791,49792],{},"Cursor. You use a cursor to track the position of consumers. You can consider it as a message ID, or the position on the message stream. This information will also be synchronized with the metadata store, which means you can resume consumption from this position even after the broker restarts.",[324,49794,49795],{},"Dispatcher. It is the only functional part of the subscription, which communicates with BookKeeper and checks if there are any new messages written to BookKeeper. If there are new messages, it will pull them out and send them to consumers.",[48,49797,49798],{},[384,49799],{"alt":49800,"src":49801},"Diagram of subscription components","\u002Fimgs\u002Fblogs\u002F63be19d6a5aeb58699adb203_NTT-blog-image9.png",[48,49803,49804],{},"As the dispatcher communicates with BookKeeper, each dispatch has its own connection to BookKeeper. This comes with a problem when you have too many consumers. In our case, 100K consumers were attached to 100K subscriptions, requiring 100K connections to BookKeeper. This huge number of connections was clearly a performance bottleneck.",[48,49806,49807],{},"In fact, these connections were redundant and unnecessary. This is because for this broadcasting task, all the consumers used their respective subscriptions just to retrieve the same messages from the same topic. Even for the cursor, as we sent the same data at the same time, we did not expect too many differences between these cursors. Theoretically, one cursor should be enough.",[32,49809,49811],{"id":49810},"broadcast-subscription-with-virtual-cursors","Broadcast Subscription with virtual cursors",[48,49813,49814],{},"To improve performance, we redesigned the subscription model specifically for handling large volumes of consumers (see Figure 10). The new structure guarantees the message order for each consumer. It shares many functions with the existing subscription model, such as cumulative acknowledgment.",[48,49816,49817],{},"In the new model, only one subscription exists to serve multiple consumers, which means there is only one dispatcher. As only a single connection to BookKeeper is allowed, this method can greatly reduce the load on BookKeeper and lower the latency. Additionally, since the subscription only has one cursor, there is no metadata duplication.",[48,49819,49820],{},[384,49821],{"alt":49822,"src":49823},"Figure of the new subscription model for the Broadcast Subscription","\u002Fimgs\u002Fblogs\u002F63be1a22b94932465c8ec2e4_NTT-blog-image10.png",[48,49825,49826],{},"In Pulsar, when consumers fail to receive or acknowledge messages, we need to resend the messages. To achieve this for one subscription and multiple consumers, we introduced a lightweight “virtual cursor” for each consumer to record the incremental position of the main cursor. The virtual cursor has a lightweight design; it does not contain any other information other than the incremental position. It allowed us to identify unread messages by comparing the virtual cursors and the data stored on BookKeeper. This way, we could keep unprocessed messages and delete any acknowledged ones.",[32,49828,49830],{"id":49829},"evaluating-the-performance-of-the-new-subscription-model","Evaluating the performance of the new subscription model",[48,49832,49833],{},"With this new subscription model, we evaluated its performance using 30K, 40K, and 100K consumers. The baseline is the shared subscription, which had the best result among all four original subscription models.",[48,49835,49836],{},[384,49837],{"alt":49838,"src":49839},"Table with benchmark test results of the Broadcast Subscription","\u002Fimgs\u002Fblogs\u002F63be1a725e78e6587ac87aa5_Screen-Shot-2023-01-10-at-6.09.40-PM.png",[48,49841,49842],{},"As shown in Table 2, when we had 40K consumers, the P99 latency of the Broadcast Subscription was almost 6 times faster than the original Shared Subscription. The connection time also saw a significant decrease as we only had one subscription. Even with 100K consumers, all the connections finished in just about 77.3 seconds. Although the results were extremely impressive, we still wanted a better P99 latency of less than 1 second.",[32,49844,49846],{"id":49845},"optimizing-orderedexecutor","Optimizing OrderedExecutor",[48,49848,49849],{},"In our benchmark evaluation, we found another factor that could lead to high latency: OrderedExecutor.",[48,49851,49852,49853,49858],{},"Let’s first explore how ",[55,49854,49857],{"href":49855,"rel":49856},"https:\u002F\u002Fbookkeeper.apache.org\u002Fdocs\u002Flatest\u002Fapi\u002Fjavadoc\u002Forg\u002Fapache\u002Fbookkeeper\u002Fcommon\u002Futil\u002FOrderedExecutor.html",[264],"OrderedExecutor"," works. BookKeeper provides OrderedExecutor in org.apache.bookkeeper.common.util. It guarantees that tasks with the same key are executed in the same thread. As we can see from the code snippet below, if we provide the same ordering key, we will always return the same thread with chooseThread. It helps us keep the order of tasks. When sending messages, Pulsar can run sustaining jobs with the same key, ensuring messages are sent in the expected order. This is widely used in Pulsar.",[8325,49860,49863],{"className":49861,"code":49862,"language":8330},[8328],"public void executeOrdered(Object orderingKey, Runnable r) {\n        chooseThread(orderingKey).execute(r);\n}\n",[4926,49864,49862],{"__ignoreMap":18},[48,49866,49867],{},"We found two problems caused by OrderedExecutor according to our test results.",[48,49869,49870],{},"First, when we split 100K consumers into different Broadcast Subscriptions, the latency did not change too much. For example, we created four Broadcast Subscriptions with 25K consumers attached to each of them and hoped this approach would further reduce latency given its parallelization. In addition, dividing consumers into different groups should also help the broker have better communication with BookKeeper. However, we found that it had no noticeable effect on our benchmark results.",[48,49872,49873],{},"The reason is that Pulsar uses the topic name as the ordering key. This means that all the messages of the same tasks are sequentialized at the topic level. However, we know that subscriptions are independent of each other. It is unnecessary to guarantee the order across all the subscriptions. We just need to keep the message order within one subscription. A natural solution is to change the key to the subscription name.",[48,49875,49876],{},"The second one is more interesting. In terms of message acknowledgments, we noticed a very high long-tail latency. On average, acknowledgments finished in 0.5 seconds, but the slowest one took up to 7 seconds, which greatly affected the overall P99 latency. We carried out further research but did not find any problems in the network or consumers. This high latency issue could always be reproduced in every benchmark test.",[48,49878,49879],{},"Finally, we found that this issue was caused by the way Pulsar handles acknowledgments. Pulsar uses two individual tasks to complete the message-sending process - one for sending the message and the other for the ACK. For each message sent by the consumer, Pulsar generates these two tasks and pushes them to OrderedExecutor.",[48,49881,49882],{},[384,49883],{"alt":49884,"src":49885},"Figure of two tasks in the same thread","\u002Fimgs\u002Fblogs\u002F63be1b196d2f836330dc03f5_NTT-blog-image11.png",[48,49887,49888],{},"To guarantee the order of messages, Pulsar always adds them to the same thread, which is suitable for many use cases. However, things are slightly different when you have 100K consumers. As shown in Figure 12, Pulsar generates 200K tasks, all of which are inserted into a single thread. This means other tasks might also exist between a pair of SEND and ACK tasks. In these cases, Pulsar first runs the in-between tasks before the ACK task can be processed, leading to a longer latency. In a worst-case scenario, there might be 10,000 in-between tasks.",[48,49890,49891],{},[384,49892],{"alt":49893,"src":49894},"Figure shows other tasks might exist between a pair of SEND and ACK tasks.","\u002Fimgs\u002Fblogs\u002F63be1b65fffd8d8bce75a41d_NTT-blog-image12.png",[48,49896,49897],{},"For our case, we only need to send messages in order while their ACK tasks can be placed anywhere. Therefore, to solve this problem, we used a random thread for ACK tasks instead of the same thread. As shown in Table 3, our final test with the updated logic of OrderExecutor shows some promising results.",[48,49899,49900],{},[384,49901],{"alt":49902,"src":49903},"Table showing test results of Broadcast Subscription with improved OrderedExecutor","\u002Fimgs\u002Fblogs\u002F63be1b9f62fe43442351b3fe_Screen-Shot-2023-01-10-at-6.14.42-PM.png",[48,49905,49906],{},"Compared with the previous test using the original OrderedExecutor logic, the P99 latency in this test for 100K consumers was about 4 times shorter and the connection time was reduced by half. The latest design also worked well for 30K consumers, the connection time of which was about 2.5 times faster.",[40,49908,2125],{"id":2122},[48,49910,49911],{},"Pulsar has a flexible design and its performance is already good enough for many use cases. However, when you need to handle special cases where a large number of consumers exist, it may be a good idea to implement your own subscription model. This will help improve Pulsar’s performance dramatically.",[48,49913,49914],{},"Additionally, using OrderedExecutor in the right way is also important to the overall performance. When you have a large number of SEND and ACK tasks that need to be processed in a short time, you may want to optimize the original logic given the additional in-between tasks.",[40,49916,38376],{"id":38375},[321,49918,49919,49923,49931,49936],{},[324,49920,45216,49921,47757],{},[55,49922,38404],{"href":45219},[324,49924,47760,49925,1154,49928,45209],{},[55,49926,47764],{"href":45463,"rel":49927},[264],[55,49929,47768],{"href":45206,"rel":49930},[264],[324,49932,45223,49933,45227],{},[55,49934,31914],{"href":31912,"rel":49935},[264],[324,49937,36219,49938,49940],{},[55,49939,38410],{"href":27690}," for the latest performance comparison on maximum throughput, publish latency, and historical read rate.",[48,49942,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":49944},[49945,49946,49947,49953,49960,49961],{"id":19155,"depth":19,"text":19156},{"id":49526,"depth":19,"text":49527},{"id":49573,"depth":19,"text":49574,"children":49948},[49949,49950,49951,49952],{"id":49621,"depth":279,"text":49622},{"id":49642,"depth":279,"text":49643},{"id":49658,"depth":279,"text":49659},{"id":49674,"depth":279,"text":49675},{"id":49721,"depth":19,"text":49722,"children":49954},[49955,49956,49957,49958,49959],{"id":49744,"depth":279,"text":49745},{"id":49780,"depth":279,"text":49781},{"id":49810,"depth":279,"text":49811},{"id":49829,"depth":279,"text":49830},{"id":49845,"depth":279,"text":49846},{"id":2122,"depth":19,"text":2125},{"id":38375,"depth":19,"text":38376},"2023-01-10","Learn why NTT chose Apache Pulsar for smart city and how Pulsar broadcasts messages for 100K consumers with an end-to-end latency of less than 1 second.","\u002Fimgs\u002Fblogs\u002F63be1ce6b7a9a44b9a5eb1de_pulsar-ntt-one-topic-100k-top.jpg",{},{"title":49490,"description":49963},"blog\u002Fhandling-100k-consumers-with-one-pulsar-topic",[821],"1HKbZk4aW0ocQ9p0WvkFNVJMAY9FyPuRpiZOTKcpmaY",{"id":49971,"title":49972,"authors":49973,"body":49974,"category":821,"createdAt":290,"date":50338,"description":49972,"extension":8,"featured":294,"image":50339,"isDraft":294,"link":290,"meta":50340,"navigation":7,"order":296,"path":50341,"readingTime":290,"relatedResources":290,"seo":50342,"stem":50343,"tags":50344,"__hash__":50345},"blogs\u002Fblog\u002Fachieving-broker-load-balancing-apache-pulsar.md","Achieving Broker Load Balancing with Apache Pulsar",[36525,36526],{"type":15,"value":49975,"toc":50315},[49976,49978,49981,49984,49988,49991,49994,49997,50001,50004,50012,50016,50019,50022,50025,50028,50031,50037,50040,50044,50047,50050,50058,50061,50065,50068,50076,50080,50083,50108,50112,50118,50121,50125,50128,50136,50140,50146,50150,50153,50159,50162,50166,50169,50172,50178,50181,50187,50193,50197,50199,50208,50214,50223,50227,50233,50236,50239,50243,50246,50252,50255,50270,50272,50275,50292,50294,50297],[40,49977,46],{"id":42},[48,49979,49980],{},"In this blog, we talk about the importance of load balancing in distributed computing systems and provide a deep dive on how Pulsar handles broker load balancing. First, we’ll cover Pulsar’s topic-bundle grouping, bundle-broker ownership, and load data models. Then, we'll walk through Pulsar’s load balancing logic with sequence diagrams that demonstrate bundle assignment, split, and shedding. By the end of this blog, you’ll understand how Pulsar dynamically balances brokers.",[48,49982,49983],{},"Before we dive into the details of Pulsar’s broker load balancing, we'll briefly discuss the challenges of distributed computing, and specifically, systems with monolithic architectures.",[32,49985,49987],{"id":49986},"the-challenges-of-load-balancing-in-distributed-streaming","The challenges of load balancing in distributed streaming",[48,49989,49990],{},"A key challenge of distributed computing is load balancing. Distributed systems need to evenly distribute message loads among servers to avoid overloaded servers that can malfunction and harm the performance of the cluster. Topics are naturally a good choice to partition messages because messages under the same topic (or topic partition) can be grouped and served by a single logical server. In most distributed streaming systems, including Pulsar, topics or groups of topics are considered a load-balance entity, where the systems need to evenly distribute the message load among the servers.",[48,49992,49993],{},"Topic load balancing can be challenging when topic loads are unpredictable. When there is a load increase in certain topics, these topics must offload directly or repartition to redistribute the load to other machines. Alternatively, when machines receive low traffic or become idle, the cluster needs to rebalance to avoid wasting server resources.",[48,49995,49996],{},"Dynamic rebalancing can be difficult in monolithic architectures, where messages are both served and persisted in the same stateful server. In monolithic streaming systems, rebalancing often involves copying messages from one server to another. Admins must carefully compute the initial topic distribution to avoid future rebalancing as much as possible. In many cases, they need careful orchestration to execute topic rebalancing.",[32,49998,50000],{"id":49999},"an-overview-of-load-balancing-in-pulsar","An overview of load balancing in Pulsar",[48,50002,50003],{},"By contrast, Apache Pulsar is equipped with automatic broker load balancing that requires no admin intervention. Pulsar’s architecture separates storage and compute, making the broker-topic assignment more flexible. Pulsar brokers persist messages in the storage servers, which removes the need for Pulsar to copy messages from one broker to another when rebalancing topics among brokers. In this scenario, the new broker simply looks up the metadata store to point to the correct storage servers where the topic messages are located.",[48,50005,50006,50007,190],{},"Let's briefly talk about the Pulsar storage architecture to have the complete Pulsar's scaling context here. On the storage side, topic messages are segmented into Ledgers, and these Ledgers are distributed to multiple BookKeeper servers, known as bookies. Pulsar horizontally scales its bookies to distribute as many Ledger (Segment) entities as possible. For a high write load, if all bookies are full, you could add more bookies, and the new message entries (new ledgers) will be placed on the new bookies. With this segmentation, during the storage scaling, Pulsar does not involve recopying old messages from bookies. For a high read load, because Pulsar caches messages in the brokers' memory, the read load on the bookies significantly offloads to the brokers, which are load-balanced. You can read more about Pulsar Storage architecture and scaling information in the blog post ",[55,50008,50011],{"href":50009,"rel":50010},"https:\u002F\u002Fwww.splunk.com\u002Fen_us\u002Fblog\u002Fit\u002Fcomparing-pulsar-and-kafka-how-a-segment-based-architecture-delivers-better-performance-scalability-and-resilience.html",[264],"Comparing Pulsar and Kafka",[40,50013,50015],{"id":50014},"topics-are-assigned-to-brokers-at-the-bundle-level","Topics are assigned to brokers at the bundle level",[48,50017,50018],{},"From the client perspective, Pulsar topics are the basic units in which clients publish and consume messages. On the broker side, a single broker will serve all the messages for a topic from all clients. A topic can be partitioned, and partitions will be distributed to multiple brokers. You could regard a topic partition as a topic and a partitioned topic as a group of topics.",[48,50020,50021],{},"Because it would be inefficient for each broker to serve only one topic, brokers need to serve multiple topics simultaneously. For this multi-topic ownership, the concept of a bundle was introduced in Pulsar to represent a middle-layer group.",[48,50023,50024],{},"Related topics are logically grouped into a namespace, which is the administrative unit. For instance, you can set configuration policies that apply to all the topics in a namespace. Internally, a namespace is divided into shards, aka the bundles. Each of these bundles becomes an assignment unit.",[48,50026,50027],{},"Pulsar uses bundles to shard topics, which will help reduce the amount of information to track. For example, Pulsar LoadManger aggregates topic load statistics, such as message rates at the bundle layer, which helps reduce the number of load samples to monitor. Also, Pulsar needs to track which broker currently serves a particular topic. With bundles, Pulsar can reduce the space needed for this ownership mapping.",[48,50029,50030],{},"Pulsar uses a hash to map topics to bundles. Here’s an example of two bundles in a namespace.",[8325,50032,50035],{"className":50033,"code":50034,"language":8330},[8328],"\nBundle_Key_Partitions : [0x00000000, 0x80000000, 0xFFFFFFFF]\nBundle1_Key_Range: [0x00000000, 0x80000000)\nBundle2_Key_Range: [0x80000000, 0xFFFFFFFF]\n\n",[4926,50036,50034],{"__ignoreMap":18},[48,50038,50039],{},"Pulsar computes the hashcode given topic name by Long hashcode = hash(topicName). Let’s say hash(“my-topic”) = 0x0000000F. Then Pulsar could do a binary search by NamespaceBundle getBundle(hashCode) to which bundle the topic belongs given the bundle key ranges. In this example, “Bundle1” is the one to which “my-topic” belongs.",[40,50041,50043],{"id":50042},"brokers-dynamically-own-bundles-on-demand","Brokers dynamically own bundles on demand",[48,50045,50046],{},"One of the advantages of Pulsar’s compute (brokers) and storage (bookies) separation is that Pulsar brokers can be stateless and horizontally scalable with dynamic bundle ownership. When brokers are overloaded, more brokers can be easily added to a cluster and redistribute bundle ownerships.",[48,50048,50049],{},"To discover the current bundle-broker ownership in a given topic, Pulsar uses a server-side discovery mechanism that redirects clients to the owner brokers’ URLs. This discovery logic requires:",[321,50051,50052,50055],{},[324,50053,50054],{},"Bundle key ranges for a given namespace, in order to map a topic to a bundle.",[324,50056,50057],{},"Bundle-Broker ownership mapping to direct the client to the current owner or to trigger a new ownership acquisition in case there is no broker assigned.",[48,50059,50060],{},"Pulsar stores bundle ranges and ownership mapping in the metadata store, such as ZooKeeper or etcd, and the information is also cached by each broker.",[40,50062,50064],{"id":50063},"load-data-model","Load data model",[48,50066,50067],{},"Collecting up-to-date load information from brokers is crucial to load balancing decisions. Pulsar constantly updates the following load data in the memory cache and metadata store and replicates it to the leader broker. Based on this load data, the leader broker runs topic-broker assignment, bundle split, and unload logic:",[321,50069,50070,50073],{},[324,50071,50072],{},"Bundle Load Data contains bundle-specific load information, such as bundle-specific msg in\u002Fout rates.",[324,50074,50075],{},"Broker Load Data contains broker-specific load information, such as CPU, memory, and network throughput in\u002Fout rates.",[40,50077,50079],{"id":50078},"load-balance-sequence","Load balance sequence",[48,50081,50082],{},"In this section, we’ll walk through load balancing logic with sequence diagrams:",[1666,50084,50085,50094,50101],{},[324,50086,50087,50088,50093],{},"Assigning topics to brokers dynamically (",[55,50089,50092],{"href":50090,"rel":50091},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fadministration-load-balance\u002F#assign-topics-to-brokers-dynamically",[264],"Read the complete documentation",".)",[324,50095,50096,50097,50093],{},"Splitting overloaded bundles (",[55,50098,50092],{"href":50099,"rel":50100},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fadministration-load-balance\u002F#split-namespace-bundles",[264],[324,50102,50103,50104,50093],{},"Shedding bundles from overloaded brokers (",[55,50105,50092],{"href":50106,"rel":50107},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fadministration-load-balance\u002F#shed-load-automatically",[264],[32,50109,50111],{"id":50110},"assigning-topics-to-brokers-dynamically","Assigning topics to brokers dynamically",[48,50113,50114],{},[384,50115],{"alt":50116,"src":50117},"brokers illustration","\u002Fimgs\u002Fblogs\u002F63b4023480291650b0b03b6e_lb1.png",[48,50119,50120],{},"Imagine a client trying to connect to a broker for a topic. The client connects to a random broker, and the broker first searches the matching bundle by the hash of the topic and its namespace bundle ranges. Then the broker checks if any broker already owns the bundle in the metadata store. If already owned, the broker redirects the client to the owner URL. Otherwise, the broker redirects the client to the leader for a broker assignment. For the assignment, the leader first filters out available brokers by the configured rules and then randomly selects one of the least loaded brokers to the bundle, as shown in Section 1 below, and returns its URL. The leader redirects the client to the returned URL, and the client connects to the assigned broker. This new broker-bundle ownership creates an ephemeral lock in the metadata store, and the lock is automatically released if the owner becomes unavailable.",[32,50122,50124],{"id":50123},"section-1-selecting-a-broker","Section 1: Selecting a broker",[48,50126,50127],{},"This step selects a broker from the filtered broker list. As a tie-breaker strategy, it uses ModularLoadManagerStrategy (LeastLongTermMessageRate by default). LeastLongTermMessageRate computes brokers’ load scores and randomly selects one among the minimal scores by the following logic:",[321,50129,50130,50133],{},[324,50131,50132],{},"If the maximum local usage of CPU, memory, and network is bigger than the LoadBalancerBrokerOverloadedThresholdPercentage (default 85%), then score=INF.",[324,50134,50135],{},"Otherwise, score = longTermMsgIn rate and longTermMsgOut rate.",[40,50137,50139],{"id":50138},"splitting-overloaded-bundles","Splitting overloaded bundles",[48,50141,50142,50145],{},[384,50143],{"alt":18,"src":50144},"\u002Fimgs\u002Fblogs\u002F63b402344106f46a97e93e31_lb2.png","\nWith the bundle load data, the leader broker identifies which bundles are overloaded beyond the threshold as shown in Section 2 below and asks the owner broker to split them. For the split, the owner broker first computes split positions, as shown in Section 3 below, and repartition the target bundles at them, as shown in Section 4 below. After the split, the owner broker updates the bundle ownerships and ranges in the metadata store. The newly split bundles can be automatically unloaded from the owner broker, configurable by the LoadBalancerAutoUnloadSplitBundlesEnabled flag.",[32,50147,50149],{"id":50148},"section-2-finding-target-bundles","Section 2: Finding target bundles",[48,50151,50152],{},"If the auto bundle split is enabled by loadBalancerAutoBundleSplitEnabled (default true) configuration, the leader broker checks if any bundle’s load is beyond LoadBalancerNamespaceBundle* thresholds.",[8325,50154,50157],{"className":50155,"code":50156,"language":8330},[8328],"\nDefaults\nLoadBalancerNamespaceBundleMaxTopics = 1000\nLoadBalancerNamespaceBundleMaxSessions = 1000\nLoadBalancerNamespaceBundleMaxMsgRate = 30000\nLoadBalancerNamespaceBundleMaxBandwidthMbytes = 100\nLoadBalancerNamespaceMaximumBundles = 128\n\n",[4926,50158,50156],{"__ignoreMap":18},[48,50160,50161],{},"If the number of bundles in the namespace is already larger than or equal to MaximumBundles, it skips the split logic.",[32,50163,50165],{"id":50164},"section-3-computing-bundle-split-boundaries","Section 3: Computing bundle split boundaries",[48,50167,50168],{},"Split operations compute the target bundle’s range boundaries to split. The bundle split boundary algorithm is configurable by supportedNamespaceBundleSplitAlgorithms.",[48,50170,50171],{},"If we have two bundle ranges in a namespace with range partitions (0x0000, 0X8000, 0xFFFF), and we are currently targeting the first bundle range (0x0000, 0x8000) to split:",[48,50173,50174,50175,190],{},"RANGE_EQUALLY_DIVIDE_NAME (default): This algorithm divides the bundle into two parts with the same hash range size, for example ttarget bundle to split=(0x0000, 0x8000) => bundle split boundary=",[2628,50176,50177],{},"0x4000",[48,50179,50180],{},"TOPIC_COUNT_EQUALLY_DIVIDE: It divides the bundle into two parts with the same topic count. Let’s say there are 6 topics in the target bundle [0x0000, 0x8000):",[8325,50182,50185],{"className":50183,"code":50184,"language":8330},[8328],"\nhash(topic1) = 0x0000\nhash(topic2) = 0x0005\nhash(topic3) = 0x0010\nhash(topic4) = 0x0015\nhash(topic5) = 0x0020\nhash(topic6) = 0x0025\n\n",[4926,50186,50184],{"__ignoreMap":18},[48,50188,50189,50190,190],{},"Here we want to split at 0x0012 to make the left and right sides of the number of topics the same. E.g. target bundle to split [0x0000, 0x8000) => bundle split boundary=",[2628,50191,50192],{},"0x0012",[32,50194,50196],{"id":50195},"section-4-splitting-bundles-by-boundaries","Section 4: Splitting bundles by boundaries",[48,50198,9186],{},[48,50200,50201,50202,50205,50206],{},"Given bundle partitions ",[2628,50203,50204],{},"0x0000, 0x8000, 0xFFFF",", splitBoundaries: ",[2628,50207,50177],{},[48,50209,50210,50211],{},"Bundle partitions after split = ",[2628,50212,50213],{},"0x0000, 0x4000, 0x8000, 0xFFFF",[48,50215,50216,50217],{},"Bundles ranges after split = [[0x0000, 0x4000),",[2628,50218,50219,50220],{},"0x4000, 0x8000), ",[2628,50221,50222],{},"0x8000, 0xFFFF",[40,50224,50226],{"id":50225},"shedding-unloading-bundles-from-overloaded-brokers","Shedding (unloading) bundles from overloaded brokers",[48,50228,50229,50232],{},[384,50230],{"alt":18,"src":50231},"\u002Fimgs\u002Fblogs\u002F63b4028d494f09abfefbc61e_lb3.png","\nWith the broker load information collected from all brokers, the leader broker identifies which brokers are overloaded and triggers bundle unload operations, with the objective of rebalancing the traffic throughout the cluster.",[48,50234,50235],{},"Using the default ThresholdShedder strategy, the leader broker computes the average of the maximal resource usage among CPU, memory, and network IO. After that, the leader finds brokers whose load is higher than the average-based threshold, as shown in Section 5 below. If identified, the leader asks the overloaded brokers to unload some bundles of topics, starting from the high throughput ones, enough to bring the broker load to below the critical threshold.",[48,50237,50238],{},"For the unloading request, the owner broker removes the target bundles’ ownerships in the metadata store and closes the client topic connections. Then the clients reinitiate the broker discovery mechanism. Eventually, the leader assigns less-loaded brokers to the unloaded bundles and the clients connect to them.",[32,50240,50242],{"id":50241},"section-5-thresholdshedder-finding-overloaded-brokers","Section 5: ThresholdShedder: finding overloaded brokers",[48,50244,50245],{},"It first computes the average resource usage of all brokers using the following formula.",[8325,50247,50250],{"className":50248,"code":50249,"language":8330},[8328],"\nFor each broker: \n    usage =  \n    max (\n    %cpu * cpuWeight\n    %memory * memoryWeight,\n    %bandwidthIn * bandwidthInWeight,\n    %bandwidthOut * bandwidthOutWeight) \u002F 100;\n\n    usage = x * prevUsage + (1 - x) * usage\n\n    avgUsage = sum(usage) \u002F numBrokers \n\n",[4926,50251,50249],{"__ignoreMap":18},[48,50253,50254],{},"If any broker’s usage is bigger than avgUsage + y, it is considered an overloaded broker.",[321,50256,50257,50264,50267],{},[324,50258,50259,50260,50263],{},"The resource usage “Weight” is by default 1.0 and configurable by ",[4926,50261,50262],{},"loadBalancerResourceWeight"," configurations.",[324,50265,50266],{},"The historical usage multiplier x is configurable by loadBalancerHistoryResourcePercentage. By default, it is 0.9, which weighs the previous usage more than the latest.",[324,50268,50269],{},"The avgUsage buffer y is configurable by ​​loadBalancerBrokerThresholdShedderPercentage, which is 10% by default.",[40,50271,16789],{"id":16788},[48,50273,50274],{},"In this blog, we reviewed the Pulsar broker load balance logic focusing on its sequence. Here are the broker load balance behaviors that I found important in this review.",[321,50276,50277,50280,50283,50286,50289],{},[324,50278,50279],{},"Pulsar groups topics into bundles for easier tracking, and it dynamically assigns and balances the bundles among brokers. If specific bundles are overloaded, they get automatically split to maintain the assignment units to a reasonable level of traffic.",[324,50281,50282],{},"Pulsar collects the global broker (cpu, memory, network usage) and bundle load data (msg in\u002Fout rate) to the leader broker in order to run the algorithmic load balance logic: bundle-broker assignment, bundle splitting, and unloading (shedding).",[324,50284,50285],{},"The bundle-broker assignment logic randomly selects the least loaded brokers and redirects clients to the assigned brokers’ URLs. The broker-bundle ownerships create ephemeral locks in the metadata store, which are automatically released if the owners become unavailable (lose ownership).",[324,50287,50288],{},"The bundle-split logic finds target bundles based on the LoadBalancerNamespaceBundle* configuration thresholds, and by default, the bundle ranges are split evenly. After splits, by default, the owner automatically unloads the newly split bundles.",[324,50290,50291],{},"The auto bundle-unload logic uses the default LoadSheddingStrategy, which finds overloaded brokers based on the average of the max resource usage among CPU, Memory, and Network IO. Then, the leader asks the overloaded brokers to unload some high loaded bundles of topics. Clients’ topic connections under the unloading bundles experience connection close and re-initiate the bundle-broker assignment.",[40,50293,36477],{"id":36476},[48,50295,50296],{},"Stay tuned for more operational content around Pulsar load balance, such as Admin APIs, metrics, logs, and troubleshooting tips. Meanwhile, check out more Pulsar resources:",[321,50298,50299,50307],{},[324,50300,50301,50302,1154,50305,36492],{},"Take Apache Pulsar Training: Take the ",[55,50303,36487],{"href":36485,"rel":50304},[264],[55,50306,36491],{"href":36490},[324,50308,50309,50310,50314],{},"Spin up a Pulsar Cluster in Minutes: If you want to try building microservices without having to set up a Pulsar cluster yourself, ",[55,50311,50313],{"href":17075,"rel":50312},[264],"sign up for StreamNative Cloud today",". StreamNative Cloud is the simple, fast, and cost-effective way to run Pulsar in the public cloud.",{"title":18,"searchDepth":19,"depth":19,"links":50316},[50317,50321,50322,50323,50324,50328,50333,50336,50337],{"id":42,"depth":19,"text":46,"children":50318},[50319,50320],{"id":49986,"depth":279,"text":49987},{"id":49999,"depth":279,"text":50000},{"id":50014,"depth":19,"text":50015},{"id":50042,"depth":19,"text":50043},{"id":50063,"depth":19,"text":50064},{"id":50078,"depth":19,"text":50079,"children":50325},[50326,50327],{"id":50110,"depth":279,"text":50111},{"id":50123,"depth":279,"text":50124},{"id":50138,"depth":19,"text":50139,"children":50329},[50330,50331,50332],{"id":50148,"depth":279,"text":50149},{"id":50164,"depth":279,"text":50165},{"id":50195,"depth":279,"text":50196},{"id":50225,"depth":19,"text":50226,"children":50334},[50335],{"id":50241,"depth":279,"text":50242},{"id":16788,"depth":19,"text":16789},{"id":36476,"depth":19,"text":36477},"2023-01-03","\u002Fimgs\u002Fblogs\u002F63c7f9d38f903d860a03ec52_63b40202fefdbc403c9fb3af_blb2.png",{},"\u002Fblog\u002Fachieving-broker-load-balancing-apache-pulsar",{"title":49972,"description":49972},"blog\u002Fachieving-broker-load-balancing-apache-pulsar",[821],"0TuP3QG-bgK0vwNebhXiPvsg24kZx90nLBJJBbJ9qhU",{"id":50347,"title":43249,"authors":50348,"body":50349,"category":821,"createdAt":290,"date":50338,"description":50613,"extension":8,"featured":294,"image":50614,"isDraft":294,"link":290,"meta":50615,"navigation":7,"order":296,"path":43151,"readingTime":5505,"relatedResources":290,"seo":50616,"stem":50617,"tags":50618,"__hash__":50619},"blogs\u002Fblog\u002Fpulsar-operators-tutorial-part-2-manage-pulsar-custom-resources-argocd.md",[46122],{"type":15,"value":50350,"toc":50609},[50351,50357,50363,50376,50379,50384,50390,50395,50401,50406,50447,50453,50458,50464,50469,50475,50483,50488,50493,50499,50504,50510,50515,50521,50526,50532,50537,50543,50549,50554,50556,50559,50562,50564,50569],[916,50352,50353],{},[48,50354,50355],{},[36,50356,46129],{},[48,50358,46132,50359,50362],{},[55,50360,46135],{"href":50361},"\u002Fblog\u002Fengineering\u002F2022-11-22-pulsar-operators-tutorial-part-1-create-an-apache-pulsar-cluster-on-kubernetes\u002F",", I demonstrated how to deploy a Pulsar cluster using operators. The operators make Pulsar deployment much easier than the installation using Terraform or Ansible.",[48,50364,50365,50366,50370,50371,50375],{},"In this blog, I will demonstrate how to use ",[55,50367,48070],{"href":50368,"rel":50369},"https:\u002F\u002Fgithub.com\u002Fargoproj\u002Fargo-cd",[264]," to control Pulsar ",[55,50372,50374],{"href":44901,"rel":50373},[264],"Custom Resources"," (CRs) by monitoring the GitHub branch\u002Ftag. In Part 1, you can see that the CRs stored in your local environment work properly in a demo\u002Fpoc environment. When dealing with a production system, you will face configuration or infrastructure drift (too many chefs in the kitchen situation). GitOps uses git mechanisms to control CR versions in GitHub repositories with branches or tags. This allows you to easily control, roll back, and upgrade the configurations of your deployment, or find related historical changes.",[48,50377,50378],{},"Let’s begin!",[1666,50380,50381],{},[324,50382,50383],{},"Create a GitHub repository (pulsar-ops in this example) to keep track of the CRs. ArgoCD will track the changes in a folder of a branch, so I put all the CRs under the default folder.",[8325,50385,50388],{"className":50386,"code":50387,"language":8330},[8328],"deployment git:(main) tree\n.\n├── README.md\n└── default\n   ├── bk-cluster.yaml\n   ├── br-cluster.yaml\n   ├── px-cluster.yaml\n   └── zk-cluster.yaml\n1 directory, 5 files\n",[4926,50389,50387],{"__ignoreMap":18},[1666,50391,50392],{"start":19},[324,50393,50394],{},"Once all the changes are committed, add the remote to the GitHub target repository, tag the Pulsar version, then push the CRs upstream.",[8325,50396,50399],{"className":50397,"code":50398,"language":8330},[8328],"deployment git:(main) git remote add origin git@github.com:yuweisung\u002Fpulsar-ops.git\ndeployment git:(main) git tag -a v2.9 -m \"pulsar-2.9\"\ndeployment git:(main) git push --set-upstream origin v2.9\n",[4926,50400,50398],{"__ignoreMap":18},[1666,50402,50403],{"start":279},[324,50404,50405],{},"Now I can go to the ArgoCD Web UI to create a Pulsar cluster app. Log in to ArgoCD, create a new app with the following information, then click CREATE. Note that for Revision, I used Tags for demo. You should use branches in your production environment.",[321,50407,50408,50411,50414,50417,50424,50430,50433,50440],{},[324,50409,50410],{},"‍Application Name: sn-platform",[324,50412,50413],{},"‍Project: default",[324,50415,50416],{},"‍SYNC POLICY: Automatic",[324,50418,50419,50420],{},"‍Repository URL: ",[55,50421,50422],{"href":50422,"rel":50423},"https:\u002F\u002Fgithub.com\u002Fyuweisung\u002Fpulsar-ops.git",[264],[324,50425,50426,50429],{},[55,50427,3931],{"href":50422,"rel":50428},[264],"Revision: v2.9 Tags",[324,50431,50432],{},"‍Path: default",[324,50434,50435,50436],{},"‍Destination: ",[55,50437,50438],{"href":50438,"rel":50439},"https:\u002F\u002Fkubernetes.default.svc",[264],[324,50441,50442,50446],{},[55,50443,3931],{"href":50444,"rel":50445},"https:\u002F\u002Fkubernetes.default.svc\u002F",[264],"Namespace: sn-platform",[48,50448,50449],{},[384,50450],{"alt":50451,"src":50452},"argoCD interface create account","\u002Fimgs\u002Fblogs\u002F63b5686bb14a848ce880227c_image1-230103.png",[1666,50454,50455],{"start":20920},[324,50456,50457],{},"After the app is created, ArgoCD will scan the specific tag\u002Fbranch (hashcode) and sync the CRs to deploy the Pulsar cluster as we did in Part 1 using kubectl apply -f. The app detail shows the progress of the deployment.",[48,50459,50460],{},[384,50461],{"alt":50462,"src":50463},"sn platform interface","\u002Fimgs\u002Fblogs\u002F63b5686c2a0a1d77fa562729_image2-230103.png",[1666,50465,50466],{"start":20934},[324,50467,50468],{},"A few minutes later, ArgoCD will display the synced status and you can find all the manifests and their relations. Once everything is green, ArgoCD will keep watching the tag\u002Fbranch and make sure the current cluster state matches the CRs in the tag\u002Fbranch. If someone manually modifies the CRs or other generated manifests (kubectl edit \u003Ccr.yaml>), ArgoCD will detect the drift and change the CR back to match the tag\u002Fbranch. The only way to change the configuration of the cluster is using git operations.",[48,50470,50471],{},[384,50472],{"alt":50473,"src":50474},"sn workflow","\u002Fimgs\u002Fblogs\u002F63b5686ba5395d1980afb848_image3-230103.png",[1666,50476,50477,50480],{"start":20948},[324,50478,50479],{},"Next, use git push and submit pull requests to control the CR details, such as adding JVM options and changing replicas of CRs. If everything works well, ArgoCD should be able to detect the changes and auto-apply the operation changes.",[324,50481,50482],{},"Change the image version from 2.9.2.15 to 2.9.2.17 and commit the changes.",[48,50484,50485],{},[384,50486],{"alt":18,"src":50487},"\u002Fimgs\u002Fblogs\u002F63b5686b886dec133880085d_image4-230103.png",[1666,50489,50490],{"start":25806},[324,50491,50492],{},"After committing the changes, retag the commit to v2.9 and push with the force option.",[8325,50494,50497],{"className":50495,"code":50496,"language":8330},[8328],"deployment git:(main) git tag -a -f v2.9 1ee752a\nUpdated tag 'v2.9' (was e40f3b6)\ndeployment git:(main) git push origin v2.9 --force\nEnumerating objects: 1, done.\nCounting objects: 100% (1\u002F1), done.\nWriting objects: 100% (1\u002F1), 165 bytes | 165.00 KiB\u002Fs, done.\nTotal 1 (delta 0), reused 0 (delta 0), pack-reused 0\nTo github.com:yuweisung\u002Fpulsar-ops.git\n+ e40f3b6...fc5fb8d v2.9 -> v2.9 (forced update)\n",[4926,50498,50496],{"__ignoreMap":18},[1666,50500,50501],{"start":25812},[324,50502,50503],{},"ArgoCD detects the commit changes and syncs the status automatically. From the Web UI, you can see that some components are in the “spinning” status as it follows the StatefulSet upgrade strategy.",[48,50505,50506],{},[384,50507],{"alt":50508,"src":50509},"sn interface","\u002Fimgs\u002Fblogs\u002F63b568f068e8e802f2aea57c_image5-230103.png",[1666,50511,50512],{"start":25817},[324,50513,50514],{},"Click a ZooKeeper Pod and you can see that the image version has been updated.",[48,50516,50517],{},[384,50518],{"alt":50519,"src":50520},"my sk interface","\u002Fimgs\u002Fblogs\u002F63b5690a652693164d643eae_image6-230103.png",[1666,50522,50523],{"start":25823},[324,50524,50525],{},"Next, I want to use the same GitOps process to scale up\u002Fdown brokers. Let’s modify the br-cluster.yaml file as shown below.",[8325,50527,50530],{"className":50528,"code":50529,"language":8330},[8328],"apiVersion: pulsar.streamnative.io\u002Fv1alpha1\nkind: PulsarBroker\nmetadata:\n name: my\n namespace: sn-platform\nspec:\n image: streamnative\u002Fpulsar:2.9.2.17\n pod:\n   resources:\n     requests:\n       cpu: 200m\n       memory: 256Mi\n   terminationGracePeriodSeconds: 30\n config:\n   custom:\n     webSocketServiceEnabled: \"true\"\n replicas: 3\n zkServers: my-zk-headless:2181\n",[4926,50531,50529],{"__ignoreMap":18},[1666,50533,50534],{"start":25828},[324,50535,50536],{},"Run the git commands to force retagging the new commit to v2.9 and to force pushing it to the remote origin branch. Note that I used --force to retag the change. In your production environment, you should go through the PR and review process.",[8325,50538,50541],{"className":50539,"code":50540,"language":8330},[8328],"deployment git:(main) git add default\u002Fbr-cluster.yaml\ndeployment git:(main) git commit -m 'scale up broker to 3'\n[main 32e0468] scale up broker to 3\n1 file changed, 1 insertion(+), 1 deletion(-)\ndeployment git:(main) git tag -a -f v2.9 32e0468\ndeployment git:(main) git push origin v2.9 --force\nCounting objects: 100% (8\u002F8), done.\nDelta compression using up to 10 threads\nCompressing objects: 100% (5\u002F5), done.\nWriting objects: 100% (5\u002F5), 516 bytes | 516.00 KiB\u002Fs, done.\nTotal 5 (delta 2), reused 0 (delta 0), pack-reused 0\nremote: Resolving deltas: 100% (2\u002F2), completed with 2 local objects.\nTo github.com:yuweisung\u002Fpulsar-ops.git\n+ fc5fb8d...13bf247 v2.9 -> v2.9 (forced update)\n",[4926,50542,50540],{"__ignoreMap":18},[1666,50544,50546],{"start":50545},13,[324,50547,50548],{},"Once the code is pushed to the same tag, you should find the new broker (my-broker-2) running in the ArgoCD web UI.",[48,50550,50551],{},[384,50552],{"alt":50508,"src":50553},"\u002Fimgs\u002Fblogs\u002F63b56942ebc47072c24049a7_image7-230103.png",[40,50555,2125],{"id":2122},[48,50557,50558],{},"With ArgoCD and GitHub, you can control Pulsar CR changes in an elegant way. Moreover, you can easily roll back and approve configurations, or find the change history from the GitHub repository.",[48,50560,50561],{},"In the next blog Part 3, I will demonstrate how to containerize Pulsar client apps using Dockerfiles. In Part 4, I will demonstrate how to use kpack to build Pulsar client apps without local Docker builds.",[40,50563,38376],{"id":38375},[48,50565,38379,50566,40419],{},[55,50567,38384],{"href":38382,"rel":50568},[264],[321,50570,50571,50576,50580,50589,50595,50602],{},[324,50572,38390,50573,190],{},[55,50574,31914],{"href":31912,"rel":50575},[264],[324,50577,45476,50578,45480],{},[55,50579,3550],{"href":45479},[324,50581,50582,758,50584],{},[2628,50583,47315],{},[55,50585,50588],{"href":50586,"rel":50587},"https:\u002F\u002Fdocs.streamnative.io\u002Foperators\u002Foverview",[264],"Understand Pulsar Operators",[324,50590,50591,758,50593],{},[2628,50592,46310],{},[55,50594,43242],{"href":50361},[324,50596,50597,758,50599],{},[2628,50598,46310],{},[55,50600,46332],{"href":50601},"\u002Fblog\u002Frelease\u002F2022-10-19-streamnatives-pulsar-operators-certified-as-red-hat-openshift-operators\u002F",[324,50603,50604,758,50606],{},[2628,50605,46310],{},[55,50607,43234],{"href":50608},"\u002Fblog\u002Frelease\u002F2022-08-15-introducing-pulsar-resources-operator-for-kubernetes\u002F",{"title":18,"searchDepth":19,"depth":19,"links":50610},[50611,50612],{"id":2122,"depth":19,"text":2125},{"id":38375,"depth":19,"text":38376},"This tutorial guides you through how to use ArgoCD to control Pulsar Custom Resources (CRs) by monitoring the GitHub branch\u002Ftag.","\u002Fimgs\u002Fblogs\u002F63c7c0730690277f0344050a_63b567fe36e2a0108f5d897e_manage-pulsar-custom-resources-with-argocd-top.jpeg",{},{"title":43249,"description":50613},"blog\u002Fpulsar-operators-tutorial-part-2-manage-pulsar-custom-resources-argocd",[48070,38442],"jGU8M995g07_7N41URir4vipTG4ItcFMmWu1ODjggLU",{"id":50621,"title":42748,"authors":50622,"body":50623,"category":3550,"createdAt":290,"date":50846,"description":50847,"extension":8,"featured":294,"image":50848,"isDraft":294,"link":290,"meta":50849,"navigation":7,"order":296,"path":42563,"readingTime":3556,"relatedResources":290,"seo":50850,"stem":50851,"tags":50852,"__hash__":50853},"blogs\u002Fblog\u002Fstreamnatives-function-mesh-operator-certified-red-hat-openshift-operator.md",[810,6500],{"type":15,"value":50624,"toc":50837},[50625,50638,50641,50645,50664,50667,50670,50684,50687,50691,50694,50697,50708,50712,50714,50723,50725,50728,50733,50739,50744,50750,50755,50760,50765,50770,50775,50780,50785,50790,50795,50800,50802],[48,50626,50627,50628,4003,50632,50637],{},"We’re excited to announce that StreamNative’s Function Mesh operator is now certified as a Red Hat OpenShift Operator. The operator allows you to easily build stream processing pipelines using ",[55,50629,15627],{"href":50630,"rel":50631},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Ffunctions\u002Ffunction-overview",[264],[55,50633,50636],{"href":50634,"rel":50635},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fconnectors\u002Fpulsar-io-overview",[264],"Pulsar IO connectors"," while meeting Red Hat’s standards of security, reliability, and lifecycle management. With the Function Mesh operator, organizations can run cloud-native, scalable Pulsar Functions on private cloud, hybrid cloud, multi-cloud, and edge environments.",[48,50639,50640],{},"In this blog, we’ll introduce what the Function Mesh operator is and the benefits of the OpenShift certification, including enterprise-grade security, easy installation, and automated upgrades. We’ll also show you how to install the operator on OpenShift.",[40,50642,50644],{"id":50643},"what-is-function-mesh-operator","What is Function Mesh Operator?",[48,50646,50647,50650,50651,4003,50654,50657,50658,50663],{},[55,50648,29463],{"href":29461,"rel":50649},[264]," is a serverless framework built for stream processing applications. It orchestrates ",[55,50652,15627],{"href":50630,"rel":50653},[264],[55,50655,50636],{"href":50634,"rel":50656},[264]," with Kubernetes-native custom resource definitions (CRDs). With Function Mesh, you can avoid the complexity of setting up ",[55,50659,50662],{"href":50660,"rel":50661},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.10.x\u002Ffunctions-worker\u002F#:~:text=Pulsar%20functions%2Dworker%20is%20a,either%20based%20on%20your%20requirements.&text=The%20%2D%2D%2D%20Service%20Urls%2D%2D%2D,connect%20to%20a%20Pulsar%20cluster.",[264],"Function Workers"," and improve the stability of the Pulsar Brokers. The framework also manages all of the related Pulsar Function or Pulsar IO connector resources as one entity. You don’t need to worry about resource cleaning or mapping while managing the lifecycle of deployed Pulsar Functions or connectors.",[48,50665,50666],{},"In essence, Function Mesh is a customized Kubernetes operator. Its resources are compatible with other Kubernetes-native resources and can be managed by cluster administrators using existing Kubernetes tools.",[48,50668,50669],{},"The available Function Mesh CRDs are:",[321,50671,50672,50675,50678,50681],{},[324,50673,50674],{},"Function: The Function resource automatically manages the full lifecycle of a Pulsar Function.",[324,50676,50677],{},"Source: The Source resource automatically manages the full lifecycle of a Pulsar Source connector.",[324,50679,50680],{},"Sink: The Sink resource automatically manages the full lifecycle of a Pulsar Sink connector.",[324,50682,50683],{},"Mesh: The Function Mesh resource automatically manages the full lifecycle of your event streaming application. It controls the creation of other objects to ensure that functions and connectors defined in your mesh are running and connected via the defined streams.",[48,50685,50686],{},"The operator will continuously reconcile the user-submitted manifests for managing the lifecycle of the Pulsar Functions or Pulsar IO connectors.",[40,50688,50690],{"id":50689},"benefits-of-the-red-hat-openshift-certification","Benefits of the Red Hat OpenShift certification",[48,50692,50693],{},"Red Hat OpenShift is an enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, multi-cloud, and edge deployments.",[48,50695,50696],{},"The certification of Function Mesh operator on OpenShift provides three key benefits for Function Mesh users:",[1666,50698,50699,50702,50705],{},[324,50700,50701],{},"Enterprise-grade security and reliability: Organizations with strict security protocols can confidently use the operator to run Pulsar Functions on OpenShift knowing the operator meets Red Hat’s standards of security and reliability.",[324,50703,50704],{},"Easy installation: Available in the Red Hat Ecosystem Catalog, Function Mesh operator can be installed in the OpenShift GUI with the click of a button.",[324,50706,50707],{},"Automated operator upgrades: You can automate upgrades for the operator through OpenShift without requiring extra effort to execute the upgrade.",[40,50709,50711],{"id":50710},"install-function-mesh-operator-on-openshift","Install Function Mesh Operator on OpenShift",[32,50713,10104],{"id":10103},[48,50715,50716,50717,50722],{},"First, install the ",[55,50718,50721],{"href":50719,"rel":50720},"https:\u002F\u002Fdocs.openshift.com\u002Fcontainer-platform\u002F4.11\u002Fsecurity\u002Fcert_manager_operator\u002Fcert-manager-operator-install.html",[264],"cert-manager Operator"," for Red Hat OpenShift.",[32,50724,42912],{"id":42911},[48,50726,50727],{},"The steps below demonstrate how to install the FunctionMesh Operators.",[1666,50729,50730],{},[324,50731,50732],{},"Open the OpenShift console and login to the cluster as Administrator role.",[48,50734,50735],{},[384,50736],{"alt":50737,"src":50738},"Red Hat OpenShift interface","\u002Fimgs\u002Fblogs\u002F63b566fef97381b9893b4fa5_pcWWkYPGKJsv1eui-Y-z4HeHaTZF-Ap648XBCHquTXKCbHCHRs211MpcP4V7G-jG5QH4He1b-5ji0wLZZc-S4ti1zj6M7NaWkOCoV7MNbZyx_5gHvXujPQuPuoFTFM17C7NSiF-UGAAEPtqRdqLawfEk5qUjNjBib2dthRzUdyglZYfi4BGFBHmHqz2EoQ.png",[1666,50740,50741],{},[324,50742,50743],{},"Create a new project or select an existing one.",[48,50745,50746],{},[384,50747],{"alt":50748,"src":50749},"Red Hat OpenShift create project","\u002Fimgs\u002Fblogs\u002F63b566fed225ff1bc3ef682a_ubg9--3eQOQAb3e94CMA4AMeaGafpALJWC9qGRB9666L9ybHLjB5chFtKLKP-Wm1BncjXHOUMfF6A4bD-zGQF5TUgNQt8m_zWfJDoGdmpmHsPQstWnlsb4whteFJPaMASRBvJ7l_PY8AOt33oVOe4wWi5x6WzwKiDrh5vGYVAGoTTNTU9O1cUu1uJtcV1Q.png",[1666,50751,50752],{},[324,50753,50754],{},"Find the Operators on the OperatorHub of OpenShift. You can search for the keyword “FunctionMesh” or “StreamNative”.",[48,50756,50757],{},[384,50758],{"alt":50737,"src":50759},"\u002Fimgs\u002Fblogs\u002F63b566feb1f3a8f207d6d1ab_An8nR6HGknCC3EedC_-iQPqPMgo2wpgQgHqufpF6AA7ZuUmwX2EbMaP6K6I71MTkpsk15jl-CwjZEpaTYTVrioo4elYYzz9SbDjBLly3xqDJfgz_-MIBQ5swludE4x4XxJNHjRqovt8lfb46F0haAWSzW1jt5oqRNoQJ5-OI_vXfmemc16qU5AjJD-w-Eg.png",[1666,50761,50762],{},[324,50763,50764],{},"Open the Operator’s detail page, and click the Install button.",[48,50766,50767],{},[384,50768],{"alt":50737,"src":50769},"\u002Fimgs\u002Fblogs\u002F63b566fe8c46967b9332a671_h_gRhRt60YrszbUrMZYQFkBl0nTYBDGR1h-3sMYWhT8plQhQZYseccU-gFsmEo3tSz3v0UfP6T1lJITihxXMfMtcmo8Ik46O_Nqy--NNeKvK8R8c3cAtJH5o-7ra6qL1V-en-X7SksRZkRlvSOwxuDYumszRLhx7rk4zIjsiEN3CVuvzsJMYDGWUVeqeaA.png",[1666,50771,50772],{},[324,50773,50774],{},"Set up the configuration. For Installation mode, choose All namespaces on the cluster (default). If you don't want the Operator to be installed or upgraded automatically, choose Manual for Update approval.",[48,50776,50777],{},[384,50778],{"alt":50737,"src":50779},"\u002Fimgs\u002Fblogs\u002F63b566fea8d6b1741fe86184_0jAHLDr3LTNHkCQjm2l1FK1FtHYu6h0GIgMChqzhVueg7jVTzRJ3PdDb0IwsLB6vCWF19JxC2xFxqSRh2YV4SGfSoKeQTuQ_mEvTG9R1cKCnuFWv3YoVTHpxfkXZk2BG0xxv5xoD012p9XyNt3PCOuz10-8cUWzyFTtTICFO8PGDuzZBe-70x0md7aKK9g.png",[1666,50781,50782],{},[324,50783,50784],{},"Click the Install button to install the Operator. It will display the status Installing Operator.",[48,50786,50787],{},[384,50788],{"alt":50737,"src":50789},"\u002Fimgs\u002Fblogs\u002F63b566feb1f3a852dbd6d1ad_5dXuHv3IQ8MzQQr0lEvHNPWMuRxP8ZMUfre19QtsX2V8Z2H8YQ9gNHoiCdo2kI10JIzYi8OXpIbwlSZ2tlFEkSAalT0zgovD5Fr9sj9pum1m3_aXb2saXU-b1f2cYBA0q-6hIlYRToR63j35DJKoYICctRKcGa8xtz6QQ1QHfDVDPAHCipsf4cBd1h90ww.png",[1666,50791,50792],{},[324,50793,50794],{},"After the Operator is installed, you can see the new status Installed operator - ready for use.",[48,50796,50797],{},[384,50798],{"alt":50737,"src":50799},"\u002Fimgs\u002Fblogs\u002F63b566fe86aa06c192cebca9_sRZiVmPWa1QD8d3USEtgGoxzbrOrjYgr-lIfhNS-Czxn1gVeZkOmU263pvx4-A4sGeApawCLFtPMCma91n3VCSLvUglhlLDqH_Q9aBh9zbaGlL20jPnpXtUslBpKUU6slcPJeYFrgnYSEneppUpGocb8XfP7rvxYTU7cC9e1NEZJHwI3R5akx0CmsHAFBA.png",[40,50801,40413],{"id":36476},[321,50803,50804,50810,50816,50824,50829],{},[324,50805,50806,50809],{},[55,50807,3921],{"href":42567,"rel":50808},[264]," to learn more about Function Mesh.",[324,50811,50812,50813,190],{},"Learn about ",[55,50814,50815],{"href":50601},"StreamNative’s Pulsar operators available on Openshift",[324,50817,50818,50819,1154,50822,36492],{},"Take the ",[55,50820,36487],{"href":36485,"rel":50821},[264],[55,50823,36491],{"href":36490},[324,50825,50826,50827,47757],{},"Interested in a fully-managed Pulsar offering built by the original creators of Pulsar? ",[55,50828,38404],{"href":45219},[324,50830,47760,50831,1154,50834,45209],{},[55,50832,47764],{"href":45463,"rel":50833},[264],[55,50835,47768],{"href":45206,"rel":50836},[264],{"title":18,"searchDepth":19,"depth":19,"links":50838},[50839,50840,50841,50845],{"id":50643,"depth":19,"text":50644},{"id":50689,"depth":19,"text":50690},{"id":50710,"depth":19,"text":50711,"children":50842},[50843,50844],{"id":10103,"depth":279,"text":10104},{"id":42911,"depth":279,"text":42912},{"id":36476,"depth":19,"text":40413},"2022-12-21","StreamNative’s Function Mesh operator is now certified as a Red Hat OpenShift Operator. It allows you to run cloud-native, scalable Pulsar Functions and build stream processing pipelines on private cloud, hybrid cloud, multi-cloud, and edge environments.","\u002Fimgs\u002Fblogs\u002F63c7c083a1f7364857133007_63b566fedf27145e0f95fed5_fmot.png",{},{"title":42748,"description":50847},"blog\u002Fstreamnatives-function-mesh-operator-certified-red-hat-openshift-operator",[9636,821,28572,4301,16985],"kDAHPAj2F1ngrNBXEYusL8Aymt952koIyuMXotGWpS4",{"id":50855,"title":50856,"authors":50857,"body":50858,"category":821,"createdAt":290,"date":51025,"description":51026,"extension":8,"featured":294,"image":51027,"isDraft":294,"link":290,"meta":51028,"navigation":7,"order":296,"path":51029,"readingTime":4475,"relatedResources":290,"seo":51030,"stem":51031,"tags":51032,"__hash__":51033},"blogs\u002Fblog\u002Fsimplifying-zhaopins-event-center-apache-pulsar.md","Simplifying Zhaopin’s Event Center with Apache Pulsar",[808],{"type":15,"value":50859,"toc":51016},[50860,50869,50872,50875,50878,50892,50895,50901,50903,50907,50910,50914,50917,50921,50924,50928,50931,50934,50938,50941,50955,50967,50971,50974,50991,50997,51000,51002,51005,51008,51011,51014],[48,50861,50862,50863,50868],{},"With a user base of over 100+ million users, and a database of 200+ million resumes, ",[55,50864,50867],{"href":50865,"rel":50866},"https:\u002F\u002Fwww.zhaopin.com\u002F",[264],"Zhaopin"," is one of the largest online recruiting and career platforms in China. As a bilingual job board, it provides one of the largest selections of real-time job vacancies in China.",[48,50870,50871],{},"A key part of the software supporting Zhaopin is our event center system. This system, which is responsible for all of the intra-service messaging within Zhaopin, supports mission-critical services such as resume submission and job search. The event center system needs to handle more than 1 billion messages per day under normal conditions, scaling to multiple billion messages per day during the peak recruiting season.",[48,50873,50874],{},"Our previous technology to support the pub-sub messaging and message queuing needs for our event center consisted of two separate systems. Our work queue use cases were implemented with RabbitMQ, while our pub-sub messaging and streaming for use cases used Apache Kafka.",[48,50876,50877],{},"We use work queues throughout our microservices architecture as an intermediate layer to decouple front-end services and backend systems, allowing our systems to run reliably even in the event of traffic spikes. At Zhaopin, our typical work-queue use cases have the following characteristics:",[321,50879,50880,50883,50886,50889],{},[324,50881,50882],{},"Each message is consumed by multiple independent services.",[324,50884,50885],{},"Each service must consume a full replica of all messages.",[324,50887,50888],{},"Each service has multiple consumers consuming messages concurrently.",[324,50890,50891],{},"Each message should be guaranteed to be delivered at-least-once.",[48,50893,50894],{},"A tracing mechanism is required for tracing the lifecycle of messages for mission-critical services. For instance, when a user submits a résumé, the service handling resume submission will first enqueue a message into the messaging system. All the other services (such as database updates, sending notifications, and recommendations) will then consume the messages from the messaging system and process them.",[48,50896,50897],{},[384,50898],{"alt":50899,"src":50900},"workflow","\u002Fimgs\u002Fblogs\u002F63a1d7d1944232b88b3a7782_resume-subscription-work-flow.png",[48,50902,3931],{},[40,50904,50906],{"id":50905},"our-challenges","Our Challenges",[48,50908,50909],{},"Although we did have technology in place to support our event center system, we saw that we faced a number of growing challenges and limitations with that technology as we grew.",[40,50911,50913],{"id":50912},"cost-complexity","Cost & Complexity",[48,50915,50916],{},"One of our primary problems with our previous approach was the hardware cost and administrative burden of deploying and managing multiple messaging technologies side by side just for the sake of serving different use cases. Several additional problems arose from having to publish the same event data into dual data pipelines as well. First, the same data needed to be stored in two separate systems, which doubled our storage requirements. Second, it was very difficult to keep the copies consistent across the two different systems. Lastly, having two different technologies introduced additional complexity because our developers and DevOps team had to be familiar with both messaging platforms.",[40,50918,50920],{"id":50919},"missing-important-capabilities","Missing Important Capabilities",[48,50922,50923],{},"In addition to the operational overhead of running two messaging systems, there were architectural shortcomings within each of our previous technologies, making the decision to switch even more compelling. While RabbitMQ supports work queue use-cases very well, it does not integrate well with popular computing frameworks. It also has several limitations such as scalability, throughput, persistence and message replay. We attempted to overcome by implementing our own distributed message queue service based on Apache Zookeeper and RabbitMQ. However, we found this decision also came with a large, ongoing maintenance burden that we felt was unsustainable.",[40,50925,50927],{"id":50926},"inadequate-performance-and-durability","Inadequate Performance and Durability",[48,50929,50930],{},"During our peak usage periods, we were unable to meet our service level SLA without increasing the number of consumers on some of our Kafka topics. However, we found that the ability to parallelize consumption in Kafka is tightly coupled with the number partitions on a given topic. Therefore, in order to increase the number of consumers on a particular topic required us to increase the number of partitions, which was not an acceptable approach for us for these reasons: The newly created partitions did not contain any data, resulting in no real increase in consumption throughput. In order to remedy this, we had to also force partition reassignment on the topic to distribute the backlogged data onto these new partitions.",[48,50932,50933],{},"Both of these steps were not easy to automate, forcing us to monitor the topics for backlog, and manually correct the issues. Although we had deployed Kafka for our pub-sub messaging use cases, Kafka does not provide the data durability guarantees that were increasingly a must-have requirement for our mission-critical business services. For these reasons, in early 2018 we decided to simplify our entire messaging technology stack by selecting a technology that would support both our work queue and streaming use cases, thereby eliminating all of the cost and complexity of hosting separate systems.",[40,50935,50937],{"id":50936},"requirements","Requirements",[48,50939,50940],{},"With this in mind, we compiled the following list of capabilities that we required from our messaging platform:",[321,50942,50943,50946,50949,50952],{},[324,50944,50945],{},"Fault Tolerance and Strong Consistency: Events stored within our messaging system are used for mission-critical online services, therefore the data must be stored reliably and in a consistent fashion. Events cannot be lost under any circumstances, and it must be possible to guarantee at-least-once delivery.",[324,50947,50948],{},"Single Topic Consumer Scalability: We must be able to easily scale the throughput of a single topic by increasing the number of consumers of the topic on-the-fly as the traffic pattern changes.",[324,50950,50951],{},"Individual and Cumulative Message Acknowledgement: The messaging system should support BOTH acknowledgement of individual messages, which is often used for work queue use cases, and cumulative acknowledgement of messages, which is required for most streaming use cases.",[324,50953,50954],{},"Message Retention and Rewind: We need to be able to configure different topics with different retention policies, either time-based or size-based. Consumers should be able to rewind their consumption back to a certain time. Based on these requirements, we started investigating the open source technologies available in the market. During our initial research, we were unable to find any open source messaging products that satisfied all of our requirements, particularly when it came to no data loss and strong consistency.",[48,50956,50957,50958,50961,50962,50966],{},"Unable to find a better solution among the technologies we knew about, our initial plan was to build our own platform on top of the strongly-consistent distributed log storage platform ",[55,50959,862],{"href":23555,"rel":50960},[264],", which offers an excellent log storage API and has been deployed at internet scale by Twitter, Yahoo, and Salesforce. However, after getting touch with the BookKeeper community, they pointed us to ",[55,50963,821],{"href":50964,"rel":50965},"http:\u002F\u002Fpulsar.apache.org\u002F",[264]," — the next generation pub-sub messaging system built on BookKeeper.",[40,50968,50970],{"id":50969},"why-apache-pulsar","Why Apache Pulsar?",[48,50972,50973],{},"After working closely with the Pulsar community and diving deeper into Pulsar, we decided to adopt Pulsar as the basis for Zhaopin’s event center system for following reasons:",[321,50975,50976,50979,50982,50985,50988],{},[324,50977,50978],{},"By using BookKeeper for its storage layer, it offers strong durability and consistency, guaranteeing zero data loss.",[324,50980,50981],{},"It provides a very flexible replication scheme, allowing users to choose different replication settings per topic to suit their requirements for throughput, latency, and data availability.",[324,50983,50984],{},"Pulsar has a segment-centric design, that separates message serving from message storage, allowing independent scalability of each. Such a layered architecture offers better resiliency and avoids complex data rebalancing when machines crash or a cluster expands.",[324,50986,50987],{},"Pulsar provides very good I\u002FO isolation, which is suitable for both messaging and streaming workloads.",[324,50989,50990],{},"It provides a simple flexible messaging model that unifies queuing and streaming. So it can be used for both work queue and pub-sub messaging use cases, thereby allowing us to eliminate the need for dual messaging systems and all the associated issues.",[48,50992,50993],{},[384,50994],{"alt":50995,"src":50996},"pub-sub","\u002Fimgs\u002Fblogs\u002F63a1d7d17c48d3f55eb1356a_pub-sub-messages.png",[48,50998,50999],{},"In addition to these key architectural features, Apache Pulsar also offers many different enterprise-grade features that are critical for supporting our business critical applications, such as multi-tenancy, geo-replication, built-in schema support and tiered storage. Recently added features such as serverless compute with Pulsar Functions and Pulsar SQL are essential for building event-driven microservices here at Zhaopin.com",[40,51001,319],{"id":316},[48,51003,51004],{},"We are very happy with our choice of Pulsar and the performance and reliability it provides, and are committed to contributing many great features back to the Apache Pulsar community, such as a Dead Letter Topic, Client Interceptors, and Delayed Messages just to name a few.",[48,51006,51007],{},"If you are running multiple messaging platforms just for the sake of serving different use cases, you should consider replacing them with Apache Pulsar to consolidate your messaging infrastructure into a single system capable of supporting both queuing and pub-\u002Fsub messaging.",[48,51009,51010],{},"We were very pleased to host Apache Pulsar past meetups at our offices in Shanghai. Hear from our engineers at @Zhaopin_com to learn more about their experiences and best practices for running Pulsar in production and using Pulsar + @ApacheFlink to power their recommendation system. More info: buff.ly\u002F2BFYrBy",[48,51012,51013],{},"We would like to especially thank all the committers to the Apache Pulsar project, as well as the technical support we received from members of its large and growing community.",[48,51015,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":51017},[51018,51019,51020,51021,51022,51023,51024],{"id":50905,"depth":19,"text":50906},{"id":50912,"depth":19,"text":50913},{"id":50919,"depth":19,"text":50920},{"id":50926,"depth":19,"text":50927},{"id":50936,"depth":19,"text":50937},{"id":50969,"depth":19,"text":50970},{"id":316,"depth":19,"text":319},"2022-12-20","An inside look at why Zhaopin chose Apache Pulsar to unify its enterprise event bus.","\u002Fimgs\u002Fblogs\u002F63a1d86ee6ecb9a40404d0db_main-zhaopin.webp",{},"\u002Fblog\u002Fsimplifying-zhaopins-event-center-apache-pulsar",{"title":50856,"description":51026},"blog\u002Fsimplifying-zhaopins-event-center-apache-pulsar",[35559,821,303],"GvDpfAoCy0pq_lEVFPNUyn2ugnPZUhW_wl2qOtIAFKU",{"id":51035,"title":51036,"authors":51037,"body":51038,"category":821,"createdAt":290,"date":51202,"description":51203,"extension":8,"featured":294,"image":51204,"isDraft":294,"link":290,"meta":51205,"navigation":7,"order":296,"path":45195,"readingTime":11180,"relatedResources":290,"seo":51206,"stem":51207,"tags":51208,"__hash__":51209},"blogs\u002Fblog\u002Fannouncing-iceberg-sink-connector-apache-pulsar.md","Announcing the Iceberg Sink Connector for Apache Pulsar",[809],{"type":15,"value":51039,"toc":51192},[51040,51043,51048,51052,51059,51065,51069,51072,51075,51078,51082,51085,51095,51099,51101,51104,51125,51130,51132,51137,51143,51148,51154,51157,51159,51162,51190],[48,51041,51042],{},"We’re excited to announce the general availability of the Iceberg Sink connector for Apache Pulsar. The connector enables seamless integration between Iceberg and Apache Pulsar, improving the diversity of the Apache Pulsar ecosystem. The Iceberg + Pulsar connector offers a convenient, efficient, and flexible approach to moving data from Pulsar to Iceberg without requiring user code.",[48,51044,48583,51045,190],{},[55,51046,48586],{"href":51047},"\u002Fblog\u002Frelease\u002F2022-08-17-announcing-the-delta-lake-sink-connector-for-apache-pulsar\u002F",[40,51049,51051],{"id":51050},"what-is-the-iceberg-sink-connector","What is the Iceberg Sink connector?",[48,51053,3600,51054,51058],{},[55,51055,51057],{"href":48603,"rel":51056},[264],"Iceberg Sink connector"," is a Pulsar IO connector that pulls data from Apache Pulsar topics and persists data to Iceberg tables.",[48,51060,51061],{},[384,51062],{"alt":51063,"src":51064},"puslar and iceberg logo","\u002Fimgs\u002Fblogs\u002F63b5661619e386ba5ddb66fd_-EhDWhd43RtpUyS_7p8rt7YLTAYr2cj07aUGmhU91aahfV5jYr36YDWe6p2zW97_ajuBdr6QgNmKn1xj155Exp0vbok6U3a7kC8HtjaksNPl0GNUMOXanJ5NTZ_2QfRtDyPLtmuMVMh6jllIJg2_huuv6-fHbU87V-onWppdBmRp3ZHaSVevgLeEEicmrw.png",[40,51066,51068],{"id":51067},"why-develop-the-iceberg-sink-connector","Why develop the Iceberg Sink connector?",[48,51070,51071],{},"In the last 5 years, lakehouse technologies such as Apache Iceberg have seen rapid adoption. Lakehouse architectures provide streaming ingest of data, tools for dealing with schema and schema evolution, improved metadata management and open standards to ease integration across a range of data processing systems.",[48,51073,51074],{},"Apache Pulsar, a distributed, open-source pub-sub messaging and streaming platform for real-time workloads, is a natural fit for lakehouse architectures. Apache Pulsar provides a unified platform that enables queueing data, analytics, and streaming in one underlying system. As a result, integrating Apache Pulsar with Lakehouse streamlines data lifecycle management and data analysis.",[48,51076,51077],{},"StreamNative built the Iceberg Sink Connector in order to provide Iceberg users with a way to connect the flow of messages from Pulsar and use more powerful features, while avoiding problems with connectivity that can appear when there are intrinsic differences between systems or privacy requirements. The connector solves this problem by fully integrating with the rest of Pulsar’s system (including, serverless functions, per-message processing, and event-stream processing). It presents a low-code solution with out-of-the-box capabilities such as multi-tenant connectivity, geo-replication, protocols for direct connection to end-user mobile clients or IoT clients, and more.",[40,51079,51081],{"id":51080},"what-are-the-benefits-of-using-the-iceberg-sink-connector","What are the benefits of using the Iceberg Sink connector?",[48,51083,51084],{},"The integration between Iceberg and Apache Pulsar provides three key benefits:",[321,51086,51087,51090,51093],{},[324,51088,51089],{},"Simplicity: Quickly move data from Apache Pulsar to Apache Iceberg without any user code.",[324,51091,51092],{},"Efficiency: Reduce your time spent configuring the data layer. This means you have more time to discover the maximum business value from real-time data in an effective way.",[324,51094,48643],{},[40,51096,51098],{"id":51097},"how-do-i-get-started-with-the-iceberg-sink-connector","How do I get started with the Iceberg Sink connector?",[32,51100,10104],{"id":10103},[48,51102,51103],{},"First, you must run an Apache Pulsar cluster.",[1666,51105,51106,51118],{},[324,51107,51108,51109,51114,51115,51117],{},"Prepare the Pulsar service. You can quickly run a Pulsar cluster anywhere by running $PULSAR_HOME\u002Fbin\u002Fpulsar standalone. See ",[55,51110,51113],{"href":51111,"rel":51112},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fstandalone\u002F",[264],"Getting Started with Pulsar"," for details. Alternatively, get started with ",[55,51116,3550],{"href":45479},", which provides an easy-to-use and fully managed Pulsar service in the public cloud.",[324,51119,51120,51121,51124],{},"Set up the Iceberg Sink connector. Download the connector from the ",[55,51122,39589],{"href":34792,"rel":51123},[264]," page, and then move it to $PULSAR_HOME\u002Fconnectors.",[48,51126,39596,51127,48708],{},[55,51128,20384],{"href":39599,"rel":51129},[264],[32,51131,39605],{"id":39604},[1666,51133,51134],{},[324,51135,51136],{},"Create a configuration file named iceberg-sink-config.json to send the public\u002Fdefault\u002Ftest-iceberg-pulsar topic messages from Apache Pulsar to the Iceberg table with the location of s3a:\u002F\u002Ftest-dev-us-west-2\u002Flakehouse\u002Ficeberg_sink:",[8325,51138,51141],{"className":51139,"code":51140,"language":8330},[8328],"{\n    \"tenant\":\"public\",\n    \"namespace\":\"default\",\n    \"name\":\"iceberg_sink\",\n    \"parallelism\":1,\n    \"inputs\": [\n      \"test-iceberg-pulsar\"\n    ],\n    \"archive\": \"connectors\u002Fpulsar-io-lakehouse-{{connector:version}}-cloud.nar\",\n    \"processingGuarantees\":\"EFFECTIVELY_ONCE\",\n    \"configs\":{\n        \"type\":\"iceberg\",\n        \"maxCommitInterval\":120,\n        \"maxRecordsPerCommit\":10000000,\n        \"catalogName\":\"test_v1\",\n        \"tableNamespace\":\"iceberg_sink_test\",\n        \"tableName\":\"ice_sink_person\",\n      \"hadoop.fs.s3a.aws.credentials.provider\": \"com.amazonaws.auth.DefaultAWSCredentialsProviderChain\",\n        \"catalogProperties\":{\n            \"warehouse\":\"s3a:\u002F\u002Ftest-dev-us-west-2\u002Flakehouse\u002Ficeberg_sink\",\n            \"catalog-impl\":\"hadoopCatalog\"\n        }\n    }\n}\n",[4926,51142,51140],{"__ignoreMap":18},[1666,51144,51145],{},[324,51146,51147],{},"Run the sink connector:",[8325,51149,51152],{"className":51150,"code":51151,"language":8330},[8328],"$PULSAR_HOME\u002Fbin\u002Fpulsar-admin sinks localrun --sink-config-file \u002Fpath\u002Fto\u002Ficeberg-sink-config.json\n",[4926,51153,51151],{"__ignoreMap":18},[48,51155,51156],{},"When you send a message to the public\u002Fdefault\u002Ftest-iceberg-pulsar topic of Apache Pulsar, this message is persisted to the Iceberg table with the location of s3a:\u002F\u002Ftest-dev-us-west-2\u002Flakehouse\u002Ficeberg_sink.",[40,51158,48857],{"id":48856},[48,51160,51161],{},"The Iceberg Sink connector is a major step in the journey of integrating Lakehouse systems into the Pulsar ecosystem. To get involved with the Iceberg Sink connector for Apache Pulsar, check out the following featured resources:",[321,51163,51164,51173,51179],{},[324,51165,51166,51167,39659,51170,48874],{},"Try out the Iceberg Sink connector. To get started, ",[55,51168,36195],{"href":48868,"rel":51169},[264],[55,51171,39663],{"href":48872,"rel":51172},[264],[324,51174,51175,51176,39673],{},"Make a contribution. The Iceberg Sink connector is a community-driven service, which hosts its source code on the StreamNative GitHub repository. If you have any feature requests or bug reports, do not hesitate to ",[55,51177,39672],{"href":48880,"rel":51178},[264],[324,51180,39676,51181,48888,51184,39687,51187,39692],{},[55,51182,39680],{"href":48880,"rel":51183},[264],[55,51185,39686],{"href":39684,"rel":51186},[264],[55,51188,39691],{"href":33664,"rel":51189},[264],[48,51191,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":51193},[51194,51195,51196,51197,51201],{"id":51050,"depth":19,"text":51051},{"id":51067,"depth":19,"text":51068},{"id":51080,"depth":19,"text":51081},{"id":51097,"depth":19,"text":51098,"children":51198},[51199,51200],{"id":10103,"depth":279,"text":10104},{"id":39604,"depth":279,"text":39605},{"id":48856,"depth":19,"text":48857},"2022-12-14","Read about the Iceberg Sink connector for Apache Pulsar that allows you to move data from Pulsar to Iceberg without requiring user code.","\u002Fimgs\u002Fblogs\u002F63c7c0915a20b4735dc52493_63b56615009cac54c8604470_iceberg-top.png",{},{"title":51036,"description":51203},"blog\u002Fannouncing-iceberg-sink-connector-apache-pulsar",[28572,302],"GywEZm71lLfYZnnzE7Y-tFFumTJZH4iik65zzIK8md8",{"id":51211,"title":51212,"authors":51213,"body":51215,"category":821,"createdAt":290,"date":51313,"description":51314,"extension":8,"featured":294,"image":51315,"isDraft":294,"link":290,"meta":51316,"navigation":7,"order":296,"path":51317,"readingTime":31039,"relatedResources":290,"seo":51318,"stem":51319,"tags":51320,"__hash__":51321},"blogs\u002Fblog\u002Fwechat-using-apache-pulsar-support-high-throughput-real-time-recommendation-service.md","WeChat: Using Apache Pulsar to Support the High Throughput Real-time Recommendation Service",[51214],"Shen Liu",{"type":15,"value":51216,"toc":51309},[51217,51226,51229,51235,51238,51260,51264,51267,51273,51277,51280,51291,51294,51300,51303],[48,51218,51219,51220,51225],{},"WeChat is a WhatsApp-like social media application developed by the Chinese tech giant Tencent. According to a recent ",[55,51221,51224],{"href":51222,"rel":51223},"https:\u002F\u002Fwww.businessofapps.com\u002Fdata\u002Fwechat-statistics\u002F",[264],"report",", WeChat provided services to 1.26 billion users in Q1 2022, with 3.5 million mini programs on its platform.",[48,51227,51228],{},"As shown in Figure 1, WeChat has multiple business scenarios, including recommendations, risk control, monitoring, and AI platform. In our service architecture, we ingest data through Software Development Kits (SDKs) or data collection tools, and then distribute them to messaging platforms such as Kafka and Pulsar. Ultimately, they are processed and stored by different downstream systems. For computing, we use Hadoop, Spark, ClickHouse, Flink, and Tensorflow; for storage, we use HDFS, HBase, Redis and self-developed key-value databases.",[48,51230,51231],{},[384,51232],{"alt":51233,"src":51234},"Figure 3. The redesigned architecture of Pulsar on Kubernetes","\u002Fimgs\u002Fblogs\u002F63b564a7d5314f4da737da7e_image3-221205.png",[48,51236,51237],{},"Apart from our efforts to remove the proxy layer for better bandwidth utilization, we also optimized our Pulsar deployment on Kubernetes in the following ways.",[321,51239,51240,51248,51257],{},[324,51241,51242,51243,22220],{},"Improved bookies’ performance by using a multi-disk and multi-directory solution with local SSDs. We contributed this enhancement to the Pulsar community. See ",[55,51244,51247],{"href":51245,"rel":51246},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar-helm-chart\u002Fpull\u002F113",[264],"PR-113",[324,51249,51250,51251,51256],{},"Integrated the ",[55,51252,51255],{"href":51253,"rel":51254},"https:\u002F\u002Fwww.tencentcloud.com\u002Fproducts\u002Fcls?lang=en&pg=",[264],"Tencent Cloud Log Service"," (CLS) as a unified logging mechanism to simplify log collection and query operations, as well as the use and maintenance of the whole system.",[324,51258,51259],{},"Combined Grafana, Kvass, and Thanos with a distributed Prometheus deployment for metric collection to improve performance and support horizontal scaling. Note that for the default Pulsar deployment, Prometheus is used as a standalone service but it is not applicable to our case given the high traffic volume.",[40,51261,51263],{"id":51262},"practice-2-using-non-persistent-topics","Practice 2: Using non-persistent topics",[48,51265,51266],{},"Apache Pulsar supports two types of topics: persistent topics and non-persistent topics. The former persists messages to disks whereas the latter only stores messages temporarily. Figure 4 compares how these two types of topics work in Pulsar. For persistent topics, producers publish messages to the dispatcher on the broker. These messages are sent to the managed ledger and then replicated across bookies via the bookie client. By contrast, producers and consumers working on non-persistent topics interact with the dispatcher on the broker directly, without any persistence in BookKeeper. Such straight communication has lower requirements for the bandwidth within the cluster.",[48,51268,51269],{},[384,51270],{"alt":51271,"src":51272},"Figure 7. Bandwidth occupancy of brokers with the updated logic","\u002Fimgs\u002Fblogs\u002F63b564fcd4fc090aaffb8ae2_image7-221205.png",[40,51274,51276],{"id":51275},"practice-4-increasing-the-cache-hit-ratio","Practice 4: Increasing the cache hit ratio",[48,51278,51279],{},"In Pulsar, brokers cache data to memory to improve reading performance, as consumers can retrieve data from these caches directly without going further into BookKeeper. Pulsar also allows you to set a data eviction strategy for these caches with the following configurations, among others:",[321,51281,51282,51285,51288],{},[324,51283,51284],{},"managedLedgerCacheSizeMB: The amount of memory used to cache data.",[324,51286,51287],{},"managedLedgerCursorBackloggedThreshold: The number of entries from the position where a cursor should be considered as inactive.",[324,51289,51290],{},"managedLedgerCacheEvictionTimeThresholdMillis: The time threshold of evicting all cached entries.",[48,51292,51293],{},"The following code snippet shows the original logic of cache eviction:",[8325,51295,51298],{"className":51296,"code":51297,"language":8330},[8328],"void doCacheEviction(long maxTimestamp) {\n    if (entryCache.getsize() \nAccording to this implementation, all cached entries before the inactive cursor would be evicted (managedLedgerCursorBackloggedThreshold controls whether the cursor should be considered inactive). This data eviction strategy was not applicable to our use case: we had a large number of consumers with different consumption rates and they needed to restart frequently. After caches were evicted, those consuming messages at lower rates had to go deeper to bookies, thus increasing the bandwidth pressure within the cluster.\n\nAn engineer from Tencent also found this issue and proposed the following solution:\n\n",[4926,51299,51297],{"__ignoreMap":18},[48,51301,51302],{},"void doCacheEviction(long maxTimestamp) {\n   if (entryCache.getSize()\nThis implementation tweaked the logic by caching any backlogged message according to markDeletePosition. However, the cache space would be filled up with cached messages, especially when consumers restarted. Therefore, we made the following changes:",[8325,51304,51307],{"className":51305,"code":51306,"language":8330},[8328],"void doCacheEviction(long maxTimestamp) {\n    if (entryCache.getSize() \nOur strategy is to exclusively cache messages within a specified period to the broker. This method has improved cache hits remarkably in our scenario, as evidenced by Figure 8. The cache hit percentage of most brokers increased from around 80% to over 95%.\n![](\u002Fimgs\u002Fblogs\u002F63b5655623197b4685eb8a1d_image8-221205.png)Figure 8. Broker entry cache hit percentage before and after optimization\n## Practice 5: Creating a COS offloader using tiered storage\n\nPulsar supports tiered storage, which allows you to migrate cold data from BookKeeper to cheaper storage systems. More importantly, such a movement of data does not impact the client when retrieving the messages. Currently, the supported storage systems include Amazon S3, Google Cloud Storage (GCS), Azure BlobStore, and Aliyun Object Storage Service (OSS).\n![](\u002Fimgs\u002Fblogs\u002F63b56558a5395d6756ad7886_image9-221205.png)Figure 9. Tiered storage in Apache Pulsar\nOur main reasons for adopting tiered storage include the following:\n\n- Cost considerations. As mentioned above, we are using SSDs for journal and ledger storage on bookies. Hence, it is a natural choice for us to use a storage solution with less hardware overhead.\n- Disaster recovery. Some of our business scenarios require large amounts of data to be stored for a long period of time. If our BookKeeper cluster failed, our data would not be lost given the redundancy stored on the external system.\n- Data replay needs. We need to run offline tests for some of the business modules, such as the recommendations service. In these cases, the ideal way is to replay topics with the original data.\n\nAs the Pulsar community does not provide a [Tencent Cloud Object Storage](https:\u002F\u002Fwww.tencentcloud.com\u002Fproducts\u002Fcos) (COS) offloader, we created a purpose-built one to move ledgers from bookies to remote storage devices. This migration has decreased our storage costs significantly, so we can store a larger amount of data with longer duration for different scenarios.\n\n## Future plans\n\nWe are pleased to make contributions to Apache Pulsar, and we would like to thank the Pulsar community for their knowledge and support. This open-source project has helped us build a fully-featured message queuing system that meets our needs for scalability, resource isolation, and high throughput. Going forward, we’d like to continue our journey with Pulsar mainly in the following directions:\n\n- Get more involved in feature improvements, such as new load balancer implementation (see [PIP 192](https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F16691)), and shadow topics to support read-only topic ownership (see [PIP 180](https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F16153)).\n- Integrate Pulsar with data lake solutions.\n",[4926,51308,51306],{"__ignoreMap":18},{"title":18,"searchDepth":19,"depth":19,"links":51310},[51311,51312],{"id":51262,"depth":19,"text":51263},{"id":51275,"depth":19,"text":51276},"2022-12-05","Take a closer look at how Pulsar helps WeChat build a fully-featured message queuing system that meets their needs for scalability, resource isolation, and high throughput.","\u002Fimgs\u002Fblogs\u002F63bc8a3aa859d8cf8bde5e3b_pulsar-wechat-high-throughput-recommendation-top.png",{},"\u002Fblog\u002Fwechat-using-apache-pulsar-support-high-throughput-real-time-recommendation-service",{"title":51212,"description":51314},"blog\u002Fwechat-using-apache-pulsar-support-high-throughput-real-time-recommendation-service",[35559,821,26747,303],"lTPAOrYzoOW2ydS2dMm_d9IU1NkOyvVf71stzSEDWWc",{"id":51323,"title":51324,"authors":51325,"body":51326,"category":7338,"createdAt":290,"date":51649,"description":51650,"extension":8,"featured":294,"image":51651,"isDraft":294,"link":290,"meta":51652,"navigation":7,"order":296,"path":51653,"readingTime":4475,"relatedResources":290,"seo":51654,"stem":51655,"tags":51656,"__hash__":51657},"blogs\u002Fblog\u002Fpulsar-summit-asia-2022-recap.md","Pulsar Summit Asia 2022 Recap",[41185],{"type":15,"value":51327,"toc":51639},[51328,51331,51334,51337,51357,51361,51364,51366,51377,51380,51397,51401,51418,51420,51437,51441,51444,51447,51453,51462,51568,51577,51607,51610,51614,51628,51631,51637],[48,51329,51330],{},"Since its inception in 2020, Pulsar Summit Asia has received increasing attention from both Asia and beyond. For Pulsar Summit Asia 2022, more than 1400 people from companies like Amazon, Tencent, IBM, Huawei, Dell, ByteDance, and Splunk registered for this online event to discuss the latest messaging and streaming technologies powering a wide variety of industries like education, food, gaming, e-commerce, and social media.",[48,51332,51333],{},"Before we talk about some of the Summit highlights, we would like to thank the entire Apache Pulsar community and all the friends from other open-source communities for making this event a great success! Additionally, this event would not have drawn the attention of such a broad audience without our speakers, Program Committee members, as well as community and media partners. Thank you all for your help and energy!",[48,51335,51336],{},"And now, let’s look at some of the highlights and a round-up of this online virtual event:",[321,51338,51339,51342,51345,51348,51351,51354],{},[324,51340,51341],{},"1400+ registrations and 40,000+ views globally",[324,51343,51344],{},"41 speakers from companies like ByteDance, Huawei, Tencent, Nippon Telegraph and Telephone Corporation (NTT), Yum China, Netease, vivo, Nutanix, and StreamNative",[324,51346,51347],{},"3 keynotes on Apache Pulsar and event-driven applications",[324,51349,51350],{},"36 sessions on use cases, technical deep dives, and ecosystem talks",[324,51352,51353],{},"14 Program Committee members",[324,51355,51356],{},"18 community and media partners",[40,51358,51360],{"id":51359},"keynotes-and-sessions-at-a-glance","Keynotes and sessions at a glance",[48,51362,51363],{},"This two-day virtual event brought together engineers, architects, and data scientists from the messaging and streaming communities. They talked about Pulsar adoption for different use cases, event-driven platforms, technical details, and even Pulsar integration with other ecosystems. The following is a quick recap of some of the keynotes and sessions.",[32,51365,40525],{"id":40524},[321,51367,51368,51371,51374],{},[324,51369,51370],{},"A Cloud-Native, Unified Messaging and Streaming System for Modern Data Infrastructure (Mandarin): Jia Zhai, an Apache Pulsar PMC member, gave a high-level overview of Apache Pulsar and explained how it meets the requirements for messaging and streaming with its cloud-native features.",[324,51372,51373],{},"What You Should Know about Apache Pulsar in 2022 (Mandarin): Penghui Li, an Apache Pulsar PMC member, talked about some of the existing problems in Pulsar and how the Pulsar community would work to solve them going forward.",[324,51375,51376],{},"Event-Driven Applications Done Right (English): Matteo Merli, Apache Pulsar PMC Chair, provided his insights on the fundamentals of modern event-driven applications.",[32,51378,51379],{"id":19169},"Use cases",[321,51381,51382,51385,51388,51391,51394],{},[324,51383,51384],{},"Pulsar + Envoy: Building an OTO Marketing Platform for Different Business Scenarios on Microservices (Mandarin): Jason Jiang from Tencent shared their experience of using Pulsar and Envoy to create an OTO marketing platform built on microservices for different business scenarios.",[324,51386,51387],{},"Pulsar in Smart Education: How NetEase Youdao Put Pulsar into Practice for Complex Business Scenarios (Mandarin): Jiaqi Shen from NetEase introduced NetEase Youdao’s practices of using Apache Pulsar to support complex scenarios in smart education.",[324,51389,51390],{},"Tens of Trillions of Messages: How Apache Pulsar Supports Big Data Business at Tencent (Mandarin): Dawei Zhang from Tencent discussed how they used Apache Pulsar for big data business to support scenarios requiring high availability and strong consistency.",[324,51392,51393],{},"Awesome Pulsar in Yum China (English): Chauncey Yan from Yum China explained why Yum China selected Pulsar for production and shared their experience of performance tuning.",[324,51395,51396],{},"Streaming Wars and How Apache Pulsar is Acing the Battle (English): Shivji Kumar Jha and Sachidananda Maharana from Nutanix talked about how they adopted Pulsar for different use cases and migrated applications from other messaging solutions to Pulsar.",[32,51398,51400],{"id":51399},"technical-deep-dives","Technical deep dives",[321,51402,51403,51406,51409,51412,51415],{},[324,51404,51405],{},"A New Way of Managing Pulsar with Infrastructure as Code (Mandarin): Max Xu and Fushu Wang from StreamNative discussed how to leverage the Terraform Provider for Pulsar and the Pulsar Resources Operator to help better manage Pulsar.",[324,51407,51408],{},"A Deep Dive into Pulsar's Geo-replication for High Availability (Mandarin): Jialing Wang from China Mobile talked about the asynchronous and synchronous data replication mechanisms, and explained how they deployed Pulsar across multiple regions and improved its performance at China Mobile.",[324,51410,51411],{},"Apache Pulsar in Volcano Engine E-MapReduce: Integration and Scenarios (Mandarin): Xin Liang from ByteDance introduced Volcano Engine E-MapReduce, a stateless, open-source big data platform, and how Pulsar fits into the platform’s ecosystem supporting different use cases.",[324,51413,51414],{},"Taking Jakarta JMS to New Generation Messaging Systems - Apache Pulsar (English): Enrico Olivelli and Mary Grygleski from DataStax explained how Pulsar concepts map to the Jakarta Messaging Specifications and demonstrated how to connect a Jakarta EE application to Pulsar.",[324,51416,51417],{},"Handling 100K Consumers with One Topic: Practices and Technical Details (English): Hongjie Zhai from NTT Software Innovation Center shared their practices and technical details of handling 100K consumers with a single Pulsar topic.",[32,51419,40696],{"id":40695},[321,51421,51422,51425,51428,51431,51434],{},[324,51423,51424],{},"Pulsar + Flink + Camel: How Vertice Built its CMDB-based Real-time Data Platform (Mandarin): Wei Wang from Vertice offered his insights on how to build a CMDB-based Real-time Data platform with Pulsar, Flink, and Camel.",[324,51426,51427],{},"Simplify Pulsar Functions Development with SQL (Mandarin): Rui Fu from StreamNative discussed how SQL syntax, Pulsar Functions, and Function Mesh can work together to deliver a unique user development experience for real-time data jobs in the cloud environment.",[324,51429,51430],{},"Apache Pulsar + KubeEdge: Managing Edge Devices with Low Latency and Persistent Storage (Mandarin): Ryan Zhao from Huawei Cloud introduced a management solution for edge devices implemented through the Device Management Interface of KubeEdge and Apache Pulsar.",[324,51432,51433],{},"Migrating from RabbitMQ to Apache Pulsar: Using AMQP-on-Pulsar (AoP) in E-commerce Industry (Mandarin): Yifei Ming from Access Corporate Group talked about their experience of using the AMQP-on-Pulsar project to migrate RabbitMQ workloads to AoP.",[324,51435,51436],{},"Make Querying from Pulsar Easier: Introduce Flink Pulsar SQL Connector (English): Yufei Zhang from StreamNative walked through the basic concepts and examples of using Pulsar SQL Connector and discussed PulsarCatalog’s two different modes of using Pulsar as a metadata store.",[40,51438,51440],{"id":51439},"whats-new-in-the-pulsar-community","What’s new in the Pulsar community",[48,51442,51443],{},"In addition to the keynotes and sessions, we also shared some exciting news at the Summit.",[48,51445,51446],{},"Apache Pulsar has been adopted by organizations and users across the globe since it graduated as a Top Level Project in September 2018. Recently, the project witnessed its 580th contributor, almost hitting the 600 milestone!",[48,51448,51449],{},[384,51450],{"alt":51451,"src":51452},"Figure 1. Pulsar GitHub repo contributors","\u002Fimgs\u002Fblogs\u002F63b5642bd1a615e06fe2e590_image1-221201.png",[48,51454,51455,51456,51461],{},"So far this year, we have welcomed 15 new Apache Pulsar ",[55,51457,51460],{"href":51458,"rel":51459},"https:\u002F\u002Fwww.apache.org\u002Ffoundation\u002Fhow-it-works.html#committers",[264],"Committers"," to the Pulsar family. They have made continuous contributions to the Pulsar community and as Pulsar Committers, they now have write access to the Pulsar repository. They are:",[321,51463,51464,51471,51478,51485,51491,51498,51505,51512,51519,51526,51533,51540,51547,51554,51561],{},[324,51465,51466],{},[55,51467,51470],{"href":51468,"rel":51469},"https:\u002F\u002Fgithub.com\u002FRobertIndie",[264],"@RobertIndie",[324,51472,51473],{},[55,51474,51477],{"href":51475,"rel":51476},"https:\u002F\u002Fgithub.com\u002Fyuruguo",[264],"@yuruguo",[324,51479,51480],{},[55,51481,51484],{"href":51482,"rel":51483},"https:\u002F\u002Fgithub.com\u002Fgaozhangmin",[264],"@gaozhangmin",[324,51486,51487],{},[55,51488,51490],{"href":42153,"rel":51489},[264],"@nodece",[324,51492,51493],{},[55,51494,51497],{"href":51495,"rel":51496},"https:\u002F\u002Fgithub.com\u002FShoothzj",[264],"@Shoothzj",[324,51499,51500],{},[55,51501,51504],{"href":51502,"rel":51503},"https:\u002F\u002Fgithub.com\u002Fhqebupt",[264],"@hqebupt",[324,51506,51507],{},[55,51508,51511],{"href":51509,"rel":51510},"https:\u002F\u002Fgithub.com\u002FStevenLuMT",[264],"@StevenLuMT",[324,51513,51514],{},[55,51515,51518],{"href":51516,"rel":51517},"https:\u002F\u002Fgithub.com\u002Flordcheng10",[264],"@lordcheng10",[324,51520,51521],{},[55,51522,51525],{"href":51523,"rel":51524},"https:\u002F\u002Fgithub.com\u002Ftisonkun",[264],"@tisonkun",[324,51527,51528],{},[55,51529,51532],{"href":51530,"rel":51531},"https:\u002F\u002Fgithub.com\u002Faloyszhang",[264],"@aloyszhang",[324,51534,51535],{},[55,51536,51539],{"href":51537,"rel":51538},"https:\u002F\u002Fgithub.com\u002Fmattisonchao",[264],"@mattisonchao",[324,51541,51542],{},[55,51543,51546],{"href":51544,"rel":51545},"https:\u002F\u002Fgithub.com\u002Furfreespace",[264],"@urfreespace",[324,51548,51549],{},[55,51550,51553],{"href":51551,"rel":51552},"https:\u002F\u002Fgithub.com\u002Fdlg99",[264],"@dlg99",[324,51555,51556],{},[55,51557,51560],{"href":51558,"rel":51559},"https:\u002F\u002Fgithub.com\u002Fnicoloboschi",[264],"@nicoloboschi",[324,51562,51563],{},[55,51564,51567],{"href":51565,"rel":51566},"https:\u002F\u002Fgithub.com\u002Fliudezhi2098",[264],"@liudezhi2098",[48,51569,51570,51571,51576],{},"We also have 4 new members joining the Apache Pulsar ",[55,51572,51575],{"href":51573,"rel":51574},"https:\u002F\u002Fwww.apache.org\u002Ffoundation\u002Fhow-it-works.html#pmc-members",[264],"Project Management Committee (PMC)"," for their merit for the evolution of the project. They are:",[321,51578,51579,51586,51593,51600],{},[324,51580,51581],{},[55,51582,51585],{"href":51583,"rel":51584},"https:\u002F\u002Fgithub.com\u002Flhotari",[264],"@lhotari",[324,51587,51588],{},[55,51589,51592],{"href":51590,"rel":51591},"https:\u002F\u002Fgithub.com\u002Fmichaeljmarshall",[264],"@michaeljmarshall",[324,51594,51595],{},[55,51596,51599],{"href":51597,"rel":51598},"https:\u002F\u002Fgithub.com\u002FTechnoboy-",[264],"@Technoboy-",[324,51601,51602],{},[55,51603,51606],{"href":51604,"rel":51605},"https:\u002F\u002Fgithub.com\u002FJason918",[264],"@Jason918",[48,51608,51609],{},"Congratulations to them all 🎉 ! And we are looking forward to more contributions from more friends in the broader open-source community.",[40,51611,51613],{"id":51612},"more-on-pulsar-summit-asia-2022","More on Pulsar Summit Asia 2022",[48,51615,51616,51617,51622,51623,190],{},"All the sessions in Pulsar Summit Asia 2022 were pre-recorded and they will be uploaded to this ",[55,51618,51621],{"href":51619,"rel":51620},"https:\u002F\u002Fwww.youtube.com\u002F@streamnative7594",[264],"YouTube account"," soon. You can also find the complete list of Summit sessions on this ",[55,51624,51627],{"href":51625,"rel":51626},"https:\u002F\u002Fpulsar-summit.org\u002Fevent\u002Fasia-2022\u002Fschedule",[264],"page",[48,51629,51630],{},"At the same time, we will be working with some of the speakers to convert their speeches into blogs and case studies, which will be published soon.",[48,51632,51633,51634,51636],{},"Feel free to contact us at ",[55,51635,39814],{"href":39813}," if you have any questions and see you in the next Summit!",[48,51638,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":51640},[51641,51647,51648],{"id":51359,"depth":19,"text":51360,"children":51642},[51643,51644,51645,51646],{"id":40524,"depth":279,"text":40525},{"id":19169,"depth":279,"text":51379},{"id":51399,"depth":279,"text":51400},{"id":40695,"depth":279,"text":40696},{"id":51439,"depth":19,"text":51440},{"id":51612,"depth":19,"text":51613},"2022-12-01","Pulsar Summit Asia 2022 was a big success! Read about the highlights of this community event.","\u002Fimgs\u002Fblogs\u002F63c7c0c40690270f1b442787_63b5642b23197ba758ea1c7b_pulsar-summit-asia-2022-recap-top.jpeg",{},"\u002Fblog\u002Fpulsar-summit-asia-2022-recap",{"title":51324,"description":51650},"blog\u002Fpulsar-summit-asia-2022-recap",[5376,821],"-6elH9X4lZ2_moRfkvQHV4vI75VoU2j-jcg6MjzG9vQ",{"id":51659,"title":51660,"authors":51661,"body":51662,"category":821,"createdAt":290,"date":51864,"description":51865,"extension":8,"featured":294,"image":51866,"isDraft":294,"link":290,"meta":51867,"navigation":7,"order":296,"path":46739,"readingTime":31039,"relatedResources":290,"seo":51868,"stem":51869,"tags":51870,"__hash__":51872},"blogs\u002Fblog\u002Fspring-into-pulsar-part-2-spring-based-microservices-multiple-protocols-apache-pulsar.md","Spring into Pulsar Part 2: Spring-based Microservices for Multiple Protocols with Apache Pulsar AMQP",[46357],{"type":15,"value":51663,"toc":51858},[51664,51666,51679,51685,51688,51691,51739,51741,51813,51815],[40,51665,46],{"id":42},[48,51667,51668,51669,51673,51674,22220],{},"This is the second part of our on-going blog series Spring into Pulsar. In the ",[55,51670,51672],{"href":51671},"\u002Fblog\u002Fengineering\u002F2022-05-26-spring-into-pulsar\u002F","first blog",", I showed you how easy it is to build a Spring application that communicates with Pulsar using it’s native protocol. In this second blog, I will build the same application with other messaging protocols (MQTT, AMQP\u002FRabbitMQ, and Kafka) that Pulsar supports to show off its flexibility. You will see how easy it is to improve the performance of your existing applications with little code change. You can see this ",[55,51675,51678],{"href":51676,"rel":51677},"https:\u002F\u002Fyoutu.be\u002FK-I2DJYIkTg",[264],"video",[48,51680,51681],{},[384,51682],{"alt":51683,"src":51684},"schema ","\u002Fimgs\u002Fblogs\u002F63be840cc42a965cf6adfc34_image4.png",[48,51686,51687],{},"In this blog, I used other popular messaging protocols to build the same Spring application with little code change. As we can see, Spring apps for MoPs, AoPs and KoPs allow you to easily leverage legacy protocols for uplifting many applications to hybrid clouds.",[40,51689,46603],{"id":51690},"source-code",[321,51692,51693,51698,51703,51708,51713,51718,51723,51728,51733],{},[324,51694,51695],{},[55,51696,46606],{"href":46606,"rel":51697},[264],[324,51699,51700],{},[55,51701,46517],{"href":46517,"rel":51702},[264],[324,51704,51705],{},[55,51706,46541],{"href":46541,"rel":51707},[264],[324,51709,51710],{},[55,51711,46548],{"href":46548,"rel":51712},[264],[324,51714,51715],{},[55,51716,46555],{"href":46555,"rel":51717},[264],[324,51719,51720],{},[55,51721,46623],{"href":46623,"rel":51722},[264],[324,51724,51725],{},[55,51726,46632],{"href":46632,"rel":51727},[264],[324,51729,51730],{},[55,51731,46575],{"href":46575,"rel":51732},[264],[324,51734,51735],{},[55,51736,51737],{"href":51737,"rel":51738},"https:\u002F\u002Fgithub.com\u002Fdavid-streamlio\u002Fmulti-protocol-pulsar",[264],[40,51740,4135],{"id":4132},[321,51742,51743,51749,51754,51759,51765,51771,51777,51783,51788,51793,51798,51803,51808],{},[324,51744,51745],{},[55,51746,51747],{"href":51747,"rel":51748},"https:\u002F\u002Fwww.flipstack.dev\u002F",[264],[324,51750,51751],{},[55,51752,46650],{"href":46650,"rel":51753},[264],[324,51755,51756],{},[55,51757,46659],{"href":46659,"rel":51758},[264],[324,51760,51761],{},[55,51762,51763],{"href":51763,"rel":51764},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fclient-libraries-java\u002F",[264],[324,51766,51767],{},[55,51768,51769],{"href":51769,"rel":51770},"https:\u002F\u002Fdzone.com\u002Farticles\u002Fsimple-apache-nifi-operations-dashboard-part-2-spr",[264],[324,51772,51773],{},[55,51774,51775],{"href":51775,"rel":51776},"https:\u002F\u002Fspring.io\u002Fblog\u002F2015\u002F01\u002F30\u002Fwhy-12-factor-application-patterns-microservices-and-cloudfoundry-matter",[264],[324,51778,51779],{},[55,51780,51781],{"href":51781,"rel":51782},"https:\u002F\u002Fwww.slideshare.net\u002Fbunkertor\u002Fbrownbag001-spring-ioc-from-2012",[264],[324,51784,51785],{},[55,51786,51763],{"href":51763,"rel":51787},[264],[324,51789,51790],{},[55,51791,51792],{"href":51792},"\u002Fblog\u002Fengineering\u002F2022-04-07-pulsar-vs-kafka-benchmark\u002F",[324,51794,51795],{},[55,51796,51797],{"href":51797},"\u002Fblog\u002Fengineering\u002F2022-04-14-what-the-flip-is-the-flip-stack\u002F",[324,51799,51800],{},[55,51801,51802],{"href":51802},"\u002Fblog\u002Frelease\u002F2022-03-07-failure-is-not-an-option-its-a-given\u002F",[324,51804,51805],{},[55,51806,51807],{"href":51807},"\u002Fblog\u002Fengineering\u002F2022-03-10-apache-pulsar-client-application-best-practices\u002F",[324,51809,51810],{},[55,51811,51812],{"href":51812},"\u002Fblog\u002Fengineering\u002F2021-12-14-developing-event-driven-microservices-with-apache-pulsar\u002F",[40,51814,38376],{"id":38375},[321,51816,51817,51826,51832,51842,51851],{},[324,51818,51819,51820,1154,51823,51825],{},"Learn the Pulsar Fundamentals: While this blog did not cover the Pulsar fundamentals, there are great resources available to help you learn more. If you are new to Pulsar, we recommend you to take the ",[55,51821,36487],{"href":36485,"rel":51822},[264],[55,51824,36491],{"href":36490}," developed by some of the original creators of Pulsar. This will get you started with Pulsar and accelerate your streaming immediately.",[324,51827,51828,51829,51831],{},"Spin up a Pulsar Cluster in Minutes: If you want to try building microservices without having to set up a Pulsar cluster yourself, sign up for ",[55,51830,3550],{"href":45479}," today. StreamNative Cloud is a simple, fast, and cost-effective way to run Pulsar in the public cloud.",[324,51833,51834,46714,51836,51839,51840,190],{},[2628,51835,46713],{},[55,51837,267],{"href":51838},"\u002Fevent\u002Fwebinar-series-building-microservices-with-pulsar\u002F"," and find the source code from the webinars ",[55,51841,267],{"href":51838},[324,51843,51844,758,51846],{},[2628,51845,42753],{},[55,51847,51850],{"href":51848,"rel":51849},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.10.x\u002Ffunctions-develop\u002F",[264],"How to develop Pulsar Functions",[324,51852,51853,758,51855],{},[2628,51854,40436],{},[55,51856,51857],{"href":44957},"Function Mesh - Simplify Complex Streaming Jobs in Cloud",{"title":18,"searchDepth":19,"depth":19,"links":51859},[51860,51861,51862,51863],{"id":42,"depth":19,"text":46},{"id":51690,"depth":19,"text":46603},{"id":4132,"depth":19,"text":4135},{"id":38375,"depth":19,"text":38376},"2022-11-29","Learn more about the Pulsar AMQP flexibility and others by building an application with messaging protocols such as MQTT, AMQP\u002FRabbitMQ, and Kafka.","\u002Fimgs\u002Fblogs\u002F63c7bc41ff0c0c10aa45f2d3_63bf3102dba42638e2a29535_spring-into-pulsar-part-2-top.jpeg",{},{"title":51660,"description":51865},"blog\u002Fspring-into-pulsar-part-2-spring-based-microservices-multiple-protocols-apache-pulsar",[38442,799,821,11043,51871,8058],"MQTT","bS_ulYsgSy1JCQ8Lr4BfRLuHastZwy8fgLQ6vzVGcxA",{"id":51874,"title":43242,"authors":51875,"body":51876,"category":821,"createdAt":290,"date":52247,"description":52248,"extension":8,"featured":294,"image":52249,"isDraft":294,"link":290,"meta":52250,"navigation":7,"order":296,"path":43241,"readingTime":5505,"relatedResources":290,"seo":52251,"stem":52252,"tags":52253,"__hash__":52254},"blogs\u002Fblog\u002Fpulsar-operators-tutorial-part-1-create-apache-pulsar-cluster-kubernetes.md",[46122],{"type":15,"value":51877,"toc":52240},[51878,51884,51892,51895,51928,51931,51937,51945,51949,51954,51960,51965,51971,51975,51978,51988,51991,51995,52004,52010,52015,52021,52024,52030,52034,52037,52040,52048,52062,52067,52073,52078,52084,52089,52095,52103,52108,52114,52119,52125,52130,52136,52144,52149,52155,52164,52170,52178,52184,52189,52195,52198,52200,52203,52205,52210],[916,51879,51880],{},[48,51881,51882],{},[36,51883,46129],{},[48,51885,19261,51886,51891],{},[55,51887,51890],{"href":51888,"rel":51889},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Fconcepts\u002Fextend-kubernetes\u002Foperator\u002F",[264],"operator"," is a controller that manages an application on Kubernetes. It helps the SRE team automate infrastructure changes, including deployment, updates, and scaling, as it provides full lifecycle management for the application. Starting from this blog, I will post a series of articles to talk about how to use StreamNative Pulsar Operators on Kubernetes to better manage applications. In the first blog, I will demonstrate how to use Pulsar Operators to deploy Pulsar on Kubernetes.",[48,51893,51894],{},"Before I introduce the specific installation steps, let’s take a look at the three sets of Operators provided by StreamNative.",[321,51896,51897,51905,51908,51911,51914,51921],{},[324,51898,51899,51904],{},[55,51900,51903],{"href":51901,"rel":51902},"https:\u002F\u002Fdocs.streamnative.io\u002Foperators\u002Fpulsar-operator\u002Fpulsar-operator-install",[264],"Pulsar Operators",". Kubernetes controllers that provide a declarative API to simplify the deployment and management of Pulsar clusters on Kubernetes. Specifically, there are three Operators:",[324,51906,51907],{},"~ Pulsar Operator. Manages the deployment of the Pulsar broker and Pulsar proxy for the Pulsar cluster.",[324,51909,51910],{},"~ BookKeeper Operator. Provides full lifecycle management for the BookKeeper cluster.",[324,51912,51913],{},"~ ZooKeeper Operator. Provides full lifecycle management for the ZooKeeper cluster.",[324,51915,51916,51920],{},[55,51917,51919],{"href":20667,"rel":51918},[264],"Pulsar Resources Operator",". An independent controller that automatically manages Pulsar resources (for example, tenants, namespace, topics, and permissions) on Kubernetes through manifest files.",[324,51922,51923,51927],{},[55,51924,51926],{"href":34283,"rel":51925},[264],"Function Mesh Operator",". Integrates different functions to process data.",[48,51929,51930],{},"You can find the three sets of Operators in the streamnative and function-mesh Helm chart repositories respectively. If you haven’t added these two repositories, you need to use the helm repo add command to add them first before you search them for the operators.",[8325,51932,51935],{"className":51933,"code":51934,"language":8330},[8328],"# helm search repo streamnative\nstreamnative\u002Fpulsar-operator            0.11.5          0.11.5      Apache Pulsar Operators Helm chart for Kubernetes\nstreamnative\u002Fpulsar-resources-operator  v0.0.8          v0.0.1      Pulsar Resources Operator Helm chart for Pulsar…\n# helm search repo function-mesh\nNAME                                    CHART VERSION   APP VERSION DESCRIPTION\nfunction-mesh\u002Ffunction-mesh-operator    0.2.1           0.3.0       function mesh operator Helm chart for Kubernetes\n",[4926,51936,51934],{"__ignoreMap":18},[48,51938,51939,51940,51944],{},"For a quick start, you can follow the ",[55,51941,51943],{"href":51901,"rel":51942},[264],"official installation documentation",". This blog explores a step-by-step way to set up a Pulsar cluster by deploying its key components separately through the Pulsar Operators.",[40,51946,51948],{"id":51947},"fetch-and-check-the-pulsar-operators-helm-chart","Fetch and check the Pulsar Operators Helm chart",[1666,51950,51951],{},[324,51952,51953],{},"Instead of using “helm install” to deploy the chart from the repository directly, I fetched the chart to check its details in this example.",[8325,51955,51958],{"className":51956,"code":51957,"language":8330},[8328],"helm repo add streamnative https:\u002F\u002Fcharts.streamnative.io\nhelm repo update\nhelm fetch streamnative\u002Fpulsar-operator --untar\ncd pulsar-operator\n",[4926,51959,51957],{"__ignoreMap":18},[1666,51961,51962],{"start":19},[324,51963,51964],{},"The command helm fetch with the --untar option downloads the chart template to your local machine. Let’s check the chart file.",[8325,51966,51969],{"className":51967,"code":51968,"language":8330},[8328],"# cat Chart.yaml\napiVersion: v1\nappVersion: 0.9.4\ndescription: Apache Pulsar Operators Helm chart for Kubernetes\nhome: https:\u002F\u002Fstreamnative.io\nicon: http:\u002F\u002Fpulsar.apache.org\u002Fimg\u002Fpulsar.svg\nkubeVersion: '>= 1.16.0-0 \n3. The chart file describes the basic information of the Helm chart, such as the maintainer and app version. The current chart supports Kubernetes versions 1.16 to 1.23. As my existing Kubernetes version is 1.24.0, if I run “helm install” directly from the remote chart repository, the installation will stop at the Kubernetes version check.\n\n",[4926,51970,51968],{"__ignoreMap":18},[8300,51972,51974],{"id":51973},"helm-install-sn-operator-n-test-streamnativepulsar-operator","helm install sn-operator -n test streamnative\u002Fpulsar-operator",[48,51976,51977],{},"Error: INSTALLATION FAILED: chart requires kubeVersion: >= 1.16.0-0\nTo bypass this, modify the kubeVersion range to >=1.16.0–0 \u003C 1.25.0–0. Note that the StreamNative team is working on an issue regarding pdb v1beta1 API removal in Kubernetes 1.25. Current operators won’t work in 1.25+.",[1666,51979,51980],{"start":20920},[324,51981,51982,51983,190],{},"A good chart maintainer documents all configurations in values.yaml. This values.yaml file is pretty straightforward, as it describes the operators you can install, including zookeeper-operator, bookkeeper-operator, and pulsar-operator (broker\u002Fproxy). The file also contains the image repository locations and tags, as well as operator details like cluster roles\u002Froles, service accounts, and operator resource limits and requests. Additionally, if you want to pull the images from a private repository, simply change the image repository URL to your private repository. For more information about the role of values.yaml file in Helm, see the ",[55,51984,51987],{"href":51985,"rel":51986},"https:\u002F\u002Fhelm.sh\u002Fdocs\u002Fchart_template_guide\u002Fvalues_files\u002F",[264],"Helm documentation",[48,51989,51990],{},"In this example, I kept the default values in values.yaml. I will come back to modify some configurations (for example, CRD roles and cluster role bindings) in a more restrictive environment.",[40,51992,51994],{"id":51993},"deploy-the-pulsar-operators","Deploy the Pulsar Operators",[1666,51996,51997],{},[324,51998,51999,52000,190],{},"After reviewing the values, use helm install to deploy the Pulsar Operators in the sn-operator namespace through the local chart. As I mentioned above, I fetched the chart locally to change the Kubernetes version and inspect the values.yaml file. If your Kubernetes version is compatible (1.16-1.23), you can simply use the helm install command directly as stated in the ",[55,52001,7120],{"href":52002,"rel":52003},"https:\u002F\u002Fdocs.streamnative.io\u002Foperators\u002Fpulsar-operator\u002Fpulsar-operator-install#steps",[264],[8325,52005,52008],{"className":52006,"code":52007,"language":8330},[8328],"# kubectl create namespace sn-operator\n# helm install -n sn-operator pulsar-operator .\n",[4926,52009,52007],{"__ignoreMap":18},[1666,52011,52012],{"start":19},[324,52013,52014],{},"Check all resources in the sn-operator namespace. You should find the following components.",[8325,52016,52019],{"className":52017,"code":52018,"language":8330},[8328],"# kubectl get all -n sn-operator\nNAME                                                                READY   STATUS    RESTARTS   AGE\npod\u002Fpulsar-operator-bookkeeper-controller-manager-9c596465-h8nbh    1\u002F1     Running   0          16h\npod\u002Fpulsar-operator-pulsar-controller-manager-6f8699ffc-gr989       1\u002F1     Running   0          16h\npod\u002Fpulsar-operator-zookeeper-controller-manager-7b54b76c79-rsm6t   1\u002F1     Running   0          16h\nNAME                                                            READY   UP-TO-DATE   AVAILABLE   AGE\ndeployment.apps\u002Fpulsar-operator-bookkeeper-controller-manager   1\u002F1     1            1           16h\ndeployment.apps\u002Fpulsar-operator-pulsar-controller-manager       1\u002F1     1            1           16h\ndeployment.apps\u002Fpulsar-operator-zookeeper-controller-manager    1\u002F1     1            1           16h\nNAME                                                                      DESIRED   CURRENT   READY   AGE\nreplicaset.apps\u002Fpulsar-operator-bookkeeper-controller-manager-9c596465    1         1         1       16h\nreplicaset.apps\u002Fpulsar-operator-pulsar-controller-manager-6f8699ffc       1         1         1       16h\nreplicaset.apps\u002Fpulsar-operator-zookeeper-controller-manager-7b54b76c79   1         1         1       16h\n",[4926,52020,52018],{"__ignoreMap":18},[48,52022,52023],{},"‍\n3. The related Kubernetes API resources have also been created.",[8325,52025,52028],{"className":52026,"code":52027,"language":8330},[8328],"# kubectl api-resources | grep pulsar\npulsarbrokers                     pb,broker         pulsar.streamnative.io\u002Fv1alpha1        true         PulsarBroker\npulsarconnections                 pconn             pulsar.streamnative.io\u002Fv1alpha1        true         PulsarConnection\npulsarnamespaces                  pns               pulsar.streamnative.io\u002Fv1alpha1        true         PulsarNamespace\npulsarpermissions                 ppermission       pulsar.streamnative.io\u002Fv1alpha1        true         PulsarPermission\npulsarproxies                     pp,proxy          pulsar.streamnative.io\u002Fv1alpha1        true         PulsarProxy\npulsartenants                     ptenant           pulsar.streamnative.io\u002Fv1alpha1        true         PulsarTenant\npulsartopics                      ptopic            pulsar.streamnative.io\u002Fv1alpha1        true         PulsarTopic\n",[4926,52029,52027],{"__ignoreMap":18},[40,52031,52033],{"id":52032},"deploy-zookeeper-bookkeeper-pulsarbroker-and-pulsarproxy","Deploy ZooKeeper, BookKeeper, PulsarBroker, and PulsarProxy",[48,52035,52036],{},"As shown in the previous section, there are seven controllers\u002Foperators, and each handles different “kinds” of custom resources, including Pulsar Operator CRDs (PulsarBroker, PulsarProxy, ZooKeeperCluster, and BookKeeperCluster) and Resources Operator CRDs (PulsarTenant, PulsarNamespace, PulsarTopic, PulsarPermission, and PulsarConnection).",[48,52038,52039],{},"Note that you need the Resource Operators to create topics, tenants, namespaces, and permissions. The CRDs created by the Helm chart contains both Pulsar cluster CRDs and resource CRDs.",[48,52041,52042,52043,52047],{},"Like the standard Kubernetes controllers and Deployments, we tell the controller what we want by feeding the cluster definitions in ",[55,52044,52046],{"href":44901,"rel":52045},[264],"Custom Resource (CR)",". In a regular Kubernetes Deployment manifest, you put all kinds of components in a YAML file and use kubectl apply or kubectl create to create Pods, Services, ConfigMaps, and other resources. Similarly, you can put ZooKeeper, BookKeeper, and PulsarBroker cluster definitions in a single YAML file, and then deploy them in one shot.",[48,52049,52050,52051,52056,52057,190],{},"In order to understand and troubleshoot the deployment, I will break it down into three parts - ZooKeeper, BookKeeper, and then the PulsarBroker. As this blog is focused on the installation of these Pulsar components on Kubernetes, I will not explain their concepts in detail. If you want to understand the dependencies among the three, check out Sijie Guo’s",[55,52052,52055],{"href":52053,"rel":52054},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1RQSot5tTuU&t=112s",[264]," TGIP YouTube video"," or refer to the ",[55,52058,52061],{"href":52059,"rel":52060},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F",[264],"Pulsar documentation",[1666,52063,52064],{},[324,52065,52066],{},"The following is the ZooKeeperCluster definition that the ZooKeeper controller\u002Foperator will use to deploy a ZooKeeper cluster. It is very similar to a Kubernetes Deployment. It defines the image location, version, replicas, resources, and persistent storage properties. There should be other properties like JVM flags for tuning. I will discuss this later as we focus on getting a running cluster, and the operator should help ensure the extra configurations are updated automatically.",[8325,52068,52071],{"className":52069,"code":52070,"language":8330},[8328],"---\napiVersion: zookeeper.streamnative.io\u002Fv1alpha1\nkind: ZooKeeperCluster\nmetadata:\n  name: my\n  namespace: sn-platform\nspec:\n  image: streamnative\u002Fpulsar:2.9.2.15\n  replicas: 3\n  pod:\n    resources:\n      requests:\n        cpu: \"50m\"\n        memory: \"256Mi\"\n      limits:\n        cpu: \"50m\"\n        memory: \"256Mi\"\n  persistence:\n    reclaimPolicy: Retain\n    data:\n      accessModes:\n      - ReadWriteOnce\n      resources:\n        requests:\n          storage: \"10Gi\"\n    dataLog:\n      accessModes:\n      - ReadWriteOnce\n      resources:\n        requests:\n          storage: \"2Gi\"\n",[4926,52072,52070],{"__ignoreMap":18},[1666,52074,52075],{"start":19},[324,52076,52077],{},"Apply this file and see what happens.",[8325,52079,52082],{"className":52080,"code":52081,"language":8330},[8328],"# kubectl apply -f zk-cluster.yaml\nzookeepercluster.zookeeper.streamnative.io\u002Fmy created\n\n# kubectl get pod -n sn-platform -w\nNAME      READY   STATUS    RESTARTS   AGE\nmy-zk-0   1\u002F1     Running   0          25s\nmy-zk-1   1\u002F1     Running   0          25s\nmy-zk-2   1\u002F1     Running   0          25s\n\n# kubectl get svc -n sn-platform\nNAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                        AGE\nmy-zk            ClusterIP   10.104.64.179           2181\u002FTCP,8000\u002FTCP,9990\u002FTCP                     42s\nmy-zk-headless   ClusterIP   None                    2181\u002FTCP,2888\u002FTCP,3888\u002FTCP,8000\u002FTCP,9990\u002FTCP   42s\n",[4926,52083,52081],{"__ignoreMap":18},[1666,52085,52086],{"start":279},[324,52087,52088],{},"Check the ZooKeeper controller log and see what’s going on.",[8325,52090,52093],{"className":52091,"code":52092,"language":8330},[8328],"# kubectl logs -n sn-ops pulsar-operator-zookeeper-controller-manager-7b54b76c79-rsm6t\n{\"severity\":\"info\",\"timestamp\":\"2022-05-11T02:47:07Z\",\"logger\":\"controllers.ZooKeeperCluster\",\"message\":\"Reconciling ZooKeeperCluster\",\"Request.Namespace\":\"sn-platform\",\"Request.Name\":\"my\"}\n{\"severity\":\"info\",\"timestamp\":\"2022-05-11T02:47:07Z\",\"logger\":\"controllers.ZooKeeperCluster\",\"message\":\"Updating an existing ZooKeeper StatefulSet\",\"StatefulSet.Namespace\":\"sn-platform\",\"StatefulSet.Name\":\"my-zk\"}\n{\"severity\":\"debug\",\"timestamp\":\"2022-05-11T02:47:07Z\",\"logger\":\"controller\",\"message\":\"Successfully Reconciled\",\"reconcilerGroup\":\"zookeeper.streamnative.io\",\"reconcilerKind\":\"ZooKeeperCluster\",\"controller\":\"zookeepercluster\",\"name\":\"my\",\"namespace\":\"sn-platform\"}\n{\"severity\":\"info\",\"timestamp\":\"2022-05-11T02:47:07Z\",\"logger\":\"controllers.ZooKeeperCluster\",\"message\":\"Reconciling ZooKeeperCluster\",\"Request.Namespace\":\"sn-platform\",\"Request.Name\":\"my\"}\n{\"severity\":\"info\",\"timestamp\":\"2022-05-11T02:47:07Z\",\"logger\":\"controllers.ZooKeeperCluster\",\"message\":\"Updating an existing ZooKeeper StatefulSet\",\"StatefulSet.Namespace\":\"sn-platform\",\"StatefulSet.Name\":\"my-zk\"}\n{\"severity\":\"debug\",\"timestamp\":\"2022-05-11T02:47:07Z\",\"logger\":\"controller\",\"message\":\"Successfully Reconciled\",\"reconcilerGroup\":\"zookeeper.streamnative.io\",\"reconcilerKind\":\"ZooKeeperCluster\",\"controller\":\"zookeepercluster\",\"name\":\"my\",\"namespace\":\"sn-platform\"}\n{\"severity\":\"info\",\"timestamp\":\"2022-05-11T02:47:07Z\",\"logger\":\"controllers.ZooKeeperCluster\",\"message\":\"Reconciling ZooKeeperCluster\",\"Request.Namespace\":\"sn-platform\",\"Request.Name\":\"my\"}\n{\"severity\":\"info\",\"timestamp\":\"2022-05-11T02:47:07Z\",\"logger\":\"controllers.ZooKeeperCluster\",\"message\":\"Updating an existing ZooKeeper StatefulSet\",\"StatefulSet.Namespace\":\"sn-platform\",\"StatefulSet.Name\":\"my-zk\"}\n",[4926,52094,52092],{"__ignoreMap":18},[48,52096,52097,52098,52102],{},"The ZooKeeper controller is watching and running a reconcile loop that keeps checking the my ZooKeeperCluster status in the sn-platform namespace. This is a recommended ",[55,52099,52101],{"href":51888,"rel":52100},[264],"operator design pattern",". The operator Pod log is handy to troubleshoot your Pulsar deployment.",[1666,52104,52105],{"start":20920},[324,52106,52107],{},"The next component is the BookKeeper cluster. Following the same pattern, you can define the BookKeeperCluster kind like the following. Note that zkServers is required in this YAML file, and it should point to the headless Service (reaching all three zkServers) of the ZooKeeper cluster you just created.",[8325,52109,52112],{"className":52110,"code":52111,"language":8330},[8328],"---\napiVersion: bookkeeper.streamnative.io\u002Fv1alpha1\nkind: BookKeeperCluster\nmetadata:\n  name: my\n  namespace: sn-platform\nspec:\n  image: streamnative\u002Fpulsar:2.9.2.15\n  replicas: 3\n  pod:\n    resources:\n      requests:\n        cpu: \"200m\"\n        memory: \"256Mi\"\n  storage:\n    reclaimPolicy: Retain\n    journal:\n      numDirsPerVolume: 1\n      numVolumes: 1\n      volumeClaimTemplate:\n        accessModes:\n        - ReadWriteOnce\n        resources:\n          requests:\n            storage: \"8Gi\"\n    ledger:\n      numDirsPerVolume: 1\n      numVolumes: 1\n      volumeClaimTemplate:\n        accessModes:\n        - ReadWriteOnce\n        resources:\n          requests:\n            storage: \"16Gi\"\n  zkServers: my-zk-headless:2181\n",[4926,52113,52111],{"__ignoreMap":18},[1666,52115,52116],{"start":20934},[324,52117,52118],{},"Apply the BookKeeper manifest file.",[8325,52120,52123],{"className":52121,"code":52122,"language":8330},[8328],"# kubectl apply -f bk-cluster.yaml\nbookkeepercluster.bookkeeper.streamnative.io\u002Fmy created\n\n# kubectl get pod -n sn-platform\nNAME                    READY   STATUS    RESTARTS   AGE\nmy-bk-0                 1\u002F1     Running   0          90s\nmy-bk-1                 1\u002F1     Running   0          90s\nmy-bk-2                 1\u002F1     Running   0          90s\nmy-bk-auto-recovery-0   1\u002F1     Running   0          48s\nmy-zk-0                 1\u002F1     Running   0          4m51s\nmy-zk-1                 1\u002F1     Running   0          4m51s\nmy-zk-2                 1\u002F1     Running   0          4m51s\n",[4926,52124,52122],{"__ignoreMap":18},[1666,52126,52127],{"start":20948},[324,52128,52129],{},"You can use the same command to find out what the BookKeeper operator is doing behind the scenes.",[8325,52131,52134],{"className":52132,"code":52133,"language":8330},[8328],"# kubectl logs -n sn-ops pulsar-operator-bookkeeper-controller-manager-9c596465-h8nbh\nW0512 11:45:15.235940       1 warnings.go:67] policy\u002Fv1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy\u002Fv1 PodDisruptionBudget\n{\"severity\":\"info\",\"timestamp\":\"2022-05-12T11:47:27Z\",\"logger\":\"controllers.BookKeeperCluster\",\"message\":\"Reconciling BookKeeperCluster\",\"Request.Namespace\":\"sn-platform\",\"Request.Name\":\"my\"}\n{\"severity\":\"info\",\"timestamp\":\"2022-05-12T11:47:27Z\",\"logger\":\"controllers.BookKeeperCluster\",\"message\":\"Updating the status for the BookKeeperCluster\",\"Namespace\":\"sn-platform\",\"Name\":\"my\",\"Status\":{\"observedGeneration\":4,\"replicas\":4,\"readyReplicas\":4,\"updatedReplicas\":4,\"labelSelector\":\"cloud.streamnative.io\u002Fapp=pulsar,cloud.streamnative.io\u002Fcluster=my,cloud.streamnative.io\u002Fcomponent=bookie\",\"conditions\":[{\"type\":\"AutoRecovery\",\"status\":\"True\",\"reason\":\"Deploy\",\"message\":\"Ready\",\"lastTransitionTime\":\"2022-05-08T19:58:26Z\"},{\"type\":\"Bookie\",\"status\":\"True\",\"reason\":\"Ready\",\"message\":\"Bookies are ready\",\"lastTransitionTime\":\"2022-05-09T00:34:12Z\"},{\"type\":\"Initialization\",\"status\":\"True\",\"reason\":\"Initialization\",\"message\":\"Initialization succeeded\",\"lastTransitionTime\":\"2022-05-08T19:57:43Z\"},{\"type\":\"Ready\",\"status\":\"True\",\"reason\":\"Ready\",\"lastTransitionTime\":\"2022-05-09T00:34:12Z\"}]}}\n{\"severity\":\"debug\",\"timestamp\":\"2022-05-12T11:47:27Z\",\"logger\":\"controller\",\"message\":\"Successfully Reconciled\",\"reconcilerGroup\":\"bookkeeper.streamnative.io\",\"reconcilerKind\":\"BookKeeperCluster\",\"controller\":\"bookkeepercluster\",\"name\":\"my\",\"namespace\":\"sn-platform\"}\nW0512 11:49:18.419391       1 warnings.go:67] autoscaling\u002Fv2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling\u002Fv2 HorizontalPodAutoscaler\nW0512 11:52:35.237812       1 warnings.go:67] policy\u002Fv1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy\u002Fv1 PodDisruptionBudget\nW0512 11:57:06.421507       1 warnings.go:67] autoscaling\u002Fv2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling\u002Fv2 HorizontalPodAutoscaler\nW0512 11:58:43.240049       1 warnings.go:67] policy\u002Fv1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy\u002Fv1 PodDisruptionBudget\nW0512 12:04:06.423448       1 warnings.go:67] autoscaling\u002Fv2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling\u002Fv2 HorizontalPodAutoscaler\nW0512 12:05:03.242609       1 warnings.go:67] policy\u002Fv1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy\u002Fv1 PodDisruptionBudget\nW0512 12:09:08.425304       1 warnings.go:67] autoscaling\u002Fv2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling\u002Fv2 HorizontalPodAutoscaler\nW0512 12:14:15.245078       1 warnings.go:67] policy\u002Fv1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy\u002Fv1 PodDisruptionBudget\nW0512 12:18:47.427470       1 warnings.go:67] autoscaling\u002Fv2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling\u002Fv2 HorizontalPodAutoscaler\nW0512 12:19:30.247840       1 warnings.go:67] policy\u002Fv1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy\u002Fv1 PodDisruptionBudget\nW0512 12:25:04.249159       1 warnings.go:67] policy\u002Fv1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy\u002Fv1 PodDisruptionBudget\nW0512 12:25:13.430394       1 warnings.go:67] autoscaling\u002Fv2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling\u002Fv2 HorizontalPodAutoscaler\n",[4926,52135,52133],{"__ignoreMap":18},[48,52137,52138,52139,190],{},"There are a bunch of warnings complaining about API deprecations. This is normal because in this example my Kubernetes version is 1.24.0, which has ",[55,52140,52143],{"href":52141,"rel":52142},"https:\u002F\u002Fkubernetes.io\u002Fblog\u002F2022\u002F04\u002F07\u002Fupcoming-changes-in-kubernetes-1-24\u002F",[264],"many big changes",[1666,52145,52146],{"start":25801},[324,52147,52148],{},"The next component is the broker cluster. The following is the PulsarBroker YAML file, where you can see the broker also depends on zkServer. Note that I added config.custom in the descriptor. This will turn on broker’s WebSocket endpoint.",[8325,52150,52153],{"className":52151,"code":52152,"language":8330},[8328],"---\napiVersion: pulsar.streamnative.io\u002Fv1alpha1\nkind: PulsarBroker\nmetadata:\n  name: my\n  namespace: sn-platform\nspec:\n  image: streamnative\u002Fpulsar:2.9.2.15\n  pod:\n    resources:\n      requests:\n        cpu: 200m\n        memory: 256Mi\n    terminationGracePeriodSeconds: 30\n  config:\n    custom:\n      webSocketServiceEnabled: \"true\"\n  replicas: 2\n  zkServers: my-zk-headless:2181\n",[4926,52154,52152],{"__ignoreMap":18},[1666,52156,52157],{"start":25806},[324,52158,52159,52160],{},"Like ZooKeeper and BookKeeper clusters, creating a broker cluster is the same as creating standard Kubernetes objects. You can use kubectl get pod -n ",[52161,52162,52163],"namespace",{}," -w to watch the sequence of Pod creation. This is helpful to understand the dependencies among Pulsar components. I skipped the operator log in this blog. You can also run a similar command to trace the controller log.",[8325,52165,52168],{"className":52166,"code":52167,"language":8330},[8328],"# kubectl apply -f br-cluster.yaml\npulsarbroker.pulsar.streamnative.io\u002Fmy created\n\n# kubectl get pod -n sn-platform -w\nNAME                            READY   STATUS      RESTARTS   AGE\nmy-bk-0                         1\u002F1     Running     0          2m11s\nmy-bk-1                         1\u002F1     Running     0          2m11s\nmy-bk-2                         1\u002F1     Running     0          2m11s\nmy-bk-auto-recovery-0           1\u002F1     Running     0          89s\nmy-broker-metadata-init-gghqc   0\u002F1     Completed   0          6s\nmy-zk-0                         1\u002F1     Running     0          5m32s\nmy-zk-1                         1\u002F1     Running     0          5m32s\nmy-zk-2                         1\u002F1     Running     0          5m32s\nmy-broker-metadata-init-gghqc   0\u002F1     Completed   0          7s\nmy-broker-metadata-init-gghqc   0\u002F1     Completed   0          7s\nmy-broker-0                     0\u002F1     Pending     0          0s\nmy-broker-1                     0\u002F1     Pending     0          0s\nmy-broker-0                     0\u002F1     Pending     0          0s\nmy-broker-1                     0\u002F1     Pending     0          0s\nmy-broker-1                     0\u002F1     Init:0\u002F1    0          0s\nmy-broker-0                     0\u002F1     Init:0\u002F1    0          0s\nmy-broker-metadata-init-gghqc   0\u002F1     Terminating   0          8s\nmy-broker-metadata-init-gghqc   0\u002F1     Terminating   0          8s\nmy-broker-0                     0\u002F1     Init:0\u002F1      0          0s\nmy-broker-1                     0\u002F1     Init:0\u002F1      0          0s\nmy-broker-1                     0\u002F1     PodInitializing   0          1s\nmy-broker-0                     0\u002F1     PodInitializing   0          1s\nmy-broker-1                     0\u002F1     Running           0          2s\nmy-broker-0                     0\u002F1     Running           0          2s\nmy-broker-0                     0\u002F1     Running           0          10s\nmy-broker-1                     0\u002F1     Running           0          10s\nmy-broker-0                     1\u002F1     Running           0          40s\nmy-broker-1                     1\u002F1     Running           0          40s\n",[4926,52169,52167],{"__ignoreMap":18},[1666,52171,52172,52175],{"start":25812},[324,52173,52174],{},"When all Pods are up and running, check the Services and you can see that all their types are ClusterIP. This assumes that all producer and consumer workloads are inside the Kubernetes cluster. In order to test the traffic, I need a LoadBalancer from machines in my environment but external to the Kubernetes cluster.",[324,52176,52177],{},"Now let’s deploy the proxy. The proxy is a bit tricky because TLS is enabled by default, which makes sense as it is the external gateway to connect to the Pulsar cluster. In this example, I turned off TLS on all components for simplicity. I will discuss enabling TLS using the operator in the next blog.",[8325,52179,52182],{"className":52180,"code":52181,"language":8330},[8328],"---\napiVersion: pulsar.streamnative.io\u002Fv1alpha1\nkind: PulsarProxy\nmetadata:\n    name: my\n    namespace: sn-platform\nspec:\n    brokerAddress: my-broker-headless \n    dnsNames: []\n    #webSocketServiceEnabled: true\n    image: streamnative\u002Fpulsar:2.9.2.15\n    config:\n      tls:\n        enabled: false \n    issuerRef:\n      name: \"\"\n    pod:\n      resources:\n        requests:\n          cpu: 200m\n          memory: 256Mi\n    replicas: 1\n",[4926,52183,52181],{"__ignoreMap":18},[1666,52185,52186],{"start":25823},[324,52187,52188],{},"Apply the proxy manifest file and check the status of different resources.",[8325,52190,52193],{"className":52191,"code":52192,"language":8330},[8328],"# kubectl apply -f px-cluster.yaml\npulsarproxy.pulsar.streamnative.io\u002Fmy created\n\n# kubectl get pod -n sn-platform\nNAME                    READY   STATUS    RESTARTS   AGE\nmy-bk-0                 1\u002F1     Running   0          44m\nmy-bk-1                 1\u002F1     Running   0          44m\nmy-bk-2                 1\u002F1     Running   0          44m\nmy-bk-auto-recovery-0   1\u002F1     Running   0          43m\nmy-broker-0             1\u002F1     Running   0          42m\nmy-broker-1             1\u002F1     Running   0          42m\nmy-proxy-0              1\u002F1     Running   0          57s\nmy-zk-0                 1\u002F1     Running   0          47m\nmy-zk-1                 1\u002F1     Running   0          47m\nmy-zk-2                 1\u002F1     Running   0          47m\n\n# kubectl get svc -n sn-platform\nNAME                           TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                        AGE\nmy-bk                          ClusterIP      10.104.212.43           3181\u002FTCP,8000\u002FTCP                              44m\nmy-bk-auto-recovery-headless   ClusterIP      None                    3181\u002FTCP,8000\u002FTCP                              44m\nmy-bk-headless                 ClusterIP      None                    3181\u002FTCP,8000\u002FTCP                              44m\nmy-broker                      ClusterIP      10.99.107.224           6650\u002FTCP,8080\u002FTCP                              41m\nmy-broker-headless             ClusterIP      None                    6650\u002FTCP,8080\u002FTCP                              41m\nmy-proxy-external              LoadBalancer   10.109.250.31   10.0.0.36     6650:32751\u002FTCP,8080:30322\u002FTCP                  33s\nmy-proxy-headless              ClusterIP      None                    6650\u002FTCP,8080\u002FTCP                              33s\nmy-zk                          ClusterIP      10.104.64.179           2181\u002FTCP,8000\u002FTCP,9990\u002FTCP                     47m\nmy-zk-headless                 ClusterIP      None                    2181\u002FTCP,2888\u002FTCP,3888\u002FTCP,8000\u002FTCP,9990\u002FTCP   47m\n",[4926,52194,52192],{"__ignoreMap":18},[48,52196,52197],{},"As shown above, the proxy automatically obtained my external LoadBalancer IP (10.0.0.36). In this example, I used MetalLB to expose the proxy Service.",[40,52199,319],{"id":316},[48,52201,52202],{},"By now, you should have a running Pulsar cluster and an exposed Service endpoint that you can use to start producing and consuming messages. In the next blog, I will demonstrate how to write consumer and producer container images to interact with the Pulsar cluster.",[40,52204,38376],{"id":38375},[48,52206,38379,52207,40419],{},[55,52208,38384],{"href":38382,"rel":52209},[264],[321,52211,52212,52217,52221,52228,52234],{},[324,52213,38390,52214,190],{},[55,52215,31914],{"href":31912,"rel":52216},[264],[324,52218,45476,52219,45480],{},[55,52220,3550],{"href":45479},[324,52222,52223,758,52225],{},[2628,52224,47315],{},[55,52226,50588],{"href":50586,"rel":52227},[264],[324,52229,52230,758,52232],{},[2628,52231,46310],{},[55,52233,46332],{"href":50601},[324,52235,52236,758,52238],{},[2628,52237,46310],{},[55,52239,43234],{"href":50608},{"title":18,"searchDepth":19,"depth":19,"links":52241},[52242,52243,52244,52245,52246],{"id":51947,"depth":19,"text":51948},{"id":51993,"depth":19,"text":51994},{"id":52032,"depth":19,"text":52033},{"id":316,"depth":19,"text":319},{"id":38375,"depth":19,"text":38376},"2022-11-22","Learn how to create an Apache Pulsar cluster on Kubernetes with Pulsar Operators in this step-by-step tutorial. Follow along as we guide you through the process of setting up a Pulsar cluster on a Kubernetes cluster","\u002Fimgs\u002Fblogs\u002F63be6afc48236609b32ed82e_pulsar-operators-tutorial-part-1-top.jpg",{},{"title":43242,"description":52248},"blog\u002Fpulsar-operators-tutorial-part-1-create-apache-pulsar-cluster-kubernetes",[38442,821,16985],"N_FCkpjemuBCqQt2spnx9waPkN25iJbALRNow845Eoo",{"id":52256,"title":52257,"authors":52258,"body":52260,"category":821,"createdAt":290,"date":52540,"description":52541,"extension":8,"featured":294,"image":52542,"isDraft":294,"link":290,"meta":52543,"navigation":7,"order":296,"path":52544,"readingTime":5505,"relatedResources":290,"seo":52545,"stem":52546,"tags":52547,"__hash__":52548},"blogs\u002Fblog\u002Fhow-proxima-beta-implemented-cqrs-event-sourcing-ofapache-pulsar-scylladb.md","How Proxima Beta Implemented CQRS and Event Sourcing on Top of Apache Pulsar and ScyllaDB",[52259],"Lei Shi",{"type":15,"value":52261,"toc":52526},[52262,52264,52273,52284,52288,52291,52297,52305,52310,52313,52317,52320,52323,52326,52331,52334,52338,52341,52344,52349,52352,52355,52361,52364,52375,52378,52381,52385,52388,52399,52402,52406,52409,52413,52416,52419,52423,52426,52432,52435,52438,52449,52452,52456,52459,52470,52474,52481,52487,52494,52501,52503,52508],[40,52263,19156],{"id":19155},[48,52265,52266,52267,52272],{},"As a part of Tencent Interactive Entertainment Group Global (IEG Global), ",[55,52268,52271],{"href":52269,"rel":52270},"https:\u002F\u002Fwww.levelinfinite.com\u002Fabout-us\u002F",[264],"Proxima Beta"," is committed to supporting our teams and studios to bring unique, exhilarating games to millions of players around the world. At Proxima Beta, our team is responsible for managing a wide range of risks to our business. As such, we must build an efficient real-time analytics system to consistently monitor all kinds of activities in our business domain.",[48,52274,52275,52276,4003,52279,52283],{},"In this blog, I will talk about our experience of building a real-time analytics system on top of ",[55,52277,821],{"href":23526,"rel":52278},[264],[55,52280,46570],{"href":52281,"rel":52282},"https:\u002F\u002Fwww.scylladb.com\u002F",[264],". Before I share our practices in detail, I will introduce two major architectures for data manipulation, namely CRUD and CQRS. I will also explain our reasons for combining CQRS and Event Sourcing to implement our service architecture, as well as their advantages over CRUD-based systems. Lastly, I will dive deeper into our practices of leveraging distinguishing features of Apache Pulsar for better data governance, such as multitenancy and geo-replication.",[40,52285,52287],{"id":52286},"a-stereotypical-crud-system","A stereotypical CRUD system",[48,52289,52290],{},"CRUD is the acronym for Create, Read, Update and Delete. It is one of the most common data processing methods for microservices development. These four operations are essential for managing persistent data, often used for relational database applications.",[48,52292,52293],{},[384,52294],{"alt":52295,"src":52296},"illustration of CQRS","\u002Fimgs\u002Fblogs\u002F63b5623369f7c2c9619e2fcb_image2-221115.png",[48,52298,52299,52300,4031],{},"This is another definition of CQRS in ",[55,52301,52304],{"href":52302,"rel":52303},"https:\u002F\u002Fmedium.com\u002Fmicroservicegeeks\u002Fintroduction-to-cqrs-64f609544f4a",[264],"Amanda Bennett's blog",[916,52306,52307],{},[48,52308,52309],{},"The Command and Query Responsibility Segregation (CQRS) pattern separates read and write operations for a data store. Reads and writes may take entirely different paths through the application and may be applied to different data stores. CQRS relies on asynchronous replication to progressively apply writes to the read view, so that changes to the application state instigated by the writer are eventually observed by the reader.",[48,52311,52312],{},"The key idea of CQRS is to explicitly build data models that serve reads and writes respectively instead of doing them against the same data model. This pattern is not very interesting by itself. However, it becomes extremely interesting when working together with Event Sourcing from an architectural point of view.",[40,52314,52316],{"id":52315},"event-sourcing-handling-operations-on-data-driven-by-events","Event Sourcing: Handling operations on data driven by events",[48,52318,52319],{},"The fundamental idea of Event Sourcing is to ensure every change to the state of an application is captured in an event object. Event objects are stored in the sequence they were applied for the application state itself. For the Event Sourcing pattern, instead of storing just the current state of the data in a domain, you use an append-only store to record the full series of actions taken on that data.",[48,52321,52322],{},"This idea is simple but really powerful because the event store acts as a system of records and can be used to materialize domain objects and views. As events represent every action that has been recorded, any possible model describing the system can be built from the events.",[48,52324,52325],{},"In reality, there are many cases where Event Sourcing is applied. A good example of Event Sourcing is our bank statement as shown in the table below.",[48,52327,52328],{},[384,52329],{"alt":21101,"src":52330},"\u002Fimgs\u002Fblogs\u002F63b5628fdc987102242d9086_table-timestamp-comment-change-balance.webp",[48,52332,52333],{},"In short, Event Sourcing tracks changes by capturing the sequence of actions instead of overwriting states deconstructively, which is what a CRUD system usually does.",[40,52335,52337],{"id":52336},"why-cqrs-and-event-sourcing","Why CQRS and Event Sourcing",[48,52339,52340],{},"CQRS-based implementations are often used together with the Even Sourcing pattern.",[48,52342,52343],{},"On the one hand, CQRS allows you to use Event Sourcing as a data storage mechanism, which is very important when building a non-trivial CQRS-based system. Although you can maintain relational models for reading and writing respectively, this practice requires high cost, since there is an event model required to synchronize the two. As mentioned above, CQRS fundamentally separates reads and writes into different models. This means with Event Sourcing, you can leverage the event model as the persistence model on the write side.",[48,52345,52346],{},[384,52347],{"alt":52295,"src":52348},"\u002Fimgs\u002Fblogs\u002F63b562a2d1a6151ffce18cf5_image3-221115.png",[48,52350,52351],{},"On the other hand, one of the major issues of using Event Sourcing alone is that you cannot perform a query like “Give me all users whose first names are Joe” to a system. This is impossible due to the lack of a representation of the current state. The only valid query to an Event sourcing system alone is GetEventById. The responsibility of maintaining the current state is shifted to event processors. Different processors can generate different views against the same events.",[48,52353,52354],{},"Here, I would like to share a real-world example to further explain why we selected CQRS with Event Sourcing for our service architecture.",[48,52356,52357],{},[384,52358],{"alt":52359,"src":52360},"Pulsar and ScyllaDB in service architecture","\u002Fimgs\u002Fblogs\u002F63b562a29b07677ec392149a_image6-221115.png",[48,52362,52363],{},"In this architecture, we use Apache Pulsar as the event storage solution because it meets the following needs:",[1666,52365,52366,52369,52372],{},[324,52367,52368],{},"Multitenancy and workload isolation. This feature is critical to large organizations like us. As multiple teams are working on the same set of data (event streams) in parallel, you must have fine-grained access control and prevent workloads interference with each other.",[324,52370,52371],{},"Scalability and elasticity. Since all activities will be captured as events and all events will be recorded for a certain amount of time, we need the ability to scale our cluster according to the volume of incoming traffic.",[324,52373,52374],{},"Geo-replication. Running a business across the globe is a challenging task, as we need to take different factors into consideration, such as policy compliance and network latency.",[48,52376,52377],{},"I will explain how Pulsar has helped us in these aspects in more detail in the next section.",[48,52379,52380],{},"On the read side of the system, any SQL\u002FNoSQL solutions that fit your query workload could be a good candidate. It is also possible to have more than one state store and optimize each of them for a certain kind of query. In our use case, since we are dealing with hundreds of thousands of game-playing sessions in parallel, we finally landed our solution on ScyllaDB as the state storage (An alternative implementation of Apache Cassandra, inspired by Amazon DynamoDB).",[40,52382,52384],{"id":52383},"a-multi-cluster-solution-built-on-apache-pulsar","A multi-cluster solution built on Apache Pulsar",[48,52386,52387],{},"There are different reasons for building a multi-cluster system as shown below:",[321,52389,52390,52393,52396],{},[324,52391,52392],{},"Achieve the Recovery Time Objectives (RTOs) and the Recovery Point Objectives (RPOs) of your organization",[324,52394,52395],{},"Lower network latency for better user experience",[324,52397,52398],{},"Comply with rules and regulations",[48,52400,52401],{},"In our case, low network latency and regulation compliance are top priorities. We are trying our best to make sure data is processed and saved in the right region. Let me quickly walk you through some typical approaches to deploying a multi-cluster system.",[32,52403,52405],{"id":52404},"independent-clusters-in-different-regions","Independent clusters in different regions",[48,52407,52408],{},"This approach runs multiple independent instances in different regions with no intercommunication. In some cases, eliminating cross-region connectivity is necessary. For example, you may need to deploy a dedicated cluster in a customer's private data center. The downside is that the maintenance cost will surge as you have more clusters. However, it does give you a high level of confidence in compliance, since there is no way you can accidentally process or save data in the wrong location.",[32,52410,52412],{"id":52411},"application-level-federation","Application-level federation",[48,52414,52415],{},"This solution pushes the complexity to the application layer. The application server coordinates with other peers to make sure data is saved to the right location. If your organization only runs one application and doesn't have a heterogeneous infrastructure, this approach probably makes more sense. This is because no matter how complex the implementation is, you only have to do it once.",[48,52417,52418],{},"In reality, however, a large organization may have hundreds of applications. We think it is not reasonable to ask every application developer to deal with a multi-cluster deployment. To make our developers less worried about complicated compliance issues, we took another approach, also known as the Global Data Ring.",[32,52420,52422],{"id":52421},"global-data-ring","Global Data Ring",[48,52424,52425],{},"This solution is a combination of policies and technologies. Every application only has access to local endpoints. Every cluster contains a Pulsar instance and a ScyllaDB instance. There is no interconnectivity between applications. This ensures that no application can accidentally access a region that it should not touch. Our Platform team can enforce this implementation without involving individual application developers.",[48,52427,52428],{},[384,52429],{"alt":52430,"src":52431},"logo pulsar and scylab on a target","\u002Fimgs\u002Fblogs\u002F63b562a219e386ffb5d90006_image7-221115.png",[48,52433,52434],{},"Figure 7",[48,52436,52437],{},"In this architecture, we are using Pulsar namespaces as geofencing data containers. Currently, we have three types of namespaces:",[321,52439,52440,52443,52446],{},[324,52441,52442],{},"Global. Geo-replication is enabled for the global namespace among all clusters. Applications running in different regions can share the same view of the namespace. Any data written to the global namespace is automatically replicated to the rest of the regions.",[324,52444,52445],{},"Regional. Geo-replication is not enabled for the regional namespace (or local namespace). The data will be stored in the same region as the writer.",[324,52447,52448],{},"Cross-region. Geo-replication is enabled for the cross-region namespace only with selected clusters. The reason is that in accordance with some local rules and regulations, we can copy the data out of a country in some cases, as long as the original copy stays within the border. This gives us great flexibility to move our workload to nearby regions, helping optimize our overall cost.",[48,52450,52451],{},"In this practice, the compliance policy can be represented by the system configuration of Pulsar and ScyllaDB, which is under the entire control of the Platform team. Those structured configurations are much easier for auditing and visualization, which eventually help us build better governance within our organization.",[40,52453,52455],{"id":52454},"whats-next-for-your-organization","What’s next for your organization",[48,52457,52458],{},"In this blog, I explained CRUD and CQRS, and why we should combine CQRS and Event Sourcing together. I hope our experience of implementing CQRS and Event Sourcing on top of Pulsar and ScyllaDB can be helpful to those who want to build similar architectures. And here are my suggestions for you in terms of short- and long-term planning.",[1666,52460,52461,52464,52467],{},[324,52462,52463],{},"Start by trying to build materialized views for queries.",[324,52465,52466],{},"Figure out what domain events your system can produce with your client.",[324,52468,52469],{},"Implement your data model based on the domain events of your client and try to establish a global data ring with your organization.",[40,52471,52473],{"id":52472},"reference","Reference",[48,52475,52476],{},[55,52477,52480],{"href":52478,"rel":52479},"https:\u002F\u002Fcqrs.files.wordpress.com\u002F2010\u002F11\u002Fcqrs_documents.pdf",[264],"CQRS Documents | Greg Young",[48,52482,52483],{},[55,52484,52486],{"href":52302,"rel":52485},[264],"Introduction to CQRS | Amanda Bennett",[48,52488,52489],{},[55,52490,52493],{"href":52491,"rel":52492},"https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Farchitecture\u002Fpatterns\u002Fcqrs",[264],"CQRS pattern - Azure Architecture Center | Microsoft Learn",[48,52495,52496],{},[55,52497,52500],{"href":52498,"rel":52499},"https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Farchitecture\u002Fpatterns\u002Fevent-sourcing",[264],"Event Sourcing pattern - Azure Architecture Center | Microsoft Learn",[40,52502,38376],{"id":38375},[48,52504,38379,52505,40419],{},[55,52506,38384],{"href":38382,"rel":52507},[264],[321,52509,52510,52515,52519],{},[324,52511,38390,52512,190],{},[55,52513,31914],{"href":31912,"rel":52514},[264],[324,52516,45476,52517,45480],{},[55,52518,3550],{"href":45479},[324,52520,52521,52525],{},[55,52522,52524],{"href":52523},"\u002Fblog\u002Fcommunity\u002F2022-11-04-announcing-pulsar-summit-asia-2022-conference-schedule\u002F","R﻿register now for free for Pulsar Summit Asia 2022","! Held on November 19th and 20th, this two-day virtual event will feature 36 sessions by developers, engineers, architects, and technologists from ByteDance, Huawei, Tencent, Nippon Telegraph and Telephone Corporation (NTT) Software Innovation Center, Yum China, Netease, vivo, WeChat, Nutanix, StreamNative, and many more.",{"title":18,"searchDepth":19,"depth":19,"links":52527},[52528,52529,52530,52531,52532,52537,52538,52539],{"id":19155,"depth":19,"text":19156},{"id":52286,"depth":19,"text":52287},{"id":52315,"depth":19,"text":52316},{"id":52336,"depth":19,"text":52337},{"id":52383,"depth":19,"text":52384,"children":52533},[52534,52535,52536],{"id":52404,"depth":279,"text":52405},{"id":52411,"depth":279,"text":52412},{"id":52421,"depth":279,"text":52422},{"id":52454,"depth":19,"text":52455},{"id":52472,"depth":19,"text":52473},{"id":38375,"depth":19,"text":38376},"2022-11-15","Take a deeper look at how Tencent IEG Global builds an efficient real-time analytics system to monitor the activities in their business domain.","\u002Fimgs\u002Fblogs\u002F63d795876c84e46d0c210e5b_63b562342aa8da5e1fe7a187_pulsar-and-scylladb-top-1.webp",{},"\u002Fblog\u002Fhow-proxima-beta-implemented-cqrs-event-sourcing-ofapache-pulsar-scylladb",{"title":52257,"description":52541},"blog\u002Fhow-proxima-beta-implemented-cqrs-event-sourcing-ofapache-pulsar-scylladb",[35559,821,8058,303],"uPxt6H3zsDSN5_iVoHzyyDuCBxNkC_FJc85mb0AHLxM",{"id":52550,"title":52551,"authors":52552,"body":52553,"category":7338,"createdAt":290,"date":52727,"description":52728,"extension":8,"featured":294,"image":52729,"isDraft":294,"link":290,"meta":52730,"navigation":7,"order":296,"path":52731,"readingTime":4475,"relatedResources":290,"seo":52732,"stem":52733,"tags":52734,"__hash__":52735},"blogs\u002Fblog\u002Fannouncing-conference-schedule-pulsar-summit-asia-2022.md","Announcing Conference Schedule for Pulsar Summit Asia 2022",[41185],{"type":15,"value":52554,"toc":52713},[52555,52558,52561,52564,52567,52571,52574,52577,52581,52584,52587,52591,52594,52597,52600,52604,52607,52610,52614,52617,52620,52624,52627,52630,52633,52637,52640,52643,52651,52655,52658,52672,52678,52681,52684,52686,52692],[48,52556,52557],{},"This August, we concluded Pulsar Summit SF, our first-ever, in-person event in North America. It witnessed over 12 breakout sessions and 5 keynotes with over 200 attendees from Apple, Blizzard, IBM, Optum, Iterable, Twitter, Uber, and many more. As we could see from the conference, there is an increase in the adoption of Pulsar and growing interest in messaging and streaming. And now, we are excited to invite you to Pulsar Summit Aisa 2022 to explore the latest messaging and streaming technologies!",[48,52559,52560],{},"Held on November 19th and 20th, this two-day virtual event will feature 36 sessions by developers, engineers, architects, and technologists from ByteDance, Huawei, Tencent, Nippon Telegraph and Telephone Corporation (NTT) Software Innovation Center, Yum China, Netease, vivo, WeChat, Nutanix, StreamNative, and many more. It will include sessions on Pulsar use cases, its ecosystem, operations, and technology deep dives.",[40,52562,52563],{"id":36414},"Featured sessions",[48,52565,52566],{},"Let’s have a quick look at some of the featured sessions.",[32,52568,52570],{"id":52569},"handling-100k-consumers-with-one-topic-practices-and-technical-details-english","Handling 100K Consumers with One Topic: Practices and Technical Details (English)",[48,52572,52573],{},"Hongjie Zhai, Researcher, NTT Software Innovation Center",[48,52575,52576],{},"With the development of smart factories and connected vehicles, exchanging messages between a considerable number of devices is required for monitoring and controlling systems. In this connection, Apache Pulsar is one of the best solutions to keeping message pipelines simple, real-time, and safe. As the current message brokers are mainly designed for cloud services, users may face performance problems when they have too many consumers. This session will share the practices and technical details of handling 100K consumers with a single Pulsar Topic.",[32,52578,52580],{"id":52579},"awesome-pulsar-in-yum-china-english","Awesome Pulsar in Yum China (English)",[48,52582,52583],{},"Chauncey Yan, Backend Engineer, Yum China",[48,52585,52586],{},"Yum China Holdings, Inc. is the largest restaurant company in China with a vision of becoming the world’s most innovative pioneer in the restaurant industry. After its technical research on next-generation message queue solutions, it ultimately selected Pulsar as the implementation standard of message middleware for its technical middle platform. To date, Pulsar has been widely applied in different scenarios within Yum China such as business middle platform and system observability. This talk will be focused on why Yum China selected Pulsar for production use and its experience of performance tuning.",[32,52588,52590],{"id":52589},"streaming-wars-and-how-apache-pulsar-is-acing-the-battle-english","Streaming Wars and How Apache Pulsar is Acing the Battle (English)",[48,52592,52593],{},"Shivji Kumar Jha, Staff Engineer, Nutanix",[48,52595,52596],{},"Sachidananda Maharana, MTS IV, Nutanix",[48,52598,52599],{},"This session will cover the operational challenges Nutanix has faced over the past 4 years when running Pulsar and how Pulsar fits into different use cases given its multi-tenancy and configurability. It will also talk about how Nutanix has aced these challenges to stick to Pulsar and even moved applications from other messaging solutions to Pulsar. It will end with the challenges and learnings on migrating from Kafka and Kinesis to Pulsar.",[32,52601,52603],{"id":52602},"pulsar-envoy-building-an-oto-marketing-platform-for-different-business-scenarios-on-microservices-mandarin","Pulsar + Envoy: Building an OTO Marketing Platform for Different Business Scenarios on Microservices (Mandarin)",[48,52605,52606],{},"Jason Jiang, Senior Engineer, Tencent",[48,52608,52609],{},"In game marketing, the OTO (One-time Offer) model provides an effective way to improve user experience. A very common OTO scenario is to recommend certain material when the player has met the required conditions. To offer such a capability with low cost and high efficiency, you can use a customized Envoy plugin to support the Pulsar protocol. With Envoy’s flexible routing configurations and various functional plugins, you can provide solutions to different OTO business scenarios with microservices. In this session, Jason Jiang from Tencent will share their experience of using Pulsar and Envoy to create an OTO marketing platform built on microservices for different business scenarios.",[32,52611,52613],{"id":52612},"apache-pulsar-in-volcano-engine-e-mapreduce-mandarin","Apache Pulsar in Volcano Engine E-MapReduce (Mandarin)",[48,52615,52616],{},"Xin Liang, Senior Engineer, ByteDance",[48,52618,52619],{},"This session will introduce Volcano Engine E-MapReduce, a stateless, open-source big data platform, as well as the motivation for integrating Pulsar as a new cluster type into the Volcano Engine E-MapReduce ecosystem. It will cover some use cases of Pulsar in Volcano E-MapReduce, especially in real-time data warehouse and stream processing. It will also discuss typical real-time scenarios and common problems, and provide possible solutions powered by Pulsar with related services in Volcano Engine E-MapReduce.",[32,52621,52623],{"id":52622},"a-new-way-of-managing-pulsar-with-infrastructure-as-code-mandarin","A New Way of Managing Pulsar with Infrastructure as Code (Mandarin)",[48,52625,52626],{},"Max Xu, Software Engineer, StreamNative",[48,52628,52629],{},"Fushu Wang, Cloud Engineer, StreamNative",[48,52631,52632],{},"Infrastructure as Code (IaC) is the process of managing and provisioning infrastructure resources through code instead of manual configurations. IaC offers the benefits of understandable code, idempotence, and consistency over traditional manual configurations. StreamNative developed the Terraform Provider for Pulsar and the Pulsar Resources Operator, which utilize Terraform and Kubernetes CRDs respectively to provide declarative management of Pulsar resources, such as tenants, namespaces, and topics. In this talk, two engineers from StreamNative will discuss how to leverage these two IaC tools to help you better manage Pulsar.",[32,52634,52636],{"id":52635},"stability-optimization-of-apache-pulsar-at-huawei-mobile-services-mandarin","Stability Optimization of Apache Pulsar at Huawei Mobile Services (Mandarin)",[48,52638,52639],{},"Lin Lin, Apache Pulsar PMC Member, SDE Expert, Huawei",[48,52641,52642],{},"HUAWEI Mobile Services is dedicated to enriching users’ lives with next-level content and services that meet every conceivable need and span diverse fields, including smart home, health and fitness, mobile office, apps, smart travel, and entertainment. Currently, HUAWEI Health, AppGallery, HUAWEI Video, and HUAWEI Mobile Cloud are all running on top of HUAWEI Mobile Services. In this session, Lin from Huawei will share their practices of using Apache Pulsar in complex business scenarios at Huawei Mobile Services and propose some of their enhancements for better stability.",[48,52644,52645,52646,52650],{},"To learn more about how companies and organizations today leverage Apache Pulsar for streaming and messaging, especially for mission-critical deployments in production, see the ",[55,52647,52649],{"href":51625,"rel":52648},[264],"complete list of sessions"," in Pulsar Summit Asia 2022.",[40,52652,52654],{"id":52653},"how-to-participate","How to participate",[48,52656,52657],{},"Pulsar Summit Asia 2022 is a virtual conference gathering speakers and audiences from different regions. As such, we arranged the conference schedule with time zones, regions, and languages all taken into consideration to provide the best possible experience.",[48,52659,52660,52661,52666,52667,52671],{},"All sessions on Day 1 will be presented in Mandarin, and all talks on Day 2 will be in English. For the Chinese audience, you can register on ",[55,52662,52665],{"href":52663,"rel":52664},"https:\u002F\u002Fwww.huodongxing.com\u002Fevent\u002F8674136399923",[264],"Huodongxing.com"," with your WeChat account. For non-Chinese or English audiences, you can register on this ",[55,52668,51627],{"href":52669,"rel":52670},"https:\u002F\u002Fstreamnative.zoom.us\u002Fwebinar\u002Fregister\u002F9716668631084\u002FWN_qKibcbEFTxKv6-MszyFeAg",[264]," to watch the English sessions with Zoom.",[48,52673,52674,52675,52677],{},"Register now for free! Contact us at ",[55,52676,39814],{"href":39813}," if you have any questions.",[40,52679,52680],{"id":39834},"About the organizer",[48,52682,52683],{},"StreamNative is the organizer of Pulsar Summit Asia 2022. Founded by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative builds a cloud-native event streaming platform that enables enterprises to easily access data as real-time event streams. As the core developers of Pulsar, the StreamNative team is deeply versed in the technology, the community, and the use cases. Today, StreamNative is focusing on growing the Apache Pulsar and BookKeeper communities and bringing its deep experience across diverse Pulsar use cases to companies across the globe.",[40,52685,40413],{"id":36476},[48,52687,52688,52689,38385],{},"As we can see from the topics submitted to Pulsar Summit Asia 2022, Apache Pulsar has become ",[55,52690,38384],{"href":38382,"rel":52691},[264],[1666,52693,52694,52699,52703],{},[324,52695,38390,52696,190],{},[55,52697,31914],{"href":31912,"rel":52698},[264],[324,52700,45476,52701,45480],{},[55,52702,3550],{"href":45479},[324,52704,52705,52706,36501,52709,52712],{},"Join the Apache Pulsar community. ",[55,52707,36500],{"href":36498,"rel":52708},[264],[55,52710,36505],{"href":31692,"rel":52711},[264]," to ask quick questions or discuss specialized topics.",{"title":18,"searchDepth":19,"depth":19,"links":52714},[52715,52724,52725,52726],{"id":36414,"depth":19,"text":52563,"children":52716},[52717,52718,52719,52720,52721,52722,52723],{"id":52569,"depth":279,"text":52570},{"id":52579,"depth":279,"text":52580},{"id":52589,"depth":279,"text":52590},{"id":52602,"depth":279,"text":52603},{"id":52612,"depth":279,"text":52613},{"id":52622,"depth":279,"text":52623},{"id":52635,"depth":279,"text":52636},{"id":52653,"depth":19,"text":52654},{"id":39834,"depth":19,"text":52680},{"id":36476,"depth":19,"text":40413},"2022-11-04","Check out the featured sessions and the conference schedule for Pulsar Summit Asia 2022.","\u002Fimgs\u002Fblogs\u002F63c7c0e21669dcd0275f8aec_63b561cd9b076746119188ff_pulsar-summit-asia-2022-top-image.jpeg",{},"\u002Fblog\u002Fannouncing-conference-schedule-pulsar-summit-asia-2022",{"title":52551,"description":52728},"blog\u002Fannouncing-conference-schedule-pulsar-summit-asia-2022",[5376,821],"rph3tvnuzMrxIlvCihIIdNyoBYzbFZbAxGW_XTEreqc",{"id":52737,"title":42741,"authors":52738,"body":52740,"category":821,"createdAt":290,"date":53253,"description":53254,"extension":8,"featured":294,"image":53255,"isDraft":294,"link":290,"meta":53256,"navigation":7,"order":296,"path":42740,"readingTime":31039,"relatedResources":290,"seo":53257,"stem":53258,"tags":53259,"__hash__":53260},"blogs\u002Fblog\u002Fusing-cloud-native-buildpacks-improve-function-image-building-capability-function-mesh.md",[52739],"Tian Fang",{"type":15,"value":52741,"toc":53232},[52742,52750,52753,52757,52760,52763,52766,52769,52773,52786,52789,52797,52800,52814,52818,52821,52838,52841,52845,52848,52852,52855,52872,52875,52879,52882,52899,52903,52906,52910,52913,52916,52920,52922,52944,52948,52954,52957,52960,52964,52967,52970,52976,52979,52985,52989,52992,52995,52998,53004,53006,53012,53015,53018,53021,53027,53030,53036,53039,53042,53048,53051,53058,53061,53064,53075,53078,53081,53084,53090,53092,53098,53102,53105,53116,53119,53123,53129,53132,53138,53141,53147,53149,53155,53158,53166,53179,53182,53188,53191,53197,53200,53204,53207,53210,53212,53217],[48,52743,52744,52745,52749],{},"The concept of Buildpacks was first conceived by Heroku in 2011. PaaS platforms like Heroku needed to support applications in multiple languages, which were often built with very similar logic. In January 2018, Pivotal and Heroku co-launched the ",[55,52746,52748],{"href":42679,"rel":52747},[264],"Cloud Native Buildpacks (CNB) project",", which joined the CNCF in October of the same year.",[48,52751,52752],{},"In this blog, I will give an overview of the CNB project and its core components, and then use an example to demonstrate how to use it to build images for Function Mesh.",[40,52754,52756],{"id":52755},"what-does-cnb-mean-for-developers-and-operators","What does CNB mean for developers and operators?",[48,52758,52759],{},"We know that the container runtime ecosystem today has long been more than a Docker monopoly. The advent of the Open Container Initiative (OCI) has set the standard for the industry, meaning that given an OCI image, any container runtime that implements the OCI standard can use that image properly.",[48,52761,52762],{},"Buildpacks is one such image builder that is able to produce OCI-compliant images. It satisfies the needs of both developers and operators and solves the conflict between the two groups.",[48,52764,52765],{},"The CNB project shields developers from the details of the application building and deployment process. They don’t need to understand and write the code for the runtime environment, or worry about details, such as which operating system to use for the image, the differences in scripts under such operating system, and image size optimization. When using CNB, developers only need to select the appropriate builder image and then provide their source code directory to build the application image.",[48,52767,52768],{},"For the Ops team, they can assemble the application image builder with several Buildpacks (the minimal build unit in CNB) in a lego-like manner to meet various needs. Based on the mechanism between the base runtime environment and the application artifacts (i.e. ABI) in the CNB image, operators can replace the base runtime environment in the application image with a single command when there is a CVE in the image's base runtime environment. They don’t need to rebuild a new image and make any adaptive changes for the new base runtime environment.",[40,52770,52772],{"id":52771},"why-does-function-mesh-need-cnb","Why does Function Mesh need CNB?",[48,52774,52775,52778,52779,4003,52782,52785],{},[55,52776,29463],{"href":29461,"rel":52777},[264]," is a serverless framework purpose-built for stream processing applications. It brings powerful event-streaming capabilities to your applications by orchestrating multiple ",[55,52780,15627],{"href":50630,"rel":52781},[264],[55,52783,50636],{"href":50634,"rel":52784},[264]," for complex stream processing jobs.",[48,52787,52788],{},"A serverless framework like Function Mesh inevitably needs to provide a way for users to submit their functions when it is working. There are currently two common ways to do this.",[1666,52790,52791,52794],{},[324,52792,52793],{},"Upload the function to the package management service of the Pulsar cluster",[324,52795,52796],{},"Customize the function Docker image",[48,52798,52799],{},"Both approaches involve plenty of repetitive manual operations, including compiling, packaging, and uploading the function code to package management systems, and writing Dockerfiles.",[48,52801,52802,52803,4003,52808,52813],{},"CNB is well suited for scenarios where the build process is constant and has proven to be working on serverless frameworks such as ",[55,52804,52807],{"href":52805,"rel":52806},"https:\u002F\u002Fcloud.google.com\u002Ffunctions",[264],"Google Cloud Functions",[55,52809,52812],{"href":52810,"rel":52811},"https:\u002F\u002Fopenfunction.dev\u002F",[264],"OpenFunction",". Thus, we have reason to believe that CNB will help improve the image building experience of Function Mesh.",[40,52815,52817],{"id":52816},"cnb-components","CNB components",[48,52819,52820],{},"Cloud Native Buildpacks consist of the following main components.",[321,52822,52823,52826,52829,52832,52835],{},[324,52824,52825],{},"Buildpack: The minimal build unit.",[324,52827,52828],{},"Stack: Provides the base runtime environment for the build phase and the application runtime phase.",[324,52830,52831],{},"Lifecycle: A lifecycle management interface abstracted from CNB to guide the entire build process.",[324,52833,52834],{},"Builder: A builder that integrates a Stack and several Buildpacks with a specific build purpose.",[324,52836,52837],{},"Platform: The executor of the interfaces in the lifecycle to meet the user's build requirements.",[48,52839,52840],{},"First, let’s look at these components in detail and how they can work together. Later, I will use an example to demonstrate how to create them.",[32,52842,52844],{"id":52843},"stack","Stack",[48,52846,52847],{},"A Stack entity is composed of two OCI images, namely the build image and the run image. For example, we can use Ubuntu as the base runtime environment for the build, and then run different phases for the application and install the required software.",[32,52849,52851],{"id":52850},"buildpack","Buildpack",[48,52853,52854],{},"To build a Java application, typically, the build logic is comprised of the following steps.",[1666,52856,52857,52860,52863,52866,52869],{},[324,52858,52859],{},"Check if there is a Java code file in the target directory (i.e., files with the .java suffix).",[324,52861,52862],{},"Check if there is a pom.xml file in the target directory.",[324,52864,52865],{},"Make sure the necessary compilation tools such as maven are in the PATH.",[324,52867,52868],{},"Run mvn clean install -B -DskipTests to compile and package the application.",[324,52870,52871],{},"Set the entry point for the image to start the application.",[48,52873,52874],{},"A good principle for making a simple Buildpack is to determine the contents of each Buildpack based on the build steps, so now we need to make 5 Buildpacks.",[32,52876,52878],{"id":52877},"lifecycle","Lifecycle",[48,52880,52881],{},"Lifecycle is the most important component of CNB. It is essentially an abstraction and orchestration of the build steps from the source code to the image, and its main phases are listed as follows.",[321,52883,52884,52887,52890,52893,52896],{},[324,52885,52886],{},"Detect: Checks which Buildpack is to be executed",[324,52888,52889],{},"Build: Executes the build logic in the Buildpack",[324,52891,52892],{},"Analyze: Handles the cached content of the build process",[324,52894,52895],{},"Export: Exports the OCI image",[324,52897,52898],{},"Rebase: Replaces the base runtime environment of the application image",[32,52900,52902],{"id":52901},"builder","Builder",[48,52904,52905],{},"A Builder entity is an OCI image. By aggregating a Stack, several Buildpacks, and a Lifecycle (which does not need to be prepared by the user), and specifying the execution order of these Buildpacks, a builder with a specific build purpose is produced.",[32,52907,52909],{"id":52908},"platform","Platform",[48,52911,52912],{},"After you have the Builder ready, you can use the Platform to apply the Builder to the given source code, complete the execution in the Lifecycle, execute Buildpacks in a given order, and finally build the source code into an image and export it.",[48,52914,52915],{},"Common Platforms include Tekton and CNB's pack-cli.",[40,52917,52919],{"id":52918},"building-a-java-function-image-with-function-mesh-buildpacks","Building a Java function image with Function Mesh Buildpacks",[32,52921,10104],{"id":10103},[321,52923,52924,52930,52936],{},[324,52925,52926],{},[55,52927,52929],{"href":23526,"rel":52928},[264],"Apache Pulsar 2.7.0 or higher",[324,52931,52932],{},[55,52933,52935],{"href":29461,"rel":52934},[264],"Function Mesh 0.1.3 or higher",[324,52937,52938,52943],{},[55,52939,52942],{"href":52940,"rel":52941},"https:\u002F\u002Fbuildpacks.io\u002Fdocs\u002Ftools\u002Fpack\u002F#install",[264],"Pack",", CLI tools for manipulating Cloud Native Buildpacks",[32,52945,52947],{"id":52946},"directory-structure","Directory structure",[8325,52949,52952],{"className":52950,"code":52951,"language":8330},[8328],".\n|-- builders\n|   `-- java-builder\n|       `-- builder.toml\n|-- buildpacks\n|   `-- java-maven\n|       |-- bin\n|       |   |-- build\n|       |   `-- detect\n|       `-- buildpack.toml\n`-- stack\n   |-- stack.build.Dockerfile\n   `-- stack.java-runner.run.Dockerfile\n",[4926,52953,52951],{"__ignoreMap":18},[32,52955,52844],{"id":52956},"stack-1",[48,52958,52959],{},"As I mentioned above, the Stack provides basic building and running environments for an application (in this case, a Java function). It is composed of a build image to construct the build environment and a run image to build application images.",[3933,52961,52963],{"id":52962},"create-the-build-image","Create the build image",[48,52965,52966],{},"The build image provides the OS environment for the application during the building phase. Note that the Stack ID is io.functionmesh.stack in this example.",[48,52968,52969],{},"stack.build.Dockerfile",[8325,52971,52974],{"className":52972,"code":52973,"language":8330},[8328],"FROM ubuntu:20.04\n\nARG pulsar_uid=10000\nARG pulsar_gid=10001\nARG stack_id=\"io.functionmesh.stack\"\n\nRUN apt-get update && \\\\\napt-get install -y xz-utils ca-certificates git wget jq gcc && \\\\\nrm -rf \u002Fvar\u002Flib\u002Fapt\u002Flists\u002F* && \\\\\nwget -O \u002Fusr\u002Flocal\u002Fbin\u002Fyj  && \\\\\nchmod +x \u002Fusr\u002Flocal\u002Fbin\u002Fyj\n\nLABEL io.buildpacks.stack.id=${stack_id}\n\nRUN groupadd pulsar --gid ${pulsar_gid} && \\\\\nuseradd --uid ${pulsar_uid} --gid ${pulsar_gid} -m -s \u002Fbin\u002Fbash pulsar\n\nENV CNB_USER_ID=${pulsar_uid}\nENV CNB_GROUP_ID=${pulsar_gid}\nENV CNB_STACK_ID=${stack_id}\n\nUSER ${CNB_USER_ID}:${CNB_GROUP_ID}\n\n",[4926,52975,52973],{"__ignoreMap":18},[48,52977,52978],{},"Use the following command to create it.",[8325,52980,52983],{"className":52981,"code":52982,"language":8330},[8328],"docker build -t fm-stack-build:v1 -f .\u002Fstack.build.Dockerfile .\n",[4926,52984,52982],{"__ignoreMap":18},[3933,52986,52988],{"id":52987},"create-the-run-image","Create the run image",[48,52990,52991],{},"The run image provides the OS environment and Pulsar Function runtime for the application during the running phase.",[48,52993,52994],{},"stack.run.Dockerfile",[48,52996,52997],{},"Note that this example uses streamnative\u002Fpulsar-functions-java-runner:2.9.2.23 as the base image. You can also change the version of the base image as needed.",[8325,52999,53002],{"className":53000,"code":53001,"language":8330},[8328],"FROM streamnative\u002Fpulsar-functions-java-runner:2.9.2.23\n\nARG pulsar_uid=10000\nARG pulsar_gid=10001\nARG stack_id=\"io.functionmesh.stack\"\nLABEL io.buildpacks.stack.id=${stack_id}\n\nENV CNB_USER_ID=${pulsar_uid}\nENV CNB_GROUP_ID=${pulsar_gid}\nENV CNB_STACK_ID=${stack_id}\n",[4926,53003,53001],{"__ignoreMap":18},[48,53005,52978],{},[8325,53007,53010],{"className":53008,"code":53009,"language":8330},[8328],"docker build -t fm-stack-java-runner-run:v1 -f .\u002Fstack.java-runner.run.Dockerfile .\n",[4926,53011,53009],{"__ignoreMap":18},[32,53013,42681],{"id":53014},"buildpacks",[48,53016,53017],{},"In this example, we need a Buildpack to check whether the Java files (with the suffix “.java”) and the required items (e.g. “pom.xml”) exist. If they do exist, we can build the target artifact (usually a “.jar” file) with Maven and move it to \u002Fpulsar.",[48,53019,53020],{},"Use the following command to create the Buildpack. Note that the Buildpack ID is functionmesh\u002Fjava-maven in this example.",[8325,53022,53025],{"className":53023,"code":53024,"language":8330},[8328],"pack buildpack new functionmesh\u002Fjava-maven \\\\\n  --api 0.7 \\\\\n  --path java-maven \\\\\n  --version 0.0.1 \\\\\n  --stacks io.functionmesh.stack\n",[4926,53026,53024],{"__ignoreMap":18},[48,53028,53029],{},"We can find that a directory named java-maven has been created.",[8325,53031,53034],{"className":53032,"code":53033,"language":8330},[8328],"`-- java-maven\n  |-- bin\n  |   |-- build\n  |   `-- detect\n  `-- buildpack.toml\n",[4926,53035,53033],{"__ignoreMap":18},[48,53037,53038],{},"buildpack.toml",[48,53040,53041],{},"buildpack.toml is the configuration file for the Buildpack, which contains the buildpack id, the stack id, and other information.",[8325,53043,53046],{"className":53044,"code":53045,"language":8330},[8328],"api = \"0.7\"\n\n[buildpack]\nid = \"functionmesh\u002Fjava-maven\"\nversion = \"0.0.1\"\n\n[[stacks]]\nid = \"io.functionmesh.stack\"\n",[4926,53047,53045],{"__ignoreMap":18},[48,53049,53050],{},"bin\u002Fdetect & bin\u002Fbuild",[48,53052,53053,53054,190],{},"Create two scripts of bin\u002Fdetect and bin\u002Fbuild. You can find them on this ",[55,53055,51627],{"href":53056,"rel":53057},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fnext\u002Ffunctions\u002Fpackage-function\u002Fpackage-function-java\u002F#create-a-buildpack",[264],[48,53059,53060],{},"The contents of bin\u002Fdetect check if the Buildpack can be applied to the source code. In this example, bin\u002Fdetect will check if the source directory includes .java files, and if so, the script will return true and let the Buildpack be applied to this source.",[48,53062,53063],{},"The contents of bin\u002Fbuild compiles the source code. The script is used to:",[321,53065,53066,53069,53072],{},[324,53067,53068],{},"Download mvn and jdk tools",[324,53070,53071],{},"Build the package",[324,53073,53074],{},"Clear the source code",[32,53076,52902],{"id":53077},"builder-1",[48,53079,53080],{},"A Builder is an image that contains all the necessary components to execute a build.",[48,53082,53083],{},"builder.toml",[8325,53085,53088],{"className":53086,"code":53087,"language":8330},[8328],"# Buildpacks to include in builder\n[[buildpacks]]\nuri = \"..\u002F..\u002Fbuildpacks\u002Fjava-maven\"\n\n# Order used for detection\n[[order]]\n  # This buildpack will display build-time information (as a dependency)\n  [[order.group]]\n  id = \"functionmesh\u002Fjava-maven\"\n  version = \"0.0.1\"\n\n# Stack that will be used by the builder\n[stack]\nid = \"io.functionmesh.stack\"\n# This image is used at runtime\nrun-image = \"fm-stack-java-runner-run:v1\"\n# This image is used at build-time\nbuild-image = \"fm-stack-build:v1\"\n",[4926,53089,53087],{"__ignoreMap":18},[48,53091,52978],{},[8325,53093,53096],{"className":53094,"code":53095,"language":8330},[8328],"pack builder create fm-java-maven-builder:v1 \\\\\n --config .\u002Fbuilder.toml \\\\\n --pull-policy if-not-present\n",[4926,53097,53095],{"__ignoreMap":18},[32,53099,53101],{"id":53100},"build-a-java-function-image-and-create-a-function","Build a Java function image and create a Function",[48,53103,53104],{},"So far, we have created the following images:",[321,53106,53107,53110,53113],{},[324,53108,53109],{},"A Stack build image: fm-stack-build:v1",[324,53111,53112],{},"A Stack run image: fm-stack-java-runner-run:v1",[324,53114,53115],{},"A Builder image: fm-java-maven-builder:v1",[48,53117,53118],{},"Now let's write a Java function file.",[3933,53120,53122],{"id":53121},"package-directory-structure","Package directory structure",[8325,53124,53127],{"className":53125,"code":53126,"language":8330},[8328],".\n|-- pom.xml\n`-- src\u002F\n  `-- main\u002F\n      `-- java\u002F\n          `-- io.streamnative.example\u002F\n              `-- ExclamationFunction.java\n",[4926,53128,53126],{"__ignoreMap":18},[48,53130,53131],{},"The ExclamationFunction.java file:",[8325,53133,53136],{"className":53134,"code":53135,"language":8330},[8328],"package io.streamnative.example;\n\nimport org.apache.pulsar.functions.api.Context;\nimport org.apache.pulsar.functions.api.Function;\nimport org.slf4j.Logger;\n\npublic class ExclamationFunction implements Function {\n  @Override\n  public String process(String input, Context context) {\n      Logger LOG = context.getLogger();\n      LOG.debug(\"My exclamation function\");\n      return String.format(\"%s!\", input);\n  }\n}\n\n",[4926,53137,53135],{"__ignoreMap":18},[48,53139,53140],{},"Build the function image in the current directory by running the following command.",[8325,53142,53145],{"className":53143,"code":53144,"language":8330},[8328],"pack build java-exclamation-function:v1 \\\\\n  --builder fm-java-maven-builder:v1 \\\\\n  --workspace \u002Fpulsar \\\\\n  --pull-policy if-not-present\n",[4926,53146,53144],{"__ignoreMap":18},[48,53148,44350],{},[8325,53150,53153],{"className":53151,"code":53152,"language":8330},[8328],"$ pack build java-exclamation-function:v1 \\\\\n  --builder fm-java-maven-builder:v1 \\\\\n  --workspace \u002Fpulsar \\\\\n  --pull-policy if-not-present\n===> ANALYZING\n[analyzer] Previous image with name \"java-exclamation-function:v1\" not found\n===> DETECTING\n[detector] functionmesh\u002Fjava-maven 0.0.1\n===> RESTORING\n===> BUILDING\n[builder] ---> Installing Maven\n[builder] ---> Running Maven\n[builder] [INFO] BUILD SUCCESS\n……\nSuccessfully built image java-exclamation-function:v1\n",[4926,53154,53152],{"__ignoreMap":18},[48,53156,53157],{},"After uploading the image java-exclamation-function:v1 to the image repository, you can use the image to create a Function object.",[48,53159,39639,53160,53165],{},[55,53161,53164],{"href":53162,"rel":53163},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=7Yih40Gcr-w",[264],"demo"," video.",[48,53167,53168,53169,4003,53174,190],{},"For examples of other runtimes, refer to ",[55,53170,53173],{"href":53171,"rel":53172},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fnext\u002Ffunctions\u002Fpackage-function\u002Fpackage-function-python",[264],"Package Python Functions",[55,53175,53178],{"href":53176,"rel":53177},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fnext\u002Ffunctions\u002Fpackage-function\u002Fpackage-function-go",[264],"Package Go Functions",[48,53180,53181],{},"Another amazing thing about CNB is that when the runtime-runner image needs a patch update (for example, fixing a critical CVE that requires the version number of the runtime-runner image to be changed, like streamnative\u002Fpulsar-functions-java-runner:2.9.2.23-patch), you just need to prepare a new runtime image fm-stack-java-runner-run:v1-patch as follows.",[8325,53183,53186],{"className":53184,"code":53185,"language":8330},[8328],"FROM streamnative\u002Fpulsar-functions-java-runner:2.9.2.23-patch\n\nARG pulsar_uid=10000\nARG pulsar_gid=10001\nARG stack_id=\"io.functionmesh.stack\"\nLABEL io.buildpacks.stack.id=${stack_id}\n\nENV CNB_USER_ID=${pulsar_uid}\nENV CNB_GROUP_ID=${pulsar_gid}\nENV CNB_STACK_ID=${stack_id}\n",[4926,53187,53185],{"__ignoreMap":18},[48,53189,53190],{},"Then, use the CNB rebase interface to replace the run image in the function image java-exclamation-function:v1 with the following.",[8325,53192,53195],{"className":53193,"code":53194,"language":8330},[8328],"pack rebase java-exclamation-function:v1 --run-image fm-stack-java-runner-run:v1-patch --pull-policy if-not-present\n",[4926,53196,53194],{"__ignoreMap":18},[48,53198,53199],{},"This way, you don't even need to change the function configuration. You just need to restart its workload to apply the function to the place where the function-runner has been replaced.",[40,53201,53203],{"id":53202},"future-work","Future work",[48,53205,53206],{},"I think we can already feel the changes that the CNB project has made to the serverless technology or to the Function Mesh project in terms of user experience. But there is still a lot of work to be done on how to seamlessly integrate CNB into a specific framework.",[48,53208,53209],{},"In the future development of Function Mesh, we plan to integrate CNB in a way that does not add complexity to the project itself, such as providing dedicated CLI tools combined with configurable builders.",[40,53211,38376],{"id":38375},[48,53213,38379,53214,40419],{},[55,53215,38384],{"href":38382,"rel":53216},[264],[321,53218,53219,53224,53228],{},[324,53220,38390,53221,190],{},[55,53222,31914],{"href":31912,"rel":53223},[264],[324,53225,45476,53226,45480],{},[55,53227,3550],{"href":45479},[324,53229,53230,52525],{},[55,53231,52524],{"href":52523},{"title":18,"searchDepth":19,"depth":19,"links":53233},[53234,53235,53236,53243,53251,53252],{"id":52755,"depth":19,"text":52756},{"id":52771,"depth":19,"text":52772},{"id":52816,"depth":19,"text":52817,"children":53237},[53238,53239,53240,53241,53242],{"id":52843,"depth":279,"text":52844},{"id":52850,"depth":279,"text":52851},{"id":52877,"depth":279,"text":52878},{"id":52901,"depth":279,"text":52902},{"id":52908,"depth":279,"text":52909},{"id":52918,"depth":19,"text":52919,"children":53244},[53245,53246,53247,53248,53249,53250],{"id":10103,"depth":279,"text":10104},{"id":52946,"depth":279,"text":52947},{"id":52956,"depth":279,"text":52844},{"id":53014,"depth":279,"text":42681},{"id":53077,"depth":279,"text":52902},{"id":53100,"depth":279,"text":53101},{"id":53202,"depth":19,"text":53203},{"id":38375,"depth":19,"text":38376},"2022-11-01","Take an in-depth look at the Cloud Native Buildpacks and how to use it to build images for Function Mesh.","\u002Fimgs\u002Fblogs\u002F63c7c0f2d4c8587ec846ae82_63b55d17cd4df794f9dbcc54_buildpacks-function-mesh-top.jpeg",{},{"title":42741,"description":53254},"blog\u002Fusing-cloud-native-buildpacks-improve-function-image-building-capability-function-mesh",[9636,16985],"sEXPcvxEr3nemcUCcgqv3KyBhrGBLRwsVfwHDLkzC6Y",{"id":53262,"title":40440,"authors":53263,"body":53264,"category":821,"createdAt":290,"date":53423,"description":53424,"extension":8,"featured":294,"image":53425,"isDraft":294,"link":290,"meta":53426,"navigation":7,"order":296,"path":40439,"readingTime":7986,"relatedResources":290,"seo":53427,"stem":53428,"tags":53429,"__hash__":53430},"blogs\u002Fblog\u002Fdeep-dive-into-transaction-buffer-apache-pulsar.md",[39879],{"type":15,"value":53265,"toc":53411},[53266,53279,53283,53286,53290,53293,53297,53304,53311,53315,53318,53321,53328,53332,53339,53343,53351,53354,53358,53361,53364,53366,53369,53371,53376,53409],[48,53267,53268,53269,53273,53274,53278],{},"In previous blog posts, we introduced the basic concept of Pulsar ",[55,53270,53272],{"href":53271},"\u002Fblog\u002Ftech\u002F2021-06-16-a-deep-dive-of-transactions-in-apache-pulsar\u002F","transactions"," as well as the design logic of ",[55,53275,53277],{"href":53276},"\u002Fblog\u002Fengineering\u002F2022-09-29-deep-dive-into-transaction-coordinators-in-apache-pulsar\u002F","the transaction coordinator",". In this blog post, we will take a closer a look at another core component of Pulsar transactions, namely the transaction buffer.",[40,53280,53282],{"id":53281},"what-is-the-transaction-buffer","What is the transaction buffer?",[48,53284,53285],{},"After you send messages to a topic partition with a transaction, the messages are stored in the transaction buffer (TB) of that partition. The transaction buffer provides committed guarantees for reads. All messages sent using the transaction are not visible to the consumer until the transaction is committed. If the transaction is aborted, the consumer will not be able to receive the messages.",[40,53287,53289],{"id":53288},"how-does-the-transaction-buffer-work","How does the transaction buffer work?",[48,53291,53292],{},"In Pulsar, all messages are immutable. You cannot individually delete or update a message that has been sent. You can only write particular messages to represent the status of the sent messages, just like marks for commit and abort operations. Pulsar adopts a strategy of restricted reading to implement transactions.",[32,53294,53296],{"id":53295},"transaction-marks","Transaction marks",[48,53298,53299,53300,53303],{},"When committing or aborting a transaction, a commit mark or abort mark will be appended to the topic ledger. The mark is not a real message and is not available to the client. It is only used to identify whether the transaction has been committed or aborted. These marks are stored in BookKeeper as shown below.\n",[384,53301],{"alt":18,"src":53302},"\u002Fimgs\u002Fblogs\u002F63b545ef0772b7013cfa1a53_image1-221024.png","Figure 1\nTransaction marks are mainly used for transaction buffer recovery. The transaction buffer reads entries from the topic’s managed ledger. When it detects a committed mark or an aborted mark in the ledger, it removes the transaction from ongoingTxns and updates maxReadPosition accordingly (I will explain these two attributes later).",[48,53305,53306,53307,53310],{},"If the entry detected is an aborted mark, the transaction buffer will retain this transaction in aborts (a map) to filter out aborted messages when dispatching massages.\n",[384,53308],{"alt":18,"src":53309},"\u002Fimgs\u002Fblogs\u002F63b545ef5d8ce74c0c5e23c1_image2-221024.png","Figure 2",[32,53312,53314],{"id":53313},"maxreadposition-and-aborted-transactions","maxReadPosition and aborted transactions",[48,53316,53317],{},"In order to improve message reading efficiency (especially for catch-up reads), we introduced maxReadPosition. It is mainly used for cumulative acknowledgments. Only messages before this position can be sent to consumers. With this abstraction, the broker does not need to cache the position of messages whose status is unknown.",[48,53319,53320],{},"Pulsar uses a map (aborts) to store all aborted transactions. When you send messages to consumers, you can use the map to check if the transaction messages have been aborted. After the ledger recorded by the transaction is deleted from the managed ledger, the transaction can be removed from aborts.",[48,53322,53323,53324,53327],{},"In this implementation, all the messages are appended to the topic. They are dispatched in published order instead of committed order. Since the consumer can only read messages before maxReadPosition, it increases end-to-end latency.\n",[384,53325],{"alt":18,"src":53326},"\u002Fimgs\u002Fblogs\u002F63b545efbe2e2aaa10665566_image3-221024.png","Figure 3",[32,53329,53331],{"id":53330},"ongoingtxns","OngoingTxns",[48,53333,53334,53335,53338],{},"Pulsar also maintains a map to record all ongoing transactions. It is used to help update maxReadPosition. If the transaction buffer has ongoing transactions, maxReadPosition should be the first ongoing transaction position - 1. If there is no ongoing transaction, maxReadPosition will be consistent with the position of the normal message published.\n",[384,53336],{"alt":18,"src":53337},"\u002Fimgs\u002Fblogs\u002F63b545efb14a84ecd9634fc1_image4-221024.png","Figure 4",[32,53340,53342],{"id":53341},"apply-lowwatermark-from-the-transaction-coordinator","Apply lowWaterMark from the transaction coordinator",[48,53344,53345,53346,53350],{},"The transaction coordinator stores the metadata information of ",[55,53347,53349],{"href":53348},"\u002Fblog\u002Fengineering\u002F2022-09-29-deep-dive-into-transaction-coordinators-in-apache-pulsar\u002F#low-watermark","lowWaterMark",". It indicates that the transactions before the lowWaterMark have either been committed or aborted.",[48,53352,53353],{},"The transaction buffer will obtain the lowWaterMark information when committing or aborting a transaction and store it in the transaction buffer. The lowWaterMark makes sure messages are not sent to the transaction buffer with ended transactions. Nevertheless, this is just a best-effort guarantee. Therefore, after each commit or abort operation, the lowWaterMark will be used again to check whether there is an ended transaction in ongoingTxns. If there is, the transaction will be aborted.",[32,53355,53357],{"id":53356},"transaction-buffer-snapshot","Transaction buffer snapshot",[48,53359,53360],{},"In order to recover maxReadPosition and aborted transactions, the transaction buffer takes a snapshot of them to persist at intervals. The snapshot is stored in a system topic __transaction_buffer_snapshot. Each namespace has a system topic to store all the transaction buffer snapshots in the namespace.",[48,53362,53363],{},"When the transaction buffer recovers maxReadPosition and aborted transactions, it reads their corresponding snapshot from the system topic. It then reads entries in the topic’s managed ledger to recover messages beginning from the recovered maxReadPosition. This way, it does not need to perform the recovery from the beginning.",[40,53365,319],{"id":316},[48,53367,53368],{},"The transaction buffer represents a key component for Pulsar transactions. When new messages arrive on a topic with a transaction, the broker will move them to the transaction buffer. The messages in the buffered state will not be available to consumers until the transaction is committed.",[40,53370,38376],{"id":38375},[48,53372,38379,53373,40419],{},[55,53374,38384],{"href":38382,"rel":53375},[264],[321,53377,53378,53383,53387,53394,53400],{},[324,53379,38390,53380,190],{},[55,53381,31914],{"href":31912,"rel":53382},[264],[324,53384,45476,53385,45480],{},[55,53386,3550],{"href":45479},[324,53388,53389,758,53391],{},[2628,53390,40436],{},[55,53392,53393],{"href":53271},"A Deep Dive of Transactions in Apache Pulsar",[324,53395,53396,758,53398],{},[2628,53397,40436],{},[55,53399,40448],{"href":53276},[324,53401,53402,758,53404],{},[2628,53403,42753],{},[55,53405,53408],{"href":53406,"rel":53407},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Ftxn-how",[264],"How do transactions work?",[48,53410,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":53412},[53413,53414,53421,53422],{"id":53281,"depth":19,"text":53282},{"id":53288,"depth":19,"text":53289,"children":53415},[53416,53417,53418,53419,53420],{"id":53295,"depth":279,"text":53296},{"id":53313,"depth":279,"text":53314},{"id":53330,"depth":279,"text":53331},{"id":53341,"depth":279,"text":53342},{"id":53356,"depth":279,"text":53357},{"id":316,"depth":19,"text":319},{"id":38375,"depth":19,"text":38376},"2022-10-24","Take a look at what transaction buffer is and how it works for Apache Pulsar transactions.","\u002Fimgs\u002Fblogs\u002F63c7c13319f6fe34ad74a9c4_63b545ef8867ad2ad4552f62_a-deep-dive-into-transaction-buffer-in-apache-pulsar-top.jpeg",{},{"title":40440,"description":53424},"blog\u002Fdeep-dive-into-transaction-buffer-apache-pulsar",[821,9144],"cZ_QRrfyyCB33RyGBLMygUAHVmxhZwTMj4enLR-klJ0",{"id":53432,"title":43592,"authors":53433,"body":53435,"category":821,"createdAt":290,"date":53616,"description":53617,"extension":8,"featured":294,"image":53618,"isDraft":294,"link":290,"meta":53619,"navigation":7,"order":296,"path":43591,"readingTime":42793,"relatedResources":290,"seo":53620,"stem":53621,"tags":53622,"__hash__":53623},"blogs\u002Fblog\u002F600k-topics-per-cluster-stability-optimization-apache-pulsar-tencent-cloud.md",[53434],"Xiaolong Ran",{"type":15,"value":53436,"toc":53610},[53437,53439,53447,53450,53454,53457,53460,53463,53469,53472,53475,53478,53481,53484,53489,53500,53503,53514,53517,53523,53526,53537,53541,53544,53547,53550,53558,53561,53569,53576,53580,53583,53586,53592,53595,53601,53604],[40,53438,19156],{"id":19155},[48,53440,53441,53446],{},[55,53442,53445],{"href":53443,"rel":53444},"https:\u002F\u002Fwww.tencentcloud.com\u002F",[264],"Tencent Cloud"," is a secure, reliable, and high-performance cloud computing service provided by Tencent, one of the largest Internet companies in China and beyond. With a worldwide network of data centers, Tencent Cloud is committed to offering industry-leading solutions that integrate its cloud computing, big data, artificial intelligence, Internet of Things, security, and other advanced technologies to support the digital transformation of enterprises around the world. Tencent Cloud has 70 availability zones in 26 geographical regions, serving millions of customers from more than 100 countries and regions.",[48,53448,53449],{},"Currently, a Pulsar cluster at Tencent Cloud can serve around 600,000 topics in production with cost controlled at a relatively low level for different use cases. In this blog post, I will share some of our practices of optimizing Apache Pulsar for better stability and performance over the past year.",[40,53451,53453],{"id":53452},"how-to-avoid-acknowledgment-holes","How to avoid acknowledgment holes",[48,53455,53456],{},"Different from other messaging systems, Pulsar supports both individual acknowledgments and cumulative acknowledgments (the latter is similar to Kafka offsets). Although individual message acknowledgments provide solutions to some online business scenarios, they also lead to acknowledgment holes.",[48,53458,53459],{},"Acknowledgment holes refer to the gaps between ranges, which result in fragmented acknowledgments. They are very common when you use shared subscriptions or choose to acknowledge messages individually. Pulsar uses an abstraction called individuallyDeletedMessages to track fragmented acknowledgments in the form of ranges (intervals). Essentially, this attribute is a collection of open and closed intervals. A square bracket means the message has been processed while a parenthesis indicates an acknowledgment hole.",[48,53461,53462],{},"In Figure 1, for example, in the first interval (5:1226..5:1280], 5 is the Ledger ID and 1226 and 1280 are the Entry IDs. As the interval is left-open and right-closed, it means 5:1280 is acknowledged and that 5:1226 is not.",[48,53464,53465],{},[384,53466],{"alt":53467,"src":53468},"example acknowledgment holes","\u002Fimgs\u002Fblogs\u002F63b543cad3c4c2cddd72a089_image1-221020.jpeg",[48,53470,53471],{},"There are many factors that can cause fragmented acknowledgments, such as the broker’s failure to process messages. In the early versions of Pulsar, there were no returns for acknowledgments, so we couldn’t ensure the acknowledgment request was correctly handled. In Apache Pulsar 2.8.0 and later versions, AckResponse was introduced for transaction messages to support returns. Another major cause is the client’s failure to call acknowledgments for some reason, which is very common in production.",[48,53473,53474],{},"To avoid acknowledgment holes, I listed the following two solutions that we tried at Tencent Cloud for your reference.",[48,53476,53477],{},"First, carefully configure the backlog size. In Pulsar, a message can be either a batch message or a single message. For a batch message, you don’t know the exact number of entries contained in it. Note that Pulsar parses batch messages on the consumer side instead of the broker side. In practice, however, it is rather difficult to precisely calculate the backlog size.",[48,53479,53480],{},"Second, create a broker compensatory mechanism for unacknowledged messages. As individuallyDeletedMessages contains information on unacknowledged messages, we can let the broker redeliver them to the client to fill the gaps.",[48,53482,53483],{},"Before I explain the details of the second solution, let’s take a look at the different stages in which messages can be in a topic. In Figure 2, a producer publishes messages on a topic, which are then received by a consumer. Messages in different states are marked in three colors.",[48,53485,53486],{},[384,53487],{"alt":53310,"src":53488},"\u002Fimgs\u002Fblogs\u002F63b543cab14a8443e161a10e_image2-221020.png",[321,53490,53491,53494,53497],{},[324,53492,53493],{},"Red: The latest messages sent to the topic.",[324,53495,53496],{},"Gray: The messages sent to the topic but not consumed by the consumer.",[324,53498,53499],{},"Blue: The messages already consumed and acknowledged.",[48,53501,53502],{},"Pulsar allows you to configure backlog policies to manage unacknowledged messages when the backlog size is exceeded.",[321,53504,53505,53508,53511],{},[324,53506,53507],{},"producer_exception: The broker disconnects from the client by throwing an exception. This tells the producer to stop sending new messages. It is the major policy that we are using in production at Tencent Cloud.",[324,53509,53510],{},"producer_request_hold: The broker holds and does not persist the producer's request payload. The producer will stop sending new messages.",[324,53512,53513],{},"consumer_backlog_eviction: The broker discards the oldest unacknowledged messages in the backlog to make sure the consumer can receive new messages. As messages are lost in this way, we haven’t used the policy in production.",[48,53515,53516],{},"So, how does Pulsar define the backlog size? In Figure 3, all messages in the stream have been consumed but not all of them have been acknowledged by the consumer.",[48,53518,53519],{},[384,53520],{"alt":53521,"src":53522},"table-zk-path-topics","\u002Fimgs\u002Fblogs\u002F63b544ec6526935728445e14_table-zk-path-topics.webp",[48,53524,53525],{},"The Pulsar community merged some code to fix the leak issue in Pulsar 2.8+. If you are using earlier versions, you might have some dirty data in your cluster. To clean up the data, we proposed the following solution.",[1666,53527,53528,53531,53534],{},[324,53529,53530],{},"Get a topic list through the ZooKeeper client (You can use it to read these paths and form topic names in a set format).",[324,53532,53533],{},"Use pulsar-admin to check whether these topics exist in the cluster. If they do not exist, the associated data must be dirty and should be deleted.",[324,53535,53536],{},"Keep in mind that you need to back up the data before the clean-up so that you can recover topics in case of any unexpected deletion.",[40,53538,53540],{"id":53539},"bookie-ledger-leaks","Bookie ledger leaks",[48,53542,53543],{},"In production, all our retention policies are no more than 15 days. Even if we add the TTL period (for example, also 15 days), the maximum message lifecycle should be 30 days. However, we found that some ledgers which were created 2 years ago still existed and could not be deleted (We are using an internal monitoring service that checks all ledger files on a regular basis).",[48,53545,53546],{},"One possible reason for orphan ledgers could be the bookie CLI commands. For example, when we use some CLI commands to check the status of a cluster, it may create a ledger on the bookie. However, the retention policy is not applicable to such ledgers.",[48,53548,53549],{},"To delete orphan ledgers, you can try the following ways:",[1666,53551,53552,53555],{},[324,53553,53554],{},"Obtain the metadata of the ledger. Each ledger has its own LedgerInfo, which stores its metadata, such as the creation time and the bookies that store the ledger data. If the ledger metadata are already missing, you can delete their corresponding ledgers directly.",[324,53556,53557],{},"As a Pulsar topic represents a sequence of ledgers, you can check whether a ledger still exists in the ledger list of the topic. If it does not exist, you can delete it.",[48,53559,53560],{},"When you try to delete orphan ledgers, you need to:",[321,53562,53563,53566],{},[324,53564,53565],{},"Pay special attention to the schema, which is mapped to a ledger in Pulsar. The schema ledger is stored on the bookie and the information about the schema itself is stored in ZooKeeper. If you delete the schema by accident, you need to delete the schema information on the broker first and try to recreate it from the producer side.",[324,53567,53568],{},"Back up your data first before you delete them.",[48,53570,53571,53572,190],{},"For more information about how to deal with orphan ledgers, see ",[55,53573,53575],{"href":53574},"\u002Fblog\u002Fengineering\u002F2022-09-27-a-deep-dive-into-topic-data-lifecycle-in-apache-pulsar\u002F","A Deep Dive into the Topic Data Lifecycle in Apache Pulsar",[40,53577,53579],{"id":53578},"cache-optimization","Cache optimization",[48,53581,53582],{},"Pulsar uses caches at different levels. Topics have their own caches on the broker side. Write caches and read caches in BookKeeper are allocated based on JVM direct memory (25% of direct memory). For hot data, generally, these caches can be hit and there is no need to read the actual data.",[48,53584,53585],{},"Figure 6 shows some cache metrics we observed in one of our production cases. There was a sharp decrease of read cache size, which led to the sudden increase of read cache misses. As a result, the reads on bookies saw a peak at 16:15 with the latency increasing to nearly 5 seconds. In fact, we noticed that this sudden peak happened periodically.",[48,53587,53588],{},[384,53589],{"alt":53590,"src":53591},"some graph to illustrate Cache optimization","\u002Fimgs\u002Fblogs\u002F63b544fd6526936acb446bcc_image6-221020.png",[48,53593,53594],{},"Let’s take a look at the following two source code snippets to analyze the reason for the above scenario.",[8325,53596,53599],{"className":53597,"code":53598,"language":8330},[8328],"try {\n     \u002F\u002F We need to check all the segments, starting from the current\n     \u002F\u002F backward to minimize the\n     \u002F\u002F checks for recently inserted entries\n     int size = cacheSegments.size();\n     for (int i = 0; i \nIterate message\n\n",[4926,53600,53598],{"__ignoreMap":18},[48,53602,53603],{},"try {\n   int offset = currentSegmentOffset.getAndAdd(entrySize);\n   if (offset + entrySize > segmentSize) {\n       \u002F\u002F Rollover to next segment\n       currentSegmentIdx = (currentSegmentIdx + 1) % cacheSegments.size();  \n       currentSegment0ffset.set(alignedSize);\n       cacheIndexes.get(currentSegmentIdx).clear();\n        offset = 0;\n}",[8325,53605,53608],{"className":53606,"code":53607,"language":8330},[8328],"\noffset + entrySize vs segmentSize\n\nThe first snippet uses a for loop to iterate messages for caches; in the second one, all caches will be cleared if the sum of offset and entrySize is larger than segmentSize. This explains the sudden decrease of read cache size. After that point, caches will be recreated.\n\nCurrently, we are using the LRU policy (OHC) to avoid sudden cache fluctuations. This is the result after our optimization:\n\n![Figure 7 graph read cache its](\u002Fimgs\u002Fblogs\u002F63b54548d4fc09382ee01ba5_image7-221020.png)\n\n## Summary\n\nIn this blog, we shared our experience of using and optimizing Apache Pulsar at Tencent Cloud for better performance and stability. Going forward, the Tencent Cloud team will continue to be an active player in the Pulsar community and work with other community members in the following aspects.\n\n- Retry policies within the client timeout period. We are thinking about creating an internal mechanism featuring multiple retries (send requests) to avoid message delivery failures.\n- Broker and bookie OOM optimization. Brokers may be out of memory when you have too many fragmented acknowledgments. For bookie OOM cases, they can be caused by different factors. For example, if one of the bookies in an ensemble has slow returns (Write Quorum =3, Ack Quorum = 2), the direct memory can never be released.\n- Bookie AutoRecovery optimization. AutoRecovery can be deployed separately or on the same machines where bookies are running. When you deploy them together, the AutoRecovery process can’t be restarted if you have a ZooKeeper session timeout. This is because there is no retry logic between AutoRecovery and ZooKeeper. Hence, we want to add an internal retry mechanism for AutoRecovery.\n",[4926,53609,53607],{"__ignoreMap":18},{"title":18,"searchDepth":19,"depth":19,"links":53611},[53612,53613,53614,53615],{"id":19155,"depth":19,"text":19156},{"id":53452,"depth":19,"text":53453},{"id":53539,"depth":19,"text":53540},{"id":53578,"depth":19,"text":53579},"2022-10-20","Learn how Tencent Cloud engineers worked to optimize Apache Pulsar for better performance and stability.","\u002Fimgs\u002Fblogs\u002F63c7c146d5ec62f566f6dc78_63b543ca652693122243ce3f_apache-pulsar-at-tencent-cloud-top.jpeg",{},{"title":43592,"description":53617},"blog\u002F600k-topics-per-cluster-stability-optimization-apache-pulsar-tencent-cloud",[35559,821,4301],"CiW_oPwNI5aoKVqR9CHs6E6pRo0YGj-Gk8sNTaq2L0M",{"id":53625,"title":53626,"authors":53627,"body":53628,"category":821,"createdAt":290,"date":53880,"description":53881,"extension":8,"featured":294,"image":53882,"isDraft":294,"link":290,"meta":53883,"navigation":7,"order":296,"path":53884,"readingTime":3556,"relatedResources":290,"seo":53885,"stem":53886,"tags":53887,"__hash__":53888},"blogs\u002Fblog\u002Fhow-to-migrate-from-rabbitmq-to-apache-pulsar.md","How to Migrate from RabbitMQ to Apache Pulsar™",[28],{"type":15,"value":53629,"toc":53873},[53630,53635,53638,53641,53644,53653,53664,53667,53671,53678,53681,53684,53698,53701,53705,53708,53711,53716,53720,53723,53749,53755,53784,53787,53791,53794,53797,53825,53828,53831,53835,53846,53849],[916,53631,53632],{},[48,53633,53634],{},"Note:",[48,53636,53637],{},"The AoP plugin supports only the 0-9-1 protocol with basic produce and consume functionalities, and does not include advanced features such as transactions. It is available as an open-source plugin and is only offered as a private preview feature in the Private Cloud distribution. It is not available on StreamNative Cloud. Please use it with caution.",[48,53639,53640],{},"Instead of using AoP, you are recommended to use RabbitMQ sink and source connectors to migrate data from RabbitMQ to Pulsar.",[48,53642,53643],{},"RabbitMQ is a popular, open-source messaging system that has been widely adopted for asynchronous service-to-service communication using a publisher-subscribe model.",[48,53645,53646,53647,53652],{},"RabbitMQ was built for single machines, but organizations today are dealing with more data than ever before. ",[55,53648,53651],{"href":53649,"rel":53650},"https:\u002F\u002Fwww.forbes.com\u002Fsites\u002Fgilpress\u002F2021\u002F12\u002F30\u002F54-predictions-about-the-state-of-data-in-2021\u002F?sh=3058efea397d",[264],"The amount of data created and consumed will continue to grow at a faster pace",", with no signs of slowing down. Many organizations that have developed critical business applications on older messaging technologies are now facing scalability issues.",[48,53654,53655,53656,53659,53660,53663],{},"As a result, organizations are looking to cloud-native alternatives, and Apache Pulsar is increasingly the messaging technology of choice. ",[55,53657,96],{"href":53658},"\u002Fblog\u002Fcase\u002F2021-01-05-iterable-scale-customer-engagement-platform-with-pulsar\u002F",", a marketing platform that sends large numbers of messages daily, made the jump from RabbitMQ when faced with flow control issues at high loads. ",[55,53661,50867],{"href":53662},"\u002Fsuccess-story\u002Fzhaopin\u002F",", a popular online recruiting and career platform in China, experienced challenges with managing multiple messaging technologies using RabbitMQ. These are just two examples of organizations that have adopted Pulsar to modernize their messaging infrastructure and increase the resiliency and reliability of data at scale.",[48,53665,53666],{},"In this blog, we’ll discuss why organizations are choosing Apache Pulsar for messaging and how to leverage AMQP-on-Pulsar (AoP), a protocol handler for RabbitMQ, to enable an easy migration from RabbitMQ to Pulsar.",[40,53668,53670],{"id":53669},"how-pulsar-helps-companies-scale-in-the-cloud","How Pulsar helps companies scale in the cloud",[48,53672,53673,53674,190],{},"Apache Pulsar is cloud-native, enabling organizations to build scalable and reliable messaging and streaming applications in elastic cloud environments. First released to the open source community in 2016, Pulsar graduated as a top-level Apache Software Foundation project in 2018, and its adoption has skyrocketed since then. In 2021, Apache Pulsar was ranked as a ",[55,53675,53677],{"href":36211,"rel":53676},[264],"top 5 ASF project",[48,53679,53680],{},"While many think of Pulsar as a real-time data streaming solution, it was originally designed as a global messaging platform for Yahoo and can solve for use cases across both data streaming and messaging.",[48,53682,53683],{},"Pulsar is being increasingly adopted by organizations looking to solve scalability and reliability issues for asynchronous messaging and complex message queues. Let’s take a look at the key attributes driving Pulsar adoption:",[1666,53685,53686,53689,53692,53695],{},[324,53687,53688],{},"Simplified operations with multi-tenancy - Pulsar’s built-in multi-tenant architecture enables organizations to securely deploy applications in a shared environment.",[324,53690,53691],{},"Elastic scalability - Pulsar’s decoupled storage and compute provides the ability to add new brokers and bookies independently, enabling seamless scalability.",[324,53693,53694],{},"Resiliency with geo-replication - Pulsar supports both asynchronous and synchronous geo-replication strategies across multiple data centers out of the box.",[324,53696,53697],{},"Cost-effective data retention - Pulsar’s tiered storage feature allows historical data to be offloaded to cloud-native storage and retain event streams for an indefinite period of time.",[48,53699,53700],{},"Next, we’ll look at a tool helping organizations make the move from RabbitMQ to Pulsar.",[40,53702,53704],{"id":53703},"introducing-aop-a-turnkey-protocol-handler-for-rabbitmq","Introducing AoP: A turnkey protocol handler for RabbitMQ",[48,53706,53707],{},"Replacing any component of your software stack can be difficult, especially when it requires the migration of one or more applications that are integral to your business. For organizations looking to migrate from RabbitMQ to Pulsar, AMQP-on-Pulsar (AoP) is a protocol handler that helps make the transition easier.",[48,53709,53710],{},"AoP enables existing applications to communicate directly with Apache Pulsar using the same RabbitMQ client library: no API changes, and no changes to your existing code base. Just add the AoP protocol handler to your existing Pulsar cluster and you can run your existing application code “as is” with a few minor configuration changes. This will allow you to easily leverage Pulsar’s powerful features, such as tiered storage and infinite event stream retention.",[48,53712,53713],{},[384,53714],{"alt":18,"src":53715},"\u002Fimgs\u002Fblogs\u002F63b542386c15365b1a2facd5_screen-shot-2022-10-12-at-1.45.42-pm.png",[40,53717,53719],{"id":53718},"how-to-get-started-with-aop","How to Get Started with AoP",[48,53721,53722],{},"Getting started with AoP is a simple process. Follow the step-by-step instructions below to configure Apache Pulsar to support RabbitMQ:",[1666,53724,53725,53734,53743,53746],{},[324,53726,53727,53728,53733],{},"Download and unzip the latest binary version of Apache Pulsar from the ",[55,53729,53732],{"href":53730,"rel":53731},"https:\u002F\u002Fpulsar.apache.org\u002Fen\u002Fdownload\u002F",[264],"downloads page",". We will refer to this unzipped folder as $PULSAR_HOME. As of this writing, the most recent stable version is 2.9.1.",[324,53735,53736,53737,53742],{},"Download the latest binary version of the AoP connector from the ",[55,53738,53741],{"href":53739,"rel":53740},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Faop\u002Freleases",[264],"releases page",", and copy it into $PULSAR_HOME\u002Fprotocols directory. As of this writing, the most recent stable version is 2",[324,53744,53745],{},".9.1.2.",[324,53747,53748],{},"Configure the Pulsar broker to run the AoP protocol handler as a plugin. Add the following configs to the Pulsar broker’s configuration file, e.g., broker.conf or standalone.conf if you are planning on running the broker in standalone mode (this is most likely the case in a laptop\u002Fdeveloper environment).",[48,53750,53751],{},[384,53752],{"alt":53753,"src":53754},"table  Pulsar broker to run the AoP protocol","\u002Fimgs\u002Fblogs\u002F63b542815d8ce7b86d5bfcea_-Pulsar-broker-to-run-the-AoP-protocol.webp",[1666,53756,53757,53760,53763,53772,53775,53778,53781],{},[324,53758,53759],{},"Add messagingProtocols and protocolHandlerDirectory properties to the Pulsar Broker configuration file. For AoP, the value for messagingProtocols is amqp; the value for protocolHandlerDirectory is the directory where you downloaded the AoP NAR file.",[324,53761,53762],{},"Set AMQP service listeners in the Pulsar broker configuration file. Note that the hostname value in listeners should be the same as Pulsar broker's advertisedAddress, e.g., `` amqpListeners=amqp:\u002F\u002F127.0.0.1:5672 advertisedAddress=127.0.0.1",[324,53764,53765,53766,53771],{},"Start your Pulsar broker with the above configuration. For more details, refer to the ",[55,53767,53770],{"href":53768,"rel":53769},"https:\u002F\u002Fhub.streamnative.io\u002Fprotocol-handlers\u002Faop\u002F0.1.0\u002F",[264],"AoP guide",". If you made all of your changes to the standalone.conf file, then you will need to start Pulsar in standalone mode using the following command: “$PULSAR_HOME\u002Fbin\u002Fpulsar standalone",[324,53773,53774],{},"Create a namespace for the AMQP vhost using the following command: $PULSAR_HOME\u002Fbin\u002Fpulsar-admin namespaces create -b 1 public\u002Fvhost",[324,53776,53777],{},"Increase the data retention policy for the namespace you just created to 100 MB worth of data or 2 days using the following command: $PULSAR_HOME\u002Fbin\u002Fpulsar-admin namespaces set-retention -s 100M -t 2d public\u002Fvhost",[324,53779,53780],{},"Now you are ready to go and can test the AoP protocol handler using a RabbitMQ client (version 5.8.0 or newer is recommended) inside a Java program.",[324,53782,53783],{},"Once steps 1-9 have been completed, change the broker URL used by the RabbitMQ clients inside your application to the address you specified in the amqpListeners property, e.g., amqp:\u002F\u002F:5672. That will enable your application to work with Apache Pulsar instead of RabbitMQ.",[48,53785,53786],{},"Congratulations! After completing the steps above, you’ve configured Pulsar to support RabbitMQ. Now we’ll discuss one approach for migrating a production application from RabbitMQ to AoP.",[40,53788,53790],{"id":53789},"best-practices-for-migrating-from-rabbitmq-to-aop","Best Practices for Migrating from RabbitMQ to AoP",[48,53792,53793],{},"While there are several options to transition from RabbitMQ to Pulsar, in this example we’ll walk you through a migration process that includes establishing a parallel testing environment. A parallel testing environment helps mitigate potential migration risks for mission-critical applications.",[48,53795,53796],{},"Let’s take a closer look:",[1666,53798,53799,53807,53813,53816,53819,53822],{},[324,53800,53801,53802,53806],{},"Stand-up a new Apache Pulsar cluster using one of the documented installation ",[55,53803,53805],{"href":37237,"rel":53804},[264],"methods",". (We recommend deploying to a Kubernetes environment using the Helm charts provided with the open-source distribution.)",[324,53808,53809,53810,50093],{},"Enable AMQP support for Pulsar. (You can find a step-by-step for this ",[55,53811,267],{"href":53768,"rel":53812},[264],[324,53814,53815],{},"Replicate the topic structure for a single application on the Pulsar cluster so you have a place to publish the application data.",[324,53817,53818],{},"Create a new branch of the application code, and change its configuration to end messages to Pulsar instead of RabbitMQ.",[324,53820,53821],{},"Run the re-configured application instance along with an instance of the application configured to write to a RabbitMQ test environment.",[324,53823,53824],{},"Compare the contents of the topics.",[48,53826,53827],{},"This migration strategy allows you to gain a high degree of confidence that your existing application will run as expected before going live with the migration. Furthermore, your existing RabbitMQ environment can serve as a stand-by environment in the event of any unforeseen issues, allowing you to revert back to a safe environment if necessary.",[48,53829,53830],{},"Once you are confident with the AMQP setup on Pulsar, we recommend you begin the migration process on an application-by-application basis. This also allows you to identify any potential issues on a smaller scale.",[40,53832,53834],{"id":53833},"additional-resources-on-aop","Additional Resources on AoP",[48,53836,53837,53838,1154,53841,53845],{},"The RabbitMQ connectors (and several other connectors) are fully open source and supported by the Apache Pulsar community. If you are interested in testing AMQP on Pulsar, you can check out open source Pulsar ",[55,53839,267],{"href":23526,"rel":53840},[264],[55,53842,53844],{"href":17075,"rel":53843},[264],"use StreamNative Cloud"," to get a Pulsar cluster running in minutes.",[48,53847,53848],{},"To learn more about AoP, please check out the following blogs:",[321,53850,53851,53858,53865],{},[324,53852,53853,53857],{},[55,53854,53856],{"href":53855},"\u002Fen\u002Fblog\u002Ftech\u002F2021-04-26-announcing-amqp10-connector-for-apache-pulsar\u002F","Announcing AMQP 1.0 Connector for Apache Pulsar"," - Read about how the connector enables seamless integration between Pulsar and AMQP.",[324,53859,53860,53864],{},[55,53861,53863],{"href":53862},"\u002Fen\u002Fblog\u002Ftech\u002F2020-06-15-announcing-aop-on-pulsar\u002F","Announcing AMQP-on-Pulsar: Bring Native AMQP Protocol Support to Apache Pulsar"," - Learn more about AoP architecture and concepts.",[324,53866,53867,53872],{},[55,53868,53871],{"href":53869,"rel":53870},"https:\u002F\u002Fwww.infoq.com\u002Farticles\u002Fpulsar-customer-engagement-platform\u002F",[264],"How Apache Pulsar Is Helping Iterable Scale Its Customer Engagement Platform"," - Read about why Iterable migrated from RabbitMQ to Pulsar and how they are building a new messaging platform on Pulsar.",{"title":18,"searchDepth":19,"depth":19,"links":53874},[53875,53876,53877,53878,53879],{"id":53669,"depth":19,"text":53670},{"id":53703,"depth":19,"text":53704},{"id":53718,"depth":19,"text":53719},{"id":53789,"depth":19,"text":53790},{"id":53833,"depth":19,"text":53834},"2022-10-12","Learn why organizations are choosing Apache Pulsar for messaging and how to leverage AMQP-on-Pulsar (AoP), a protocol handler for RabbitMQ, to enable an easy migration from RabbitMQ to Pulsar.","\u002Fimgs\u002Fblogs\u002F63c7c16b54ef997e930c4d4e_63b54238d3c4c294987054a5_rabbit-to-pulsar-top.png",{},"\u002Fblog\u002Fhow-to-migrate-from-rabbitmq-to-apache-pulsar",{"title":53626,"description":53881},"blog\u002Fhow-to-migrate-from-rabbitmq-to-apache-pulsar",[11043,3550,821,28572,9144,32622],"GIEwYraD6Jhzf5yXkkN304sdXNaMVoSFbLqefrOq0xM",{"id":53890,"title":53891,"authors":53892,"body":53893,"category":821,"createdAt":290,"date":53880,"description":54005,"extension":8,"featured":294,"image":54006,"isDraft":294,"link":290,"meta":54007,"navigation":7,"order":296,"path":54008,"readingTime":33204,"relatedResources":290,"seo":54009,"stem":54010,"tags":54011,"__hash__":54012},"blogs\u002Fblog\u002Fimproving-regular-expression-based-subscriptions-pulsar-consumers.md","Improving Regular Expression-Based Subscriptions in Pulsar Consumers",[42141],{"type":15,"value":53894,"toc":53995},[53895,53899,53902,53910,53913,53919,53923,53926,53932,53936,53939,53946,53950,53953,53955,53958,53960,53965],[40,53896,53898],{"id":53897},"a-closer-look-at-regular-expression-based-subscriptions","A closer look at regular expression-based subscriptions",[48,53900,53901],{},"Currently, you have two options when creating a consumer with a Pulsar client:",[1666,53903,53904,53907],{},[324,53905,53906],{},"Specify the list of topics you want to consume. Often this list consists of only one topic name, but it does not need to.",[324,53908,53909],{},"Specify a regular expression as a topic pattern. Initially, this is equivalent to listing all topics that match the pattern. But as time goes by, new topics are created, while some may be deleted. The regular expression-based consumer has an auto-discovery mechanism. Under the hood, consumers regularly ask a broker for the current list of topics. Whenever the consumer finds a change in the set of topic names that match the pattern, it subscribes to the new topics. You can set the period at which the consumer will check for updates and refresh its list of topics at creation time. By default, this happens once every minute.",[48,53911,53912],{},"The following image shows a typical interaction between the consumer and the broker:",[48,53914,53915],{},[384,53916],{"alt":53917,"src":53918},"Figure 2. The broker sends a filtered response.","\u002Fimgs\u002Fblogs\u002F63b53f60b14a843ec35dcafb_filtered-response.png",[32,53920,53922],{"id":53921},"skipping-updates-when-nothing-has-changed","Skipping updates when nothing has changed",[48,53924,53925],{},"Often there’s no change in the response from the broker between subsequent requests. This is either because no new topic is created at all or because the new topic(s) don’t match the pattern. In such cases, there’s no value in responding with the list of topic names; the broker should indicate that nothing has changed, so that the consumer can continue without updates to its list of topics. One way to enable such a response is for brokers to track the last response to a specific consumer. However, this would put an unnecessary burden on brokers, so we decided to let consumers keep track of this state. Specifically, brokers calculate a hash from the topic list and include it in the response. The next time the consumer requests the list of topics, it adds the hash to the request. If the broker finds that the current hash is the same as the one in the request, it sends a response with a flag instead of topic names to show that the state the consumer knows about is still current.",[48,53927,53928],{},[384,53929],{"alt":53930,"src":53931},"Figure 3. The broker indicates that no change happened.","\u002Fimgs\u002Fblogs\u002F63b53f602a0a1d55b93591ae_broker-indicating-no-change.png",[40,53933,53935],{"id":53934},"notifications-for-faster-discovery","Notifications for faster discovery",[48,53937,53938],{},"The above features solved the issue of unnecessary network traffic but did not help discover topics earlier and avoid lags for the first messages. For that, we introduced topic list watchers.",[48,53940,53941,53942,53945],{},"As shown in Figure 4, consumers register as watchers with brokers. The initial exchange resembles what we’ve discussed in the sections above. The difference is that the broker keeps track of watchers. Brokers get notifications from the metadata store whenever a new topic is created (possibly through another broker) and immediately send a message to the consumers that registered with a pattern that the new topic’s name matches. This way, the consumers can know about newly created topics within seconds.\n",[384,53943],{"alt":18,"src":53944},"\u002Fimgs\u002Fblogs\u002F63b53f60d1a76a1effe88290_topic-list-watcher-lifecycle.png","Figure 4. The life cycle of topic list watchers",[32,53947,53949],{"id":53948},"polling-and-notifications-in-parallel","Polling and notifications in parallel",[48,53951,53952],{},"The procedure for topic list updates described above involves multiple services and steps. For example, the metadata store needs to notify every broker, and those brokers need to process the notification and then update every consumer that is interested in the newly created topic. During the process, it is possible that a broker experiences an issue right after the topic is created. In such events, it will be unable to send notifications to consumers even though the topic was successfully created. Put simply, creating a topic and sending notifications is not an atomic operation. As a result, consumers can’t rely on notifications exclusively; they still must use the polling mechanism to make up for updates they did not get. Conveniently, missing notifications do not cause errors or inconsistencies in the consumer; they merely delay message processing in new topics until the consumer polls again for matching topics.",[40,53954,2125],{"id":2122},[48,53956,53957],{},"The enhancements described above will be available in Apache Pulsar release 2.11.0. They will address issues that Pulsar users at scale face in connection with regular expression-based subscriptions. First, applying the topic pattern on the broker side and omitting updates altogether in most cases will reduce network utilization significantly. Second, watchers provide an efficient way of discovering topics right after creation, thus eliminating the lag in processing the first messages produced to those topics.",[40,53959,38376],{"id":38375},[48,53961,38379,53962,40419],{},[55,53963,38384],{"href":38382,"rel":53964},[264],[321,53966,53967,53972,53976,53986],{},[324,53968,38390,53969,190],{},[55,53970,31914],{"href":31912,"rel":53971},[264],[324,53973,45476,53974,45480],{},[55,53975,3550],{"href":45479},[324,53977,53978,758,53981],{},[2628,53979,53980],{},"PIP-145",[55,53982,53985],{"href":53983,"rel":53984},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F14505",[264],"Improve performance of regex subscriptions",[324,53987,53988,758,53990],{},[2628,53989,42753],{},[55,53991,53994],{"href":53992,"rel":53993},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fconcepts-messaging\u002F#multi-topic-subscriptions",[264],"Multi-topic subscriptions",{"title":18,"searchDepth":19,"depth":19,"links":53996},[53997,54000,54003,54004],{"id":53897,"depth":19,"text":53898,"children":53998},[53999],{"id":53921,"depth":279,"text":53922},{"id":53934,"depth":19,"text":53935,"children":54001},[54002],{"id":53948,"depth":279,"text":53949},{"id":2122,"depth":19,"text":2125},{"id":38375,"depth":19,"text":38376},"Understand the optimized regular expression-based subscriptions in Apache Pulsar.","\u002Fimgs\u002Fblogs\u002F63c7c17de754a56f53d25b76_63b53f609e09e911806ffe78_improve-regex-sub-top.jpeg",{},"\u002Fblog\u002Fimproving-regular-expression-based-subscriptions-pulsar-consumers",{"title":53891,"description":54005},"blog\u002Fimproving-regular-expression-based-subscriptions-pulsar-consumers",[821],"KlvmgjLzOnZ2qyq768xDaTJMKA6buuqfbGPt6vZA2VI",{"id":54014,"title":46332,"authors":54015,"body":54017,"category":3550,"createdAt":290,"date":53880,"description":54188,"extension":8,"featured":294,"image":54189,"isDraft":294,"link":290,"meta":54190,"navigation":7,"order":296,"path":46331,"readingTime":3556,"relatedResources":290,"seo":54191,"stem":54192,"tags":54193,"__hash__":54194},"blogs\u002Fblog\u002Fstreamnatives-pulsar-operators-certified-red-hat-openshift-operators.md",[24776,54016],"Fushu Wang",{"type":15,"value":54018,"toc":54179},[54019,54025,54032,54035,54039,54042,54045,54056,54058,54061,54064,54072,54076,54078,54084,54086,54089,54094,54100,54105,54110,54114,54119,54123,54128,54132,54137,54141,54146,54151,54153,54177],[916,54020,54021],{},[48,54022,54023],{},[36,54024,46129],{},[48,54026,54027,54028,54031],{},"We are excited to announce that StreamNative’s Pulsar Operators, available in the ",[55,54029,44086],{"href":54030},"\u002Fplatform\u002F",", are now certified as Red Hat OpenShift Operators. Using the operators, you can now easily set up and manage Pulsar clusters that meet Red Hat’s standards of security, reliability, and lifecycle management on OpenShift. The Operators enable organizations to build cloud-native, scalable streaming platforms and run containerized workloads across private cloud, hybrid cloud, multi-cloud, and edge environments with peace of mind.",[48,54033,54034],{},"In this blog, we talk about what Pulsar Operators are and the benefits of the OpenShift certification, including enterprise-grade security, easy installation, and automated upgrades. We also walk through how to install the operators on OpenShift.",[40,54036,54038],{"id":54037},"what-are-pulsar-operators","What are Pulsar Operators?",[48,54040,54041],{},"Pulsar Operators are key components of the StreamNative Platform offering. They are Kubernetes controllers that provide a declarative API to simplify the deployment and management of Pulsar clusters on Kubernetes.",[48,54043,54044],{},"The Pulsar Operators available on OpenShift include BookKeeper Operator, Pulsar Operator, and ZooKeeper Operator. Together, the three operators manage the key components in a Pulsar cluster:",[1666,54046,54047,54050,54053],{},[324,54048,54049],{},"BookKeeper Operator: Provides full lifecycle management for BookKeeper clusters.",[324,54051,54052],{},"Pulsar Operator: Manages the deployment of the Pulsar Broker and Pulsar Proxy to run Pulsar clusters.",[324,54054,54055],{},"ZooKeeper Operator: Provides full lifecycle management for ZooKeeper clusters.",[40,54057,50690],{"id":50689},[48,54059,54060],{},"Red Hat OpenShift is an enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, multicloud, and edge deployments.",[48,54062,54063],{},"The certifications of Pulsar Operators on OpenShift brings three key benefits to StreamNative Platform customers:",[1666,54065,54066,54068,54070],{},[324,54067,34365],{},[324,54069,34368],{},[324,54071,34371],{},[40,54073,54075],{"id":54074},"install-streamnatives-pulsar-operators-on-openshift","Install StreamNative’s Pulsar Operators on OpenShift",[32,54077,10104],{"id":10103},[48,54079,50716,54080,50722],{},[55,54081,50721],{"href":54082,"rel":54083},"https:\u002F\u002Fdocs.openshift.com\u002Fcontainer-platform\u002F4.10\u002Fsecurity\u002Fcert_manager_operator\u002Fcert-manager-operator-install.html",[264],[32,54085,42912],{"id":42911},[48,54087,54088],{},"The steps below demonstrate how to install the BookKeeper Operators. Follow the same steps to install other Pulsar Operators.",[1666,54090,54091],{},[324,54092,54093],{},"Open the cluster console.",[48,54095,54096],{},[384,54097],{"alt":54098,"src":54099},"interface OpenShift Operators","\u002Fimgs\u002Fblogs\u002F63b543166c1536e91f30a821_5E_RalUzuxPJ6wRMIrJOz6YkR8D-8YQTEE-OunXVh3JwFxD5dUP7gtL9LXdees7u7Xgw5fCWrMuTtdBClKhx_xyFkrvStKJcCzBRVS6SFDgUGCCJiUJSau8F0HhiOus1pMizOwkyBtiqm9OHGxpntkUY8b7qGFicCCSlwrF0cdS9tF-sjPPWxYdp4w.png",[1666,54101,54102],{},[324,54103,54104],{},"Find the Operators on the OperatorHub of OpenShift. You can search for the keyword Pulsar or StreamNative.",[48,54106,54107],{},[384,54108],{"alt":54098,"src":54109},"\u002Fimgs\u002Fblogs\u002F63b54316c59a1505183f823d_vhL5IFmCYAJv15D4UXpeUp3MwwZGg2HU-JrcX2I3nlQVI8im1upjghrWmw99qNv-VisFIlzT7yHugDtHgTPtRa8vJV-9TwGoeXOC-zDEMGWzTV6EqMUO0dC8mDM_zQwj07DojMwJK3QRESvR70loSkcq5avzwwNE6YUitazSU-ZPnIsi3gT3NTvcXg.png",[1666,54111,54112],{},[324,54113,50764],{},[48,54115,54116],{},[384,54117],{"alt":54098,"src":54118},"\u002Fimgs\u002Fblogs\u002F63b54315d3c4c248d27197de_l-9Ph2UPldhoSAsH7-av1HL_OAwx-a17PO_ouWhTQIsmcxH5b4qHAmCNKmmgCCkbpv6jIds_ECymiJsO3DRhCO1FcRvHYCT0PL_5QUPe7O2TQHGUxcm1ZSV1faiILOLyFMSVgyiG7cfXNDiU8yRACE2BVK0ofYDATp5BTRmG8315i8vwQComCYeSJg.png",[1666,54120,54121],{},[324,54122,50774],{},[48,54124,54125],{},[384,54126],{"alt":54098,"src":54127},"\u002Fimgs\u002Fblogs\u002F63b543150e058c5cb54b5bd1_fQWnn7i0zjukkyfJ1s04_QeU1G4a4A1oEl6gk7J2AfJY1pQkGqQJTaG3pAMQYwODDDmjNJiChTzCJzOJIP4U8eDMo1ZC38AoHyMqBC_lndXWj3t70ZezXtxTFLwT2oztuAyFRTHEJn-3YjqhkpV1TGydITaOnPiHKV4j9onOkWOnfa80D5GP7fqLEg.png",[1666,54129,54130],{},[324,54131,50784],{},[48,54133,54134],{},[384,54135],{"alt":54098,"src":54136},"\u002Fimgs\u002Fblogs\u002F63b54315be2e2a019a635ada_X7YeIbjffPJ-XwouYAHrfI2dDBqq8rbXBEp03TGx8rCqfww46zHY6wQ1lUiRIf_l1VKSAW1LPTo1Shr7mjqaaRqoFizNT4hFv6-d9l607GG56PWL_RiChyH4zwc4bLqc0eVeMJR81wPfN9306ITMCRuWljHby5sgrDpeSWfOe7dS-Y0u1FHd7rmhwA.png",[1666,54138,54139],{},[324,54140,50794],{},[48,54142,54143],{},[384,54144],{"alt":54098,"src":54145},"\u002Fimgs\u002Fblogs\u002F63b543162d8e7577843309c9_84n978Ip7jfzADrjPSu6G_Y9kd6vyCVvXZ2I51PHeTe0zZKRk-8RxJgJse5qbmNaZVN3RkmAo146rwqDqjO2M8JnwqRnxdEnPSCUjwUQAMKvT1q6Im4Bbrz2JeD9-shSDWmTQMQdab4jk_o9V9dNo1tEph0wY0r1sES-nW0GroPC4XQFzSbou9aahw.png",[1666,54147,54148],{},[324,54149,54150],{},"Next step: deploy sn-platform",[40,54152,40413],{"id":36476},[321,54154,54155,54161,54169],{},[324,54156,54157,54158,190],{},"Learn more about StreamNative’s Pulsar Operators. Read the documentation ",[55,54159,267],{"href":50586,"rel":54160},[264],[324,54162,54163,54164,1154,54167,36492],{},"Start your Pulsar training today. Take the ",[55,54165,36487],{"href":36485,"rel":54166},[264],[55,54168,36491],{"href":36490},[324,54170,54171,54172,54176],{},"Try StreamNative Platform on Red Hat Openshift. StreamNative Platform is a self-managed cloud-native messaging and event-streaming platform powered by Pulsar. It enables you to build real-time applications and data infrastructure for both real-time and historical events. ",[55,54173,3921],{"href":54174,"rel":54175},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Fv1.6.0\u002Foperator-guides\u002Fdeploy\u002Fdeploy-snp-openshift",[264]," to try StreamNative Platform on Openshift.",[48,54178,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":54180},[54181,54182,54183,54187],{"id":54037,"depth":19,"text":54038},{"id":50689,"depth":19,"text":50690},{"id":54074,"depth":19,"text":54075,"children":54184},[54185,54186],{"id":10103,"depth":279,"text":10104},{"id":42911,"depth":279,"text":42912},{"id":36476,"depth":19,"text":40413},"Learn about the benefits of using StreamNative’s Pulsar Operators on Red Hat Openshift, including enterprise-grade security, easy installation, and automated upgrades, and how to get started.","\u002Fimgs\u002Fblogs\u002F63c7c157069027588a444edc_63b543152d8e75067c3308bd_social-1200x627.png",{},{"title":46332,"description":54188},"blog\u002Fstreamnatives-pulsar-operators-certified-red-hat-openshift-operators",[302,821,16985],"9KbVD-DK_CqtcJQXDfHKGespsTHo2cyqqdjv6ozatDk",{"id":54196,"title":54197,"authors":54198,"body":54200,"category":821,"createdAt":290,"date":54443,"description":54444,"extension":8,"featured":294,"image":54445,"isDraft":294,"link":290,"meta":54446,"navigation":7,"order":296,"path":54447,"readingTime":33204,"relatedResources":290,"seo":54448,"stem":54449,"tags":54450,"__hash__":54451},"blogs\u002Fblog\u002Fannouncing-flink-pulsar-sql-connector.md","Announcing the Flink-Pulsar SQL Connector",[54199],"Yufei Zhang",{"type":15,"value":54201,"toc":54432},[54202,54205,54209,54218,54229,54233,54236,54247,54251,54276,54280,54282,54290,54298,54302,54316,54321,54327,54332,54338,54343,54349,54354,54360,54363,54368,54374,54382,54385,54388,54390,54393],[48,54203,54204],{},"We are happy to announce that the Flink-Pulsar SQL Connector has been released and is available for download and use. The Flink-Pulsar SQL Connector supports querying data from and writing data to Pulsar topics with simple Flink SQL queries. With this connector, you can easily create Flink + Pulsar pipelines without writing Java or Scala. Read this blog to learn about the benefits and features of this connector and how to use it.",[40,54206,54208],{"id":54207},"what-is-the-flink-pulsar-sql-connector","What is the Flink-Pulsar SQL Connector?",[48,54210,54211,54212,54217],{},"The Flink community provides the ",[55,54213,54216],{"href":54214,"rel":54215},"https:\u002F\u002Fnightlies.apache.org\u002Fflink\u002Fflink-docs-master\u002Fdocs\u002Fdev\u002Ftable\u002Foverview\u002F",[264],"SQL and Table API"," to express Flink jobs using SQL queries. The Flink-Pulsar SQL Connector allows Flink SQL to read from and write to Pulsar topics with simple “SELECT FROM” and “INSERT INTO” statements.",[916,54219,54220],{},[48,54221,54222,54223,54228],{},"Note: The Flink-Pulsar SQL Connector is implemented based on the ",[55,54224,54227],{"href":54225,"rel":54226},"https:\u002F\u002Fhub.streamnative.io\u002Fdata-processing\u002Fpulsar-flink\u002F1.15.0.1\u002F#pulsar-datastream-connector",[264],"Pulsar DataStream Connector"," and inherits most of the DataStream Connector’s configurations.",[40,54230,54232],{"id":54231},"what-are-the-benefits-of-using-the-flink-pulsar-sql-connector","What are the benefits of using the Flink-Pulsar SQL Connector?",[48,54234,54235],{},"The Flink-Pulsar SQL Connector provides three key benefits:",[321,54237,54238,54241,54244],{},[324,54239,54240],{},"Ease of Use: This connector allows you to discover real-time data values in Pulsar by submitting Flink jobs via SQL queries without the need to write and deploy Java. You can start queries from Pulsar topics using SQL with native tables without writing CREATE TABLE statements.",[324,54242,54243],{},"Scalability: The Flink-Pulsar SQL Connector inherits high scalability from the underlying DataStream Connector, which is designed to be scalable by using the newest source and sink APIs.",[324,54245,54246],{},"Flexibility: The Flink-Pulsar SQL Connector gives you the flexibility to subscribe to a topic pattern before a topic matching that pattern is even created. The connector is able to discover newly added topics during runtime without restarting the job.",[40,54248,54250],{"id":54249},"features-of-the-flink-pulsar-sql-connector","Features of the Flink-Pulsar SQL Connector",[321,54252,54253,54256,54259,54268],{},[324,54254,54255],{},"Define columns from message metadata: The Flink-Pulsar SQL Connector allows you to map the metadata of a Pulsar message, such as event_time, producer_name, publish_time etc, to Flink table columns. This can be useful when defining a watermark based on the time attributes metadata or enriching the Flink record with topic names.",[324,54257,54258],{},"Dynamic topic discovery: Similar to the DataStream Connector, the Flink-Pulsar SQL Connector allows you to define topic patterns, and you can add new data by adding new topics while the Flink SQL job is running. This is useful because when a source topic needs to be scaled up, you don’t need to restart the Flink job.",[324,54260,54261,54262,54267],{},"Avro and JSON format: The Flink-Pulsar SQL Connector supports Flink’s JSON and Avro formats to read corresponding binary data stored in Pulsar topics. It can also automatically derive the Flink table schema when reading from a Pulsar topic with JSON or Avro schema. Read ",[55,54263,54266],{"href":54264,"rel":54265},"https:\u002F\u002Fhub.streamnative.io\u002Fdata-processing\u002Fpulsar-flink\u002F1.15.0.1#native-tables",[264],"this document"," to learn more about this feature.",[324,54269,54270,54271,54275],{},"PulsarCatalog: PulsarCatalog allows you to use a Pulsar cluster as metadata storage for Flink tables. It supports defining two types of tables: explicit and native. native table allows you to read from Pulsar topics without creating the Flink table explicitly, thus the name “native”. Read ",[55,54272,54266],{"href":54273,"rel":54274},"https:\u002F\u002Fhub.streamnative.io\u002Fdata-processing\u002Fpulsar-flink\u002F1.15.0.1#available-metadata",[264]," for a detailed description of explicit and native tables.",[40,54277,54279],{"id":54278},"get-started-with-the-flink-pulsar-sql-connector","Get started with the Flink-Pulsar SQL Connector",[32,54281,10104],{"id":10103},[321,54283,54284,54287],{},[324,54285,54286],{},"For the Table API program, add the Flink-Pulsar SQL connector to your dependencies.",[324,54288,54289],{},"For Flink SQL queries with SQL Client, download the SQL jar and add it to the classpath when starting the SQL client. For example: “.\u002Fbin\u002Fsql-client.sh --jar flink-sql-connector-pulsar-1.15.1.1.jar”",[48,54291,54292,54293],{},"MavenSQL Jario.streamnative.connectors flink-sql-connector-pulsar 1.15.1.1",[55,54294,54297],{"href":54295,"rel":54296},"https:\u002F\u002Frepo1.maven.org\u002Fmaven2\u002Fio\u002Fstreamnative\u002Fconnectors\u002Fflink-sql-connector-pulsar\u002F",[264],"SQL JAR",[32,54299,54301],{"id":54300},"how-to-use-the-flink-pulsar-sql-connector","How to use the Flink-Pulsar SQL Connector",[48,54303,54304,54305,4003,54310,54315],{},"The sample code below demonstrates how to use PulsarCatalog and the Flink-Pulsar SQL Connector. You can find all the code below in the ",[55,54306,54309],{"href":54307,"rel":54308},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fflink-example",[264],"flink-example",[55,54311,54314],{"href":54312,"rel":54313},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fflink-example\u002Fblob\u002Fmain\u002Fsql-examples\u002Fsql-example.md",[264],"sql-examples"," repositories.",[1666,54317,54318],{},[324,54319,54320],{},"Create PulsarCatalog.",[8325,54322,54325],{"className":54323,"code":54324,"language":8330},[8328],"\nCREATE CATALOG pulsar\n  WITH (\n    'type' = 'pulsar-catalog',\n    'catalog-admin-url' = 'http:\u002F\u002Fpulsar:8080',\n    'catalog-service-url' = 'pulsar:\u002F\u002Fpulsar:6650'\n  );\n\n",[4926,54326,54324],{"__ignoreMap":18},[1666,54328,54329],{},[324,54330,54331],{},"Create an explicit table with watermark strategies.",[8325,54333,54336],{"className":54334,"code":54335,"language":8330},[8328],"\nCREATE TABLE sql_user (\n    name STRING,\n    age INT,\n    income DOUBLE,\n    single BOOLEAN,\n    createTime BIGINT,\n    row_time AS cast(TO_TIMESTAMP(FROM_UNIXTIME(createTime \u002F 1000), 'yyyy-MM-dd HH:mm:ss') as timestamp(3)),\n    WATERMARK FOR row_time AS row_time - INTERVAL '5' SECOND\n) WITH (\n  'connector' = 'pulsar',\n  'topics' = 'persistent:\u002F\u002Fsample\u002Fflink\u002Fuser',\n  'format' = 'json'\n);\n\n",[4926,54337,54335],{"__ignoreMap":18},[1666,54339,54340],{},[324,54341,54342],{},"Run a window query for the table.",[8325,54344,54347],{"className":54345,"code":54346,"language":8330},[8328],"\nSELECT single,\n TUMBLE_START(row_time, INTERVAL '10' SECOND) AS sStart,\n SUM(age) as age_sum from `sql_user`\n GROUP BY TUMBLE(row_time, INTERVAL '10' SECOND), single;\n\n",[4926,54348,54346],{"__ignoreMap":18},[1666,54350,54351],{},[324,54352,54353],{},"Write into the same table.",[8325,54355,54358],{"className":54356,"code":54357,"language":8330},[8328],"\nINSERT INTO `sql_user` VALUES ('user 1', 11, 25000.0, true, 1656831003);\n\n",[4926,54359,54357],{"__ignoreMap":18},[48,54361,54362],{},"So far we covered how to create and query from an explicit table. Next we can query directly from the native table mapped from the topic persistent:\u002F\u002Fsample\u002Fflink\u002Fuser.",[1666,54364,54365],{},[324,54366,54367],{},"Read 10 records from the native table named user.",[8325,54369,54372],{"className":54370,"code":54371,"language":8330},[8328],"\nSELECT * FROM `user` LIMIT 10;\n\n",[4926,54373,54371],{"__ignoreMap":18},[48,54375,54376,54377,190],{},"For more information, refer to the ",[55,54378,54381],{"href":54379,"rel":54380},"https:\u002F\u002Fhub.streamnative.io\u002Fdata-processing\u002Fpulsar-flink\u002F1.15.0.1",[264],"Flink-Pulsar SQL Connector documentation",[40,54383,54384],{"id":1727},"What’s next?",[48,54386,54387],{},"Currently the Flink-Pulsar SQL Connector does not support defining PRIMARY KEY, so it cannot support Change Data Capture (CDC) formats with an upsert\u002Fdelete operation. We will improve the connector to support upsert mode for CDC scenarios.",[40,54389,39647],{"id":39646},[48,54391,54392],{},"The Flink-Pulsar SQL Connector is a community-driven initiative. To get involved with the Flink-Pulsar SQL Connector, check out the following resources:",[321,54394,54395,54407,54415],{},[324,54396,54397,54398,54402,54403,54406],{},"Try out the Flink-Pulsar SQL Connector: ",[55,54399,54401],{"href":54379,"rel":54400},[264],"Download"," the connector and read the ",[55,54404,7120],{"href":54379,"rel":54405},[264]," to learn more about it.",[324,54408,54409,54410,54414],{},"Make a contribution: If you have any feature requests or bug reports, do not hesitate to ",[55,54411,39672],{"href":54412,"rel":54413},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fflink",[264]," by submitting a pull request.",[324,54416,54417,54418,48888,54421,39687,54424,54427,54428],{},"Contact us: Feel free to create an issue on ",[55,54419,39680],{"href":54412,"rel":54420},[264],[55,54422,39686],{"href":39684,"rel":54423},[264],[55,54425,39691],{"href":33664,"rel":54426},[264]," to get answers from Pulsar experts",[55,54429],{"href":54430,"rel":54431},"https:\u002F\u002Fdigidop.slack.com\u002Farchives\u002FD01LQ2A9DE3\u002Fp1671608435938599",[264],{"title":18,"searchDepth":19,"depth":19,"links":54433},[54434,54435,54436,54437,54441,54442],{"id":54207,"depth":19,"text":54208},{"id":54231,"depth":19,"text":54232},{"id":54249,"depth":19,"text":54250},{"id":54278,"depth":19,"text":54279,"children":54438},[54439,54440],{"id":10103,"depth":279,"text":10104},{"id":54300,"depth":279,"text":54301},{"id":1727,"depth":19,"text":54384},{"id":39646,"depth":19,"text":39647},"2022-09-29","Learn how to use the Flink-Pulsar SQL Connector to create Flink pipelines without writing Java or Scala.","\u002Fimgs\u002Fblogs\u002F63c7c1925a20b417dac61447_63b53de5d1a76a2466e7408d_flink-sql-connector-top.png",{},"\u002Fblog\u002Fannouncing-flink-pulsar-sql-connector",{"title":54197,"description":54444},"blog\u002Fannouncing-flink-pulsar-sql-connector",[28572,821,8057],"vtg9whWICeU_9NW3vkI6tJnUih9BA6eTt6rjXqMJcZI",{"id":54453,"title":40448,"authors":54454,"body":54456,"category":821,"createdAt":290,"date":54443,"description":54652,"extension":8,"featured":294,"image":54653,"isDraft":294,"link":290,"meta":54654,"navigation":7,"order":296,"path":40447,"readingTime":42793,"relatedResources":290,"seo":54655,"stem":54656,"tags":54657,"__hash__":54658},"blogs\u002Fblog\u002Fdeep-dive-transaction-coordinators-apache-pulsar.md",[808,54455],"Bo Cong",{"type":15,"value":54457,"toc":54640},[54458,54466,54469,54473,54476,54479,54483,54486,54492,54495,54498,54502,54505,54508,54511,54525,54529,54532,54536,54540,54543,54549,54558,54562,54565,54571,54575,54578,54581,54587,54590,54593,54597,54600,54603,54612,54616,54623,54625,54628,54635,54638],[48,54459,54460,54461,54465],{},"In a ",[55,54462,46135],{"href":54463,"rel":54464},"https:\u002F\u002Fwww.streamnative.cn\u002Fblog\u002Ftech\u002F2021-06-16-a-deep-dive-of-transactions-in-apache-pulsar\u002F",[264],", we introduced the core components of Pulsar transactions, the transaction API, and the transaction data flow. In the subsequent blogs, we will introduce the details of each component, covering the transaction coordinator, transaction buffer, and pending ack state.",[48,54467,54468],{},"This blog gives you a comprehensive understanding of the transaction coordinator, including its design logic and transaction logs. It is a core component of Pulsar transactions and guarantees their integrity.",[40,54470,54472],{"id":54471},"what-is-the-transaction-coordinator","What is the transaction coordinator?",[48,54474,54475],{},"The transaction coordinator (TC) manages the entire lifecycle of transactions and makes sure they function as expected. The transaction coordinator handles transaction timeouts and ensures that the transaction is aborted after a timeout.",[48,54477,54478],{},"Additionally, the transaction coordinator guarantees the durability of transactions. It records all metadata changes of a transaction and interacts with the topic owner broker to complete the transaction.",[32,54480,54482],{"id":54481},"transaction-id-and-transaction-coordinator-id","Transaction ID and transaction coordinator ID",[48,54484,54485],{},"In Pulsar, each transaction is identified with a 128-bit transaction ID (TxnID). The highest 16 bits are reserved for the transaction coordinator ID and the remaining bits are used by the TC to generate monotonically increasing numbers.",[48,54487,54488],{},[384,54489],{"alt":54490,"src":54491},"illustration of transaction ID","\u002Fimgs\u002Fblogs\u002F63b53d43450e922c46971401_transaction-id.png",[48,54493,54494],{},"A Pulsar cluster can have multiple transaction coordinators. You can use their IDs to identify different transaction coordinators.",[48,54496,54497],{},"The transaction coordinator is responsible for generating transaction IDs. They persist in the transaction log, which is an internal component of the transaction coordinator.",[32,54499,54501],{"id":54500},"transaction-metadata","Transaction metadata",[48,54503,54504],{},"After a new transaction opens, the client will publish, consume, and acknowledges messages on topics\u002Fpartitions with this transaction. During the process, the TC needs to know which topics the client has interacted with (for example, publishing and acknowledging messages). This tells the TC which broker it needs to talk to when completing the transaction.",[48,54506,54507],{},"On the client side, after a new topic\u002Fpartition joins the transaction, the client will send a transaction metadata change request to the TC to add the newly created partition or acked partition. The TC then persists the metadata change into the transaction log, which guarantees that the transaction can be recovered in case of a failure.",[48,54509,54510],{},"The transaction metadata contains:",[321,54512,54513,54516,54519,54522],{},[324,54514,54515],{},"Transaction ID",[324,54517,54518],{},"Transaction status (for example, OPEN, COMMITTING, and COMMITTED)",[324,54520,54521],{},"Created partitions",[324,54523,54524],{},"Acknowledged partitions and subscriptions",[32,54526,54528],{"id":54527},"complete-a-transaction","Complete a transaction",[48,54530,54531],{},"The client will commit a transaction if everything goes well in the transaction or abort it if any errors occur. At this point, the transaction is coming to an end. The TC will change the transaction state to COMMITTING or ABORTING and then interact with the owner broker of related topics to ensure the transaction ends successfully. After the transaction is completed, the TC changes the transaction state to COMMITTED or ABORTED. Since the TC will retry to complete the transaction, the transaction's complete operation needs to guarantee the reentrancy.",[32,54533,54535],{"id":54534},"transaction-log","Transaction log",[3933,54537,54539],{"id":54538},"add","Add",[48,54541,54542],{},"The transaction log topic stores the metadata changes of a transaction instead of the actual messages in the transaction. The messages are stored in topic partitions. A transaction can be in various states such as OPEN, COMMITTING, and COMMITTED. It is the state and associated metadata that are stored in the transaction log.",[48,54544,54545],{},[384,54546],{"alt":54547,"src":54548},"figure of Transaction log - Add","\u002Fimgs\u002Fblogs\u002F63b53d43009cac1c70417814_transaction-log-add.png",[48,54550,54551,54552,54557],{},"Essentially, the transaction log is a Pulsar ",[55,54553,54556],{"href":54554,"rel":54555},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging#system-topic",[264],"system topic"," in the pulsar\u002Fsystem namespace. Each TC has an independent transaction log.",[3933,54559,54561],{"id":54560},"delete","Delete",[48,54563,54564],{},"When a transaction operation is added to the log, the storage position of the operation will be returned. The position of all logs about a transaction is recorded in memory. When the transaction status changes to COMMITTED or ABORTED, it indicates that the transaction's lifecycle has completed. Therefore, you can delete the transaction's metadata in the log based on the log position stored in memory.",[48,54566,54567],{},[384,54568],{"alt":54569,"src":54570},"Figure 3. Transaction log - Delete","\u002Fimgs\u002Fblogs\u002F63b53d436c153684c32c469d_transaction-log-delete.png",[3933,54572,54574],{"id":54573},"recover","Recover",[48,54576,54577],{},"When the broker goes down, or the coordinator restarts due to the load balancing strategy, the coordinator needs to restore the metadata information of the transaction.",[48,54579,54580],{},"Keep in mind that if the status of a transaction is COMMITTED or ABORTED, you can delete the log directly. If the transaction is in the COMMITTING or ABORTING state, you need to perform the commit or abort operation in the transaction buffer (TB) or in the transaction pending ack (TP) after changing the transaction status to COMMITTED or ABORTED. After that, you can delete the log of this transaction.",[48,54582,54583],{},[384,54584],{"alt":54585,"src":54586},"Figure 5. Transaction Coordinator low watermark","\u002Fimgs\u002Fblogs\u002F63b53d435d8ce754fc581d05_low-watermark-transaction.png",[48,54588,54589],{},"As shown in Figure 5, before Txn-0 is committed, the low watermark is -1. After Txn-1 is aborted and Txn-0 is committed, the low watermark changes to Txn-1. After Txn-2 is committed, the low watermark changes to Txn-2.",[48,54591,54592],{},"The low watermark information will be carried to the original topic owner broker when the transaction is complete. Because the status of this transaction is unknown in TB and TP, the message or ack request received by TB and TP may carry completed transactions. The low watermark cleans up these useless transactions in TB or TP. If a transaction ID is less than the low watermark, you need to abort the transaction directly in TB or TP.",[40,54594,54596],{"id":54595},"transaction-coordinator-assignment","Transaction coordinator assignment",[48,54598,54599],{},"The transaction coordinator is designed as a separate module running inside a Pulsar broker. A broker can run multiple transaction coordinators. They can run outside of brokers as well, but the current implementation only allows them to run within brokers.",[48,54601,54602],{},"By default, a Pulsar cluster has 16 transaction coordinators. You can change it through the --initial-num-transaction-coordinators option when initializing the Pulsar cluster metadata.",[48,54604,54605,54606,54611],{},"Note that transaction coordinators will be available for scale-up through the Pulsar ",[55,54607,54610],{"href":54608,"rel":54609},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15296",[264],"Admin API"," in 2.11.0.",[32,54613,54615],{"id":54614},"how-to-assign-transaction-coordinators-to-brokers","How to assign transaction coordinators to brokers",[48,54617,54618,54619,54622],{},"Pulsar uses the existing topic ownership mechanism to assign transaction coordinators. Each TC has a “virtual” topic transaction_coordinator_assign_{TCID} for the assignment. For example, If the topic transaction_coordinator_assign_1 is assigned to broker A, it means TC-1 will start on broker A.\n",[384,54620],{"alt":18,"src":54621},"\u002Fimgs\u002Fblogs\u002F63b53d438867ad765a4b4b79_tc-assignment.png","Figure 6. Transaction Coordinator assignment\nThe client finds the broker address through the topic lookup mechanism. If the client wants to commit a transaction with transaction ID (1:10), the client will first find the owner broker of the topic transaction_coordinator_assign_1 and then send the transaction commit command to the broker directly. The client will not introduce the lookup request for each transaction operation. Instead, it has a cache that only redoes the lookup after the TC topic ownership is changed.",[40,54624,319],{"id":316},[48,54626,54627],{},"This blog explains the concept of transaction coordinators and how Pulsar assigns them to brokers. It provides details about the transaction log, which stores all the transaction metadata changes for transaction durability. It also introduces transaction timeout and low watermark. The latter is a key metric that can be used for data clean-up.",[48,54629,54630,54631,31443],{},"The blog is focused on the transaction coordinator itself, not on the client and transaction coordinator interaction. If you are interested in the complete transaction process, see ",[55,54632,54634],{"href":54463,"rel":54633},[264],"this blog about transaction details",[48,54636,54637],{},"In future blogs, we will talk more about other transaction components, such as the transaction buffer.",[48,54639,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":54641},[54642,54648,54651],{"id":54471,"depth":19,"text":54472,"children":54643},[54644,54645,54646,54647],{"id":54481,"depth":279,"text":54482},{"id":54500,"depth":279,"text":54501},{"id":54527,"depth":279,"text":54528},{"id":54534,"depth":279,"text":54535},{"id":54595,"depth":19,"text":54596,"children":54649},[54650],{"id":54614,"depth":279,"text":54615},{"id":316,"depth":19,"text":319},"Understand the basics of transaction coordinators and how Pulsar assigns them to brokers.","\u002Fimgs\u002Fblogs\u002F63c7c1a1ff82f11b72a625a3_63b53d437851daca291179b0_transaction-coordinators-top.jpeg",{},{"title":40448,"description":54652},"blog\u002Fdeep-dive-transaction-coordinators-apache-pulsar",[821,9144],"eA1Czy4UyAHzgvnQ3MTKiSXa-oyKC66HeTct0Up2H5I",{"id":54660,"title":53575,"authors":54661,"body":54662,"category":821,"createdAt":290,"date":55060,"description":55061,"extension":8,"featured":294,"image":55062,"isDraft":294,"link":290,"meta":55063,"navigation":7,"order":296,"path":38025,"readingTime":5505,"relatedResources":290,"seo":55064,"stem":55065,"tags":55066,"__hash__":55067},"blogs\u002Fblog\u002Fdeep-dive-into-topic-data-lifecycle-apache-pulsar.md",[809],{"type":15,"value":54663,"toc":55044},[54664,54667,54670,54686,54690,54693,54696,54702,54721,54724,54728,54731,54735,54741,54745,54748,54756,54759,54762,54765,54772,54776,54779,54783,54786,54793,54796,54800,54803,54806,54809,54820,54823,54826,54830,54833,54836,54852,54856,54859,54865,54871,54875,54878,54882,54885,54891,54894,54898,54907,54915,54919,54926,54937,54940,54955,54964,54968,54971,54982,54985,54997,55000,55003,55005,55008,55010,55015],[48,54665,54666],{},"The lifecycle of topic data in Apache Pulsar is managed by two key retention policies: the topic retention policy on the broker side, and the bookie data retention policy on the bookie side. All the data deletion operations can only be triggered by the broker. We shouldn’t delete ledger files from bookies directly. Otherwise, the data will be lost.",[48,54668,54669],{},"This blog focuses on the following three topics in Pulsar:",[1666,54671,54672,54680,54683],{},[324,54673,54674,54675,190],{},"Topic retention policy. We will mainly discuss the cases where the retention policy doesn’t work. For more information about retention and expiry strategies, see ",[55,54676,54679],{"href":54677,"rel":54678},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fcookbooks-retention-expiry\u002F",[264],"Message retention and expiry",[324,54681,54682],{},"Bookie data compaction policy.",[324,54684,54685],{},"How to detect and deal with orphan ledgers to fix bookie ledger files that can’t be deleted.",[40,54687,54689],{"id":54688},"overview-topic-data-lifecycle","Overview: Topic data lifecycle",[48,54691,54692],{},"In Pulsar, when a producer publishes messages on a topic, these data are written to specific ledgers managed by ManagedLedger, which is owned by the Pulsar broker. The metadata is stored in Pulsar’s meta store, such as Apache ZooKeeper. Ledgers are written to specific bookies according to the replica policies configured (i.e. the values of E, WQ, and AQ). For each ledger, its metadata is stored in BookKeeper’s meta store, such as Apache ZooKeeper.",[48,54694,54695],{},"When a ledger (for example, Ledger 3 in Figure 1) needs to be deleted according to the configured retention policy, it goes through the following steps.",[48,54697,54698],{},[384,54699],{"alt":54700,"src":54701},"illustration","\u002Fimgs\u002Fblogs\u002F63b53c677851da757d11535c_compaction-checker-flow.png",[1666,54703,54704,54707,54715,54718],{},[324,54705,54706],{},"Delete Ledger 3 from the ManagedLedger’s ledger list in the meta store.",[324,54708,54709,54710,190],{},"If the first step succeeds, the broker will send the deletion request of Ledger 3 to the BookKeeper cluster asynchronously. However, this does not ensure the ledger can be deleted successfully. The risk of leaving Ledger 3 an orphan ledger in the BookKeeper cluster still exists. For more information about how to solve this problem, see this ",[55,54711,54714],{"href":54712,"rel":54713},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F16569",[264],"proposal",[324,54716,54717],{},"Each bookie performs a regular compaction check, which is configured through minorCompactionInterval and majorCompactionInterval.",[324,54719,54720],{},"In the compaction check, the bookie checks whether the metadata of each ledger exists in the meta store. If not, the bookie will delete the data of this ledger from the log file.",[48,54722,54723],{},"From the last two steps above, we can see that the deletion of topic data, which are stored in BookKeeper, is triggered by the compaction checker, not by the Pulsar broker. This means that the ledger data are not deleted immediately when the ledger is removed from the ManagedLedger’s ledger list.",[40,54725,54727],{"id":54726},"topic-retention-policy","Topic retention policy",[48,54729,54730],{},"In this section, we will briefly talk about topic retention policies in Pulsar, and focus on the drawbacks of current ledger deletion logic as well as the cases where retention policies don't work.",[32,54732,54734],{"id":54733},"topic-retention-and-ttl","Topic retention and TTL",[48,54736,54737,54738,190],{},"When messages arrive at a broker, they are stored until they have been acknowledged on all subscriptions, at which point it is marked for deletion. You can control this behavior and retain messages that have already been acknowledged on all subscriptions by setting a retention policy for all topics in a given namespace. If messages are not acknowledged, Pulsar stores them forever by default, which can lead to heavy disk space usage. This is where TTL (Time to live) can be useful as it determines how long unacknowledged messages will be retained. For more information, see ",[55,54739,54679],{"href":54677,"rel":54740},[264],[32,54742,54744],{"id":54743},"drawbacks-of-the-current-ledger-deletion-logic","Drawbacks of the current ledger deletion logic",[48,54746,54747],{},"The current logic entails two separate steps to delete a ledger.",[1666,54749,54750,54753],{},[324,54751,54752],{},"Remove all the ledgers to be deleted from the ledger list and update the newest ledger list in the meta store.",[324,54754,54755],{},"In the meta store update callback operation, remove the ledgers to be deleted from storage asynchronously, such as BookKeeper or Tiered storage. Note that this does not ensure the deletion is successful.",[48,54757,54758],{},"As these two steps are separated, we can't ensure the ledger deletion transaction. If the first step succeeds and the second step fails, then the ledgers can no longer be deleted from the storage system. The second step may fail in some cases (for example, the broker restarts), resulting in orphan ledgers in the storage system.",[48,54760,54761],{},"Can we swap step 1 with step 2 to fix this problem? No, If we delete the ledger from the storage system first, and then remove the ledger from ManagedLedger’s ledger list, we still can’t ensure the ledger deletion transaction. If we delete the ledger from the storage system successfully while the ledger fails to be removed from the ManageLedger’s ledger list, consumers will fail to read data from the topic. This is because the topic still sees the deleted ledger as readable. This issue is even more serious than orphan ledgers.",[48,54763,54764],{},"Another risk in topic deletion is that when a ledger is deleted on the broker side (i.e. removed from the ManagedLedger’s ledger list), the topic metadata may remain if the ledger fails to be deleted in BookKeeper. Therefore, when a consumer fetches data according to the topic metadata, it will fail since the actual data does not exist on bookies.",[48,54766,54767,54768,54771],{},"In order to resolve the above issues, we are working on a ",[55,54769,54714],{"href":54712,"rel":54770},[264]," to introduce a two-phase deletion protocol to make sure the ledger deletion from the storage system is retriable.",[32,54773,54775],{"id":54774},"why-doesnt-the-retention-policy-work","Why doesn’t the retention policy work?",[48,54777,54778],{},"Topic retention policies may not take effect in the following two cases.",[3933,54780,54782],{"id":54781},"the-topic-is-not-loaded-into-brokers","The topic is not loaded into brokers",[48,54784,54785],{},"Each topic’s retention policy checker belongs to its own ManagedLedger. If the ManagedLedger is not loaded into the broker, the retention policy checker won’t work. Let’s see the following example.",[48,54787,54788,54789,54792],{},"We produce 100GB of data on topic-a at timestamp t0, and finish producing messages at t0 + 3 hours. The retention policy of topic-a is configured for 6 hours, which means the data won’t be expired until t0 + 6 hours later. However, topic-a may be unloaded between ",[2628,54790,54791],{},"t0 + 3, t0 +6"," due to broker restarts, its bundle unloaded by the load balancer, or related operations triggered by the pulsar-admin command. If there is no producer, consumer, or other load topic operations for it, it remains in the unloaded status. When the time reaches t0 + 6 hours, the 100GB of data on topic-a should be expired according to the retention policy. However, as topic-a is not loaded into any broker, the broker's retention policy checker cannot find topic-a. Therefore, the retention policy does not work. In this case, the 100GB of data won’t be expired until topic-a is loaded into the broker again.",[48,54794,54795],{},"We are developing a tool to fix this issue. In this tool, we will check all the long-term ledgers stored in the BookKeeper cluster, and parse out the topic names that the ledgers belong to. After that, these topics can be loaded into Pulsar brokers so that the retention policy can be applied to them.",[3933,54797,54799],{"id":54798},"the-ledger-is-in-the-open-state","The ledger is in the OPEN state",[48,54801,54802],{},"The retention policy checker applies each topic’s retention policy, while it only checks the ledgers in the CLOSED state. If a ledger is OPEN, the retention policy won’t take effect even though the ledger should be expired. See the following example for details.",[48,54804,54805],{},"We produce messages to a topic at the rate of 100MB\u002Fs with the minimum rollover time of the ledger set to 10 minutes. The minimum rollover time is used to prevent ledger rollovers from happening frequently, and it must be reached before a ledger rollover.",[48,54807,54808],{},"This means the ledger will remain in the OPEN state until 10 minutes are reached. 10 minutes later, the ledger size is about 60GB. If we set the retention time to 5 minutes, these data cannot be expired since the ledger is in the OPEN state. Note that a ledger rollover can be triggered after the minimum rollover time (managedLedgerMinLedgerRolloverTimeMinutes) is reached and one of the following conditions is met:",[321,54810,54811,54814,54817],{},[324,54812,54813],{},"The maximum rollover time has been reached (managedLedgerMaxLedgerRolloverTimeMinutes).",[324,54815,54816],{},"The number of entries written to the ledger has reached the maximum value (managedLedgerMaxEntriesPerLedger)",[324,54818,54819],{},"The entries written to the ledger have reached the maximum size value (managedLedgerMaxSizePerLedgerMbytes).",[48,54821,54822],{},"These parameters can be configured in broker.conf.",[48,54824,54825],{},"In this example, if the topic is unloaded at timestamp t0 + 9 minutes and remains unloaded, there will be at least about 54GB of data that cannot be expired no matter what the configured retention policy is.",[40,54827,54829],{"id":54828},"bookie-data-compaction-policy","Bookie data compaction policy",[48,54831,54832],{},"Whether a ledger still exists or not in the BookKeeper cluster is tracked based on the ledger metadata stored in the meta store, such as Zookeeper. If Pulsar has deleted the metadata from the meta store, it means the ledger data needs to be removed from all the bookies that store the ledger’s replicas. When a ledger needs to be deleted based on the topic retention policy, Pulsar only deletes the ledger’s metadata instead of the actual replica data stored on bookies. Whether the actual data is deleted depends on each bookie’s garbage collection thread.",[48,54834,54835],{},"Each bookie’s garbage collection can be triggered in the following three cases.",[1666,54837,54838,54841,54844],{},[324,54839,54840],{},"Minor compaction. You can configure it through minorCompactionThreshold=0.2 and minorCompactionInterval=3600. By default, minor compaction is triggered every hour. If an entryLogFile’s remaining data size is less than 20% of the total size, the entryLogFile will be compacted.",[324,54842,54843],{},"Major compaction. You can configure it through majorCompactionThreshold=0.5 and majorCompactionInterval=86400. By default, major compaction is triggered every day. If the remaining data size of an entry log file is less than 50% of the total size, the entry log file will be compacted.",[324,54845,54846,54847,54851],{},"Compaction triggered by REST API. curl -XPUT ",[55,54848,54849],{"href":54849,"rel":54850},"http:\u002F\u002Flocalhost:8000\u002Fapi\u002Fv1\u002Fbookie\u002Fgc",[264],". For the rest api, we should turn it on first by setting httpServerEnabled=true.",[32,54853,54855],{"id":54854},"how-bookie-gc-works","How bookie GC works",[48,54857,54858],{},"When a bookie triggers compaction, the compaction checker checks each ledger’s metadata to get the ledger list. For each ledger in the ledger list, it checks whether the ledger’s metadata still exists in the meta store, such as Ledger 2 in Figure 2.",[48,54860,54861],{},[384,54862],{"alt":54863,"src":54864},"illustration to explain How bookie GC works","\u002Fimgs\u002Fblogs\u002F63b53c67a48e6a060ddc5d96_ledger-deletion-process.png",[48,54866,54867,54868,53327],{},"After that, the compaction checker filters out the ledgers that still exist, and calculates the remaining data size percentage of the entry log file. If the percentage is lower than the threshold (by default, minorCompactionThreshold=0.2 and majorCompactionThreshold=0.5), it starts the compaction for the entry log file. Specifically, it reads the remaining ledger’s data from the old entry log file and writes them into the current entry log file. After all the remaining ledgers are compacted successfully, it deletes old entry log files. This frees up storage space.\n",[384,54869],{"alt":18,"src":54870},"\u002Fimgs\u002Fblogs\u002F63b53c670e058c15e846aca9_compaction-checker.png",[32,54872,54874],{"id":54873},"how-to-reduce-gc-io-impacts","How to reduce GC IO impacts",[48,54876,54877],{},"As the compaction checker reads data from old entry log files and writes them into current ones, it may cause mixed read and write IOs for disks. If we do not introduce a throttling policy, it will affect the performance of the ledger disk.",[3933,54879,54881],{"id":54880},"compaction-throttling","Compaction throttling",[48,54883,54884],{},"In bookies, there are two kinds of compaction throttle policies, namely by bytes or by entries. The related configurations are listed as follows.",[8325,54886,54889],{"className":54887,"code":54888,"language":8330},[8328],"# Throttle compaction by bytes or by entries.\nisThrottleByBytes=false\n\n# Set the rate at which compaction will re-add entries. The unit is adds per second.\ncompactionRateByEntries=1000\n\n# Set the rate at which compaction will re-add entries. The unit is bytes added per second.\ncompactionRateByBytes=1000000\n",[4926,54890,54888],{"__ignoreMap":18},[48,54892,54893],{},"By default, the bookie uses the throttle-by-entries policy. However, as the data size of each entry is not the same, we cannot control the compaction read and write throughput, and it will have a great impact on the ledger disk performance. Therefore, we recommend using the throttle-by-bytes policy.",[3933,54895,54897],{"id":54896},"pagecache-pre-reads","PageCache pre-reads",[48,54899,54900,54901,54906],{},"For an entry log file, if more than 90% of the entries have been deleted, the compactor will scan the entries' header metadata one by one. When reading one entry's metadata, it will miss the BufferedChannel read buffer cache, and it will trigger prefetch from the disk. For the following entries, the header metadata reading will also miss the BufferedChannel read buffer cache, and will continue to prefetch from the disk without throttling. This will lead to high ledger disk IO utilization. For more information, see this pull request ",[55,54902,54905],{"href":54903,"rel":54904},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fbookkeeper\u002Fpull\u002F3192",[264],"#3192"," to fix this bug.",[48,54908,54909,54910,190],{},"Moreover, each prefetch operation from the disk will also trigger OS PageCache prefetch. For the compaction model, the OS PageCache prefetch will lead to PageCache pollution, and may also affect the journal sync latency. To solve this problem, we can use the Direct IO to reduce the PageCache effect. For more information, see this issue ",[55,54911,54914],{"href":54912,"rel":54913},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fbookkeeper\u002Fissues\u002F2943",[264],"#2943",[32,54916,54918],{"id":54917},"why-doesnt-the-bookie-gc-work","Why doesn’t the bookie GC work",[48,54920,54921,54922,54925],{},"When one ledger disk reaches the maximum usage threshold, it suspends minor and major compaction. When we use the curl -XPUT ",[55,54923,54849],{"href":54849,"rel":54924},[264]," command to trigger compaction, it will be filtered by the suspendMajor and suspendMinor flags. Consequently:",[1666,54927,54928,54931,54934],{},[324,54929,54930],{},"The bookie doesn’t clean up deleted ledgers.",[324,54932,54933],{},"The disk space can't be freed up.",[324,54935,54936],{},"The bookie can't recover from the readOnly state to the Writable state.",[48,54938,54939],{},"In this case, we can only trigger compaction through the following steps.",[1666,54941,54942,54945,54948],{},[324,54943,54944],{},"Increase the maximum disk usage threshold.",[324,54946,54947],{},"Restart the bookie.",[324,54949,54950,54951,54954],{},"Use the command curl -XPUT ",[55,54952,54849],{"href":54849,"rel":54953},[264]," to trigger compaction.",[48,54956,54957,54958,54963],{},"There is a pull request ",[55,54959,54962],{"href":54960,"rel":54961},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fbookkeeper\u002Fpull\u002F3205",[264],"#3205"," to add a flag forceAllowCompaction=true for REST API to ignore the suspendMajor and suspendMinor flags to force trigger compaction.",[40,54965,54967],{"id":54966},"remove-entry-log-files-that-cannot-be-deleted","Remove entry log files that cannot be deleted",[48,54969,54970],{},"When a Pulsar cluster keeps running for a few months, some old entry log files on bookies may fail to be deleted. The main reasons are listed as follows.",[1666,54972,54973,54976,54979],{},[324,54974,54975],{},"Ledger deletion logic’s bug, which leads to orphan ledgers.",[324,54977,54978],{},"Inactive topics are not loaded into the broker. As a result, the topic retention policy can’t take effect on them.",[324,54980,54981],{},"Inactive cursors still exist in the cluster, and their corresponding cursor ledgers can’t be deleted.",[48,54983,54984],{},"We need a tool to detect and repair these ledgers in the above cases.",[48,54986,54987,54988,54991,54992,190],{},"For case 1, after this ",[55,54989,54714],{"href":54712,"rel":54990},[264]," is applied, it can be resolved. However, the existing orphan ledgers still can’t be deleted. We need to scan the whole BookKeeper cluster’s metadata, and check each ledger’s metadata. If the ledger related topic’s ledger list doesn’t contain the ledger, it means the ledger has been deleted. We can directly delete these ledgers safely using the bookkeeper command. For more information, see the ",[55,54993,54996],{"href":54994,"rel":54995},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Flatest\u002Foperator-guides\u002Fconfigure\u002Fsn-pulsar-tool\u002Fpck-tutorial",[264],"tool here",[48,54998,54999],{},"For case 2, we will develop a checker to detect inactive topics, which still hold ledger data. After these topics are detected, an operation will be triggered to load them into brokers, and apply a retention policy for them. This feature is still in development.",[48,55001,55002],{},"For case 3, we are considering directly deleting the cursors which have been inactive for a long time, such as 7 days.",[40,55004,319],{"id":316},[48,55006,55007],{},"This blog explains the topic data lifecycle in Apache Pulsar, including the topic retention policy and BookKeeper garbage collection logic. At the same time, it also discusses cases where topic data can’t be deleted, and gives some solutions.",[40,55009,38376],{"id":38375},[48,55011,38379,55012,38385],{},[55,55013,38384],{"href":38382,"rel":55014},[264],[321,55016,55017,55022,55026],{},[324,55018,38390,55019,190],{},[55,55020,31914],{"href":31912,"rel":55021},[264],[324,55023,45476,55024,45480],{},[55,55025,3550],{"href":45479},[324,55027,55028,55032,55033,55037,55038,55043],{},[55,55029,55031],{"href":35357,"rel":55030},[264],"Pulsar Summit Asia 2022"," will take place on November 19th and 20th, 2022. ",[55,55034,55036],{"href":55035},"\u002Fblog\u002Fcommunity\u002F2022-08-22-pulsar-summit-asia-2022-cfp-is-open-now\u002F","The CFP is open now","! ",[55,55039,55042],{"href":55040,"rel":55041},"https:\u002F\u002Fsessionize.com\u002Fpulsar-summit-asia-2022\u002F",[264],"Submit a proposal"," to share your Pulsar story!",{"title":18,"searchDepth":19,"depth":19,"links":55045},[55046,55047,55052,55057,55058,55059],{"id":54688,"depth":19,"text":54689},{"id":54726,"depth":19,"text":54727,"children":55048},[55049,55050,55051],{"id":54733,"depth":279,"text":54734},{"id":54743,"depth":279,"text":54744},{"id":54774,"depth":279,"text":54775},{"id":54828,"depth":19,"text":54829,"children":55053},[55054,55055,55056],{"id":54854,"depth":279,"text":54855},{"id":54873,"depth":279,"text":54874},{"id":54917,"depth":279,"text":54918},{"id":54966,"depth":19,"text":54967},{"id":316,"depth":19,"text":319},{"id":38375,"depth":19,"text":38376},"2022-09-27","Understand how ledger data are deleted in Pulsar and some problems you may have during the process.","\u002Fimgs\u002Fblogs\u002F63c7c1b47534714d7a35604d_63b53c679b07673934732295_a-deep-dive-into-the-topic-data-lifecycle-in-apache-pulsar-top.jpeg",{},{"title":53575,"description":55061},"blog\u002Fdeep-dive-into-topic-data-lifecycle-apache-pulsar",[12106,821],"93giXYBx-9XuDdAdZkSt-x4A3Z--442WLi5SxtwjuZY",{"id":55069,"title":55070,"authors":55071,"body":55072,"category":821,"createdAt":290,"date":55321,"description":55322,"extension":8,"featured":294,"image":55323,"isDraft":294,"link":290,"meta":55324,"navigation":7,"order":296,"path":55325,"readingTime":11508,"relatedResources":290,"seo":55326,"stem":55327,"tags":55328,"__hash__":55329},"blogs\u002Fblog\u002Fbeauty-apache-pulsar-individual-message-acknowledgment.md","The Beauty of Apache Pulsar: Individual Message Acknowledgment",[44544],{"type":15,"value":55073,"toc":55313},[55074,55080,55087,55091,55094,55097,55106,55110,55118,55121,55124,55132,55138,55141,55144,55147,55151,55159,55162,55165,55168,55173,55176,55180,55183,55194,55200,55203,55206,55209,55232,55235,55241,55244,55252,55255,55258,55271,55274,55277,55280,55282,55285,55287,55292],[48,55075,55076],{},[55,55077],{"href":55078,"rel":55079},"https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fasafmesika\u002F",[264],[48,55081,55082,55083,55086],{},"As a cloud-native distributed messaging and streaming platform, ",[55,55084,821],{"href":23526,"rel":55085},[264]," has a variety of features that differentiate it from other tools. One of the capabilities that got me to “fall in love” with it was individual message acknowledgment. When I was working with Apache Kafka, I noticed that one of the key features missing was the ability to signal which messages have been successfully processed. In this post, I will give a high-level explanation of how individual message acknowledgments work in a simple and elegant way in Pulsar.",[40,55088,55090],{"id":55089},"individual-message-acknowledgment-why-other-systems-fall-short","Individual message acknowledgment: Why other systems fall short",[48,55092,55093],{},"The classic use case for individual message acknowledgment is task execution where the message describes the task, such as triggering an alert or sending an email. If some of the tasks fail, you may want to retry them, but you don’t want to reprocess those that have already succeeded. In Kafka, the only native option you have is to signal success for the entire bulk of messages that your application has read. The naive way of implementing retries for failed individual messages is stopping to read messages, and retrying the failed ones. However, the retries may have delays between them, resulting in a “traffic jam” in the execution pipeline. A more sophisticated workaround is to write the failed messages to a retry topic and store in a different data store when each message should be retried. Nevertheless, this requires a lot of writing, wiring, and DevOps work.",[48,55095,55096],{},"As that feature is not natively baked into Apache Kafka, you search for alternatives. The popular options are Apache ActiveMQ and RabbitMQ, but neither was designed to handle the scale of data as opposed to Kafka. This means they are not horizontally scalable, fault-tolerant, or highly available like Kafka. When you search for a distributed system with the ability to acknowledge individual messages, one tool that may appear in the results is Apache Pulsar.",[48,55098,55099,55100,55105],{},"Apache Pulsar provides the individual acknowledgment capability by design. It boasts a client supporting retries, with ",[55,55101,55104],{"href":55102,"rel":55103},"https:\u002F\u002Fdzone.com\u002Farticles\u002Funderstanding-retry-pattern-with-exponential-back",[264],"exponential back-off"," delays between messages (delay increasing in duration exponentially: 2, 4, 8, 16, 32 seconds) without any additional code on the user side.",[40,55107,55109],{"id":55108},"reading-and-writing-in-pulsar","Reading and writing in Pulsar",[48,55111,55112,55113,55117],{},"As shown in Figure 1, the Pulsar clients (comprising a Producer and two Consumers) communicate with the Pulsar cluster, which consists of multiple Brokers. The producer writes messages to a topic. Reading messages from a topic, on the other hand, requires a ",[55,55114,55116],{"href":27728,"rel":55115},[264],"subscription",". It’s the actual entity stored in the broker that keeps the position of readers. It tells you which messages have been read and which remains to be read. Each subscription can have one or multiple consumers, allowing them to share the load of reading and processing the messages across multiple machines, each running its own consumer.",[48,55119,55120],{},"Different subscriptions allow you to read the same messages on the topic for different purposes, each having its own consumption position tracked by its respective cursor. For example, one subscription can slowly upload the messages to S3, while another can quickly compute in-memory aggregations and flush them to an analytics database.",[48,55122,55123],{},"For a given subscription, consumers have the following API:",[321,55125,55126,55129],{},[324,55127,55128],{},"consume() → Messages (each containing a MessageID)",[324,55130,55131],{},"acknowledge(messageID)",[48,55133,55134],{},[384,55135],{"alt":55136,"src":55137},"illustration Pulsar brokers","\u002Fimgs\u002Fblogs\u002F63b53b11526d5c68ae4b78c2_pulsar-cluster-reading-writing.png",[48,55139,55140],{},"For efficiency reasons, the consumer buffers the individual acknowledgment commands and sends them in bulk to the broker. This behavior can be tuned by setting the maximum time to buffer before sending, or the maximum number of acknowledgments to hold before sending. The trade-off is that if the consumer machine goes down, these acknowledgments will not be persisted. As a result, messages will be redelivered to the consumers for that subscription. Since Pulsar guarantees at-least-once delivery, this does not break that contract. It’s a trade-off many high-scale systems make.",[48,55142,55143],{},"For avid Kafka users, note that Pulsar has two types of topics: Topics (as described above) and Partitioned Topics. A single partitioned topic is composed of several partitions, each implemented as an internal topic. In short, a Kafka Topic is equivalent to a Pulsar Partitioned Topic, and a Kafka Topic Partition is effectively an internal Pulsar Topic.",[48,55145,55146],{},"Before we dive into individual message acknowledgments, we need to understand how storage works in Pulsar (how messages are stored) as it is essential to knowing other key concepts.",[40,55148,55150],{"id":55149},"how-do-messages-persist-on-pulsar-brokers","How do messages persist on Pulsar brokers?",[48,55152,55153,55154,55158],{},"Pulsar, different from many messaging systems, doesn’t store its messages on broker’s disks. It stores them in a separate system called ",[55,55155,55157],{"href":23555,"rel":55156},[264],"Apache BookKeeper."," This two-layer architecture design powers many unique features.",[48,55160,55161],{},"In Apache BookKeeper, ledgers are the basic unit of storage. You can consider them as virtual files. An entry written to a ledger is appended to the end of it. An entry is simply a container for any data you keep and the data itself is just a byte array. Pulsar converts the message produced to it into a byte array and persists it as an entry in BookKeeper.",[48,55163,55164],{},"You can open a ledger, append entries to it, and eventually close it so that it becomes immutable (read-only). A ledger can only be written by a single writer, namely a single machine.",[48,55166,55167],{},"In Pulsar, a topic is actually a list of ledger IDs. Pulsar opens a ledger and appends any message it receives for the topic to the currently active ledger for that topic. After a size or time threshold is reached, the ledger is closed with a new one opened.",[48,55169,55170],{},[384,55171],{"alt":55136,"src":55172},"\u002Fimgs\u002Fblogs\u002F63b53b11947e684b869557e2_message-persist-in-bk-cluster.png",[48,55174,55175],{},"Apache BookKeeper by itself is a distributed system, which is horizontally scalable, fault-tolerant to its data, and highly available. It offloads many responsibilities from Pulsar, and it deserves its own blog post, so we won’t dive into more details.",[40,55177,55179],{"id":55178},"acknowledgment-of-messages","Acknowledgment of messages",[48,55181,55182],{},"The subscription is a data structure, holding information about which messages have been acknowledged and which have not. It’s composed of the following fields:",[321,55184,55185,55188,55191],{},[324,55186,55187],{},"Delete Marker (a.k.a. markDeletePosition): The position of a message in a topic. All messages before it (inclusive) have been acknowledged. In Pulsar, the default behavior is to delete acknowledged messages, so it is named this way.",[324,55189,55190],{},"Since a topic is a list of ledgers, a position is a specific Ledger ID contained in that list, and an entry ID (the position of the entry within the ledger, the first one being 0).",[324,55192,55193],{},"Individual Acknowledgments: The messages that have been acknowledged, which are positioned after the Delete Marker. The data structure for it is a map between a Ledger ID to a bit set. A bit set at position 10 means the 11th message in that ledger has been acknowledged.",[48,55195,55196],{},[384,55197],{"alt":55198,"src":55199},"figure to illustrate Acknowledgment of messages","\u002Fimgs\u002Fblogs\u002F63b53b110e058c26a345b6cd_delete-marker-in-pulsar.png",[48,55201,55202],{},"As I mentioned earlier, Pulsar contains a cluster of brokers. Every broker has a set of topics it is responsible for, which means all reads and writes for those topics go through that broker. Hence, the broker keeps that subscription data structure in memory for all subscriptions of those topics. The resiliency to broker failures comes in the form of persisting the subscription data structure: it is written to BookKeeper at a certain frequency, and in some cases to ZooKeeper.",[48,55204,55205],{},"Each subscription has a separate designated ledger which the subscription state is persisted to. The in-memory data structure is converted into a more compact data structure called “Position Info” and then serialized into a byte array through Protocol Buffers encoding, and appended as entry to the designated subscription ledger.",[48,55207,55208],{},"Position Info contains the following fields:",[321,55210,55211,55214,55217,55220,55223,55226,55229],{},[324,55212,55213],{},"Delete Marker Ledger ID",[324,55215,55216],{},"Delete Marker Entry ID",[324,55218,55219],{},"Array of Ranges, where each Range is a:",[324,55221,55222],{},"Range Start Ledger ID",[324,55224,55225],{},"Range Start Entry ID",[324,55227,55228],{},"Range End Ledger ID",[324,55230,55231],{},"Range End Entry ID",[48,55233,55234],{},"If we take Figure 3 as an example, we can convert it into Position Info as follows:",[8325,55236,55239],{"className":55237,"code":55238,"language":8330},[8328],"Delete Marker Ledger ID = L2\nDelete Marker Entry ID = 2\nRanges[] = \n   Range 0: (L2, 0) —> (L2, 2)\n   Range 1: (L2, 4) —> (L2, 4)\n   Range 2: (L3, 1) —> (L3, 1)\n",[4926,55240,55238],{"__ignoreMap":18},[48,55242,55243],{},"The subscription has two event types triggering the persistence of the subscription in-memory state to the ledger:",[1666,55245,55246,55249],{},[324,55247,55248],{},"The consumer acknowledges a message. Since this happens at a rather high frequency, there’s a rate limiter, limiting it to 1 persistence action per second by default.",[324,55250,55251],{},"Timeout. If the state is “dirty” and hasn’t been changed for X seconds, it’s persisted to BookKeeper.",[48,55253,55254],{},"In a worst-case scenario, there may be a large number (e.g. millions) of acknowledged messages that are fragmented. In that case, Position Info will be extremely large (tens of megabytes). The current workaround for that is increasing the rate limiting frequency so that it is higher than the default value (1 persistence action per second). This trades off the number of acknowledged messages you lose when the broker terminates ungracefully between persistence actions.",[48,55256,55257],{},"Currently, some community members are actively working on some solutions:",[1666,55259,55260,55263],{},[324,55261,55262],{},"Compress Position Info.",[324,55264,55265,55266,190],{},"Splitting Position Info into multiple BookKeeper entries as explained in ",[55,55267,55270],{"href":55268,"rel":55269},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-81%3A-Split-the-individual-acknowledgments-into-multiple-entries",[264],"PIP-81",[48,55272,55273],{},"In case of a broker failure, you can lose acknowledgments for the subscriptions handled by that broker, depending on your configuration. By default you can lose up to 1 second worth of acknowledgments, as long as the subscription is active and messages are constantly acknowledged (triggering the persistence of it). If the subscription suddenly stops consuming messages, you will lose a time-out (as you configured) worth of acknowledged messages.",[48,55275,55276],{},"The subscription ledger (where the subscription state is persisted to) ID is persisted in ZooKeeper. When a broker crashes, another broker will take ownership of the topic it serves. That broker obtains the subscription ledger ID from ZooKeeper and then reads the last subscription state from that ledger (last entry) into memory.",[48,55278,55279],{},"There is a corner case optimization: if the number of unacknowledged message ranges in the converted Position Info is less than 1000 (by default), then the subscription state will be persisted directly in ZooKeeper. This only happens when the subscription ledger is closed (rolled over due to size\u002Ftime) or a topic is moved from one broker to another (due to shutdown).",[40,55281,319],{"id":316},[48,55283,55284],{},"This blog explained the individual message acknowledgment feature of Apache Pulsar. It also covered the read and write logic, as well as message storage, to give a broader context for understanding it. I hope this post can be helpful for you to understand part of the “magic” behind Pulsar. Future posts will dive into other unique features of Pulsar.",[40,55286,38376],{"id":38375},[48,55288,38379,55289,38385],{},[55,55290,38384],{"href":38382,"rel":55291},[264],[321,55293,55294,55299,55303],{},[324,55295,38390,55296,190],{},[55,55297,31914],{"href":31912,"rel":55298},[264],[324,55300,45476,55301,45480],{},[55,55302,3550],{"href":45479},[324,55304,55305,55032,55308,55037,55310,55043],{},[55,55306,55031],{"href":35357,"rel":55307},[264],[55,55309,55036],{"href":55035},[55,55311,55042],{"href":55040,"rel":55312},[264],{"title":18,"searchDepth":19,"depth":19,"links":55314},[55315,55316,55317,55318,55319,55320],{"id":55089,"depth":19,"text":55090},{"id":55108,"depth":19,"text":55109},{"id":55149,"depth":19,"text":55150},{"id":55178,"depth":19,"text":55179},{"id":316,"depth":19,"text":319},{"id":38375,"depth":19,"text":38376},"2022-09-22","Individual message acknowledgment is one of the features that set Pulsar apart from other messaging and streaming systems. This blog gives a high-level explanation of how individual message acknowledgments work in a simple and elegant way in Pulsar.","\u002Fimgs\u002Fblogs\u002F63c7c206dd37cf4b9bfd92c3_63b53b1103127414b4daedf2_individual-message-acknowledgment-top.jpeg",{},"\u002Fblog\u002Fbeauty-apache-pulsar-individual-message-acknowledgment",{"title":55070,"description":55322},"blog\u002Fbeauty-apache-pulsar-individual-message-acknowledgment",[799,821],"i-OpITCxZ5egovKumKe9olqIV_U73Fr6dRxEaI6Xr5w",{"id":55331,"title":46732,"authors":55332,"body":55334,"category":821,"createdAt":290,"date":55565,"description":55566,"extension":8,"featured":294,"image":55567,"isDraft":294,"link":290,"meta":55568,"navigation":7,"order":296,"path":46731,"readingTime":33204,"relatedResources":290,"seo":55569,"stem":55570,"tags":55571,"__hash__":55572},"blogs\u002Fblog\u002Fannouncing-spring-for-apache-pulsar.md",[55333],"Alexander Preuss",{"type":15,"value":55335,"toc":55553},[55336,55345,55348,55352,55355,55358,55361,55365,55368,55374,55382,55385,55387,55390,55396,55399,55405,55409,55412,55418,55427,55431,55434,55440,55443,55447,55450,55456,55459,55462,55468,55472,55475,55478,55484,55487,55490,55496,55498,55504,55506],[48,55337,55338,55339,55344],{},"We are excited to announce the first milestone release of ",[55,55340,55343],{"href":55341,"rel":55342},"https:\u002F\u002Fdocs.spring.io\u002Fspring-pulsar\u002Fdocs\u002F0.1.0-M1\u002Freference\u002Fhtml\u002F",[264],"Spring for Apache Pulsar",". This integration enables you to leverage the power of Apache Pulsar straight from your Spring applications.",[48,55346,55347],{},"Let's take a look at the benefits of using Apache Pulsar with Spring before jumping into an example application.",[40,55349,55351],{"id":55350},"why-use-apache-pulsar-with-spring","Why use Apache Pulsar with Spring",[48,55353,55354],{},"Spring is the world's most popular Java framework that helps developers create production-ready applications quickly, safely, and easily. It is a flexible framework that offers both out-of-the-box defaults to increase development efficiency, as well customization for any arising requirements. This makes it the perfect candidate to use when building your cloud-native applications.",[48,55356,55357],{},"Apache Pulsar is a cloud-native streaming and messaging platform that enables organizations to build scalable, reliable applications in elastic cloud environments. It combines the best features of traditional messaging and pub-sub systems. In Pulsar’s multi-layer architecture, each layer is scalable, distributed, and decoupled from the other layers. The separation of compute and storage allows you to scale both independently.",[48,55359,55360],{},"Together, Pulsar and Spring allow you to easily build data applications that are scalable, robust, and quick to develop. Integrating Pulsar with Spring microservices further enables seamless interoperation with services written in other languages. Spring for Apache Pulsar offers a toolkit to interface with Pulsar. From Templates to Listeners and Autoconfiguration, all the Spring concepts you love can now be used with Pulsar! Fitting Pulsar into your existing architecture is especially easy if you are using the Spring for Kafka or Spring AMQP integrations. Spring for Pulsar adopts the same concepts and makes you feel right at home.",[40,55362,55364],{"id":55363},"how-to-use-spring-for-apache-pulsar","How to use Spring for Apache Pulsar",[48,55366,55367],{},"We are going to build an example application that will consume signup data to alert the customer success team about new clients. Spring will run our application and provide configuration while Pulsar is used as a messaging bus to route our data.",[48,55369,55370],{},[384,55371],{"alt":55372,"src":55373},"illustration how to Spring for Apache Pulsar","\u002Fimgs\u002Fblogs\u002F63b539ba65269364163ce165_spring-pulsar-1.png",[48,55375,55376,55377,190],{},"The full source code for our example is available in ",[55,55378,55381],{"href":55379,"rel":55380},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fexamples\u002Ftree\u002Fmaster\u002Fspring-pulsar",[264],"this GitHub repository",[48,55383,55384],{},"Check out this demo video to see Spring for Apache Pulsar in action:",[32,55386,10104],{"id":10103},[48,55388,55389],{},"We are using Maven and Java 17 to run our application. To start using Spring for Apache Pulsar we first need to add it as a dependency to our Spring project.",[8325,55391,55394],{"className":55392,"code":55393,"language":8330},[8328],"\n    0.1.0-M1\n\n    \n        org.springframework.pulsar\n        spring-pulsar-spring-boot-starter\n        ${spring-pulsar.version}\n    \n    \n        org.springframework.boot\n        spring-boot-configuration-processor\n    \n\n    \n        \n            org.springframework.boot\n            spring-boot-maven-plugin\n        \n    \n\n    \n        spring-milestones\n        Spring Milestones\n        https:\u002F\u002Frepo.spring.io\u002Fmilestone\n        \n            false\n        \n    \n\n    \n        spring-milestones\n        Spring Milestones\n        https:\u002F\u002Frepo.spring.io\u002Fmilestone\n        \n            false\n        \n    \n\n",[4926,55395,55393],{"__ignoreMap":18},[48,55397,55398],{},"To compile our application we can run mvn clean package. The application can be run using mvn spring-boot:run.",[48,55400,55401,55402,55404],{},"We also need a Pulsar cluster for the application to run against. We can use a local standalone Pulsar cluster, or use ",[55,55403,3550],{"href":45479}," to provide one for us.",[32,55406,55408],{"id":55407},"connecting-to-pulsar","Connecting to Pulsar",[48,55410,55411],{},"We can now configure our application to connect to Pulsar using Spring configuration. Let's add the following to our src\u002Fmain\u002Fresources\u002Fapplication.yml.",[8325,55413,55416],{"className":55414,"code":55415,"language":8330},[8328],"\nspring:\n  pulsar:\n    client:\n      service-url: pulsar+ssl:\u002F\u002Ffree.o-j8r1u.snio.cloud:6651\n      auth-plugin-class-name: org.apache.pulsar.client.impl.auth.oauth2.AuthenticationOAuth2\n      authentication:\n        private-key: file:\u002F\u002F\u002FUsers\u002Fuser\u002FDownloads\u002Fo-j8r1u-free.json\n        audience: urn:sn:pulsar:o-j8r1u:free\n        issuer-url: https:\u002F\u002Fauth.streamnative.cloud\u002F\n    producer:\n      batching-enabled: false\n      topic-name: persistent:\u002F\u002Fpublic\u002Fdefault\u002Fsignups-topic\n\n",[4926,55417,55415],{"__ignoreMap":18},[48,55419,55420,55421,55426],{},"This enables us to connect to StreamNative cloud by using OAuth2 authentication. To retrieve your authentication credentials, please follow the ",[55,55422,55425],{"href":55423,"rel":55424},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fstable\u002Fconnect\u002Foverview",[264],"StreamNative Cloud documentation",". It also pre-configures a Pulsar producer to not batch messages and sets its default topic name.",[32,55428,55430],{"id":55429},"producing-data","Producing data",[48,55432,55433],{},"We can now start to send messages to our cluster. For this example we will generate fake signup data and continuously write it to our signup topic. Using Spring for Apache Pulsar, we can send messages by simply adding a PulsarTemplate to our application code.",[8325,55435,55438],{"className":55436,"code":55437,"language":8330},[8328],"\n@EnableScheduling\n@SpringBootApplication\npublic class SignupApplication {\n    private static final Logger log = LoggerFactory.getLogger(SignupApplication.class);\n    \n    @Autowired private PulsarTemplate signupTemplate;\n    \n    @Autowired private PulsarTemplate customerTemplate;\n    \n    @Scheduled(initialDelay = 5000, fixedRate = 5000)\n    void publishSignupData() throws PulsarClientException {\n        Signup signup = signupGenerator.generate();\n        signupTemplate.setSchema(JSONSchema.of(Signup.class));\n        signupTemplate.send(signup);\n    }\n   …\n}\n \n",[4926,55439,55437],{"__ignoreMap":18},[48,55441,55442],{},"The code above creates a scheduled task that generates a fake signup and sends it as a message to the default topic we configured in our application.yml. Note how easy it is to configure the PulsarTemplate to send the message with a JSON schema.",[32,55444,55446],{"id":55445},"consuming-data","Consuming data",[48,55448,55449],{},"To derive value from our data, we now want to filter the incoming signups. For any signup that uses the ENTERPRISE tier, we will create a new Customer and publish a message on our customer-success topic. To consume the signup topic, we need to add a PulsarListener method to our SignupApplication class.",[8325,55451,55454],{"className":55452,"code":55453,"language":8330},[8328],"\n@PulsarListener(\n    subscriptionName = \"signup-consumer\",\n    topics = \"signups-topic\",\n    schemaType = SchemaType.JSON)\nvoid filterSignups(Signup signup) throws PulsarClientException {\n    log.info(\n        \"{} {} ({}) just signed up for {} tier\",\n        signup.firstName(),\n        signup.lastName(),\n        signup.companyEmail(),\n        signup.signupTier());\n\n    if (signup.signupTier() == SignupTier.ENTERPRISE) {\n        Customer customer = Customer.from(signup);\n        customerTemplate.setSchema(JSONSchema.of(Customer.class));\n        customerTemplate.send(\"customer-success\", customer);\n    }\n}\n\n",[4926,55455,55453],{"__ignoreMap":18},[48,55457,55458],{},"Behind the scenes, the PulsarListener annotation configures a Pulsar consumer to read from the specified topic(s) with the given schema. In our filterSignups method, we are using the second PulsarTemplate we added before. This time, we don't want to send messages to the default topic, so we pass in 'customer-success' as the topic name to write to.",[48,55460,55461],{},"Finally, our customer success team can now receive an alert about any new enterprise clients. To do so, they simply need to consume the 'customer-success' topic with the Customer schema.",[8325,55463,55466],{"className":55464,"code":55465,"language":8330},[8328],"\n@PulsarListener(\n    subscriptionName = \"customer-consumer\",\n    topics = \"customer-success\",\n    schemaType = SchemaType.JSON)\nvoid alertCustomerSuccess(Customer customer) {\n    log.info(\n        \"## Start the onboarding for {} - {} {} ({}) - {} ##\",\n        customer.companyName(),\n        customer.firstName(),\n        customer.lastName(),\n        customer.phoneNumber(),\n        customer.companyEmail());\n}\n\n",[4926,55467,55465],{"__ignoreMap":18},[32,55469,55471],{"id":55470},"advanced-features","Advanced features",[48,55473,55474],{},"Spring for Apache Pulsar offers many more advanced features. As an example, we want to show how to utilize a ProducerInterceptor for debug logging our messages.",[48,55476,55477],{},"First, we create a Spring configuration class that adds a ProducerInterceptor bean. Our ProducerInterceptor implementation only needs to log message information upon acknowledgment by the broker.",[8325,55479,55482],{"className":55480,"code":55481,"language":8330},[8328],"\n@Configuration(proxyBeanMethods = false)\nclass SignupConfiguration {\n\n  @Bean\n  ProducerInterceptor loggingInterceptor() {\n    return new LoggingInterceptor();\n  }\n\n  static class LoggingInterceptor implements ProducerInterceptor {\n    private static final Logger log = LoggerFactory.getLogger(LoggingInterceptor.class);\n\n    @Override\n    public void close() {\n      \u002F\u002F no-op\n    }\n\n    @Override\n    public boolean eligible(Message message) {\n      return true;\n    }\n\n    @Override\n    public Message beforeSend(Producer producer, Message message) {\n      return message;\n    }\n\n    @Override\n    public void onSendAcknowledgement(\n        Producer producer, Message message, MessageId msgId, Throwable exception) {\n      log.debug(\"MessageId: {}, Value: {}\", message.getMessageId(), message.getValue());\n    }\n  }\n}\n\n",[4926,55483,55481],{"__ignoreMap":18},[48,55485,55486],{},"Thanks to Spring's auto-configuration, our application will automatically configure our LoggingInterceptor to intercept messages on all Pulsar producers.",[48,55488,55489],{},"The only thing missing to see our interceptor in action is setting the log level to debug in the application.yml.",[8325,55491,55494],{"className":55492,"code":55493,"language":8330},[8328],"\nlogging:\n  level:\n    io.streamnative.example: debug\n\n",[4926,55495,55493],{"__ignoreMap":18},[32,55497,2125],{"id":2122},[48,55499,55500,55501,190],{},"In this blog, we explored how to use Spring for Apache Pulsar to quickly build a sample application. The Spring for Apache Pulsar integration offers many more features, like subscription types, batching, and manual acknowledgment. For advanced applications that further require read or write access to data in external systems, we can add Pulsar IO connectors. If you are interested in learning more about Pulsar IO connectors, please visit ",[55,55502,38697],{"href":35258,"rel":55503},[264],[40,55505,39647],{"id":39646},[321,55507,55508,55516,55523,55530,55537,55544],{},[324,55509,55510,55511,55515],{},"See the ",[55,55512,55514],{"href":55379,"rel":55513},[264],"full source code"," for our signup application",[324,55517,55518,55519,55522],{},"Try out ",[55,55520,55521],{"href":45479},"StreamNative Free Cloud"," to get started with Apache Pulsar",[324,55524,55525,55526,55529],{},"Help the ",[55,55527,55343],{"href":46379,"rel":55528},[264]," open source project evolve",[324,55531,55532,55533],{},"Read through the ",[55,55534,55536],{"href":55341,"rel":55535},[264],"Spring for Apache Pulsar documentation",[324,55538,55539,55540],{},"Join the ",[55,55541,55543],{"href":36242,"rel":55542},[264],"Apache Pulsar Slack",[324,55545,55546,55547,4003,55550,20571],{},"Subscribe to the Apache Pulsar mailing list (",[55,55548,55549],{"href":48254},"users-subscribe@pulsar.apache.org",[55,55551,55552],{"href":48257},"dev-subscribe@pulsar.apache.org",{"title":18,"searchDepth":19,"depth":19,"links":55554},[55555,55556,55564],{"id":55350,"depth":19,"text":55351},{"id":55363,"depth":19,"text":55364,"children":55557},[55558,55559,55560,55561,55562,55563],{"id":10103,"depth":279,"text":10104},{"id":55407,"depth":279,"text":55408},{"id":55429,"depth":279,"text":55430},{"id":55445,"depth":279,"text":55446},{"id":55470,"depth":279,"text":55471},{"id":2122,"depth":279,"text":2125},{"id":39646,"depth":19,"text":39647},"2022-09-21","We are excited to announce the first milestone release of Spring for Apache Pulsar. This integration enables you to leverage the power of Apache Pulsar straight from your Spring applications.","\u002Fimgs\u002Fblogs\u002F63c7bc01f5e3957ca5e0a442_63b539baf973812315178f86_spring-pulsar-top.png",{},{"title":46732,"description":55566},"blog\u002Fannouncing-spring-for-apache-pulsar",[302,821],"OkkIUaU_Uaty4llq7asHu3Q1rtwESeWgoZVUHTrQQug",{"id":55574,"title":41449,"authors":55575,"body":55576,"category":821,"createdAt":290,"date":55823,"description":55824,"extension":8,"featured":294,"image":55825,"isDraft":294,"link":290,"meta":55826,"navigation":7,"order":296,"path":32263,"readingTime":3556,"relatedResources":290,"seo":55827,"stem":55828,"tags":55829,"__hash__":55830},"blogs\u002Fblog\u002Funderstanding-pulsar-10-minutes-guide-kafka-users.md",[42146],{"type":15,"value":55577,"toc":55811},[55578,55582,55589,55596,55602,55610,55613,55622,55628,55632,55640,55643,55649,55652,55658,55662,55665,55671,55677,55680,55683,55687,55690,55699,55705,55708,55717,55721,55739,55743,55752,55758,55767,55773,55775,55783,55789,55792,55809],[40,55579,55581],{"id":55580},"you-already-know-more-about-apache-pulsar-than-you-realize","You already know more about Apache Pulsar than you realize",[48,55583,55584,55585,55588],{},"Today we’ll bootstrap your ",[55,55586,821],{"href":23526,"rel":55587},[264]," knowledge by translating your existing Apache Kafka experience. We’ll show you how fundamental Apache Kafka concepts appear in Apache Pulsar. You will then be able to use your pre-existing Kafka knowledge to deliver comparable use cases with Pulsar rapidly.",[48,55590,55591,55592,55595],{},"Apache Pulsar presents an attractive alternative to the well-established ",[55,55593,799],{"href":31428,"rel":55594},[264]," ecosystem. Pulsar’s differentiating feature set and architectural differences can overcome fundamental Kafka limitations. While Kafka and Pulsar are both highly scalable and durable distributed event streaming platforms, they have many differences. For example, Pulsar offers flexible subscriptions, unified streaming and messaging, and disaggregated storage.",[48,55597,55598,55599,55601],{},"I’ve been fortunate enough to have worked with Apache Kafka for several years, offering it as a platform in enterprises for Data Scientists, Engineers, and Analysts. Consequently, I’ve experienced a broad range of Kafka use cases and have been exposed to many facets of its ecosystem. When I joined the team at ",[55,55600,4496],{"href":10259}," to work with Pulsar, I wanted to see how my Kafka experiences would look in a Pulsar setting. I’m going to assume you have a reasonable understanding of Kafka — I won’t describe basic concepts like topics from the ground up. I’ll avoid differentiating platform features and focus on how open-source Pulsar can perform the role of open-source Kafka from the producer and consumer perspectives.",[48,55603,55604,55605,55609],{},"Clarifying note: I will not address Pulsar’s Kafka Protocol handler (",[55,55606,35093],{"href":55607,"rel":55608},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Fv1.0.0\u002Fconcepts\u002Fkop-concepts",[264],"), which has Pulsar appear to clients as a set of Kafka brokers; using KoP will not further our knowledge of Pulsar.",[40,55611,42853],{"id":55612},"topics",[48,55614,55615,55616,55621],{},"Let’s begin with topics — a fundamental concept in Kafka and Pulsar. You can think of a Pulsar topic as a single Kafka topic partition. At this point, you might be wondering about performance, as multiple partitions are what give Kafka its horizontal scalability. However, worry not, because Pulsar builds on its topic primitive to provide an equivalent concept, the explicitly named ‘",[55,55617,55620],{"href":55618,"rel":55619},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F#partitioned-topics",[264],"partitioned topic","’. In this case, a set of Pulsar topics are logically grouped and form something functionally equivalent to a Kafka partitioned topic. This distinction exists because there are additional messaging architectures — beyond the streaming use cases that Kafka targets — supported by Pulsar and use unpartitioned topics as an elementary component. As we are focusing on Kafka use cases, we will, for the rest of this article, assume that the term topic relates to a Pulsar partitioned topic or, equivalently, a Kafka topic.",[8325,55623,55626],{"className":55624,"code":55625,"language":8330},[8328],"PulsarAdmin admin = ...;\nadmin.topics().createPartitionedTopic(topic, partitions);\n",[4926,55627,55625],{"__ignoreMap":18},[40,55629,55631],{"id":55630},"retention","Retention",[48,55633,55634,55635,55639],{},"When compared to Kafka, Pulsar has greater flexibility regarding the lifecycle of messages. By default, Pulsar will keep all unacknowledged messages forever, and delete acknowledged messages immediately. We can adjust ",[55,55636,55638],{"href":54677,"rel":55637},[264],"both behaviors",", using message retention to preserve acknowledged messages and message expiry to purge unacknowledged messages.",[48,55641,55642],{},"In Kafka, retention policies are unaware of consumer activity — a message is persisted or purged irrespective of whether consumers have read it. This Kafka behavior is perhaps one that you won’t want to replicate in Pulsar because you’ll need to go out of your way to create something less useful. However, in the interest of learning, let’s see how this would look. First let’s retain messages that are acknowledged by all consumers:",[8325,55644,55647],{"className":55645,"code":55646,"language":8330},[8328],"PulsarAdmin admin = ...;\nadmin.topicPolicies().setRetention(\n        “topic-name”,\n        new RetentionPolicies(sizeInMins, sizeInMB)\n);\n",[4926,55648,55646],{"__ignoreMap":18},[48,55650,55651],{},"Now let’s expire unacknowledged messages:",[8325,55653,55656],{"className":55654,"code":55655,"language":8330},[8328],"admin.topicPolicies().setMessageTTL(\n        “topic-name”,\n        messageTTLInSeconds\n);\n",[4926,55657,55655],{"__ignoreMap":18},[40,55659,55661],{"id":55660},"compaction","Compaction",[48,55663,55664],{},"Like Kafka, Pulsar is able to perform compaction on topics. However, the internal implementation is slightly different as Pulsar maintains both uncompacted and compacted forms of data concurrently. A new compacted ledger is generated during compaction, and the previous one is discarded.",[48,55666,55667],{},[384,55668],{"alt":55669,"src":55670},"Consumers can choose to read compacted or uncompacted","\u002Fimgs\u002Fblogs\u002F63b537f2a5395d11e4849a4b_kp2.png",[8325,55672,55675],{"className":55673,"code":55674,"language":8330},[8328],"Consumer compactedTopicConsumer = client.newConsumer()\n       .topic(\"some-compacted-topic\")\n       ...\n       .readCompacted(true)\n       .subscribe();\n\n",[4926,55676,55674],{"__ignoreMap":18},[48,55678,55679],{},"Pulsar also has similar message deletion semantics to Kafka. The publishing of a keyed message with a null value acts as a tombstone that removes the value on the next compaction cycle. Along with key-based deletion, Kafka also allows retention to be specified on compact topics. You can for example specify unlimited retention, which would always keep at least the latest message for all keys, providing similar storage semantics to a key-value store. Alternatively, you can specify a retention period so that messages are aged out with a TTL.",[48,55681,55682],{},"Pulsar is slightly less flexible in this regard. Messages can only be removed from the compact ledger via explicit deletion by key, otherwise, you can expect to store at least the latest message for all keys. Retention can be set on the compact topic, but this only applies to the non-compact ledger — i.e. the uncompacted messages. Given this constraint, due care must be taken when considering the cardinality of keys, and, if a message TTL is required in the compacted ledger, then it would be necessary to create a manual process to mark and delete messages.",[40,55684,55686],{"id":55685},"schemas","Schemas",[48,55688,55689],{},"Within the Kafka ecosystem, schema validation of messages can be applied by using Confluent’s Schema Registry and producer\u002Fconsumer SerDes. Clients must be configured by application developers so that they integrate with the registry service and can discover and publish their schema. With a suitably configured client, published messages will be validated on the client side.",[48,55691,55692,55693,55698],{},"A similar capability exists in Pulsar — the ",[55,55694,55697],{"href":55695,"rel":55696},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fschema-get-started#server-side-approach",[264],"server-side schema strategy",". But note that the ‘server-side’ reference relates to the integration of the schema registry functionality into Pulsar’s brokers instead of an uncoupled tertiary service. This integration is then expressed directly within Pulsar client APIs, so additional configuration of Kafka schema registry endpoints, SerDes, and subject name strategies is unneeded. Pulsar also has parity with Kafka regarding supported schema flavors (Avro, Protobuf, etc.) and evolution guarantees.",[8325,55700,55703],{"className":55701,"code":55702,"language":8330},[8328],"Producer producer = client.newProducer(JSONSchema.of(User.class))\n       .topic(topic)\n       .create();\nUser user = new User(\"Tom\", 28);\nproducer.send(user);\n\n",[4926,55704,55702],{"__ignoreMap":18},[40,55706,55707],{"id":22230},"Producers",[48,55709,55710,55711,55716],{},"Multiple producers can concurrently connect and send messages to a Kafka topic. These topic access semantics are also provided by Pulsar using the default producer ",[55,55712,55715],{"href":55713,"rel":55714},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F#access-mode",[264],"Access Mode"," of Shared. Therefore we can expect that for most use cases, no effort is required to obtain equivalent behavior. Note that whereas a single Kafka producer instance can write to multiple topics, the Pulsar idiom is to have one Producer instance per topic.",[32,55718,55720],{"id":55719},"partitioning","Partitioning",[48,55722,55723,55724,55729,55730,55732,55733,55738],{},"Partitioning is the process by which Producers distribute messages across the partitions in a topic. In Kafka, we can control the distribution by setting a key on each message and also providing a ",[55,55725,55728],{"href":55726,"rel":55727},"https:\u002F\u002Fkafka.apache.org\u002F24\u002Fjavadoc\u002F?org\u002Fapache\u002Fkafka\u002Fclients\u002Fproducer\u002FPartitioner.html",[264],"Partitioner"," implementation to the producing client. Pulsar adopts a very similar approach, but its terminology is slightly different. In Pulsar, we may also guide partitioning by setting a message key — although this must be of type string whereas Kafka supports a byte",[2628,55731],{}," key. Partitioning is further controlled by providing the Producer client with a ",[55,55734,55737],{"href":55735,"rel":55736},"https:\u002F\u002Fpulsar.apache.org\u002Fapi\u002Fclient\u002Forg\u002Fapache\u002Fpulsar\u002Fclient\u002Fapi\u002FMessageRoutingMode",[264],"MessageRoutingMode",". Pulsar’s default routing mode is RoundRobinPartition, which importantly uses the message key hash to allocate messages to partitions, and round-robin allocates messages with no key. Conveniently, these are the same partitioning semantics as Kafka’s default partitioner, and so we can expect that for most use cases, no effort is required to obtain equivalent behavior. Additionally, for custom behaviors, Pulsar’s routing modes deliver the same partitioner flexibility as Kafka.",[32,55740,55742],{"id":55741},"replicas","Replicas",[48,55744,55745,55746,55751],{},"Like Kafka, Apache Pulsar can make multiple replica copies of the messages it receives and in fact offers some interesting flexibility in this area. Unlike Kafka, the storage of Pulsar messages is disaggregated from the brokers — it is not constrained by storage locality. What this means in practice is that we can independently specify the number of message replicas and the number of storage nodes. Whereas messages for a given Kafka partition replica get written on a specific node, in Pulsar messages are distributed across a set of Bookies (aka an “ensemble”). Jack Vanlightly describes this nicely in ",[55,55747,55750],{"href":55748,"rel":55749},"https:\u002F\u002Fjack-vanlightly.com\u002Fblog\u002F2018\u002F10\u002F2\u002Funderstanding-how-apache-pulsar-works",[264],"his blog",": “Kafka topics are like sticks of Toblerone, … Pulsar topics are like a gas expanding to fill the available space”. Consider the following example where we persist messages M1-M3 on 5 node Pulsar and Kafka clusters with a replication count of 3:",[48,55753,55754],{},[384,55755],{"alt":55756,"src":55757},"Pulsar’s Individual ACKs vs Cumulative ACK","\u002Fimgs\u002Fblogs\u002F63b538529b5ddeb5f6a45b49_kp4.png",[48,55759,55760,55761,55766],{},"Fortunately, Pulsar consumer API includes an equivalent operation that ",[55,55762,55765],{"href":55763,"rel":55764},"https:\u002F\u002Fpulsar.incubator.apache.org\u002Fapi\u002Fclient\u002Forg\u002Fapache\u002Fpulsar\u002Fclient\u002Fapi\u002FConsumer.html#acknowledgeCumulative-org.apache.pulsar.client.api.MessageId-",[264],"cumulatively acknowledges"," the referenced message and all messages before it, and you can do this synchronously or asynchronously. Ultimately, the semantics are equivalent to Kafka’s auto-commit as Pulsar’s ACKs are flushed to brokers periodically to be persisted.",[8325,55768,55771],{"className":55769,"code":55770,"language":8330},[8328],"try (PulsarClient client = PulsarClient.builder()\n      .serviceUrl(\"pulsar:\u002F\u002Flocalhost:6650\")\n      .build());\n   Consumer consumer = client.newConsumer(schema)\n           .subscriptionName(\"subscription\")\n           .subscriptionType(SubscriptionType.Failover)\n           .topic(topic)\n           .subscribe()) {\n \n  while(true) {\n      Message message = consumer.receive();\n      \u002F\u002F Process message\n      consumer.acknowledgeCumulative(message);\n  }\n}\n\n",[4926,55772,55770],{"__ignoreMap":18},[40,55774,33082],{"id":33081},[48,55776,55777,55778,55782],{},"Before we wrap up, let’s summarize the mappings between Pulsar and Kafka concepts that we’ve just covered. Notice that each Kafka concept often translates to a particular configuration of a Pulsar concept. This pattern arises because Pulsar offers highly flexible primitives that can be used to create a broad range of messaging architectures beyond the streaming use case provided by Kafka. Importantly, it shows that Pulsar can replicate Kafka’s behaviors if so desired, and this should not be surprising given that a ",[55,55779,55781],{"href":55607,"rel":55780},[264],"Kafka-compatible protocol handler"," has been created for Pulsar.",[48,55784,55785],{},[384,55786],{"alt":55787,"src":55788},"recap table kafka pulsar","\u002Fimgs\u002Fblogs\u002F63b538df3bc84572903a8975_table-recap-kafka-pulsar.webp",[8300,55790,55791],{"id":35721},"Next steps",[48,55793,55794,55795,55799,55800,55804,55805,190],{},"You’ve learned how to apply your knowledge of Kafka concepts using Apache Pulsar. If you want to put this into practice, head over to the ",[55,55796,3550],{"href":55797,"rel":55798},"http:\u002F\u002Fconsole.streamnative.cloud\u002F",[264]," where you can get a free Pulsar cluster up and running in minutes. Remember that we only covered features that exist in Kafka — we didn’t touch on the additional Pulsar capabilities or performance characteristics. If you’re interested in going beyond Kafka, then take a look at ",[55,55801,55803],{"href":31912,"rel":55802},[264],"free training at the StreamNative Academy"," or have a read of our ",[55,55806,55808],{"href":55807},"\u002Fdownload\u002Freport-pulsar-vs-kafka-benchmark-2022\u002F","Pulsar vs Kafka benchmark",[48,55810,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":55812},[55813,55814,55815,55816,55817,55818,55822],{"id":55580,"depth":19,"text":55581},{"id":55612,"depth":19,"text":42853},{"id":55630,"depth":19,"text":55631},{"id":55660,"depth":19,"text":55661},{"id":55685,"depth":19,"text":55686},{"id":22230,"depth":19,"text":55707,"children":55819},[55820,55821],{"id":55719,"depth":279,"text":55720},{"id":55741,"depth":279,"text":55742},{"id":33081,"depth":19,"text":33082},"2022-09-15","In this blog we’ll bootstrap your Apache Pulsar knowledge by translating your existing Apache Kafka experience. Discussion of Kafa Topics and Kafka Messages","\u002Fimgs\u002Fblogs\u002F63c7c1eeff0c0c587c486628_63b537a7338d4e7215582ac8_kp-top.png",{},{"title":41449,"description":55824},"blog\u002Funderstanding-pulsar-10-minutes-guide-kafka-users",[799,7347],"VS2v12FPoYczZiI81sFtX8pOCwh0asMMPNACF4tlteo",{"id":55832,"title":55833,"authors":55834,"body":55835,"category":821,"createdAt":290,"date":55996,"description":55997,"extension":8,"featured":294,"image":55998,"isDraft":294,"link":290,"meta":55999,"navigation":7,"order":296,"path":56000,"readingTime":11508,"relatedResources":290,"seo":56001,"stem":56002,"tags":56003,"__hash__":56004},"blogs\u002Fblog\u002Fannouncing-flink-pulsar-sink-connector.md","Announcing the Flink-Pulsar Sink Connector",[54199],{"type":15,"value":55836,"toc":55990},[55837,55840,55844,55853,55857,55860,55880,55888,55892,55895,55901,55904,55910,55924,55933,55935,55938],[48,55838,55839],{},"We are excited to announce that the Flink-Pulsar Sink Connector has been released in Flink 1.15 and is available for download and use. Read this blog to learn about new features that allow for greater flexibility of topic routing, delayed messages, exactly-once delivery, and schema.",[40,55841,55843],{"id":55842},"what-is-the-flink-pulsar-sink-connector","What is the Flink-Pulsar Sink Connector?",[48,55845,55846,55847,55852],{},"The Flink-Pulsar Sink Connector is part of the Flink-Pulsar DataStream Connector. It implements Flink’s new ",[55,55848,55851],{"href":55849,"rel":55850},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FFLINK\u002FFLIP-177%3A+Extend+Sink+API",[264],"SinkV2 API"," and allows you to write Flink job results back to Pulsar topics seamlessly. This sink connector, when used with the Flink-Pulsar Source Connector, enables you to define an end-to-end, exactly-once streaming pipeline.",[40,55854,55856],{"id":55855},"new-features-of-the-flink-pulsar-sink-connector","New features of the Flink Pulsar Sink Connector",[48,55858,55859],{},"The Flink-Pulsar Sink Connector provides many useful new features:",[321,55861,55862,55865,55868,55877],{},[324,55863,55864],{},"Flexible topic routing strategies: The Flink-Pulsar Sink Connector is not restricted by Pulsar client routing strategies. It allows you to provide a list of topics, partitions, or topic patterns as the destination. The routing strategy can be message key-hash based or implemented by a custom routing strategy. It also allows you to decide on topics during runtime, such as retrieving the topic from the message body. The flexible topic routing strategies can satisfy the most complicated routing use cases.For example, if you want to send processed results such as metric data to multiple downstream topics, without the topic pattern mode in the Sink Connector, you need to write a multiple sink pipeline. With the topic pattern mode, you can specify a regex pattern and all records can be sent to the topics matching the regex pattern. In case there are new target topics, you do not need to stop the pipeline to reconfigure the pipeline; you only need to add the topics and the Sink Connector will discover the dynamically added topics automatically at runtime.In more complicated use cases where individual records should be sent to different topics determined by some field values, you can choose to implement a custom routing strategy.",[324,55866,55867],{},"Delayed message delivery: You can designate a time interval so that downstream connectors will only consume the messages produced by the Flink-Pulsar Sink Connector after this time interval.",[324,55869,55870,55871,55876],{},"Exactly-once delivery guarantee: The Flink-Pulsar Sink Connector implements ",[55,55872,55875],{"href":55873,"rel":55874},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Ftxn-what\u002F",[264],"Pulsar transaction"," and provides an exactly-once delivery guarantee.",[324,55878,55879],{},"Flexible schema choice: Schema is used to serialize Flink records to bytes and send them to Pulsar. You can choose a Pulsar schema that the underlying Pulsar producer would then use. In this case, the Pulsar client handles the serialization. You can also use a Flink schema that is serialized before going to the Pulsar client and then sent as pure bytes by the Pulsar producer. You can also implement your own serialization logic based on a specific use case.",[48,55881,55882,55883,190],{},"The Flink-Pulsar Sink Connector allows you to customize runtime behavior while also providing out-of-box implementations, such as record serializer, message router, and message delayer. For complete documentation, please refer to ",[55,55884,55887],{"href":55885,"rel":55886},"https:\u002F\u002Fnightlies.apache.org\u002Fflink\u002Fflink-docs-master\u002Fdocs\u002Fconnectors\u002Fdatastream\u002Fpulsar\u002F#pulsar-sink",[264],"Flink-Pulsar Sink Connector documentation",[40,55889,55891],{"id":55890},"using-the-pulsar-flink-sink-connector","Using the Pulsar Flink Sink Connector",[48,55893,55894],{},"To use the Flink-Pulsar Sink Connector, you need to add the jar dependency in your Flink application.",[8325,55896,55899],{"className":55897,"code":55898,"language":8330},[8328],"\n    org.apache.flink\n    flink-connector-pulsar\n    1.16-SNAPSHOT\n\n",[4926,55900,55898],{"__ignoreMap":18},[48,55902,55903],{},"The following sample code illustrates how to use a data generator and the Flink-Pulsar Sink Connector to write data into Pulsar topics.",[8325,55905,55908],{"className":55906,"code":55907,"language":8330},[8328],"\npublic class SimpleSink {\n\n    public static void main(String[] args) throws Exception {\n        \u002F\u002F Load application configs.\n        ApplicationConfigs configs = loadConfig();\n\n        \u002F\u002F Create execution environment.\n        StreamExecutionEnvironment env = createEnvironment(configs);\n\n        \u002F\u002F Create a fake source.\n        InfiniteSourceFunction sourceFunction = new InfiniteSourceFunction\u003C>(new FakerGenerator(), 20000);\n        DataStreamSource source = env.addSource(sourceFunction);\n\n        \u002F\u002F Create Pulsar sink.\n        PulsarSink sink = PulsarSink.builder()\n            .setServiceUrl(configs.serviceUrl())\n            .setAdminUrl(configs.adminUrl())\n            .setTopics(\"persistent:\u002F\u002Fsample\u002Fflink\u002Fsimple-string\")\n            .setProducerName(\"flink-sink-%s\")\n            .setSerializationSchema(flinkSchema(new SimpleStringSchema()))\n            .setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE)\n            .setConfig(fromMap(configs.sinkConfigs()))\n            .build();\n\n        source.sinkTo(sink);\n\n        env.execute(\"Simple Pulsar Sink\");\n    }\n}\n \n",[4926,55909,55907],{"__ignoreMap":18},[48,55911,55912,55913,55918,55919,55923],{},"The documentation is available on the ",[55,55914,55917],{"href":55915,"rel":55916},"https:\u002F\u002Fnightlies.apache.org\u002Fflink\u002Fflink-docs-release-1.15\u002Fdocs\u002Fconnectors\u002Fdatastream\u002Fpulsar\u002F#pulsar-sink",[264],"Flink documentation page",". For complete examples and demo projects, we have created the ",[55,55920,55922],{"href":54307,"rel":55921},[264],"streamnative\u002Fflink-example"," repository that contains detailed demo projects using the Flink-Pulsar Sink Connector. This repository also includes DataStream Source Connector and SQL Connector examples. Follow the instructions on the repository readme to get up and running quickly with the examples.",[48,55925,55926,55927,55932],{},"If you are using the legacy ",[55,55928,55931],{"href":55929,"rel":55930},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink",[264],"Pulsar-Flink Connector",", be aware that the legacy Pulsar-Flink Connector and the Flink-Pulsar Sink Connector introduced in this blog are two different implementations and the legacy connector will be deprecated soon. We recommend you start using the Flink-Pulsar Sink Connector.",[40,55934,39647],{"id":39646},[48,55936,55937],{},"The Flink-Pulsar Sink Connector is being actively maintained and will continue to evolve and provide a better Flink and Pulsar integration experience with the help of the community. To get involved with the Flink-Pulsar Connector, check out the following featured resources:",[321,55939,55940,55947,55953,55959,55972],{},[324,55941,55942,55943],{},"The connector in the official Flink repository : ",[55,55944,55945],{"href":55945,"rel":55946},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fflink\u002Ftree\u002Fmaster\u002Fflink-connectors\u002Fflink-connector-pulsar",[264],[324,55948,55949,55950],{},"The connector in the StreamNative fork of Flink : ",[55,55951,54412],{"href":54412,"rel":55952},[264],[324,55954,55955,55956],{},"Try out the Flink-example: ",[55,55957,54307],{"href":54307,"rel":55958},[264],[324,55960,55961,55962,55967,55968,190],{},"Report a bug: Please submit a ",[55,55963,55966],{"href":55964,"rel":55965},"https:\u002F\u002Fissues.apache.org\u002Fjira\u002Fprojects\u002FFLINK\u002Fissues\u002FFLINK-26721?filter=allopenissues",[264],"Flink JIRA"," or a issue in the ",[55,55969,55971],{"href":54412,"rel":55970},[264],"StreamNative fork",[324,55973,55974,55975,55980,55981,55985,55986,4003,55988,20571],{},"Pick up an issue and submit PR: Please follow the standard ",[55,55976,55979],{"href":55977,"rel":55978},"https:\u002F\u002Fflink.apache.org\u002Fcontributing\u002Fhow-to-contribute.html",[264],"Flink contribution guidelines"," Join the Apache Pulsar community on ",[55,55982,55984],{"href":31692,"rel":55983},[264],"Slack"," or subscribe to the Pulsar mailing list (",[55,55987,55549],{"href":48254},[55,55989,55552],{"href":48257},{"title":18,"searchDepth":19,"depth":19,"links":55991},[55992,55993,55994,55995],{"id":55842,"depth":19,"text":55843},{"id":55855,"depth":19,"text":55856},{"id":55890,"depth":19,"text":55891},{"id":39646,"depth":19,"text":39647},"2022-08-30","Read about the new features of the Flink-Pulsar Sink Connector and how to get started with it.","\u002Fimgs\u002Fblogs\u002F63c7c1c4f86b5b17893725fc_63b536a9e55a72520e695a46_fsc-top.png",{},"\u002Fblog\u002Fannouncing-flink-pulsar-sink-connector",{"title":55833,"description":55997},"blog\u002Fannouncing-flink-pulsar-sink-connector",[28572,302],"TIeLDFA8yRQmIy0d8_n2uVHTjXJEjBHG53L2gL0_GTY",{"id":56006,"title":56007,"authors":56008,"body":56010,"category":7338,"createdAt":290,"date":56287,"description":56288,"extension":8,"featured":294,"image":56289,"isDraft":294,"link":290,"meta":56290,"navigation":7,"order":296,"path":56291,"readingTime":4475,"relatedResources":290,"seo":56292,"stem":56293,"tags":56294,"__hash__":56295},"blogs\u002Fblog\u002Fpulsar-summit-sf-2022-community-event-recap.md","Pulsar Summit SF 2022: Community Event Recap",[56009],"Kristin Crosier",{"type":15,"value":56011,"toc":56278},[56012,56015,56018,56021,56035,56038,56040,56047,56049,56066,56072,56076,56096,56100,56120,56124,56127,56130,56133,56179,56209,56212,56218,56224,56227,56241,56245,56252,56255],[48,56013,56014],{},"We’re sad that Pulsar Summit San Francisco 2022 is over, but excited to share the event was a big hit! Let’s just say the venue was pulsing with energy. (See what we did there? 👀)",[48,56016,56017],{},"Pulsar Summit SF marked the first-ever, in-person event in North America, and it was incredible to see members of the data streaming community convene for great conversations and knowledge sharing. The turnout was impressive — the event sold out, and the team had to add tables to accommodate more people.",[48,56019,56020],{},"We want to thank the entire Apache Pulsar community and friends from other open-source communities for making the event a success. In addition to the Pulsar community, the event was well-attended by open-source partners and friends including Apache Pinot, Apache Flink, Delta Lake, Apache Beam, Apache Hudi, and ScyllaDB. Now for the highlights:",[321,56022,56023,56026,56029,56032],{},[324,56024,56025],{},"200+ attendees from Apple, Blizzard, IBM, Optum, Iterable, Twitter, Uber, and more!",[324,56027,56028],{},"20 speakers from companies like Google, AWS, and Yahoo",[324,56030,56031],{},"5 keynotes on Apache Pulsar and other streaming technologies like Apache Beam and Apache Pinot",[324,56033,56034],{},"12 breakout sessions on tech deep dives, use cases, and ecosystem talks",[48,56036,56037],{},"The talks highlighted how organizations rely on not just a single technology, but rather on an ecosystem of technologies to build out streaming use cases. The event felt truly unique because of the focus on community and representation of the broader streaming ecosystem.",[40,56039,51360],{"id":51359},[48,56041,56042,56043,190],{},"Summit speakers covered topics on building event-driven architectures, streaming at scale, and leveraging multiple streaming technologies for a unified data stack. Here’s a quick recap of the keynotes and breakout sessions, which will be available on-demand after editing is complete. Sign up for access to the recorded content ",[55,56044,267],{"href":56045,"rel":56046},"https:\u002F\u002Fshare.hsforms.com\u002F1B7c1mJy6RDaon09nNX0wdg3x5r4",[264],[32,56048,40525],{"id":40524},[321,56050,56051,56054,56057,56060,56063],{},[324,56052,56053],{},"Microservices and event-driven architectures: Sijie Guo and Matteo Merli (StreamNative) discussed the evolution of microservices and the fundamentals of modern event-driven applications.",[324,56055,56056],{},"Stream processing with Pulsar and Beam: Byron Ellis (Google) shared how to do stream processing at scale using Pulsar and Beam.",[324,56058,56059],{},"User-facing analytics with Pinot: Xiang Fu (StarTree) talked about democratizing data decision-making and using Pinot for user-facing analytical applications.",[324,56061,56062],{},"Pulsar at scale: Ignacio Alvarez (Mercado Libre) shared his company’s success with Pulsar at scale across 200M RPM and thousands of instances.",[324,56064,56065],{},"Asynchronous messaging with Python and Pulsar: Zac Bentley (Klaviyo) discussed how the company dealt with messaging scalability challenges and built an asynchronous application framework with Python and Pulsar.",[48,56067,56068],{},[384,56069],{"alt":56070,"src":56071},"image of a room with people at the pulsar summit 2022","\u002Fimgs\u002Fblogs\u002F63b535affd34ba0d0569d271_sijie-keynote.jpeg",[32,56073,56075],{"id":56074},"tech-deep-diveuse-case-breakout-sessions","Tech Deep Dive\u002FUse Case Breakout Sessions",[321,56077,56078,56081,56084,56087,56090,56093],{},[324,56079,56080],{},"How Yahoo uses Pulsar: Ludwig Pummer and Rajan Dhabalia (Yahoo) offered insights on the challenges of cloud messaging systems and selecting Pulsar for its availability, performance, and cost.",[324,56082,56083],{},"Broker rebalancing in Pulsar: Heesung Sohn (StreamNative) led a deep dive on broker rebalancing in Pulsar.",[324,56085,56086],{},"Messaging redelivery in Pulsar: David Kjerrumgaard (StreamNative) walked the audience through what to do in publication redelivery scenarios in Pulsar.",[324,56088,56089],{},"Pulsar failure scenarios: Lari Hotari (DataStax) discussed validating Pulsar’s behavior under failure conditions.",[324,56091,56092],{},"Running Pulsar without ZooKeeper: Matteo Merli (StreamNative) shared how to leverage alternative metadata and coordination systems to move toward a ZooKeeper-less Pulsar.",[324,56094,56095],{},"How Toast runs blue-green deploys with Pulsar and Envoy: Kai Levy and Zac Walsh (Toast) discussed Toast’s cloud-based microservice ecosystem, their journey from RabbitMQ to Pulsar, and how they use Envoy for blue-green deploys.",[32,56097,56099],{"id":56098},"ecosystem-breakout-sessions","Ecosystem Breakout Sessions",[321,56101,56102,56105,56108,56111,56114,56117],{},[324,56103,56104],{},"Building lakehouses with Delta Lake and Pulsar: Nick Karpov (Databricks) discussed the key features of Delta Lake and its expanding ecosystem.",[324,56106,56107],{},"Lakehouses with Hudi and Pulsar: Alexey Kudinkin (Onehouse) and Addison Higham (StreamNative) gave an overview of why Apache Hudi and Pulsar work well for lakehouse architectures.",[324,56109,56110],{},"Streaming pipelines with Flink and Pulsar: Caito Scherr (Ververica) talked about using Pulsar and Flink for a unified data stack and gave a live demo.",[324,56112,56113],{},"Apache Kafka on Pulsar: Ricardo Ferreira (AWS) demoed Apache Kafka on Pulsar in three scenarios: microservices built for Kafka, CDC using Debezium for MySQL, and and stream processing using ksqlDB.",[324,56115,56116],{},"Pulsar Functions with SQL: Neng Lu and Rui Fu (StreamNative) delivered a session on how SQL, Pulsar Functions, and Function Mesh work together to enable easy access to real-time data in the cloud.",[324,56118,56119],{},"Event streaming with ScyllaDB: Peter Corless (ScyllaDB) shared his insights on distributed database design for high-performance event streaming.",[40,56121,56123],{"id":56122},"more-highlights-in-person-networking-happy-hour-and-a-roaming-photo-booth","More highlights: In-person networking, happy hour, and a roaming photo booth",[48,56125,56126],{},"We love the versatility of virtual events and bringing people in different locations together. But there’s nothing quite like finally meeting your new Twitter pal in person or learning from other data streaming thought leaders. And Pulsar Summit SF had plenty of opportunities to meet and learn!",[48,56128,56129],{},"According to Tim Spann, Developer Advocate at StreamNative, “The Ecosystem breakout sessions were impressive and showed the strength of Apache Pulsar when combined with lakehouses, streaming pipelines, and fast databases. Those sessions also offered validation that Pulsar belongs squarely in the middle of all modern data architectures and excels at high-performance event streaming.”",[48,56131,56132],{},"The community was very open and welcoming, and attendees happily swapped best practices and use cases. Members of the broader ecosystem were even live-tweeting and sharing their experiences throughout the day.",[916,56134,56135],{},[48,56136,190,56137,56142,56143,758,56148,56153,56154,56157,56158,56162,56163,758,56168,56173,56174],{},[55,56138,56141],{"href":56139,"rel":56140},"https:\u002F\u002Ftwitter.com\u002FPulsarSummit?ref_src=twsrc%5Etfw",[264],"@PulsarSummit"," was an awesome event. Amazing technology. Industry leading use cases. The collective brainpower in the room was enough to light up San Francisco. Shoutouts to ",[55,56144,56147],{"href":56145,"rel":56146},"https:\u002F\u002Ftwitter.com\u002FPaaSDev?ref_src=twsrc%5Etfw",[264],"@PaaSDev",[55,56149,56152],{"href":56150,"rel":56151},"https:\u002F\u002Ftwitter.com\u002Friferrei?ref_src=twsrc%5Etfw",[264],"@riferrei"," \\",[2628,56155,56156],{},"pictured",", and the whole cast at ",[55,56159,36254],{"href":56160,"rel":56161},"https:\u002F\u002Ftwitter.com\u002Fstreamnativeio?ref_src=twsrc%5Etfw",[264]," who made it possible. ",[55,56164,56167],{"href":56165,"rel":56166},"https:\u002F\u002Ftwitter.com\u002Fhashtag\u002FPulsarSummitSF?src=hash&ref_src=twsrc%5Etfw",[264],"#PulsarSummitSF",[55,56169,56172],{"href":56170,"rel":56171},"https:\u002F\u002Ft.co\u002Frlj7kCAOzX",[264],"pic.twitter.com\u002Frlj7kCAOzX","— Peter Corless 🌎☮ 💛🇺🇸🇮🇪🇺🇦🌻💉 (@PeterCorless) ",[55,56175,56178],{"href":56176,"rel":56177},"https:\u002F\u002Ftwitter.com\u002FPeterCorless\u002Fstatus\u002F1561026414681305089?ref_src=twsrc%5Etfw",[264],"August 20, 2022",[916,56180,56181],{},[48,56182,56183,56184,56189,56190,56194,56195,758,56198,56203,56204],{},"Shoutout to technology specific conferences that really highlight platform partnerships- it’s been really cool getting some deep dives into software like ",[55,56185,56188],{"href":56186,"rel":56187},"https:\u002F\u002Ftwitter.com\u002Fapachehudi?ref_src=twsrc%5Etfw",[264],"@apachehudi"," in the context of other streaming tech (like ",[55,56191,36238],{"href":56192,"rel":56193},"https:\u002F\u002Ftwitter.com\u002Fapache_pulsar?ref_src=twsrc%5Etfw",[264]," ) ",[55,56196,56167],{"href":56165,"rel":56197},[264],[55,56199,56202],{"href":56200,"rel":56201},"https:\u002F\u002Ft.co\u002FxQaeoAQMwe",[264],"pic.twitter.com\u002FxQaeoAQMwe","— CAITOs (@caito_200_OK) ",[55,56205,56208],{"href":56206,"rel":56207},"https:\u002F\u002Ftwitter.com\u002Fcaito_200_OK\u002Fstatus\u002F1560366358264631296?ref_src=twsrc%5Etfw",[264],"August 18, 2022",[48,56210,56211],{},"The event wrapped up with a friendly happy hour and a fun way to capture live photos — a roaming photo booth! The photo booth captured lots of photos and GIFs of people socializing and meeting fellow attendees.",[48,56213,56214],{},[384,56215],{"alt":56216,"src":56217},"people taking a picture at the pulsar summit 2022","\u002Fimgs\u002Fblogs\u002F63b535af2834b476d45c854b_img_6444.jpeg",[48,56219,56220],{},[384,56221],{"alt":56222,"src":56223},"pucture streamnative team at te pulsar summit 2022","\u002Fimgs\u002Fblogs\u002F63b535af038b2570a39681dc_img_c4aa43ecc0c2-1.jpeg",[48,56225,56226],{},"During Happy Hour, we surveyed a few people and asked what three words they would use to describe the Summit. We were pleased to hear descriptors like “community” and “educational” come up repeatedly:",[321,56228,56229,56232,56235,56238],{},[324,56230,56231],{},"Ricardo Ferreira, Developer Advocate at AWS: “I love it!”",[324,56233,56234],{},"Jim Zucker, Solution Architect, Financial Services at Ness: “Informative, educational, and community”",[324,56236,56237],{},"Zac Walsh, Senior Software Engineer at Toast: “Educational, community, and fun”",[324,56239,56240],{},"Caito Sherr, Developer Advocate at Ververica: “Engaging, mellow, and organized”",[40,56242,56244],{"id":56243},"more-on-pulsar-summit-sf","More on Pulsar Summit SF",[48,56246,56247,56248,56251],{},"The event team recorded all keynotes and breakout sessions, and the on-demand videos will be available soon. ",[55,56249,39858],{"href":56045,"rel":56250},[264]," to get access once the videos go live.",[48,56253,56254],{},"If you’d like to hear more about Pulsar Summit San Francisco from people who attended, check out these recap resources:",[321,56256,56257,56264,56271],{},[324,56258,56259],{},[55,56260,56263],{"href":56261,"rel":56262},"https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Fpulsar-summit-2022-report-aug-19-tim-spann-\u002F",[264],"Tim Spann, Developer Advocate @ StreamNative, recapped the event in his LinkedIn newsletter",[324,56265,56266],{},[55,56267,56270],{"href":56268,"rel":56269},"https:\u002F\u002Ftwitter.com\u002Fi\u002Fspaces\u002F1dRKZlaBvwbJB?s=20",[264],"ScyllaDB hosted a Twitter Live recap featuring Peter Corless and Raouf Chebri",[324,56272,56273],{},[55,56274,56277],{"href":56275,"rel":56276},"https:\u002F\u002Fwww.scylladb.com\u002F2022\u002F08\u002F24\u002Foverheard-at-pulsar-summit-2022\u002F",[264],"ScyllaDB: Overheard at Pulsar Summit 2022",{"title":18,"searchDepth":19,"depth":19,"links":56279},[56280,56285,56286],{"id":51359,"depth":19,"text":51360,"children":56281},[56282,56283,56284],{"id":40524,"depth":279,"text":40525},{"id":56074,"depth":279,"text":56075},{"id":56098,"depth":279,"text":56099},{"id":56122,"depth":19,"text":56123},{"id":56243,"depth":19,"text":56244},"2022-08-24","Pulsar Summit San Francisco was a big hit! Read about the highlights of this community event.","\u002Fimgs\u002Fblogs\u002F63c7c1da5235812799594fc9_63b535af3bc8453d4137fb8f_banner-4.png",{},"\u002Fblog\u002Fpulsar-summit-sf-2022-community-event-recap",{"title":56007,"description":56288},"blog\u002Fpulsar-summit-sf-2022-community-event-recap",[5376,821],"YfBn47J3Uxae5edgn4YUp_9uwDxxXD_4N9Jskhb1wrk",{"id":56297,"title":56298,"authors":56299,"body":56300,"category":7338,"createdAt":290,"date":56409,"description":56410,"extension":8,"featured":294,"image":56411,"isDraft":294,"link":290,"meta":56412,"navigation":7,"order":296,"path":56413,"readingTime":11180,"relatedResources":290,"seo":56414,"stem":56415,"tags":56416,"__hash__":56417},"blogs\u002Fblog\u002Fpulsar-summit-asia-2022-cfp-open.md","Pulsar Summit Asia 2022: CFP Is Open Now!",[41185],{"type":15,"value":56301,"toc":56402},[56302,56305,56308,56311,56315,56318,56330,56338,56352,56355,56359,56376,56380,56385,56392,56394,56396,56398,56400],[48,56303,56304],{},"Pulsar Summit is the conference dedicated to Apache Pulsar, and the messaging and event streaming community. The conference gathers an international audience of developers, data architects, data scientists, Apache Pulsar committers and contributors, as well as the messaging and streaming community. Together, they share experiences, exchange ideas and knowledge, and receive hands-on training sessions led by Pulsar experts.",[48,56306,56307],{},"In January this year, Pulsar Summit Asia 2021 (delayed due to the COVID-19 pandemic) featured more than 25 interactive sessions by technologists, developers, software engineers, and software architects from StreamNative, BIGO, China Mobile, Nutanix, Tencent, and more. The conference drew over 1000 attendees around the world, including attendees from top technology, internet, and media companies, such as Tencent, TikTok, Alibaba, and Microsoft.",[48,56309,56310],{},"Pulsar Summit Asia 2022 will be hosted virtually on November 19th and 20th, 2022. It is expected to cover the pivotal topics and technologies at the core of Apache Pulsar.",[40,56312,56314],{"id":56313},"join-us-and-speak-at-pulsar-summit-asia-2022","Join us and speak at Pulsar Summit Asia 2022",[48,56316,56317],{},"Share your Pulsar story and speak at the summit! It is a great opportunity to participate and raise your profile in the rapidly growing Apache Pulsar community. We are looking for Pulsar stories that are innovative, informative, or thought-provoking. Here are some suggestions:",[321,56319,56320,56322,56324,56326,56328],{},[324,56321,48333],{},[324,56323,48336],{},[324,56325,48339],{},[324,56327,48330],{},[324,56329,48342],{},[48,56331,56332,56333,56337],{},"To speak at the summit, ",[55,56334,56336],{"href":55040,"rel":56335},[264],"submit an abstract"," about your session. All levels of talks (beginner, intermediate, and advanced) are welcome. Remember to keep your proposal short, relevant, and engaging. The following session formats are acceptable:",[321,56339,56340,56343,56346,56349],{},[324,56341,56342],{},"Presentation: 40-minute presentation, maximum of 2 speakers",[324,56344,56345],{},"Panel: 40-minute of discussion amongst 3 to 5 speakers",[324,56347,56348],{},"Workshop: 60-90 minutes, in-depth, hands-on tutorials, maximum of 2 speakers",[324,56350,56351],{},"Lightning talk: 10-minute presentation, maximum of 2 speakers",[48,56353,56354],{},"You need to pre-record your session after the proposal is approved. For time zone and network reasons, we do not recommend speakers present their talk live.",[40,56356,56358],{"id":56357},"dates-to-remember","Dates to remember",[321,56360,56361,56364,56367,56370,56373],{},[324,56362,56363],{},"CFP opens: August 22nd, 2022",[324,56365,56366],{},"CFP closes: October 9th, 2022",[324,56368,56369],{},"CFP notifications: October 19th, 2022",[324,56371,56372],{},"Schedule announcement: November 4th, 2022",[324,56374,56375],{},"Event dates: November 19th, 2022 - November 20th, 2022",[40,56377,56379],{"id":56378},"sponsor-pulsar-summit","Sponsor Pulsar Summit",[48,56381,56382,56383,190],{},"Pulsar Summit is a conference for the community and your support is needed. Sponsoring this event provides a great opportunity for your organization to further engage with the Apache Pulsar community. For more information on becoming a sponsor, contact us at ",[55,56384,39814],{"href":39813},[48,56386,56387,56388,56391],{},"Help us make #PulsarSummit Asia 2022 a big success by spreading the word and submitting your proposal! Follow us on Twitter (",[55,56389,39823],{"href":39821,"rel":56390},[264],") to receive the latest updates on the summit!",[40,56393,39828],{"id":39827},[48,56395,39831],{},[40,56397,52680],{"id":39834},[48,56399,52683],{},[48,56401,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":56403},[56404,56405,56406,56407,56408],{"id":56313,"depth":19,"text":56314},{"id":56357,"depth":19,"text":56358},{"id":56378,"depth":19,"text":56379},{"id":39827,"depth":19,"text":39828},{"id":39834,"depth":19,"text":52680},"2022-08-22","Pulsar Summit Asia 2022 will be hosted virtually on November 19th and 20th, 2022. Share your Pulsar story and speak at the summit! It is a great opportunity to participate and raise your profile in the rapidly growing Apache Pulsar community.","\u002Fimgs\u002Fblogs\u002F63c7c222f9561b5d702bb1c4_63b535557f6d1a57f227b2f4_pulsar-summit.jpeg",{},"\u002Fblog\u002Fpulsar-summit-asia-2022-cfp-open",{"title":56298,"description":56410},"blog\u002Fpulsar-summit-asia-2022-cfp-open",[5376,821],"xaLooON-8sohSxViMPZpzp7SdECUQgeoQ2dGzlKLrZc",{"id":56419,"title":43584,"authors":56420,"body":56421,"category":821,"createdAt":290,"date":56862,"description":56863,"extension":8,"featured":294,"image":56864,"isDraft":294,"link":290,"meta":56865,"navigation":7,"order":296,"path":43583,"readingTime":3556,"relatedResources":290,"seo":56866,"stem":56867,"tags":56868,"__hash__":56869},"blogs\u002Fblog\u002Fclient-optimization-how-tencent-maintains-apache-pulsar-clusters-100-billion-messages-daily.md",[41185],{"type":15,"value":56422,"toc":56842},[56423,56427,56430,56433,56435,56438,56441,56444,56453,56457,56460,56468,56471,56474,56478,56481,56484,56488,56494,56498,56501,56507,56510,56516,56519,56522,56525,56529,56532,56538,56541,56544,56548,56551,56554,56560,56563,56566,56570,56573,56576,56580,56583,56586,56592,56595,56599,56602,56608,56611,56628,56631,56634,56638,56641,56655,56658,56662,56665,56668,56672,56675,56678,56682,56685,56688,56692,56695,56698,56702,56714,56718,56721,56738,56741,56744,56747,56750,56753,56762,56776,56779,56782,56785,56788,56791,56794,56797,56800,56803,56806,56809,56812,56815,56817,56840],[40,56424,56426],{"id":56425},"authors","Authors",[48,56428,56429],{},"Mingyu Bao, Senior Software Engineer at the Tencent TEG Data Platform Department. He is responsible for the development of projects like Apache Pulsar, Apache Inlong, and DB data collection. He is focused on big data and message middleware, with over 10 years of experience in Java development.",[48,56431,56432],{},"Dawei Zhang, Apache Pulsar Committer, Senior Software Engineer at the Tencent TEG Data Platform Department. He is responsible for the development of the Apache Pulsar project. He is focused on MQ and real-time data processing, with over 6 years of experience in big data platform development.",[40,56434,19156],{"id":19155},[48,56436,56437],{},"Tencent is a world-leading internet and technology company that develops innovative products and services for users across the globe. It provides communication and social services that connect more than one billion people around the world, helping them to keep in touch with friends and family members, pay for daily necessities, and even be entertained.",[48,56439,56440],{},"To offer a diverse product portfolio, Tencent needs to stay at the forefront of technological innovations. At Tencent, the Technology Engineering Group (TEG) is responsible for supporting the company and its business groups in technology and operational platforms, as well as the construction and operation of R&D management and data centers.",[48,56442,56443],{},"Recently, an internal team working on messaging queuing (MQ) solutions at TEG developed a performance analysis system (referred to as the “Data Project” in this blog) for maintenance metrics. This system provides general infrastructure services to the entire Tencent group. The Data Project collects performance metrics and reports data for business operations and monitoring. It may also be adopted for real-time analysis at both the front end and back end in the future.",[48,56445,56446,56447,56452],{},"The Data Project uses Apache Pulsar as its message system with servers deployed on ",[55,56448,56451],{"href":56449,"rel":56450},"https:\u002F\u002Fintl.cloud.tencent.com\u002Fproducts\u002Fcvm",[264],"Cloud Virtual Machines"," (CVM), and producers and consumers deployed on Kubernetes. The Pulsar clusters managed by the Data Project produce a greater number of messages than any other cluster managed by the MQ team. Due to the large cluster size, we have faced specific challenges and hope to share our learnings and best practices in this post.",[40,56454,56456],{"id":56455},"high-system-reliability-and-low-latency-why-apache-pulsar","High system reliability and low latency: Why Apache Pulsar",[48,56458,56459],{},"Our use case has the following two important characteristics:",[321,56461,56462,56465],{},[324,56463,56464],{},"Large message volumes. The Data Project is running with a large number of nodes to handle numerous messages every day with up to thousands of consumers bound to a single subscription. While we don’t have too many topics in total, each topic is connected to multiple clients. Each partition is attached to over 150 producers and more than 8000 consumers.",[324,56466,56467],{},"Strict requirements for high system reliability and low latency. With such large clusters to maintain, we must stick to higher standards of deployment, operations, and stability.",[48,56469,56470],{},"Therefore, when selecting potential message solutions, we put low latency and high throughput as key metrics for analysis. Compared with some of the common message systems in the market, ultimately Apache Pulsar stood out with its outstanding performance and capabilities.",[48,56472,56473],{},"Pulsar provides different subscription types, namely exclusive, failover, shared, and key_shared. Shared and key_shared subscriptions are able to support use cases where a large number of consumers are working at the same time. This is where other message systems like Kafka fall short. They function with rather low performance in a multi-partition scenario as consumers are restricted by the number of partitions.",[40,56475,56477],{"id":56476},"configurations-at-a-glance-large-clusters-with-over-100-billion-messages-per-day","Configurations at a glance: Large clusters with over 100 billion messages per day",[48,56479,56480],{},"Now that we know the reason behind our preference for Pulsar, let’s look at how we configured our cluster to take full advantage of it.",[48,56482,56483],{},"The business data in the Data Project are handled by two Pulsar clusters, which we will refer to as T-1 and T-2 in this blog. The client Pods (producers and consumers are deployed in different Kubernetes Pods) that connect to cluster T-1 are placed in the same server room as the Pulsar cluster. The client Pods that interact with cluster T-2 are deployed in different server rooms from the Pulsar cluster. Note that the latency of data transmission across different server rooms is a little bit higher than that in the same server room.",[32,56485,56487],{"id":56486},"server-side-configurations","Server-side configurations",[48,56489,56490],{},[384,56491],{"alt":56492,"src":56493},"table Server-side configurations","\u002Fimgs\u002Fblogs\u002F63b530703450ab1cb4248103_server-side-config.png",[32,56495,56497],{"id":56496},"client-side-configurations","Client-side configurations",[48,56499,56500],{},"Developed in Go, the business system of the Data Project is built on the master branch (latest version) of the Pulsar Go Client, deployed on STKE (Tencent’s internal container platform).",[48,56502,56503],{},[384,56504],{"alt":56505,"src":56506},"illustration of  Massive amounts of acknowledgment information ","\u002Fimgs\u002Fblogs\u002F63b53070789b6a3150344a30_message-acks.png",[48,56508,56509],{},"When pushing messages to the client with a shared type subscription, brokers only send a subset of the messages to each consumer in a round-robin distribution. After each consumer acknowledges their messages, you can see that there is plenty of range information as shown below stored on the broker.",[48,56511,56512],{},[384,56513],{"alt":56514,"src":56515},"figure 2 with some numbers","\u002Fimgs\u002Fblogs\u002F63b5307047daaa62c0729147_individualldeletedmessages.png",[48,56517,56518],{},"An acknowledgment hole refers to the gap between two consecutive ranges. Its information is stored through the individualDeletedMessages attribute. The number of consumers attached to the same subscription and their consumption speed can all impact acknowledgment holes. A larger number of acknowledgment holes could mean that you have plenty of acknowledgment information.",[48,56520,56521],{},"Pulsar periodically aggregates all acknowledgment information of all consumers associated with the same subscription as an entry and writes it to bookies. The process is the same as writing ordinary messages. Therefore, when you have too many acknowledgment holes and large amounts of acknowledgment information, your system might be overwhelmed. For example, you might notice a longer production time, latency spikes, or even timeouts on the client side.",[48,56523,56524],{},"In these cases, you can reduce the number of consumers, increase the consumption speed, adjust the frequency of storing acknowledgment information, and change the number of saved ranges.",[3933,56526,56528],{"id":56527},"analysis-2-the-pulsar-io-thread-gets-stuck","Analysis 2: The pulsar-io thread gets stuck",[48,56530,56531],{},"The pulsar-io thread pool is used to process client requests on Pulsar brokers. When threads are too slow or get stuck, there might be timeouts or disconnections on the client side. We can identify and analyze these problems through jstack information, which displays that there can be many connections in the CLOSE_WAIT state on brokers as shown below:",[8325,56533,56536],{"className":56534,"code":56535,"language":8330},[8328],"\n36          0 :6650           :57180         CLOSE_WAIT    20714\u002Fjava\n36          0 :6650           :48858         CLOSE_WAIT    20714\u002Fjava\n36          0 :6650           :49846         CLOSE_WAIT    20714\u002Fjava\n36          0 :6650           :55342         CLOSE_WAIT    20714\u002Fjava\n \n",[4926,56537,56535],{"__ignoreMap":18},[48,56539,56540],{},"Usually, these can be caused by server code bugs (such as deadlocks in some concurrent scenarios), while configurations could also be a reason. If the pulsar-io thread pool has been running for a long time, you can modify the numioThreads parameter in broker.conf to change the number of working threads in the pool on the premise that you have sufficient CPU resources. This can help improve performance in concurrent tasks.",[48,56542,56543],{},"A busy pulsar-io thread pool is essentially not going to cause problems. Nevertheless, on the broker side, there is a background thread periodically checking if each channel is receiving requests from the client within the expected threshold. If not, the broker will close the channel (similar logic also exists in the client SDK). This is why a client is disconnected when the pulsar-io thread pool gets stuck or slows down.",[3933,56545,56547],{"id":56546},"analysis-3-excessive-time-consumption-in-ledger-switching","Analysis 3: Excessive time consumption in ledger switching",[48,56549,56550],{},"As a logic storage unit in Apache BookKeeper, each ledger stores a certain number of log entries, and each entry contains one or multiple messages (if message batching is enabled). When certain conditions are met (for example, the entry number, total message size, or lifespan reaches the preset threshold), a ledger is switched (ledger rollover).",[48,56552,56553],{},"Here’s what it looks like when ledger switching takes too much time:",[8325,56555,56558],{"className":56556,"code":56557,"language":8330},[8328],"\n14:40:44.528 [bookkeeper-ml-workers-OrderedExecutor-16-0] INFO org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent:\u002F\u002F}, client=\u002F:51550, producerName=}\n14:59:00.398 [bookkeeper-ml-workers-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.OpAddEntry - Closing ledger 7383265 for being full\n$ cat pulsar-broker-11-135-219-214.log-11-05-2021-2.log | grep ‘bookkeeper-ml-workers-OrderedExecutor-16-0’ | grep ‘15:0’ | head -n 10\n15:01:01:256 [bookkeeper-ml-workers-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [\n] Ledger creation was initiated 120005 ms ago but it never completed and creation timeout task didn’t kick in as well. Force to fail the create ledger operation.\n15:01:01:256 [bookkeeper-ml-workers-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [] Error creating ledger rc=-23 Bookie operation timeout\n \n",[4926,56559,56557],{"__ignoreMap":18},[48,56561,56562],{},"When ledger switching happens, new messages or existing ones that are yet to be processed will go to appendingQueue. After the new ledger is created, the system can continue to process data in the queue, thus making sure no messages are lost.",[48,56564,56565],{},"When ledger switching takes a longer time, it means that messages are produced slowly or that there could be timeouts. In this case, you need to check whether this is a ZooKeeper issue (pay more attention to the performance of machines running ZooKeeper and garbage collection).",[3933,56567,56569],{"id":56568},"analysis-4-busy-bookkeeper-io-thread","Analysis 4: Busy bookkeeper-io thread",[48,56571,56572],{},"In our current Pulsar clusters, we are using a relatively stable version of BookKeeper. To optimize performance, you can modify the number of client threads and key configurations (for example, ensemble size, write quorum, and ack quorum).",[48,56574,56575],{},"If you notice that the bookkeeper-io thread pool of the BookKeeper client is busy or a single thread in the pool is busy, you need to first check ZooKeeper, bookie processes, and Full GC. If there is no problem, it may be a good idea to change the number of threads in the bookkeeper-io thread pool and the number of partitions.",[3933,56577,56579],{"id":56578},"analysis-5-debug-logging-impact","Analysis 5: DEBUG logging impact",[48,56581,56582],{},"If producers spend too much time producing messages, the Java client is usually where the issue exists in terms of logging levels. If Log4j is adopted in your business system with debug logging enabled, the performance of Pulsar Client SDK might be impacted. Therefore, we suggest that you use the Pulsar Java app coupled with Log4j or Log4j plus SLF4J for logging. At the same time, change your logging level to at least INFO or ERROR for your Pulsar package.",[48,56584,56585],{},"In extreme cases, DEBUG logs may affect thread performance, which could mean a longer time (second-level) in producing messages. After optimization, it can fall back to the normal level (millisecond-level). See Figure 3 below for details:",[48,56587,56588],{},[384,56589],{"alt":56590,"src":56591},"example of DEBUG logging impact","\u002Fimgs\u002Fblogs\u002F63b530e1fdbbb30d59503cb4_debug-logs.png",[48,56593,56594],{},"When you have a large number of messages, you may want to disable DEBUG level log printing on brokers and bookies for your Pulsar cluster. It is also recommended to change the logging level to INFO or ERROR.",[3933,56596,56598],{"id":56597},"analysis-6-uneven-partition-distribution","Analysis 6: Uneven partition distribution",[48,56600,56601],{},"In Pulsar, we can configure bundles in each namespace (4 bundles by default) as shown below. Topics are assigned to a particular bundle by taking the hash of the topic name and checking which bundle the hash falls into. Each bundle is independent of the others and thus is independently assigned to different brokers. When too many partitions fall into the same broker, it becomes overloaded, which impairs the efficiency of producing and consuming messages.",[48,56603,56604],{},[384,56605],{"alt":56606,"src":56607},"illustration of Uneven partition distribution","\u002Fimgs\u002Fblogs\u002F63b530e2dc98716d76fee334_bundles.png",[48,56609,56610],{},"Note that:",[321,56612,56613,56616,56619,56622,56625],{},[324,56614,56615],{},"Each namespace has a bundle list.",[324,56617,56618],{},"Partitions fall into different bundles based on the hash value.",[324,56620,56621],{},"Each bundle is bound to a single broker.",[324,56623,56624],{},"Bundles can be dynamically split, which is configurable.",[324,56626,56627],{},"Bundles and brokers are bound based on the brokers' load.",[48,56629,56630],{},"When the Data Project started, there were only a few partitions in each topic and a few bundles in each namespace. As we modified the number of partitions and bundles, we gradually achieved load balancing across brokers.",[48,56632,56633],{},"Pulsar still has room for improvement in dynamic bundle splitting and partition distribution, especially the splitting algorithm. It currently supports range_equally_divide (default) and topic_count_equally_divide (We suggest using the latter). That said, the improvement needs to be carried out without compromising system stability and load balancing.",[32,56635,56637],{"id":56636},"optimization-2-frequent-client-disconnections-and-reconnections","Optimization 2: Frequent client disconnections and reconnections",[48,56639,56640],{},"There are various reasons for disconnections and reconnections. Based on our own use case, we have summarized the following major causes.",[321,56642,56643,56646,56649,56652],{},[324,56644,56645],{},"Client disconnection and reconnection mechanism",[324,56647,56648],{},"Go SDK exception handling",[324,56650,56651],{},"Go SDK producer sequence id inconsistency",[324,56653,56654],{},"Consumers were frequently created and deleted at scale",[48,56656,56657],{},"Now, let’s analyze each of these causes and examine some solutions.",[3933,56659,56661],{"id":56660},"analysis-1-client-disconnection-and-reconnection-mechanism","Analysis 1: Client disconnection and reconnection mechanism",[48,56663,56664],{},"The Pulsar client SDK has similar logic (see Analysis 2 in the previous section above) that periodically checks if requests from the server within the expected threshold are received. If not, the client will be disconnected from the server.",[48,56666,56667],{},"Usually, the problem could be a lack of resources on client machines that already have a high utilization rate (and let’s assume there is nothing wrong with the server). This means your application is not able to handle the data from the server. To solve this, change the business logic or the deployment method of your client.",[3933,56669,56671],{"id":56670},"analysis-2-go-sdk-exception-handling","Analysis 2: Go SDK exception handling",[48,56673,56674],{},"The Pulsar community provides integration support for clients in different languages, such as Java, Go, C++, and Python. However, besides Java and Go, the implementation of other languages still needs to be improved. Compared with the SDK for Java, the SDK for Go needs to be more detail-oriented.",[48,56676,56677],{},"When receiving an exception from the server, the Java SDK is able to identify whether the channel should be deleted or not for the exception (for example, ServerError_TooManyRequests) and recreate it if necessary. By contrast, the Go client deletes the channel directly and recreates it.",[3933,56679,56681],{"id":56680},"analysis-3-go-sdk-producer-sequence-id-inconsistency","Analysis 3: Go SDK producer sequence id inconsistency",[48,56683,56684],{},"After sending messages, a Go SDK producer (written with a relatively low version) will receive broker responses. If the sequenceID in the responses is not consistent with the sequenceID at the front of the queue on the client, it will result in a disconnect.",[48,56686,56687],{},"In higher Go SDK versions, this issue and the one mentioned in Analysis 1 have been properly handled. Therefore, it is suggested that you choose the latest version of Go SDK. If you are interested, you are welcome to make contributions to the development of Pulsar Go SDK.",[3933,56689,56691],{"id":56690},"analysis-4-consumers-were-frequently-created-and-deleted-at-scale","Analysis 4: Consumers were frequently created and deleted at scale",[48,56693,56694],{},"In cluster maintenance, we adjusted the number of partitions to meet the growing business demand. The client, which did not restart, noticed the change on the server, thus creating new consumers for new partitions. We found that this was caused by an SDK bug in Java 2.6.2. Because of the bug, the client could repeatedly create a great number of consumers and delete them. To solve this, we suggest you upgrade your Java client.",[48,56696,56697],{},"In addition, our client Pods once had a similar issue of frequent restarts. After troubleshooting, we found that this was a panic error. As such, we advise you to take fault tolerance into consideration for your logic implementation to avoid potential problems.",[32,56699,56701],{"id":56700},"optimization-3-upgrade-zookeeper","Optimization 3: Upgrade ZooKeeper",[48,56703,56704,56705,56708,56709,56713],{},"Initially, we were using ZooKeeper 3.4.6 while a bug as shown in the figure below continuously occurred.\n",[384,56706],{"alt":18,"src":56707},"\u002Fimgs\u002Fblogs\u002F63b530e20312746a2ed104ca_zookeeper-bug346.png","Figure 5\nThe bug was later fixed. For more information, see the Apache Zookeeper ",[55,56710,44661],{"href":56711,"rel":56712},"https:\u002F\u002Fissues.apache.org\u002Fjira\u002Fbrowse\u002FZOOKEEPER-2044.",[264],". Therefore, we suggest that you apply a patch to fix it or upgrade ZooKeeper. In the Data Project, we upgraded to 3.6.3 and the issue was resolved.",[40,56715,56717],{"id":56716},"pulsar-cluster-maintenance-guidance","Pulsar cluster maintenance guidance",[48,56719,56720],{},"In cluster maintenance, there might be issues of timeouts, slow message production and consumption, and large message backlogs. To improve troubleshooting efficiency, you can get started with the following perspectives:",[321,56722,56723,56726,56729,56732,56735],{},[324,56724,56725],{},"Cluster resource configurations",[324,56727,56728],{},"Client message consumption",[324,56730,56731],{},"Message acknowledgment information",[324,56733,56734],{},"Thread status",[324,56736,56737],{},"Log analysis",[48,56739,56740],{},"Let’s take a closer look at each of them.",[32,56742,56725],{"id":56743},"cluster-resource-configurations",[48,56745,56746],{},"First, check whether the current resource configurations are sufficient to help your cluster handle the workload. This can be analyzed by checking CPU, memory, and disk IO information of brokers, bookies, and ZooKeeper on the Pulsar cluster dashboard. Second, check the GC state of Java processes, especially those with frequent Full GC. Make timely decisions to put more resources into your cluster if necessary.",[32,56748,56728],{"id":56749},"client-message-consumption",[48,56751,56752],{},"The client may have a backpressure issue resulting from a lack of consumption activities. In this scenario, message reception is rather slow or no messages can be received even though there are consumer processes.",[48,56754,56755,56756,56761],{},"You can use the pulsar-admin CLI tool to ",[55,56757,56760],{"href":56758,"rel":56759},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fadmin-api-topics\u002F#get-stats",[264],"see detailed statistics"," of the impacted topic (pulsar-admin topics stats). In the output, pay special attention to the following fields:",[321,56763,56764,56767,56770,56773],{},[324,56765,56766],{},"unackedMessages: The number of delivered messages that are yet to be acknowledged.",[324,56768,56769],{},"msgBacklog: The number of messages in the subscription backlog that are yet to be delivered.",[324,56771,56772],{},"type: The subscription type.",[324,56774,56775],{},"blockedConsumerOnUnackedMsgs: Whether or not the consumer is blocked because of too many unacknowledged messages.",[48,56777,56778],{},"If there are too many unacknowledged messages, it will affect message distribution from brokers to the client. This is usually caused by code in the business application, resulting in messages not getting acknowledged.",[32,56780,56731],{"id":56781},"message-acknowledgment-information",[48,56783,56784],{},"If producers spend too much time producing messages, check your system configurations as well as acknowledgment hole information. For the latter, use pulsar-admin topics stats-internal to see the status of a topic and check the value of the field individuallyDeletedMessages in the subscription.",[48,56786,56787],{},"Pulsar uses a ledger inside BookKeeper (also known as the “cursor ledger”) for each subscriber to track message acknowledgments. After a consumer has processed a message, it sends an acknowledgment to the broker, which then updates the cursor ledger for the consumer’s subscription. If there is too much acknowledgment information sent for storage, it will put greater pressure on bookies, increasing the time spent producing messages.",[32,56789,56734],{"id":56790},"thread-status",[48,56792,56793],{},"You can check the status of threads running on brokers, especially the pulsar-io, bookkeeper-io, and bookkeeper-ml-workers-OrderedExecutor thread pool. It is possible that some thread pools do not have sufficient resources or a thread in a certain pool has been occupied for a long time.",[48,56795,56796],{},"To do so, use the top -p PID H command to list the threads with high CPU usage and then locate the specific thread based on the jstack information.",[32,56798,56737],{"id":56799},"log-analysis",[48,56801,56802],{},"If you still cannot find the reason for your problem, check logs (for example, those in clients, brokers, and bookies) in detail. Try to find any valuable information and analyze it while also taking into account the features of your business, scenarios when the problem occurs, and recent events.",[40,56804,56805],{"id":17129},"Looking ahead",[48,56807,56808],{},"In this post, we have analyzed some common issues we faced in our deployment regarding cluster maintenance and detailed solutions that worked for us. As we continue to work on cluster optimization and make further contributions to the community, we hope these insights provide a solid foundation for future work.",[48,56810,56811],{},"In our use case, for example, we have a considerable number of clients interacting with a single topic, which means there are numerous producers and consumers. This has further raised the bar for the Pulsar SDK, which needs to be more detail-oriented. Even a trivial issue that may lead to disconnections or reconnections could affect the entire system. This is also where the Pulsar Go SDK needs to be continuously upgraded. In this connection, the Tencent TEG MQ team has been an active driving force to work with the community in improving the Pulsar Go SDK.",[48,56813,56814],{},"Additionally, our Data Project contains a large amount of metadata information that needs to be processed on brokers. This means that we have to keep improving broker configurations and making adjustments to further enhance reliability and stability. In addition to scaling machines with more resources, we also plan to optimize configurations in Pulsar read\u002Fwrite threads, entry caches, bookie write\u002Fwrite caches, and bookie read\u002Fwrite threads.",[40,56816,40413],{"id":36476},[321,56818,56819,56823,56831,56836],{},[324,56820,45216,56821,47757],{},[55,56822,38404],{"href":45219},[324,56824,47760,56825,1154,56828,45209],{},[55,56826,47764],{"href":45463,"rel":56827},[264],[55,56829,47768],{"href":45206,"rel":56830},[264],[324,56832,45223,56833,45227],{},[55,56834,31914],{"href":31912,"rel":56835},[264],[324,56837,36219,56838,49940],{},[55,56839,38410],{"href":27690},[48,56841,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":56843},[56844,56845,56846,56847,56853,56860,56861],{"id":56425,"depth":19,"text":56426},{"id":19155,"depth":19,"text":19156},{"id":56455,"depth":19,"text":56456},{"id":56476,"depth":19,"text":56477,"children":56848},[56849,56850,56851,56852],{"id":56486,"depth":279,"text":56487},{"id":56496,"depth":279,"text":56497},{"id":56636,"depth":279,"text":56637},{"id":56700,"depth":279,"text":56701},{"id":56716,"depth":19,"text":56717,"children":56854},[56855,56856,56857,56858,56859],{"id":56743,"depth":279,"text":56725},{"id":56749,"depth":279,"text":56728},{"id":56781,"depth":279,"text":56731},{"id":56790,"depth":279,"text":56734},{"id":56799,"depth":279,"text":56737},{"id":17129,"depth":19,"text":56805},{"id":36476,"depth":19,"text":40413},"2022-08-18","Recently, an internal team working on messaging queuing (MQ) solutions at Tencent developed a performance analysis system (“Data Project”) for maintenance metrics. The Data Project uses Apache Pulsar as its message system. The Pulsar clusters managed by the Data Project produce a greater number of messages than any other cluster managed by the MQ team.","\u002Fimgs\u002Fblogs\u002F63c7c234f86b5b755c37354d_63b53070e64eae9f14d4b108_client-optimization-top-.jpeg",{},{"title":43584,"description":56863},"blog\u002Fclient-optimization-how-tencent-maintains-apache-pulsar-clusters-100-billion-messages-daily",[35559,821],"Dj87kiwafySZyRzxOWxfbqekGlAX6wJst_0Dc1tX-cQ",{"id":56871,"title":56872,"authors":56873,"body":56874,"category":821,"createdAt":290,"date":57061,"description":57062,"extension":8,"featured":294,"image":57063,"isDraft":294,"link":290,"meta":57064,"navigation":7,"order":296,"path":45187,"readingTime":3556,"relatedResources":290,"seo":57065,"stem":57066,"tags":57067,"__hash__":57068},"blogs\u002Fblog\u002Fannouncing-delta-lake-sink-connector-apache-pulsar.md","Announcing the Delta Lake Sink Connector for Apache Pulsar",[809],{"type":15,"value":56875,"toc":57050},[56876,56879,56882,56886,56889,56892,56895,56899,56907,56913,56917,56920,56922,56925,56928,56932,56935,56946,56950,56952,56954,56970,56975,56977,56982,56988,56992,56998,57001,57013,57015,57018,57047],[48,56877,56878],{},"Apache Pulsar™ and lakehouse technologies are a natural fit for their scalability and accessibility across a large range of data sets and use cases. Today we’re introducing a new Apache Pulsar + Delta Lake connector that provides one API for real-time and lakehouse systems. The Pulsar + Delta Lake connector enables organizations to build real-time engineering solutions for analytics and ML\u002FAI that are simple, open, and multi-cloud.",[48,56880,56881],{},"Before we dive into the new connector and why you should use it, let’s look at why lakehouse technology adoption is on the rise.",[40,56883,56885],{"id":56884},"why-lakehouse-technologies-pulsar","Why Lakehouse Technologies + Pulsar",[48,56887,56888],{},"Lakehouse technologies enable companies to turn data into actionable insights by making application data and events easy to process. A lakehouse combines data lake capabilities with transactions and high-level data management utilities, which you can integrate with existing systems to power traditional BI, batch, and AI\u002FML use cases in one platform. However, a lakehouse needs the ability to ingest and activate data in real time.",[48,56890,56891],{},"Pulsar is the real-time data platform designed to solve both complex messaging workloads and simplify building end-to-end data pipelines. Its out-of-box connectors and serverless functions in Python, Java, and Go make it a good fit for lakehouse technologies.",[48,56893,56894],{},"We’re excited to introduce the Delta Lake connector to connect these two powerful technologies (lakehouses and Pulsar). The Delta Lake connector allows companies to solve for minimal data latency and easily deliver real-time engineering to lakehouses with a seamless, single-API experience. This connector is part of our plan to create a Pulsar ecosystem that can serve as the universal and sustainable hub of computing for events, enabling new productivity and innovation.",[40,56896,56898],{"id":56897},"what-is-the-delta-lake-sink-connector","What is the Delta Lake Sink connector?",[48,56900,3600,56901,56906],{},[55,56902,56905],{"href":56903,"rel":56904},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Flakehouse-sink\u002F2.9.2\u002F",[264],"Delta Lake Sink connector"," is a Pulsar IO connector that pulls data from Apache Pulsar topics and persists data to Delta Lake.",[48,56908,56909],{},[384,56910],{"alt":56911,"src":56912},"pulsar and delta lake logo","\u002Fimgs\u002Fblogs\u002F63b4087bcbf5ed0335e0dc83_pulsar-and-delta-lake.png",[40,56914,56916],{"id":56915},"why-develop-the-delta-lake-sink-connector","Why develop the Delta Lake Sink connector?",[48,56918,56919],{},"In the last 5 years, the rise of streaming data and the need for lower data latency have pushed data lakes to their limits. As a result, lakehouse architectures, a term coined by Databricks and implemented via Delta Lake as well as other technologies such as Apache Hudi and Apache Iceberg, have seen rapid adoption. Lakehouse architectures provide streaming ingest of data, tools for dealing with schema and schema evolution, improved metadata management and open standards to ease integration across a range of data processing systems.",[48,56921,51074],{},[48,56923,56924],{},"StreamNative, a company that provides a unified messaging and streaming platform powered by Apache Pulsar, built the Delta Lake Sink Connector to provide Delta Lake users with a way to connect the flow of messages from Pulsar and use more powerful features, while avoiding problems with connectivity that can appear when there are intrinsic differences1 between systems or privacy requirements.",[48,56926,56927],{},"The connector solves this problem by fully integrating with Pulsar (including, its serverless functions, per-message processing, and event-stream processing). The connector presents a low-code solution with out-of-the-box capabilities such as multi-tenant connectivity, geo-replication, protocols for direct connection to end-user mobile or IoT clients, and more.",[40,56929,56931],{"id":56930},"what-are-the-benefits-of-using-the-delta-lake-sink-connector","What are the benefits of using the Delta Lake Sink connector?",[48,56933,56934],{},"The integration between Delta Lake and Apache Pulsar provides three key benefits.",[321,56936,56937,56940,56943],{},[324,56938,56939],{},"Simplicity: Quickly move data from Apache Pulsar to Delta Lake without any user code.",[324,56941,56942],{},"Efficiency: Reduce your time in configuring the data layer. This means you have more time to discover the maximum business value from real-time data in an effective way.",[324,56944,56945],{},"Flexibility: Run in different modes (standalone or distributed). This allows you to build reactive data pipelines to meet the business and operational needs in real time.",[40,56947,56949],{"id":56948},"how-do-i-get-started-with-the-delta-lake-sink-connector","How do I get started with the Delta Lake Sink connector?",[32,56951,10104],{"id":10103},[48,56953,51103],{},[1666,56955,56956,56964],{},[324,56957,51108,56958,51114,56961,56963],{},[55,56959,51113],{"href":51111,"rel":56960},[264],[55,56962,3550],{"href":45479},", which provides an easy-to-use and fully-managed Pulsar service in the public cloud.",[324,56965,56966,56967,51124],{},"Set up the Delta Lake Sink connector. Download the connector from the ",[55,56968,39589],{"href":34792,"rel":56969},[264],[48,56971,39596,56972,48708],{},[55,56973,20384],{"href":39599,"rel":56974},[264],[32,56976,39605],{"id":39604},[1666,56978,56979],{},[324,56980,56981],{},"Create a configuration file named delta-lake-sink-config.json to send the public\u002Fdefault\u002Ftest-delta-pulsar topic messages from Apache Pulsar to the Delta Lake table with the location of s3a:\u002F\u002Ftest-dev-us-west-2\u002Flakehouse\u002Fdelta_sink:",[8325,56983,56986],{"className":56984,"code":56985,"language":8330},[8328],"{\n    \"tenant\":\"public\",\n    \"namespace\":\"default\",\n    \"name\":\"delta_sink\",\n    \"parallelism\":1,\n    \"inputs\": [\n      \"test-delta-pulsar\"\n    ],\n    \"archive\": \"connectors\u002Fpulsar-io-lakehouse-{{connector:version}}.nar\",\n    \"processingGuarantees\":\"EFFECTIVELY_ONCE\",\n    \"configs\":{\n        \"type\":\"delta\",\n        \"maxCommitInterval\":120,\n        \"maxRecordsPerCommit\":10000000,\n        \"tablePath\": \"s3a:\u002F\u002Ftest-dev-us-west-2\u002Flakehouse\u002Fdelta_sink\",\n        \"hadoop.fs.s3a.aws.credentials.provider\": \"com.amazonaws.auth.DefaultAWSCredentialsProviderChain\"\n    }\n}\n",[4926,56987,56985],{"__ignoreMap":18},[1666,56989,56990],{"start":19},[324,56991,51147],{},[8325,56993,56996],{"className":56994,"code":56995,"language":8330},[8328],"$PULSAR_HOME\u002Fbin\u002Fpulsar-admin sinks localrun --sink-config-file \u002Fpath\u002Fto\u002Fdelta-lake-sink-config.json\n",[4926,56997,56995],{"__ignoreMap":18},[48,56999,57000],{},"When you send a message to the public\u002Fdefault\u002Ftest-delta-pulsar topic of Apache Pulsar, this message is persisted to the Delta Lake table with the location of s3a:\u002F\u002Ftest-dev-us-west-2\u002Flakehouse\u002Fdelta_sink.",[48,57002,54376,57003,57006,57007,57012],{},[55,57004,38697],{"href":48603,"rel":57005},[264]," and this ",[55,57008,57011],{"href":57009,"rel":57010},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-lakehouse\u002Fblob\u002Fmaster\u002Fdocs\u002Fdelta-lake-demo.md",[264],"document and demo video"," to see how to run the Delta Lake Sink connector.",[40,57014,48857],{"id":48856},[48,57016,57017],{},"The Delta Lake Sink connector is a major step in the journey of integrating Lakehouse systems into the Pulsar ecosystem. To get involved with the Delta Lake Sink connector for Apache Pulsar, check out the following featured resources:",[321,57019,57020,57029,57035],{},[324,57021,57022,57023,39659,57026,48874],{},"Try out the Delta Lake Sink connector. To get started, ",[55,57024,36195],{"href":48868,"rel":57025},[264],[55,57027,39663],{"href":48872,"rel":57028},[264],[324,57030,57031,57032,39673],{},"Make a contribution. The Delta Lake Sink connector is a community-driven service, which hosts its source code on the StreamNative GitHub repository. If you have any feature requests or bug reports, do not hesitate to ",[55,57033,39672],{"href":48880,"rel":57034},[264],[324,57036,39676,57037,57040,57041,39687,57044,39692],{},[55,57038,39680],{"href":48880,"rel":57039},[264],", send email to the ",[55,57042,39686],{"href":39684,"rel":57043},[264],[55,57045,39691],{"href":33664,"rel":57046},[264],[48,57048,57049],{},"1Intrinsic differences exist between platforms that have no notion of schema and the ones that have sophisticated schema capabilities because there is no simple way to translate between them. These platform differences range from traditional messaging like Amazon SQS to multi-level hierarchical Avro schema written to a data lake. Distinctions also exist between platforms relying on different data representations, such as Pandas DataFrames and simple messages.",{"title":18,"searchDepth":19,"depth":19,"links":57051},[57052,57053,57054,57055,57056,57060],{"id":56884,"depth":19,"text":56885},{"id":56897,"depth":19,"text":56898},{"id":56915,"depth":19,"text":56916},{"id":56930,"depth":19,"text":56931},{"id":56948,"depth":19,"text":56949,"children":57057},[57058,57059],{"id":10103,"depth":279,"text":10104},{"id":39604,"depth":279,"text":39605},{"id":48856,"depth":19,"text":48857},"2022-08-17","This blog introduces a new Apache Pulsar + Delta Lake connector that provides one API for real-time and lakehouse systems. The Pulsar + Delta Lake connector enables organizations to build real-time engineering solutions for analytics and ML\u002FAI that are simple, open, and multi-cloud.","\u002Fimgs\u002Fblogs\u002F63c7c24ea1f736aa2f1401a5_63b4087b105e27f46a62cab9_delta-lake-pulsar-top.jpeg",{},{"title":56872,"description":57062},"blog\u002Fannouncing-delta-lake-sink-connector-apache-pulsar",[302,28572],"E2yG-TJBcrvQd_I1eaeBVnceqjxXAzcwD3i_GBP6s34",{"id":57070,"title":43234,"authors":57071,"body":57072,"category":821,"createdAt":290,"date":57298,"description":57299,"extension":8,"featured":294,"image":57300,"isDraft":294,"link":290,"meta":57301,"navigation":7,"order":296,"path":43233,"readingTime":33204,"relatedResources":290,"seo":57302,"stem":57303,"tags":57304,"__hash__":57305},"blogs\u002Fblog\u002Fintroducing-pulsar-resources-operator-kubernetes.md",[24776,54016],{"type":15,"value":57073,"toc":57287},[57074,57081,57085,57088,57110,57114,57124,57126,57128,57130,57151,57153,57155,57160,57165,57170,57176,57181,57187,57196,57200,57203,57206,57211,57217,57222,57228,57232,57235,57240,57246,57251,57257,57259,57262],[48,57075,57076,57077,57080],{},"We are excited to announce the release of ",[55,57078,51919],{"href":20667,"rel":57079},[264]," as an open-source project under the Apache License V2. The Pulsar Resources Operator provides declarative management of key Pulsar resources on Kubernetes.",[40,57082,57084],{"id":57083},"what-is-the-pulsar-resources-operator","What is the Pulsar Resources Operator?",[48,57086,57087],{},"The Pulsar Resources Operator is an independent controller that automatically manages Pulsar resources on Kubernetes using manifest files. The Pulsar Resources Operator provides full lifecycle management for the following Pulsar resources, including creation, update, and deletion:",[321,57089,57090,57095,57100,57105],{},[324,57091,57092],{},[55,57093,42839],{"href":42837,"rel":57094},[264],[324,57096,57097],{},[55,57098,42846],{"href":42844,"rel":57099},[264],[324,57101,57102],{},[55,57103,42853],{"href":42851,"rel":57104},[264],[324,57106,57107],{},[55,57108,42860],{"href":42858,"rel":57109},[264],[40,57111,57113],{"id":57112},"why-do-you-need-the-pulsar-resources-operator","Why do you need the Pulsar Resources Operator?",[48,57115,57116,57117,4003,57120,57123],{},"While you can manage Pulsar resources with CLI tools such as ",[55,57118,38169],{"href":42817,"rel":57119},[264],[55,57121,34522],{"href":42821,"rel":57122},[264],", or a client SDK, these are not the best practice when you are running a Pulsar cluster on Kubernetes. It’s very easy and useful to create a Pulsar resource by applying its manifest files, especially if you want to initialize some basic Pulsar resources in your CI workflow when creating a new Pulsar cluster.",[40,57125,42873],{"id":42872},[48,57127,42876],{},[32,57129,10104],{"id":10103},[321,57131,57132,57138,57143,57146],{},[324,57133,42883,57134,57137],{},[55,57135,42888],{"href":42886,"rel":57136},[264]," (v1.16 - v1.24), compatible with your cluster (+\u002F- 1 minor release from your cluster).",[324,57139,42883,57140,42897],{},[55,57141,42896],{"href":42894,"rel":57142},[264],[324,57144,57145],{},"Prepare a Kubernetes cluster (v1.16 - v1.24).",[324,57147,42903,57148,190],{},[55,57149,42908],{"href":42906,"rel":57150},[264],[32,57152,42912],{"id":42911},[48,57154,42915],{},[1666,57156,57157],{},[324,57158,57159],{},"Add the StreamNative chart repository.",[8325,57161,57163],{"className":57162,"code":44195,"language":8330},[8328],[4926,57164,44195],{"__ignoreMap":18},[1666,57166,57167],{"start":19},[324,57168,57169],{},"Install the operator using the pulsar-resources-operator Helm chart.",[8325,57171,57174],{"className":57172,"code":57173,"language":8330},[8328],"helm install  streamnative\u002Fpulsar-resources-operator -n  --create-namespace\nkubectl get pods -n \n",[4926,57175,57173],{"__ignoreMap":18},[1666,57177,57178],{},[324,57179,57180],{},"When you want to upgrade the operator, use the following commands. You need to pull the chart file, decompress the tgz file, and then apply the crds.",[8325,57182,57185],{"className":57183,"code":57184,"language":8330},[8328],"helm repo update\nhelm pull streamnative\u002Fpulsar-resources-operator\ntar -zxvf pulsar-resources-operator-v0.1.0.tgz\nkubectl apply -f pulsar-resources-operator\u002Fcrds\nhelm upgrade  streamnative\u002Fpulsar-resources-operator -n \n",[4926,57186,57184],{"__ignoreMap":18},[48,57188,57189,57190,57195],{},"For more details about the installation, see the ",[55,57191,57194],{"href":57192,"rel":57193},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator#installation",[264],"Installation section"," on GitHub.",[40,57197,57199],{"id":57198},"create-the-pulsarconnection","Create the PulsarConnection",[48,57201,57202],{},"The PulsarConnection covers the address of the Pulsar cluster and the authentication information. The Operator uses it to access the Pulsar cluster to create other resources.",[48,57204,57205],{},"To create the PulsarConnection:",[1666,57207,57208],{},[324,57209,57210],{},"Define a connection named pulsar-connection that contains the fields shown in the file below.",[8325,57212,57215],{"className":57213,"code":57214,"language":8330},[8328],"apiVersion: pulsar.streamnative.io\u002Fv1alpha1\nkind: PulsarConnection\nmetadata:\n  name: pulsar-connection\n  namespace: \nspec:\n  adminServiceURL: http:\u002F\u002Fpulsar-sn-platform-broker.test.svc.cluster.local:8080\n",[4926,57216,57214],{"__ignoreMap":18},[1666,57218,57219],{},[324,57220,57221],{},"Apply the YAML file to create the PulsarConnection and check the status.",[8325,57223,57226],{"className":57224,"code":57225,"language":8330},[8328],"kubectl  apply -f connection.yaml\n\nkubectl get pulsarconnections -n \nNAME              ADMIN_SERVICE_URL   GENERATION   OBSERVED_GENERATION   READY\npulsar-connection   http:\u002F\u002Fpulsar-xxxx:8080 1            1                True\n",[4926,57227,57225],{"__ignoreMap":18},[40,57229,57231],{"id":57230},"create-pulsar-resources","Create Pulsar resources",[48,57233,57234],{},"The Pulsar Resources Operator allows you to quickly create Pulsar resources (for example, PulsarTenant and PulsarNamespace) on Kubernetes using YAML files. The following example demonstrates how to create a PulsarTenant object on Kubernetes.",[1666,57236,57237],{},[324,57238,57239],{},"Create a YAML file named pulsar-tenant that contains the fields shown below.",[8325,57241,57244],{"className":57242,"code":57243,"language":8330},[8328],"apiVersion: pulsar.streamnative.io\u002Fv1alpha1\nkind: PulsarTenant\nmetadata:\n  name: pulsar-tenant\n  namespace: \nspec:\n  name: pulsar-tenant\n  connectionRef:\n    name: pulsar-connection\n  adminRoles:\n  - admin\n  - ops\n",[4926,57245,57243],{"__ignoreMap":18},[1666,57247,57248],{"start":19},[324,57249,57250],{},"Apply the YAML file to create the tenant and check the status.",[8325,57252,57255],{"className":57253,"code":57254,"language":8330},[8328],"kubectl apply -f tenant.yaml\n\nkubectl get pulsartenants -n \nNAME              RESOURCE_NAME   GENERATION   OBSERVED_GENERATION   READY\npulsar-tenant      pulsar-tenant      1                1               True\n",[4926,57256,57254],{"__ignoreMap":18},[40,57258,40413],{"id":36476},[48,57260,57261],{},"Check out the following resources to learn more about the Pulsar Resources Operator.",[321,57263,57264,57276],{},[324,57265,57266,57267,4003,57271,57275],{},"Documentation. See the ",[55,57268,57270],{"href":20667,"rel":57269},[264],"GitHub repository",[55,57272,18312],{"href":57273,"rel":57274},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-resources-operator#tutorial",[264]," to understand different configurations of the Pulsar resources.",[324,57277,39676,57278,48888,57281,39687,57284,39692],{},[55,57279,39680],{"href":20667,"rel":57280},[264],[55,57282,39686],{"href":39684,"rel":57283},[264],[55,57285,39691],{"href":33664,"rel":57286},[264],{"title":18,"searchDepth":19,"depth":19,"links":57288},[57289,57290,57291,57295,57296,57297],{"id":57083,"depth":19,"text":57084},{"id":57112,"depth":19,"text":57113},{"id":42872,"depth":19,"text":42873,"children":57292},[57293,57294],{"id":10103,"depth":279,"text":10104},{"id":42911,"depth":279,"text":42912},{"id":57198,"depth":19,"text":57199},{"id":57230,"depth":19,"text":57231},{"id":36476,"depth":19,"text":40413},"2022-08-15","We are excited to announce the release of Pulsar Resources Operator as an open-source project under the Apache License V2. The Pulsar Resources Operator provides declarative management of key Pulsar resources on Kubernetes.","\u002Fimgs\u002Fblogs\u002F63c7e6a34948ed4874da2464_63b406fafefdbc39bfa5c473_pulsar-kubernetes-top-1-.jpeg",{},{"title":43234,"description":57299},"blog\u002Fintroducing-pulsar-resources-operator-kubernetes",[302,821,16985],"-cmdR0buDlD2Uec0eyuFxtdnaZy76lHHWBjQxl0QfTk",{"id":57307,"title":57308,"authors":57309,"body":57310,"category":821,"createdAt":290,"date":57532,"description":57533,"extension":8,"featured":294,"image":57534,"isDraft":294,"link":290,"meta":57535,"navigation":7,"order":296,"path":45183,"readingTime":11180,"relatedResources":290,"seo":57536,"stem":57537,"tags":57538,"__hash__":57539},"blogs\u002Fblog\u002Fannouncing-google-cloud-bigquery-sink-connector-apache-pulsar.md","Announcing the Google Cloud BigQuery Sink Connector for Apache Pulsar",[6969],{"type":15,"value":57311,"toc":57523},[57312,57315,57319,57327,57333,57337,57345,57348,57368,57372,57375,57388,57392,57394,57397,57422,57427,57429,57434,57440,57444,57450,57455,57462,57466,57469,57498,57500],[48,57313,57314],{},"We are excited to announce the general availability of the Google Cloud BigQuery sink connector for Apache Pulsar. This connector seamlessly synchronizes Pulsar data to BigQuery in real time, enabling Google Cloud BigQuery to leverage Pulsar and expanding the Apache Pulsar ecosystem.",[40,57316,57318],{"id":57317},"what-is-the-google-cloud-bigquery-sink-connector","What is the Google Cloud BigQuery sink connector?",[48,57320,3600,57321,57326],{},[55,57322,57325],{"href":57323,"rel":57324},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-bigquery",[264],"Google BigQuery sink connector"," pulls data from Pulsar topics and persists data to Google Cloud BigQuery tables.",[48,57328,57329],{},[384,57330],{"alt":57331,"src":57332},"logo pulsar and google cloud bgquery","\u002Fimgs\u002Fblogs\u002F63b405dccbf5ed5ac3dea3a7_bigquery-sink-pulsar.png",[32,57334,57336],{"id":57335},"why-develop-the-google-cloud-bigquery-sink-connector","Why develop the Google Cloud BigQuery sink connector?",[48,57338,57339,57344],{},[55,57340,57343],{"href":57341,"rel":57342},"https:\u002F\u002Fcloud.google.com\u002Fbigquery\u002Fdocs\u002Fintroduction",[264],"Google Cloud BigQuery"," is a fully managed enterprise data warehouse that enables users to manage and analyze data with built-in features like machine learning, geospatial analysis, and business intelligence.",[48,57346,57347],{},"The Google Cloud BigQuery sink connector provides you with a way to write data from Pulsar to BigQuery in real time. It presents a low-code solution with out-of-the-box capabilities like strong fault tolerance, great scalability, automatic creation and update of table schema, partitioned tables, clustered tables, and many more.",[48,57349,57350,57351,57356,57357,57361,57362,57367],{},"Before the availability of this connector, you could only use the ",[55,57352,57355],{"href":57353,"rel":57354},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-cloud-storage",[264],"Cloud Storage Sink connector for Pulsar"," to move data to Cloud Storage. In order to perform query analysis in the form of external tables, you needed to use BigQuery (refer to ",[55,57358,57360],{"href":57359},"\u002Fblog\u002Fengineering\u002F2022-02-03-integrating-apache-pulsar-with-bigquery\u002F","Integrating Apache Pulsar with BigQuery","). However, using external tables in BigQuery has ",[55,57363,57366],{"href":57364,"rel":57365},"https:\u002F\u002Fcloud.google.com\u002Fbigquery\u002Fdocs\u002Fexternal-tables#external_table_limitations",[264],"many limitations",", such as no support for clustered tables and poor query performance. This connector enables you to write data directly from Pulsar to BigQuery and supports partitioned and aggregate tables.",[32,57369,57371],{"id":57370},"what-are-the-benefits-of-using-the-google-cloud-bigquery-sink-connector","What are the benefits of using the Google Cloud BigQuery sink connector?",[48,57373,57374],{},"The integration between Google Cloud BigQuery and Apache Pulsar provides four key benefits.",[321,57376,57377,57380,57382,57385],{},[324,57378,57379],{},"Simplicity: Quickly move data from Apache Pulsar to Google Cloud BigQuery without any user code.",[324,57381,56942],{},[324,57383,57384],{},"Scalability: Run in different modes (standalone or distributed). This allows you to build reactive data pipelines to meet the business and operational needs in real time.",[324,57386,57387],{},"Auto Schema: Automatically create and update a table’s schema based on the Pulsar topic schema.",[32,57389,57391],{"id":57390},"how-to-get-started-with-the-google-cloud-bigquery-sink-connector","How to get started with the Google Cloud BigQuery sink connector",[3933,57393,10104],{"id":10103},[48,57395,57396],{},"First, you must run an Apache Pulsar cluster and a Google Cloud BigQuery service.",[1666,57398,57399,57405,57414],{},[324,57400,57401,57402,22220],{},"Prepare the Pulsar service. You can quickly run a Pulsar cluster anywhere by running $PULSAR_HOME\u002Fbin\u002Fpulsar standalone. Refer to the ",[55,57403,7120],{"href":39571,"rel":57404},[264],[324,57406,57407,57408,57413],{},"Prepare the Google Cloud BigQuery service. See ",[55,57409,57412],{"href":57410,"rel":57411},"https:\u002F\u002Fcloud.google.com\u002Fbigquery\u002Fdocs\u002Fquickstarts",[264],"Google Cloud BigQuery Quickstarts"," for details. Note that you need to set up the GOOGLE_APPLICATION_CREDENTIALS environment variable to access Google BigQuery.",[324,57415,57416,57417,57421],{},"Set up the Google BigQuery connector. Download the connector from the ",[55,57418,39589],{"href":57419,"rel":57420},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-bigquery\u002Freleases",[264]," page, and then move the jar package to $PULSAR_HOME\u002Fconnectors.",[48,57423,39596,57424,39601],{},[55,57425,20384],{"href":39599,"rel":57426},[264],[3933,57428,39605],{"id":39604},[1666,57430,57431],{},[324,57432,57433],{},"Create a configuration file named google-bigquery-sink-config.json. The configured connector writes the message in the public\u002Fdefault\u002Fgoogle-bigquery-pulsar topic to the test-pulsar table of BigQuery.",[8325,57435,57438],{"className":57436,"code":57437,"language":8330},[8328],"\n{\n     \"name\": \"google-bigquery-sink\",\n     \"archive\": \"$PULSAR_HOME\u002Fconnectors\u002Fpulsar-io-bigquery-{{connector:version}}.jar\",\n     \"className\": \"org.apache.pulsar.ecosystem.io.bigquery.BigQuerySink\",\n     \"tenant\": \"public\",\n     \"namespace\": \"default\",\n     \"inputs\": [\n       \"google-bigquery-pulsar\"\n     ],\n     \"parallelism\": 1,\n     \"configs\": {\n       \"projectId\": \"SECRETS\",\n       \"datasetName\": \"pulsar-io-google-bigquery\",\n       \"tableName\": \"test-pulasr\"\n   }\n }\n \n",[4926,57439,57437],{"__ignoreMap":18},[1666,57441,57442],{"start":19},[324,57443,39621],{},[8325,57445,57448],{"className":57446,"code":57447,"language":8330},[8328],"\nPULSAR_HOME\u002Fbin\u002Fpulsar-admin sinks localrun \\\n--sink-config-file google-bigquery-sink-config.json\n\n",[4926,57449,57447],{"__ignoreMap":18},[1666,57451,57452],{"start":279},[324,57453,57454],{},"You can send messages to the public\u002Fdefault\u002Fgoogle-bigquery-pulsar topic, then view it in BigQuery.",[48,57456,39639,57457,190],{},[55,57458,57461],{"href":57459,"rel":57460},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Fgoogle-bigquery-sink\u002F2.10.1\u002F",[264],"Google Cloud BigQuery Sink documentation",[40,57463,57465],{"id":57464},"how-can-you-get-involved","How can you get involved?",[48,57467,57468],{},"The Google BigQuery sink connector is a major step in the journey of integrating Pulsar with other big data systems. To get involved with the Google Cloud BigQuery sink connector for Apache Pulsar, check out the following featured resources:",[321,57470,57471,57480,57487],{},[324,57472,57473,57474,39659,57477,48874],{},"Try out the Google BigQuery sink connector. To get started, ",[55,57475,36195],{"href":57419,"rel":57476},[264],[55,57478,39663],{"href":57323,"rel":57479},[264],[324,57481,57482,57483,39673],{},"Make a contribution. The Google BigQuery sink connector is a community-driven service, which hosts its source code on the StreamNative GitHub repository. If you have any feature requests or bug reports, do not hesitate to ",[55,57484,39672],{"href":57485,"rel":57486},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-bigquery\u002Fissues\u002Fnew\u002Fchoose",[264],[324,57488,39676,57489,39681,57492,39687,57495,39692],{},[55,57490,39680],{"href":57485,"rel":57491},[264],[55,57493,39686],{"href":39684,"rel":57494},[264],[55,57496,39691],{"href":33664,"rel":57497},[264],[40,57499,40413],{"id":36476},[321,57501,57502,57506,57514,57519],{},[324,57503,45216,57504,47757],{},[55,57505,38404],{"href":45219},[324,57507,47760,57508,1154,57511,45209],{},[55,57509,47764],{"href":45463,"rel":57510},[264],[55,57512,47768],{"href":45206,"rel":57513},[264],[324,57515,45223,57516,45227],{},[55,57517,31914],{"href":31912,"rel":57518},[264],[324,57520,36219,57521,49940],{},[55,57522,38410],{"href":27690},{"title":18,"searchDepth":19,"depth":19,"links":57524},[57525,57530,57531],{"id":57317,"depth":19,"text":57318,"children":57526},[57527,57528,57529],{"id":57335,"depth":279,"text":57336},{"id":57370,"depth":279,"text":57371},{"id":57390,"depth":279,"text":57391},{"id":57464,"depth":19,"text":57465},{"id":36476,"depth":19,"text":40413},"2022-08-03","Read about how this connector seamlessly synchronizes Pulsar data to BigQuery in real time, enabling Google Cloud BigQuery to leverage Pulsar.","\u002Fimgs\u002Fblogs\u002F63c7f9a3eabe6e290dbe9cd8_63b405db093c6babe76356f1_google-cloud-bigquery-top.jpeg",{},{"title":57308,"description":57533},"blog\u002Fannouncing-google-cloud-bigquery-sink-connector-apache-pulsar",[302,28572],"J86M5F8RDJg-w-XcUNIbH-r9P2r0f-orWgLk-CnjM14",{"id":57541,"title":57542,"authors":57543,"body":57545,"category":821,"createdAt":290,"date":57799,"description":57800,"extension":8,"featured":294,"image":57801,"isDraft":294,"link":290,"meta":57802,"navigation":7,"order":296,"path":57803,"readingTime":11508,"relatedResources":290,"seo":57804,"stem":57805,"tags":57806,"__hash__":57807},"blogs\u002Fblog\u002Fnew-apache-pulsar-2-9-3.md","What’s New in Apache Pulsar 2.9.3",[57544,42150],"Jun Ma",{"type":15,"value":57546,"toc":57782},[57547,57550,57553,57561,57565,57574,57577,57580,57584,57587,57596,57599,57602,57610,57613,57621,57630,57633,57636,57639,57642,57651,57654,57657,57660,57663,57672,57675,57678,57682,57685,57688,57691,57723,57726,57729,57731,57739,57747,57763,57765,57780],[48,57548,57549],{},"The Apache Pulsar community releases version 2.9.3! 53 contributors provided improvements and bug fixes that delivered 200+ commits. Thanks for all your contributions.",[48,57551,57552],{},"The highlight of the 2.9.3 release is introducing 30+ transaction fixes and improvements. Earlier-adoption users of Pulsar transactions have documented long-term use in their production environments and reported valuable findings in real applications. This provides the Pulsar community with the opportunity to make a difference.",[48,57554,57555,57556,190],{},"This blog walks through the most noteworthy changes. For the complete list including all feature enhancements and bug fixes, check out the ",[55,57557,57560],{"href":57558,"rel":57559},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002Fversioned\u002Fpulsar-2.9.3\u002F",[264],"Pulsar 2.9.3 Release Notes",[40,57562,57564],{"id":57563},"notable-enhancements-and-bug-fixes","Notable enhancements and bug fixes",[32,57566,57568,57569],{"id":57567},"enabled-cursor-data-compression-to-reduce-persistent-cursor-data-size-14542","Enabled cursor data compression to reduce persistent cursor data size. ",[55,57570,57573],{"href":57571,"rel":57572},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F14542",[264],"14542",[3933,57575,57576],{"id":44661},"Issue",[48,57578,57579],{},"The cursor data is managed by the ZooKeeper\u002FEtcd metadata store. When the data size increases, it may take too much time to pull the data, and brokers may end up writing large chunks of data to the ZooKeeper\u002FEtcd metadata store.",[3933,57581,57583],{"id":57582},"resolution","Resolution",[48,57585,57586],{},"Provide the ability to enable compression mechanisms to reduce cursor data size and the pulling time.",[32,57588,57590,57591],{"id":57589},"reduced-the-memory-occupied-by-metadatapositions-and-avoid-oom-15137","Reduced the memory occupied by metadataPositions and avoid OOM. ",[55,57592,57595],{"href":57593,"rel":57594},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15137",[264],"15137",[3933,57597,57576],{"id":57598},"issue-1",[48,57600,57601],{},"The map metadataPositions in MLPendingAckStore is used to clear useless data in PendingAck, where the key is the position that is persistent in PendingAck and the value is the max position acked by an operation. It judges whether the max subscription cursor position is smaller than the subscription cursor’s markDeletePosition. If the max position is smaller, then the log cursor will mark to delete the position. It causes two main issues:",[321,57603,57604,57607],{},[324,57605,57606],{},"In normal cases, this map stores all transaction ack operations. This is a waste of memory and CPU.",[324,57608,57609],{},"If a transaction that has not been committed for a long time acks a message in a later position, the map will not be cleaned up, which finally leads to OOM (out-of-memory).",[3933,57611,57583],{"id":57612},"resolution-1",[48,57614,57615,57616,190],{},"Regularly store a small amount of data according to certain rules. For more detailed implementation, refer to ",[55,57617,57620],{"href":57618,"rel":57619},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F15073",[264],"PIP-153",[32,57622,57624,57625],{"id":57623},"checked-lowwatermark-before-appending-transaction-entries-to-transaction-buffer-15424","Checked lowWaterMark before appending transaction entries to Transaction Buffer. ",[55,57626,57629],{"href":57627,"rel":57628},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15424",[264],"15424",[3933,57631,57576],{"id":57632},"issue-2",[48,57634,57635],{},"When a client sends messages using a previously committed transaction, these messages are visible to consumers unexpectedly.",[3933,57637,57583],{"id":57638},"resolution-2",[48,57640,57641],{},"Add a map to store the lowWaterMark of Transaction Coordinator in Trasanction Buffer, and check lowWaterMark before appending transaction entries to Trasanction Buffer. So when sending messages using an invalid transaction, clients will receive NotAllowedException.",[32,57643,57645,57646],{"id":57644},"fixed-the-consumption-performance-regression-pr-15162","Fixed the consumption performance regression. ",[55,57647,57650],{"href":57648,"rel":57649},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15162",[264],"PR-15162",[3933,57652,57576],{"id":57653},"issue-3",[48,57655,57656],{},"This performance regression was introduced in 2.10.0, 2.9.1, and 2.8.3. You may find a significant performance drop with message listeners while using Java Client. The root cause is each message will introduce the thread switching from the external thread pool to the internal thread poll and then to the external thread pool.",[3933,57658,57583],{"id":57659},"resolution-3",[48,57661,57662],{},"Avoid the thread switching for each message to improve consumption throughput.",[32,57664,57666,57667],{"id":57665},"fixed-a-deadlock-issue-of-topic-creation-pr-15570","Fixed a deadlock issue of topic creation. ",[55,57668,57671],{"href":57669,"rel":57670},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15570",[264],"PR-15570",[3933,57673,57576],{"id":57674},"issue-4",[48,57676,57677],{},"This deadlock issue occurred during topic creation by trying to re-acquire the same StampedLock from the same thread when removing it. This will cause the topic to stop service for a long time, and ultimately with a failure in the deduplication or geo-replication check. The workaround is restarting the broker.",[32,57679,57681],{"id":57680},"optimized-the-memory-usage-of-brokers","Optimized the memory usage of brokers.",[3933,57683,57576],{"id":57684},"issue-5",[48,57686,57687],{},"Pulsar has some internal data structures, such as ConcurrentLongLongPairHashMap, and ConcurrentLongPairHashMap, which can reduce the memory usage rather than using the Boxing type. However, in earlier versions, the data structures were not supported for shrinking even if the data was removed, which wasted a certain amount of memory in certain situations.",[48,57689,57690],{},"Pull requests",[321,57692,57693,57699,57705,57711,57717],{},[324,57694,57695],{},[55,57696,57697],{"href":57697,"rel":57698},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15354",[264],[324,57700,57701],{},[55,57702,57703],{"href":57703,"rel":57704},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15342",[264],[324,57706,57707],{},[55,57708,57709],{"href":57709,"rel":57710},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F14663",[264],[324,57712,57713],{},[55,57714,57715],{"href":57715,"rel":57716},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F14515",[264],[324,57718,57719],{},[55,57720,57721],{"href":57721,"rel":57722},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F14497",[264],[3933,57724,57583],{"id":57725},"resolution-4",[48,57727,57728],{},"Support the shrinking of the internal data structures, such as ConcurrentSortedLongPairSet, ConcurrentOpenHashMap, and so on.",[40,57730,13565],{"id":1727},[48,57732,57733,57734,57738],{},"If you are interested in learning more about Pulsar 2.9.3, you can ",[55,57735,36195],{"href":57736,"rel":57737},"https:\u002F\u002Fpulsar.apache.org\u002Fversions\u002F",[264]," and try it out now!",[48,57740,57741,57742,57746],{},"Pulsar Summit San Francisco 2022 will take place on August 18th, 2022. ",[55,57743,57745],{"href":35357,"rel":57744},[264],"Register now"," and help us make it an even bigger success by spreading the word on social media!",[48,57748,57749,57750,57753,57754,57757,57758,20076],{},"For more information about the Apache Pulsar project and current progress, visit the ",[55,57751,40821],{"href":23526,"rel":57752},[264],", follow the project on Twitter ",[55,57755,36238],{"href":36236,"rel":57756},[264],", and join ",[55,57759,57762],{"href":57760,"rel":57761},"https:\u002F\u002Fapache-pulsar.herokuapp.com\u002F",[264],"Pulsar Slack",[40,57764,39647],{"id":39646},[48,57766,57767,57768,57772,57773,57775,57776,57779],{},"To get started, you can ",[55,57769,57771],{"href":36193,"rel":57770},[264],"download Pulsar directly"," or you can spin up a Pulsar cluster with a free 30-day trial of ",[55,57774,3550],{"href":45479},"! We also offer technical consulting and expert training to help get your organization started. As always, we are highly responsive to your feedback. Feel free to ",[55,57777,24379],{"href":57778},"\u002Fen\u002Fcontact"," if you have any questions at any time. We look forward to hearing from you and stay tuned for the next Pulsar release!",[48,57781,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":57783},[57784,57797,57798],{"id":57563,"depth":19,"text":57564,"children":57785},[57786,57788,57790,57792,57794,57796],{"id":57567,"depth":279,"text":57787},"Enabled cursor data compression to reduce persistent cursor data size. 14542",{"id":57589,"depth":279,"text":57789},"Reduced the memory occupied by metadataPositions and avoid OOM. 15137",{"id":57623,"depth":279,"text":57791},"Checked lowWaterMark before appending transaction entries to Transaction Buffer. 15424",{"id":57644,"depth":279,"text":57793},"Fixed the consumption performance regression. PR-15162",{"id":57665,"depth":279,"text":57795},"Fixed a deadlock issue of topic creation. PR-15570",{"id":57680,"depth":279,"text":57681},{"id":1727,"depth":19,"text":13565},{"id":39646,"depth":19,"text":39647},"2022-07-27","We are excited to see the Apache Pulsar community has successfully released the 2.9.3 version! 53 contributors provided improvements and bug fixes that delivered 200+ commits. Let's walk through the most noteworthy changes!","\u002Fimgs\u002Fblogs\u002F63c7f9b36c2079bacbfa4708_63b404b762b4f76a949ff65b_top.jpeg",{},"\u002Fblog\u002Fnew-apache-pulsar-2-9-3",{"title":57542,"description":57800},"blog\u002Fnew-apache-pulsar-2-9-3",[302,821,9144],"Q5G8S_FIeSC3HQflz2uin52WOAgzTnmO8GAvSRzZ8nM",{"id":57809,"title":57810,"authors":57811,"body":57812,"category":7338,"createdAt":290,"date":57980,"description":57981,"extension":8,"featured":294,"image":57982,"isDraft":294,"link":290,"meta":57983,"navigation":7,"order":296,"path":57984,"readingTime":33204,"relatedResources":290,"seo":57985,"stem":57986,"tags":57987,"__hash__":57988},"blogs\u002Fblog\u002Fapache-pulsar-sessions-apachecon-asia-2022.md","Apache Pulsar Sessions in ApacheCon Asia 2022: Join Us Now and Check the Schedule",[41185],{"type":15,"value":57813,"toc":57967},[57814,57829,57832,57834,57837,57841,57844,57847,57851,57854,57857,57861,57864,57867,57871,57874,57877,57881,57884,57887,57891,57894,57897,57901,57904,57907,57910,57924,57926,57933,57935,57941],[48,57815,57816,57817,57822,57823,57828],{},"We are excited to invite you to ",[55,57818,57821],{"href":57819,"rel":57820},"https:\u002F\u002Fapachecon.com\u002Facasia2022\u002Findex.html",[264],"ApacheCon Asia 2022"," to explore the newest tools and tips, and connect with subject-matter experts in various Apache Pulsar-related sessions. The Apache Software Foundation will be holding ApacheCon Asia 2022 online between July 29th and July 31st, 2022. ",[55,57824,57827],{"href":57825,"rel":57826},"https:\u002F\u002Fapachecon.com\u002Facasia2022\u002Fregister.html",[264],"Register now for free"," to join us for this inspiring three-day event of cutting-edge technologies.",[48,57830,57831],{},"The conference gathers adopters, developers, engineers, and technologists from some of the most influential open source communities in the world. To date, there has been a total of over 200 proposals submitted by presenters from Intel, Huawei, Tencent Cloud, StreamNative, Sina Weibo, vivo, and many more. Nearly 50 of these sessions are related to streaming and messaging, 30 of which are focused on Apache Pulsar-based technologies.",[40,57833,52563],{"id":36414},[48,57835,57836],{},"Let’s have a quick look at some of the featured sessions in messaging and streaming, ranging from technical deep dives, best practices, to tutorials, and insights.",[32,57838,57840],{"id":57839},"flipn-awesome-streaming-with-open-source-english","FLiPN Awesome Streaming with Open Source (English)",[48,57842,57843],{},"Timothy Spann, Developer Advocate, StreamNative",[48,57845,57846],{},"In this talk, Tim will walk through how to build different types of streaming applications by using Apache NiFi, Apache Flink, Apache Spark, and Apache Pulsar together. The session will demonstrate how to ingest various data and REST feeds to enrich data and send them to Apache Pulsar. Applications will be built on top of the live streaming data with Web socket dashboards, Apache Spark SQL ETL, and Apache Flink continuous SQL.",[32,57848,57850],{"id":57849},"introducing-tableview-pulsars-database-table-abstraction-english","Introducing TableView: Pulsar's Database Table Abstraction (English)",[48,57852,57853],{},"David Kjerrumgaard, Apache Pulsar Committer, Developer Advocate, StreamNative",[48,57855,57856],{},"In many use cases, applications are using Pulsar consumers or readers to fetch all the updates from a topic and construct a map with the latest value of each key for the messages that were received. The new TableView consumer offers support for this access pattern directly in the Pulsar client API itself and encapsulates the complexities of manually constructing such local caches manually. This talk will demonstrate how to use the new TableView consumer using a simple application and discuss best practices and patterns for using the TableView consumer.",[32,57858,57860],{"id":57859},"route-to-the-next-generation-message-middleware-how-vivo-migrated-to-pulsar-mandarin","Route to the Next-Generation Message Middleware: How vivo Migrated to Pulsar (Mandarin)",[48,57862,57863],{},"Limin Quan, Jianbo Chen, Big Data Engineer, vivo",[48,57865,57866],{},"vivo has used Kafka to support its business with over 1000 billion messages per day. Now, it has migrated to Apache Pulsar as its next-generation message middleware to handle an even larger amount of data. In this talk, Quan and Chen will share the reasons behind vivo’s choice of Apache Pulsar and how vivo has worked to put Pulsar into practice (for example, migrating plans and troubleshooting tips).",[32,57868,57870],{"id":57869},"practice-and-optimization-apache-pulsar-in-tencent-cloud-mandarin","Practice and Optimization: Apache Pulsar in Tencent Cloud (Mandarin)",[48,57872,57873],{},"Xiaolong Ran, Apache Pulsar Committer, Senior R&D Engineer, Tencent Cloud",[48,57875,57876],{},"As Apache Pulsar has been put into production at scale in Tencent Cloud, it is widely used in different scenarios supporting companies and organizations across industries. To bring the experience to another level, Tencent has worked out a series of strategies in terms of optimization and stability. In this talk, Ran will focus on Tencent Cloud's work on performance optimization and shed light upon some of the best practices and troubleshooting tips for using Apache Pulsar.",[32,57878,57880],{"id":57879},"apache-pulsar-as-lakehouse-introducing-the-lakehouse-tiered-storage-integration-for-apache-pulsar-mandarin","Apache Pulsar as Lakehouse: Introducing the Lakehouse Tiered Storage Integration for Apache Pulsar (Mandarin)",[48,57882,57883],{},"Hang Chen, Apache Pulsar PMC member, Software Engineer, StreamNative",[48,57885,57886],{},"Currently, tiered storage is introduced to offload cold data, while they are managed by Apache Pulsar in a non-open format. Therefore, it is very difficult to integrate the data into other big data components, such as Flink SQL and Spark SQL. In this talk, Chen will explain how to use Lakehouse to manage offloaded data and integrate it with the cold data offloading mechanism.",[32,57888,57890],{"id":57889},"build-high-performance-apache-pulsar-with-intel-optane-persistent-memory-mandarin","Build High-Performance Apache Pulsar with Intel Optane Persistent Memory (Mandarin)",[48,57892,57893],{},"Fenghua Hu, Cloud Software Architect, Intel",[48,57895,57896],{},"Intel Optane persistent memory (PMem) is a revolutionary memory product, which features high performance, large capacity, storage persistence, and more. In this talk, Hu will demonstrate how to use Intel Optane PMem to bring Apache Pulsar’s ability of high throughput and low latency to another level and effectively cope with performance-demanding scenarios.",[32,57898,57900],{"id":57899},"the-evolution-of-apache-pulsar-as-a-message-queue-in-huawei-device-mandarin","The Evolution of Apache Pulsar as A Message Queue in Huawei Device (Mandarin)",[48,57902,57903],{},"Lin Lin, Apache Pulsar PMC member, SDE Expert, Huawei Device",[48,57905,57906],{},"Xiaotong Wang, Senior Engineer, Huawei Device",[48,57908,57909],{},"In the cloud-native era, Huawei Device is faced with many challenges in message queue infrastructure, such as maintenance difficulties in different message queue solutions, high overheads, and disaster tolerance ability building. This talk will cover Huawei Device’s experience in redesigning its message queue architecture and present some solutions to these problems.",[48,57911,57912,57913,4003,57918,57923],{},"To learn more about how companies and organizations today leverage Apache Pulsar for streaming and messaging, serverless computing, and mission-critical deployments in production, see other Apache Pulsar-related sessions in ",[55,57914,57917],{"href":57915,"rel":57916},"https:\u002F\u002Fapachecon.com\u002Facasia2022\u002Ftracks\u002Fstreaming.html",[264],"streaming",[55,57919,57922],{"href":57920,"rel":57921},"https:\u002F\u002Fapachecon.com\u002Facasia2022\u002Ftracks\u002Fmessaging.html",[264],"messaging"," tracks respectively.",[40,57925,52654],{"id":52653},[48,57927,57928,57932],{},[55,57929,57931],{"href":57825,"rel":57930},[264],"Register"," now for free.",[40,57934,40413],{"id":36476},[48,57936,57937,57938,38385],{},"As we can see from topics submitted to ApacheCon Asia 2022, Apache Pulsar has become ",[55,57939,38384],{"href":38382,"rel":57940},[264],[1666,57942,57943,57948,57952,57959],{},[324,57944,38390,57945,190],{},[55,57946,31914],{"href":31912,"rel":57947},[264],[324,57949,45476,57950,45480],{},[55,57951,3550],{"href":45479},[324,57953,57954,57955,57958],{},"Save your spot at the Pulsar Summit San Francisco. The first in-person Pulsar Summit is taking place this August! ",[55,57956,25339],{"href":35357,"rel":57957},[264]," to join the Pulsar community and the messaging and event streaming community.",[324,57960,52705,57961,36501,57964,52712],{},[55,57962,36500],{"href":36498,"rel":57963},[264],[55,57965,36505],{"href":57760,"rel":57966},[264],{"title":18,"searchDepth":19,"depth":19,"links":57968},[57969,57978,57979],{"id":36414,"depth":19,"text":52563,"children":57970},[57971,57972,57973,57974,57975,57976,57977],{"id":57839,"depth":279,"text":57840},{"id":57849,"depth":279,"text":57850},{"id":57859,"depth":279,"text":57860},{"id":57869,"depth":279,"text":57870},{"id":57879,"depth":279,"text":57880},{"id":57889,"depth":279,"text":57890},{"id":57899,"depth":279,"text":57900},{"id":52653,"depth":19,"text":52654},{"id":36476,"depth":19,"text":40413},"2022-07-22","Take a quick look at some of the featured sessions related to Apache Pulsar in ApacheCon Asia 2022 and register now for free.","\u002Fimgs\u002Fblogs\u002F63c7f9c40cb4c46286b7b641_63b403e13169772ee573a85a_apachecon-asia-2022-top.jpeg",{},"\u002Fblog\u002Fapache-pulsar-sessions-apachecon-asia-2022",{"title":57810,"description":57981},"blog\u002Fapache-pulsar-sessions-apachecon-asia-2022",[5376,821],"9J085--3QofPXk35uFOjoTvOv9Jz7mq7rxlalvTS9eY",{"id":57990,"title":57991,"authors":57992,"body":57993,"category":7338,"createdAt":290,"date":58118,"description":58119,"extension":8,"featured":294,"image":58120,"isDraft":294,"link":290,"meta":58121,"navigation":7,"order":296,"path":58122,"readingTime":4475,"relatedResources":290,"seo":58123,"stem":58124,"tags":58125,"__hash__":58126},"blogs\u002Fblog\u002Fspeakers-sponsorship-prospectus-announced-pulsar-summit-san-francisco-2022.md","Speakers and Sponsorship Prospectus Announced for Pulsar Summit San Francisco 2022",[44843,40485],{"type":15,"value":57994,"toc":58107},[57995,58003,58006,58009,58011,58015,58018,58021,58025,58028,58031,58035,58038,58041,58045,58048,58051,58055,58057,58060,58062,58081,58088,58090],[48,57996,57997,57998,190],{},"We're excited to invite you to the Pulsar Summit San Francisco 2022! Join the Apache Pulsar community in-person at Hotel Nikko on August 18th for this action-packed, one-day event. Don’t miss the opportunity to join us for four keynotes and 12 breakout sessions and to network with fellow attendees at the Happy Hour event. ",[55,57999,58002],{"href":58000,"rel":58001},"https:\u002F\u002Fwww.eventbrite.com\u002Fe\u002Fpulsar-summit-san-francisco-2022-tickets-332014162297",[264],"Save your spot",[48,58004,58005],{},"The Pulsar Summit gathers developers, architects, and data engineers to discuss the latest in real-time data streaming and message queuing. Past Pulsar Summits have featured more than 200 interactive sessions presented by tech leaders from Intuit, Micro Focus, Salesforce, Splunk, Verizon Media, Tencent, and more. The Summits garnered 2,000+ global attendees representing top technology, fintech, and media companies, such as Google, Amazon, eBay, Microsoft, American Express, LEGO, Athena Health, Paypal, and many more.",[48,58007,58008],{},"This year, Pulsar Summit San Francisco will include tech deep dives, adoption stories, best practices, and insights into Pulsar’s global adoption and thriving community. Take a sneak peek below at a few of the featured sessions:",[40,58010,36415],{"id":36414},[32,58012,58014],{"id":58013},"_1-message-redelivery-an-unexpected-journey","1. Message Redelivery: An Unexpected Journey",[48,58016,58017],{},"David Kjerrumgaard, Developer Advocate, StreamNative",[48,58019,58020],{},"Understanding Pulsar’s redelivery semantics is critical for preventing duplicate or out-of-order processing when message acknowledgments are not received. This talk walks you through the redelivery semantics, highlights some of the mechanisms available to application developers to control this behavior, and provides best practices for configuring message redelivery to suit various use cases.",[32,58022,58024],{"id":58023},"_2-is-using-kop-kafka-on-pulsar-a-good-idea","2. Is Using KoP (Kafka-On-Pulsar) a Good Idea?",[48,58026,58027],{},"Ricardo Ferreira, Senior Developer Advocate, AWS",[48,58029,58030],{},"Learn how to unlock infinite event stream retention, a rebalance-free architecture, native support for event processing, and the multi-tenancy with Apache Pulsar for your microservices written for Apache Kafka. This talk dives into the architecture of protocol handlers and how KoP (Kafka-on-Pulsar) works.",[32,58032,58034],{"id":58033},"_3-cross-the-streams-creating-streaming-data-pipelines-with-apache-flink-apache-pulsar","3. Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apache Pulsar",[48,58036,58037],{},"Caito Scherr, Developer Advocate, Ververica",[48,58039,58040],{},"Learn how to build a unified batch and streaming pipeline from scratch with Apache Pulsar and Flink SQL in this step-by-step demo. Leverage the power of Apache Flink, a high-speed, customizable stream processing engine, without the steep learning curve.",[32,58042,58044],{"id":58043},"_4-building-reliable-lakehouses-with-apache-pulsar-and-delta-lake","4. Building Reliable Lakehouses with Apache Pulsar and Delta Lake",[48,58046,58047],{},"Denny Lee, Sr. Staff Developer Advocate and Delta Lake Committer, Databricks",[48,58049,58050],{},"Explore the key features of Delta Lake that enable the Lakehouse architecture. Learn about the future ecosystem around Delta Lake, including supporting multiple languages and data processing systems.",[32,58052,58054],{"id":58053},"_5-towards-a-zookeeper-less-pulsar-etcd-etcd-etcd","5. Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd.",[48,58056,44044],{},[48,58058,58059],{},"Apache Pulsar 2.10 eliminates Pulsar’s dependency on Apache ZooKeeper and introduces a pluggable framework to enable leveraging alternative metadata and coordination systems. Learn how to utilize the existing etcd service running inside Kubernetes to act as Pulsar's metadata store, achieving a Zookeeper-less Pulsar.",[40,58061,36452],{"id":36451},[1666,58063,58064,58069],{},[324,58065,58066],{},[55,58067,36460],{"href":58000,"rel":58068},[264],[324,58070,58071,58072,58076,58077,190],{},"Become a sponsor! Learn how your company can stand out as a thought leader in the fast-growing Apache Pulsar community by becoming a Summit Sponsor. ",[55,58073,3533],{"href":58074,"rel":58075},"https:\u002F\u002F6585952.fs1.hubspotusercontent-na1.net\u002Fhubfs\u002F6585952\u002FPulsar%20Summit%20San%20Francisco%202022%20Sponsorship%20Prospectus%20v.2.pdf",[264]," about the limited Sponsorship opportunities available, and secure your sponsorship by emailing: ",[55,58078,58080],{"href":58079},"mailto:sponsors@pulsar-summit.org","sponsors@pulsar-summit.org",[916,58082,58083],{},[48,58084,58085],{},[55,58086,36473],{"href":58000,"rel":58087},[264],[40,58089,36477],{"id":36476},[1666,58091,58092,58099],{},[324,58093,36482,58094,1154,58097,36492],{},[55,58095,36487],{"href":36485,"rel":58096},[264],[55,58098,36491],{"href":36490},[324,58100,36495,58101,36501,58104,36506],{},[55,58102,36500],{"href":36498,"rel":58103},[264],[55,58105,36505],{"href":57760,"rel":58106},[264],{"title":18,"searchDepth":19,"depth":19,"links":58108},[58109,58116,58117],{"id":36414,"depth":19,"text":36415,"children":58110},[58111,58112,58113,58114,58115],{"id":58013,"depth":279,"text":58014},{"id":58023,"depth":279,"text":58024},{"id":58033,"depth":279,"text":58034},{"id":58043,"depth":279,"text":58044},{"id":58053,"depth":279,"text":58054},{"id":36451,"depth":19,"text":36452},{"id":36476,"depth":19,"text":36477},"2022-07-06","Take a sneak peek at a few of the featured sessions and learn about the opportunity to sponsor Pulsar Summit.","\u002Fimgs\u002Fblogs\u002F63c7fa1791fda30fe9aa2840_63b3fbcf64455022ecf79c3f_banner-02-social-2-.png",{},"\u002Fblog\u002Fspeakers-sponsorship-prospectus-announced-pulsar-summit-san-francisco-2022",{"title":57991,"description":58119},"blog\u002Fspeakers-sponsorship-prospectus-announced-pulsar-summit-san-francisco-2022",[5376,821],"Mef4vPRdgPdKBDhuA9IdCoBlwc7YCvMetb9FxIA2HV4",{"id":58128,"title":58129,"authors":58130,"body":58131,"category":821,"createdAt":290,"date":58378,"description":58135,"extension":8,"featured":294,"image":58379,"isDraft":294,"link":290,"meta":58380,"navigation":7,"order":296,"path":58381,"readingTime":3556,"relatedResources":290,"seo":58382,"stem":58383,"tags":58384,"__hash__":58385},"blogs\u002Fblog\u002Fgoogle-cloud-pub-sub-connector-apache-pulsar.md","Announcing the Google Cloud Pub\u002FSub Connector for Apache Pulsar",[42155],{"type":15,"value":58132,"toc":58364},[58133,58136,58140,58143,58145,58153,58159,58162,58165,58173,58179,58182,58186,58189,58192,58195,58199,58202,58213,58217,58220,58230,58234,58237,58268,58273,58276,58281,58287,58290,58296,58298,58303,58309,58311,58317,58320,58326,58328,58331,58362],[48,58134,58135],{},"We are excited to announce the general availability of the Google Cloud Pub\u002FSub connector for Apache Pulsar. The connector enables seamless integration between Google Cloud Pub\u002FSub and Apache Pulsar, improving the diversity of the Apache Pulsar ecosystem.",[40,58137,58139],{"id":58138},"what-is-the-google-cloud-pubsub-connector","What is the Google Cloud Pub\u002FSub connector?",[48,58141,58142],{},"The Google Cloud Pub\u002FSub connector is a Pulsar IO connector enabling data replication between Google Cloud Pub\u002FSub and Apache Pulsar. The connector provides two ways to import and export data between the systems: Source and Sink.",[32,58144,27049],{"id":27048},[48,58146,3600,58147,58152],{},[55,58148,58151],{"href":58149,"rel":58150},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Fgoogle-pubsub-source\u002Fv2.9.1.2\u002F",[264],"Google Cloud Pub\u002FSub source"," fetches data from Google Cloud Pub\u002FSub and writes data to Apache Pulsar topics.",[48,58154,58155],{},[384,58156],{"alt":58157,"src":58158},"google cloud log pulsar logo","\u002Fimgs\u002Fblogs\u002F63b3f91f47f6007b64214cd8_source-google-pulsar.png",[48,58160,58161],{},"Figure 1. Google Cloud Pub\u002FSub source",[32,58163,35269],{"id":58164},"sink",[48,58166,3600,58167,58172],{},[55,58168,58171],{"href":58169,"rel":58170},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Fgoogle-pubsub-sink\u002Fv2.9.1.2\u002F",[264],"Google Cloud Pub\u002FSub sink"," pulls data from Apache Pulsar topics and persists data to Google Cloud Pub\u002FSub.",[48,58174,58175],{},[384,58176],{"alt":58177,"src":58178},"pulsar logo and google cloud pub\u002Fsub","\u002Fimgs\u002Fblogs\u002F63b3f91f20941eb8386e3396_sink-google-pulsar.png",[48,58180,58181],{},"Figure 2. Google Cloud Pub\u002FSub sink",[40,58183,58185],{"id":58184},"why-did-streamnative-develop-the-google-cloud-pubsub-connector","Why did StreamNative develop the Google Cloud Pub\u002FSub connector?",[48,58187,58188],{},"Apache Pulsar and Google Cloud Pub\u002FSub are two of the most popular and widely used messaging platforms in modern cloud environments. Apache Pulsar’s unified platform enables queueing data, analytics, and streaming in one underlying system. Google Cloud Pub\u002FSub is known for efficient performance, a powerful ecosystem in streaming analytics, and the capability of in-order delivery at scale.",[48,58190,58191],{},"Historically, however, users did not have a simple and reliable way of performing fully-featured messaging and streaming in one cloud pub\u002Fsub system, so they compensated for this by investing significant development efforts to bridge the gaps.",[48,58193,58194],{},"The new StreamNative connector provides Google Cloud Pub\u002FSub users a way to connect the flow of messages to Pulsar and use the features unavailable elsewhere, while also avoiding problems with connectivity that can appear when there are intrinsic differences1 between systems or privacy requirements. The connector solves this problem by fully integrating with the rest of Pulsar’s system (including, serverless functions, per-message processing, and event-stream processing). It presents a low-code solution with out-of-box capabilities like multi-tenant connectivity, geo-replication, protocols for direct connection to end-user mobile clients or IoT clients, and more. These features are essential for two-way event traffic.",[40,58196,58198],{"id":58197},"what-are-the-benefits-of-using-the-google-cloud-pubsub-connector","What are the benefits of using the Google Cloud Pub\u002FSub connector?",[48,58200,58201],{},"The integration between Google Cloud Pub\u002FSub and Apache Pulsar results in 3 key benefits.",[321,58203,58204,58207,58210],{},[324,58205,58206],{},"Easy. You can quickly move data between Apache Pulsar and Google Cloud Pub\u002FSub without writing any code.",[324,58208,58209],{},"Efficient. You can reduce the time on the data layer and have more time to find the maximum business value from real-time data in an effective way.",[324,58211,58212],{},"Scalable. You can run this connector on any node (standalone or distributed), allowing you to build reactive data pipelines to meet your business and operational needs in real-time.",[40,58214,58216],{"id":58215},"how-do-i-start-using-the-google-cloud-pubsub-connector","How do I start using the Google Cloud Pub\u002FSub connector?",[48,58218,58219],{},"You can be up and running with the connector in 3 easy steps:",[321,58221,58222,58225,58228],{},[324,58223,58224],{},"Configure the services and download the connector",[324,58226,58227],{},"Configure the source connector",[324,58229,39605],{},[32,58231,58233],{"id":58232},"before-you-start","Before you start",[48,58235,58236],{},"First, you must run an Apache Pulsar cluster and a Google Cloud Pub\u002FSub service.",[1666,58238,58239,58246,58261],{},[324,58240,51108,58241,51114,58244,56963],{},[55,58242,51113],{"href":39571,"rel":58243},[264],[55,58245,3550],{"href":45479},[324,58247,58248,58249,58254,58255,58260],{},"Prepare the Google Cloud Pub\u002FSub service. See ",[55,58250,58253],{"href":58251,"rel":58252},"https:\u002F\u002Fconsole.cloud.google.com\u002Fcloudpubsub?tutorial=pubsub_quickstart",[264],"Getting Started with Google Cloud Pub\u002FSub"," for details. Note that you need to install ",[55,58256,58259],{"href":58257,"rel":58258},"https:\u002F\u002Fcloud.google.com\u002Fsdk\u002Fgcloud",[264],"gcloud CLI",", and set up the GOOGLE_APPLICATION_CREDENTIALS environment variable to access Google Cloud.",[324,58262,58263,58264,51124],{},"Set up the Google Cloud Pub\u002FSub connector. Download the connector from the ",[55,58265,39589],{"href":58266,"rel":58267},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-google-pubsub\u002Freleases\u002F",[264],[48,58269,39596,58270,48708],{},[55,58271,20384],{"href":39599,"rel":58272},[264],[32,58274,58227],{"id":58275},"configure-the-source-connector",[1666,58277,58278],{},[324,58279,58280],{},"Create a configuration file named google-pubsub-source-config.json to send the pulsar-io-google-pubsub\u002Ftest-google-pubsub-source topic messages from Google Cloud Pub\u002FSub to the public\u002Fdefault\u002Ftest-google-pubsub-source topic of Apache Pulsar:",[8325,58282,58285],{"className":58283,"code":58284,"language":8330},[8328],"\n{\n    \"tenant\": \"public\",\n    \"namespace\": \"default\",\n    \"name\": \"google-pubsub-source\",\n    \"topicName\": \"test-google-pubsub-source\",\n    \"archive\": \"connectors\u002Fpulsar-io-google-pubsub-$VERSION.nar\",\n    \"parallelism\": 1,\n    \"configs\":\n    {\n    \"pubsubProjectId\": \"pulsar-io-google-pubsub\",\n    \"pubsubTopicId\": \"test-google-pubsub-source\"\n    }\n}\n\n",[4926,58286,58284],{"__ignoreMap":18},[48,58288,58289],{},"Run the source connector:",[8325,58291,58294],{"className":58292,"code":58293,"language":8330},[8328],"\n$PULSAR_HOME\u002Fbin\u002Fpulsar-admin sources localrun --source-config-file \u002Fpath\u002Fto\u002Fgoogle-pubsub-source-config.json\n\n",[4926,58295,58293],{"__ignoreMap":18},[32,58297,39605],{"id":39604},[1666,58299,58300],{},[324,58301,58302],{},"Create a configuration file named google-pubsub-sink-config.json to send the public\u002Fdefault\u002Ftest-google-pubsub-sink topic messages from Apache Pulsar to the pulsar-io-google-pubsub\u002Ftest-google-pubsub-sink topic of Google Cloud Pub\u002FSub:",[8325,58304,58307],{"className":58305,"code":58306,"language":8330},[8328],"\n{\n    \"tenant\": \"public\",\n    \"namespace\": \"default\",\n    \"name\": \"google-pubsub-sink\",\n    \"inputs\": [\n    \"test-google-pubsub-sink\"\n    ],\n    \"archive\": \"connectors\u002Fpulsar-io-google-pubsub-$VERSION.nar\",\n    \"parallelism\": 1,\n    \"configs\": {\n    \"pubsubProjectId\": \"pulsar-io-google-pubsub\",\n    \"pubsubTopicId\": \"test-google-pubsub-sink\"\n}\n}\n\n",[4926,58308,58306],{"__ignoreMap":18},[48,58310,51147],{},[8325,58312,58315],{"className":58313,"code":58314,"language":8330},[8328],"\n$PULSAR_HOME\u002Fbin\u002Fpulsar-admin sinks localrun --sink-config-file \u002Fpath\u002Fto\u002Fgoogle-pubsub-sink-config.json\n\n",[4926,58316,58314],{"__ignoreMap":18},[48,58318,58319],{},"When you send a message to the public\u002Fdefault\u002Ftest-google-pubsub-sink topic of Apache Pulsar, this message is persisted to the pulsar-io-google-pubsub\u002Ftest-google-pubsub-sink topic of Google Cloud Pub\u002FSub.",[48,58321,39639,58322,53165],{},[55,58323,53164],{"href":58324,"rel":58325},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Lc0_9WPGhow",[264],[40,58327,48857],{"id":48856},[48,58329,58330],{},"The Google Cloud Pub\u002FSub connector is a major step in the journey of integrating other messaging systems into the Pulsar ecosystem. To get involved with the Google Cloud Pub\u002FSub connector for Apache Pulsar, check out the following featured resources:",[321,58332,58333,58344,58356],{},[324,58334,58335,58336,39659,58340,48874],{},"Try out the Google Cloud Pub\u002FSub connector. To get started, ",[55,58337,36195],{"href":58338,"rel":58339},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-google-pubsub\u002Freleases",[264],[55,58341,39663],{"href":58342,"rel":58343},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-google-pubsub\u002Fblob\u002Fmaster\u002FREADME.md",[264],[324,58345,39676,58346,57040,58350,39687,58353,39692],{},[55,58347,39680],{"href":58348,"rel":58349},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-google-pubsub\u002Fissues\u002Fnew\u002Fchoose",[264],[55,58351,39686],{"href":39684,"rel":58352},[264],[55,58354,39691],{"href":33664,"rel":58355},[264],[324,58357,58358,58359,39673],{},"Make a contribution. The Google Cloud Pub\u002FSub connector is a community-driven service, which hosts its source code on the StreamNative GitHub repository. We would love you to explore this new connector and contribute to its evolution. If you have any feature requests or bug reports, do not hesitate to ",[55,58360,39672],{"href":58348,"rel":58361},[264],[48,58363,57049],{},{"title":18,"searchDepth":19,"depth":19,"links":58365},[58366,58370,58371,58372,58377],{"id":58138,"depth":19,"text":58139,"children":58367},[58368,58369],{"id":27048,"depth":279,"text":27049},{"id":58164,"depth":279,"text":35269},{"id":58184,"depth":19,"text":58185},{"id":58197,"depth":19,"text":58198},{"id":58215,"depth":19,"text":58216,"children":58373},[58374,58375,58376],{"id":58232,"depth":279,"text":58233},{"id":58275,"depth":279,"text":58227},{"id":39604,"depth":279,"text":39605},{"id":48856,"depth":19,"text":48857},"2022-06-24","\u002Fimgs\u002Fblogs\u002F63c7fa2ff8ae5a79f2d13a72_63b3f91fc0a3814de23f44ff_topimage-google-pulsar.png",{},"\u002Fblog\u002Fgoogle-cloud-pub-sub-connector-apache-pulsar",{"title":58129,"description":58135},"blog\u002Fgoogle-cloud-pub-sub-connector-apache-pulsar",[28572,302],"mm9rbgXKc2LzlSsH94AjIOYqhiVI6p2AAPsVFFX7Mgg",{"id":58387,"title":58388,"authors":58389,"body":58390,"category":3550,"createdAt":290,"date":58508,"description":58509,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":58510,"navigation":7,"order":296,"path":58511,"readingTime":58512,"relatedResources":290,"seo":58513,"stem":58514,"tags":58515,"__hash__":58516},"blogs\u002Fblog\u002Fintroducing-streamnative-platform-1-5.md","Introducing StreamNative Platform 1.5",[24776],{"type":15,"value":58391,"toc":58503},[58392,58398,58401,58404,58415,58419,58428,58437,58446,58450,58458,58464,58468,58471,58477,58480,58485,58488,58501],[48,58393,58394],{},[384,58395],{"alt":58396,"src":58397},"streamnative platform","\u002Fimgs\u002Fblogs\u002F63b3f80d770e35f053a14609_streamnative-console.png",[48,58399,58400],{},"We are pleased to announce the release of StreamNative Platform 1.5. StreamNative Platform provides an easy way to build mission-critical messaging, streaming applications, and real-time data pipelines. It integrates data from multiple sources into a centralized messaging and event streaming platform. With release 1.5, we have further simplified management tasks for Pulsar traffic with the integration with Istio, providing a visualized way for you to create and manage connectors on the StreamNative Console.",[48,58402,58403],{},"This release features the following major enhancements:",[321,58405,58406,58409,58412],{},[324,58407,58408],{},"Provides a deeper integration with Istio",[324,58410,58411],{},"Supports deployment on OpenShift",[324,58413,58414],{},"Simplifies the use of Function Mesh and Connectors",[40,58416,58418],{"id":58417},"deeper-integration-with-istio","Deeper integration with Istio",[48,58420,58421,58422,58427],{},"StreamNative Platform began supporting integration with Istio in release ",[55,58423,58426],{"href":58424,"rel":58425},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fcharts\u002Freleases\u002Ftag\u002Fsn-platform-1.3.0",[264],"1.3",". It creates VirtualService and Gateway resources when Istio-related configurations are enabled on StreamNative Platform. In release 1.5, we provide more improvements to simplify the traffic proxy for Pulsar Protocol, Kafka Protocol, and MQTT Protocol.",[48,58429,58430,58431,58436],{},"Integrate cert-manager with Istio Ingress Gateway: On StreamNative Platform 1.3, we had to manually create the TLS secret in the Istio root namespace. With release 1.5, we have added support for ",[55,58432,58435],{"href":58433,"rel":58434},"https:\u002F\u002Fcert-manager.io\u002F",[264],"cert-manager",", which enables the Istio TLS secret to be automatically created and managed by cert-manager.",[48,58438,58439,58440,58445],{},"Expose MoP on the Istio Gateway: StreamNative Platform supported MoP in release ",[55,58441,58444],{"href":58442,"rel":58443},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fcharts\u002Freleases\u002Ftag\u002Fsn-platform-1.4.0",[264],"1.4",". In release 1.5, we added MoP-related VirtualService and Gateway resources support. Now, StreamNative Platform can expose Pulsar Protocol, Kafka Protocol, and Mqtt Protocol through the Istio Gateway in a unified way.",[40,58447,58449],{"id":58448},"support-deployment-on-openshift","Support deployment on OpenShift",[48,58451,58452,58457],{},[55,58453,58456],{"href":58454,"rel":58455},"https:\u002F\u002Fwww.redhat.com\u002Fen\u002Ftechnologies\u002Fcloud-computing\u002Fopenshift",[264],"OpenShift"," is one of the most popular enterprise-ready Kubernetes container platforms. As such, adding support for OpenShift gives you more options and flexibility when you deploy StreamNative Platform 1.5 on the container platform. To enable OpenShift, set the following configurations in the Helm chart values.yaml file.",[8325,58459,58462],{"className":58460,"code":58461,"language":8330},[8328],"\n# Support deployment on OpenShift\nopenshift:\n  enabled: true\n  ssc:\n    enabled: true\n\nvault:\n  securityContext:\n    runAsUser: 0\n\nzookeeper:\n  securityContext:\n    runAsUser: 0\n\nbookkeeper:\n  securityContext:\n    runAsUser: 0\n\nautorecovery:\n  securityContext:\n    runAsUser: 0\n\nbroker:\n  securityContext:\n    runAsUser: 0\n\nproxy:\n  securityContext:\n    runAsUser: 0\n\ntoolset:\n  securityContext:\n    runAsUser: 0\n\nstreamnative_console:\n  securityContext:\n    runAsUser: 0\n\n",[4926,58463,58461],{"__ignoreMap":18},[40,58465,58467],{"id":58466},"simplify-the-use-of-function-mesh-and-connectors","Simplify the use of Function Mesh and Connectors",[48,58469,58470],{},"StreamNative Platform 1.5 adds support for the Function Mesh Worker service. You can enable Function Mesh by setting functionmesh.enabled to true in the Helm chart values.yaml file.",[8325,58472,58475],{"className":58473,"code":58474,"language":8330},[8328],"\nbroker:\n  functionmesh:\n    enabled: true\n\n",[4926,58476,58474],{"__ignoreMap":18},[48,58478,58479],{},"With release 1.5, you can leverage Function Mesh while still using the pulsar-admin or pulsarctl tools to manage Pulsar functions and connectors.",[48,58481,58482],{},[384,58483],{"alt":58467,"src":58484},"\u002Fimgs\u002Fblogs\u002F63b3f8554106f4fd99e468a8_function-mesh.png",[48,58486,58487],{},"To simplify the use and management of connectors on StreamNative Platform, the Connector page is now available on the StreamNative Console. This new page enhances the user experience by providing a visualized way for you to create and manage connector-related resources. For example, now you can create source\u002Fsink connector jobs, update connector job configurations, and review connector job exception logs on the StreamNative Console.",[48,58489,58490,58492,58495,58496,190],{},[384,58491],{"alt":58396,"src":58397},[384,58493],{"alt":18,"src":58494},"\u002Fimgs\u002Fblogs\u002F63b3f8558fb46510cc50217b_streamnative-console2.png","\nFor more information, refer to the ",[55,58497,58500],{"href":58498,"rel":58499},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fcharts\u002Freleases\u002Ftag\u002Fsn-platform-1.5.0",[264],"StreamNative Platform 1.5 Release Notes",[48,58502,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":58504},[58505,58506,58507],{"id":58417,"depth":19,"text":58418},{"id":58448,"depth":19,"text":58449},{"id":58466,"depth":19,"text":58467},"2022-06-15","With StreamNative Platform 1.5, we have further simplified management tasks for Pulsar traffic with the integration with Istio, providing a visualized way for you to create and manage connectors on the StreamNative Console.",{},"\u002Fblog\u002Fintroducing-streamnative-platform-1-5","6 minn read",{"title":58388,"description":58509},"blog\u002Fintroducing-streamnative-platform-1-5",[302,821,303],"_4kp99sV5niKig3KhOpM4YvqspYsA-kjdysLNpKTpEk",{"id":58518,"title":58519,"authors":58520,"body":58521,"category":7338,"createdAt":290,"date":58653,"description":58654,"extension":8,"featured":294,"image":58655,"isDraft":294,"link":290,"meta":58656,"navigation":7,"order":296,"path":58657,"readingTime":4475,"relatedResources":290,"seo":58658,"stem":58659,"tags":58660,"__hash__":58661},"blogs\u002Fblog\u002Fjoin-streamnative-distributed-data-systems-masterclass.md","Join StreamNative at Distributed Data Systems Masterclass",[46357],{"type":15,"value":58522,"toc":58645},[58523,58525,58528,58531,58534,58538,58541,58544,58552,58555,58559,58565,58571,58574,58576,58595,58599,58643],[40,58524,46],{"id":42},[48,58526,58527],{},"Do you need to quickly learn how to adapt to meet ever-expanding operations across a global reach? Do you need to figure out how to build a distributed data system in a fast, modern, open way? You're not alone.",[48,58529,58530],{},"The pressure to create and implement real-time event streaming using distributed datastores is on the rise. And, adapting and creating the architecture, techniques, and technologies to do that require education and investment. That's why we're excited to present this free Distributed Data Systems Masterclass.",[48,58532,58533],{},"Join ScyllaDB and StreamNative on Tuesday, June 21st, for a half-day event on how to build modern distributed data systems using state-of-the-art event streaming and distributed databases. In this Masterclass, our panel of experts will go in-depth on how to make the impossible possible, if not easy.",[40,58535,58537],{"id":58536},"why-take-the-masterclass","Why Take the Masterclass",[48,58539,58540],{},"Now is the time to ensure you are ready to build a system that will scale and grow with your needs, going from a trickle of data to a flood of continuous data events. The days of waiting once an hour for data to arrive are long over. You need the latest fraud detection events, logs, change data capture events from tables, real-time sensors, REST feeds, cloud data events, and so much more.",[48,58542,58543],{},"In this Masterclass, we will show you an optimal way to combine a powerful distributed data store and a unified streaming data platform.",[32,58545,58547],{"id":58546},"register-today",[55,58548,58551],{"href":58549,"rel":58550},"https:\u002F\u002Fhopin.com\u002Fevents\u002Fdistributed-data-systems-masterclass\u002Fregistration?utm_campaign=social&utm_source=streamnative",[264],"Register today!",[48,58553,58554],{},"This is a rare opportunity to hear from developers from AWS, ScyllaDB, and StreamNative.",[3933,58556,58558],{"id":58557},"meet-the-speakers","Meet The Speakers",[48,58560,58561],{},[384,58562],{"alt":58563,"src":58564},"Figure One: AirQuality microservices architecture","\u002Fimgs\u002Fblogs\u002F63b3f7593522b253099a5d7b_scy5.png",[48,58566,58567],{},[384,58568],{"alt":58569,"src":58570},"Figure Two: ScyllaDB metrics","\u002Fimgs\u002Fblogs\u002F63b3f759c0a38182373e26c9_scy7.png",[48,58572,58573],{},"Don’t miss this opportunity to learn how to build and manage enterprise-scale distributed data systems with the latest #eventstreaming and distributed #database technologies. Save your spot to attend, win swag, and have the opportunity to earn a certificate of completion!",[40,58575,4135],{"id":4132},[321,58577,58578,58587],{},[324,58579,58580,758,58583,58586],{},[2628,58581,58582],{},"Registration",[55,58584,58002],{"href":58549,"rel":58585},[264]," for the Distributed Data System Masterclass!",[324,58588,58589,758,58591],{},[2628,58590,46603],{},[55,58592,58594],{"href":46568,"rel":58593},[264],"AirQuality DataStore",[40,58596,58598],{"id":58597},"more-on-pulsar","More on Pulsar",[321,58600,58601,58610,58617,58620,58628,58637],{},[324,58602,58603,58604,1154,58607,58609],{},"Learn the Pulsar Fundamentals: New to Pulsar? We recommend you take the ",[55,58605,36487],{"href":36485,"rel":58606},[264],[55,58608,36491],{"href":36490}," developed by some of the original creators of Pulsar to get started.",[324,58611,51828,58612,58616],{},[55,58613,3550],{"href":58614,"rel":58615},"https:\u002F\u002Fauth.streamnative.cloud\u002Fu\u002Fsignup",[264]," today. StreamNative Cloud is the simple, fast, and cost-effective way to run Pulsar in the public cloud.",[324,58618,58619],{},"Continued Learning: If you are interested in learning more about microservices and Pulsar, take a look at the following resources:",[324,58621,58622,46714,58624,190],{},[2628,58623,46713],{},[55,58625,58627],{"href":58626},"\u002Fevent\u002Fwebinar-series-building-microservices-with-pulsar","on the StreamNative website",[324,58629,58630,758,58633],{},[2628,58631,58632],{},"Pulsar Documentation",[55,58634,51850],{"href":58635,"rel":58636},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Ffunctions-develop\u002F",[264],[324,58638,58639,758,58641],{},[2628,58640,40436],{},[55,58642,51857],{"href":44957},[48,58644,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":58646},[58647,58648,58651,58652],{"id":42,"depth":19,"text":46},{"id":58536,"depth":19,"text":58537,"children":58649},[58650],{"id":58546,"depth":279,"text":58551},{"id":4132,"depth":19,"text":4135},{"id":58597,"depth":19,"text":58598},"2022-06-14","Learn about how to build a distributed data system from developers from AWS, ScyllaDB, and StreamNative.","\u002Fimgs\u002Fblogs\u002F63c7fa4b53f98aa7cca46861_63b3f6cc770e3569a9a0cc9b_1200x675-twitter-distributed-data-systems-masterclass-2.png",{},"\u002Fblog\u002Fjoin-streamnative-distributed-data-systems-masterclass",{"title":58519,"description":58654},"blog\u002Fjoin-streamnative-distributed-data-systems-masterclass",[303],"BZ2kXnkTmr57pc3Rz-BYrN1ctz5hWUaZja1AgFaUKgU",{"id":58663,"title":58664,"authors":58665,"body":58666,"category":821,"createdAt":290,"date":58843,"description":58844,"extension":8,"featured":294,"image":58845,"isDraft":294,"link":290,"meta":58846,"navigation":7,"order":296,"path":58847,"readingTime":11508,"relatedResources":290,"seo":58848,"stem":58849,"tags":58850,"__hash__":58851},"blogs\u002Fblog\u002Fnew-apache-pulsar-2-10-1.md","What’s New in Apache Pulsar 2.10.1",[808,57544],{"type":15,"value":58667,"toc":58828},[58668,58671,58674,58680,58682,58691,58693,58696,58705,58707,58710,58712,58715,58720,58722,58725,58727,58730,58735,58737,58739,58748,58750,58753,58755,58757,58759,58761,58788,58790,58792,58794,58801,58806,58817,58819],[48,58669,58670],{},"The Apache Pulsar community releases version 2.10.1! 50 contributors provided improvements and bug fixes that delivered 200+ commits. Thanks for all your contributions.",[48,58672,58673],{},"The highlight of the 2.10.1 release is introducing 30+ transaction fixes and improvements. Earlier-adoption users of Pulsar transactions have documented long-term use in their production environments and reported valuable findings in real applications. This provides the Pulsar community with the opportunity to make a difference.",[48,58675,57555,58676,190],{},[55,58677,58679],{"href":46406,"rel":58678},[264],"Pulsar 2.10.1 Release Notes",[40,58681,57564],{"id":57563},[32,58683,58685,58686],{"id":58684},"fixed-ineffective-load-manager-due-to-brokers-zero-resource-usage-pr-15314","Fixed ineffective load manager due to broker’s zero resource usage. ",[55,58687,58690],{"href":58688,"rel":58689},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15314",[264],"PR-15314",[3933,58692,57576],{"id":44661},[48,58694,58695],{},"Introduced in 2.10.0, the leader broker’s resource usage (CPU, memory, direct memory…) was always 0 when performing load balance. The root cause is that deserializing the JSON data to ResourceUsage POJO didn’t use the constructor ResourceUsage (double usage, double limit), so the percentage was always 0.",[32,58697,58699,58700],{"id":58698},"allow-users-with-produceconsume-privileges-to-get-topic-schema-pr-15956","Allow users with produce\u002Fconsume privileges to get topic schema. ",[55,58701,58704],{"href":58702,"rel":58703},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15956",[264],"PR-15956",[3933,58706,57576],{"id":57598},[48,58708,58709],{},"In earlier versions, only users with admin privileges were able to get topic schema, which made schema inconvenient to use.",[3933,58711,57583],{"id":57582},[48,58713,58714],{},"Allow users who have metadata access privileges to get topic schema. Subscribers can be from different teams, and the producers and subscribers should be able to get the topic schema instead of asking the tenant admin to do so before publishing and consuming messages.",[32,58716,57645,58717],{"id":57644},[55,58718,57650],{"href":57648,"rel":58719},[264],[3933,58721,57576],{"id":57632},[48,58723,58724],{},"This performance regression was introduced in 2.10.0, 2.9.1, and 2.8.3. You may find a significant performance drop with message listeners while using Java Client. The root cause is each message will introduce the thread switching from the external thread pool to the internal thread poll, and then to the external thread pool.",[3933,58726,57583],{"id":57612},[48,58728,58729],{},"2.10.1 is the first version to have this issue fixed by avoiding the thread switching for each message to improve consumption throughput.",[32,58731,57666,58732],{"id":57665},[55,58733,57671],{"href":57669,"rel":58734},[264],[3933,58736,57576],{"id":57653},[48,58738,57677],{},[32,58740,58742,58743],{"id":58741},"fixed-key-shared-delivery-of-messages-with-interleaved-delays-pr-15409","Fixed key-shared delivery of messages with interleaved delays. ",[55,58744,58747],{"href":58745,"rel":58746},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F15409",[264],"PR-15409",[3933,58749,57576],{"id":57674},[48,58751,58752],{},"This is a regression issue introduced in 2.10.0. When delayed messages with interleaved delays occurred on a shared\u002Fkey-shared subscription, many of the messages were not delivered but stayed in the backlog. The reason was that when peeking into getMessagesToReplayNow(), we could not discard the returned set due to untracked message IDs in the delayed message controller.",[32,58754,57681],{"id":57680},[3933,58756,57576],{"id":57684},[48,58758,57687],{},[48,58760,57690],{},[321,58762,58763,58768,58773,58778,58783],{},[324,58764,58765],{},[55,58766,57697],{"href":57697,"rel":58767},[264],[324,58769,58770],{},[55,58771,57703],{"href":57703,"rel":58772},[264],[324,58774,58775],{},[55,58776,57709],{"href":57709,"rel":58777},[264],[324,58779,58780],{},[55,58781,57715],{"href":57715,"rel":58782},[264],[324,58784,58785],{},[55,58786,57721],{"href":57721,"rel":58787},[264],[3933,58789,57583],{"id":57638},[48,58791,57728],{},[40,58793,13565],{"id":1727},[48,58795,58796,58797,57738],{},"If you are interested in learning more about Pulsar 2.10.1, you can ",[55,58798,36195],{"href":58799,"rel":58800},"https:\u002F\u002Fpulsar.apache.org\u002Fen\u002Fversions\u002F",[264],[48,58802,57741,58803,57746],{},[55,58804,57745],{"href":35357,"rel":58805},[264],[48,58807,57749,58808,57753,58811,57757,58814,20076],{},[55,58809,40821],{"href":23526,"rel":58810},[264],[55,58812,36238],{"href":36236,"rel":58813},[264],[55,58815,57762],{"href":57760,"rel":58816},[264],[40,58818,39647],{"id":39646},[48,58820,57767,58821,57772,58824,57775,58826,57779],{},[55,58822,57771],{"href":36193,"rel":58823},[264],[55,58825,3550],{"href":45479},[55,58827,24379],{"href":57778},{"title":18,"searchDepth":19,"depth":19,"links":58829},[58830,58841,58842],{"id":57563,"depth":19,"text":57564,"children":58831},[58832,58834,58836,58837,58838,58840],{"id":58684,"depth":279,"text":58833},"Fixed ineffective load manager due to broker’s zero resource usage. PR-15314",{"id":58698,"depth":279,"text":58835},"Allow users with produce\u002Fconsume privileges to get topic schema. PR-15956",{"id":57644,"depth":279,"text":57793},{"id":57665,"depth":279,"text":57795},{"id":58741,"depth":279,"text":58839},"Fixed key-shared delivery of messages with interleaved delays. PR-15409",{"id":57680,"depth":279,"text":57681},{"id":1727,"depth":19,"text":13565},{"id":39646,"depth":19,"text":39647},"2022-06-12","We are excited to see the Apache Pulsar community has successfully released the 2.10.1 version! 50 contributors provided improvements and bug fixes that delivered 200+ commits. Let's walk through the most noteworthy changes!","\u002Fimgs\u002Fblogs\u002F63c7fa03c443b097ff11636a_63b3fc82644550f3dbf79f52_top.jpeg",{},"\u002Fblog\u002Fnew-apache-pulsar-2-10-1",{"title":58664,"description":58844},"blog\u002Fnew-apache-pulsar-2-10-1",[302,821,9144],"uzbMknLroVoKTecomKVx14bp1XXHfDSJLZ7gqPT2Lqw",{"id":58853,"title":38154,"authors":58854,"body":58856,"category":821,"createdAt":290,"date":59781,"description":59782,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":59783,"navigation":7,"order":296,"path":59784,"readingTime":59785,"relatedResources":290,"seo":59786,"stem":59787,"tags":59788,"__hash__":59789},"blogs\u002Fblog\u002Fpulsar-isolation-part-iv-single-cluster-isolation.md",[58855],"Ran Gao",{"type":15,"value":58857,"toc":59767},[58858,58860,58863,58895,58898,58900,58908,58913,58918,58924,58929,58935,58940,58946,58949,58953,58958,58964,58972,58978,58983,58989,58994,59000,59005,59011,59016,59022,59028,59033,59039,59044,59047,59053,59059,59064,59070,59074,59077,59083,59088,59094,59098,59101,59106,59111,59114,59119,59124,59129,59135,59140,59146,59151,59157,59161,59164,59170,59174,59178,59183,59186,59192,59194,59200,59206,59211,59217,59222,59228,59232,59238,59243,59248,59252,59255,59261,59265,59270,59275,59279,59284,59289,59295,59299,59305,59309,59314,59318,59323,59329,59334,59337,59343,59348,59354,59359,59365,59370,59376,59381,59387,59392,59398,59406,59412,59417,59423,59427,59432,59437,59443,59450,59456,59459,59465,59469,59474,59479,59484,59490,59494,59500,59504,59509,59514,59519,59524,59530,59537,59543,59547,59550,59555,59561,59566,59572,59577,59583,59589,59594,59600,59604,59610,59614,59619,59624,59629,59634,59640,59643,59649,59652,59657,59662,59667,59673,59678,59684,59689,59695,59700,59706,59711,59717,59722,59728,59730],[40,58859,46],{"id":42},[48,58861,58862],{},"This is the fourth blog in our four-part blog series on how to achieve resource isolation in Apache Pulsar. Before we dive in, let’s review what was covered in Parts I, II, and III.",[321,58864,58865,58872,58875,58878,58881,58888],{},[324,58866,58867,58871],{},[55,58868,58870],{"href":58869},"\u002Fen\u002Fblog\u002Ftech\u002F2021-03-02-taking-an-in-depth-look-at-how-to-achieve-isolation-in-pulsar\u002F","Pulsar Isolation Part I: Taking an In-Depth Look at How to Achieve Isolation in Pulsar"," This blog provides an introduction to three approaches to implement isolation in Pulsar. These include:",[324,58873,58874],{},"leveraging separate Pulsar clusters that use separate BookKeeper clusters,",[324,58876,58877],{},"leveraging separate Pulsar clusters that share one BookKeeper cluster, and",[324,58879,58880],{},"using a single Pulsar cluster with a single BookKeeper cluster. Each of these approaches and their specific use cases are discussed at length in the subsequent blogs.",[324,58882,58883,58887],{},[55,58884,58886],{"href":58885},"\u002Fblog\u002Ftech\u002F2021-06-03-pulsar-isolation-for-dummies-separate-pulsar-clusters\u002F","Pulsar Isolation Part II: Separate Pulsar Clusters"," shows you how to achieve isolation between separate Pulsar clusters that use separate BookKeeper clusters. This shared-nothing approach offers the highest level of isolation and is suitable for storing highly sensitive data, such as personally identifiable information or financial records.",[324,58889,58890,58894],{},[55,58891,58893],{"href":58892},"\u002Fblog\u002Fengineering\u002F2022-01-12-pulsar-isolation-part-3-separate-pulsar-clusters-sharing-a-single-bookkeeper-cluster\u002F","Pulsar Isolation Part III: Separate Pulsar Clusters Sharing a Single BookKeeper Cluster"," demonstrates how to achieve Pulsar isolation using separate Pulsar clusters that share one BookKeeper cluster. This approach uses separate Pulsar broker clusters in order to isolate the end-users from one another and allows you to use different authentication methods based on the use case. As a result, you gain the benefits of using a shared storage layer, such as a reduced hardware footprint and the associated hardware and maintenance costs.",[48,58896,58897],{},"In this fourth and final blog of the series, we provide a step-by-step tutorial on how to use a single cluster to achieve broker and bookie isolation. This more traditional approach takes advantage of Pulsar’s built-in multi-tenancy and removes the need to manage multiple broker and bookie clusters.",[40,58899,47824],{"id":47823},[48,58901,58902,58903,190],{},"In this tutorial we use the docker-compose to establish a Pulsar cluster. First, we need to ",[55,58904,58907],{"href":58905,"rel":58906},"https:\u002F\u002Fdocs.docker.com\u002Fget-docker\u002F",[264],"install the docker environment",[916,58909,58910],{},[48,58911,58912],{},"This tutorial is based on docker 20.10.10, docker-compose 1.29.2, and MacOS 12.3.1.",[1666,58914,58915],{},[324,58916,58917],{},"Get the docker-compose configuration files.",[8325,58919,58922],{"className":58920,"code":58921,"language":8330},[8328],"\ngit clone https:\u002F\u002Fgithub.com\u002Fgaoran10\u002Fpulsar-docker-compose\ncd pulsar-docker-compose\n\n",[4926,58923,58921],{"__ignoreMap":18},[1666,58925,58926],{},[324,58927,58928],{},"Start the cluster.",[8325,58930,58933],{"className":58931,"code":58932,"language":8330},[8328],"\ndocker-compose up\n\n",[4926,58934,58932],{"__ignoreMap":18},[1666,58936,58937],{},[324,58938,58939],{},"Check the pods.",[8325,58941,58944],{"className":58942,"code":58943,"language":8330},[8328],"\ndocker-compose ps\n   Name                  Command                State                         Ports\n--------------------------------------------------------------------------------------------------------\nbk1           bash -c export dbStorage_w ...   Up\nbk2           bash -c export dbStorage_w ...   Up\nbk3           bash -c export dbStorage_w ...   Up\nbk4           bash -c export dbStorage_w ...   Up\nbroker1       bash -c bin\u002Fapply-config-f ...   Up\nbroker2       bash -c bin\u002Fapply-config-f ...   Up\nbroker3       bash -c bin\u002Fapply-config-f ...   Up\nproxy1        bash -c bin\u002Fapply-config-f ...   Up         0.0.0.0:6650->6650\u002Ftcp, 0.0.0.0:8080->8080\u002Ftcp\npulsar-init   bin\u002Finit-cluster.sh              Exit 0\nzk1           bash -c bin\u002Fapply-config-f ...   Up\n\n",[4926,58945,58943],{"__ignoreMap":18},[48,58947,58948],{},"After the cluster initiation completes, we can begin setting the broker isolation policy.",[40,58950,58952],{"id":58951},"broker-isolation","Broker Isolation",[1666,58954,58955],{},[324,58956,58957],{},"Download a Pulsar release package to execute the pulsar-admin command.",[8325,58959,58962],{"className":58960,"code":58961,"language":8330},[8328],"\nwget https:\u002F\u002Farchive.apache.org\u002Fdist\u002Fpulsar\u002Fpulsar-2.10.0\u002Fapache-pulsar-2.10.0-bin.tar.gz\ntar -txvf apache-pulsar-2.10.0-bin.tar.gz\n\u002F\u002F we can execute pulsar-admin command in this directory\ncd apache-pulsar-2.10.0\n\n",[4926,58963,58961],{"__ignoreMap":18},[1666,58965,58966,58969],{},[324,58967,58968],{},"Get the broker list.",[324,58970,58971],{},"Create a namespace.",[8325,58973,58976],{"className":58974,"code":58975,"language":8330},[8328],"\nbin\u002Fpulsar-admin namespaces create public\u002Fns-isolation\nbin\u002Fpulsar-admin namespaces set-retention -s 1G -t 3d public\u002Fns-isolation\n\n",[4926,58977,58975],{"__ignoreMap":18},[1666,58979,58980],{},[324,58981,58982],{},"Set the namespace isolation policy.",[8325,58984,58987],{"className":58985,"code":58986,"language":8330},[8328],"\nbin\u002Fpulsar-admin ns-isolation-policy set \\\n--auto-failover-policy-type min_available \\\n--auto-failover-policy-params min_limit=1,usage_threshold=80 \\\n--namespaces public\u002Fns-isolation \\\n--primary \"broker1:*\" \\\n--secondary \"broker2:*\" \\\ntest ns-broker-isolation\n\n",[4926,58988,58986],{"__ignoreMap":18},[1666,58990,58991],{},[324,58992,58993],{},"Get the namespace isolation policies.",[8325,58995,58998],{"className":58996,"code":58997,"language":8330},[8328],"\nbin\u002Fpulsar-admin ns-isolation-policy list test\n# output\nns-broker-isolation    NamespaceIsolationDataImpl(namespaces=[public\u002Fns-isolation], primary=[broker1:*], secondary=[broker2:*], autoFailoverPolicy=AutoFailoverPolicyDataImpl(policyType=min_available, parameters={min_limit=1, usage_threshold=80}))\n\n",[4926,58999,58997],{"__ignoreMap":18},[1666,59001,59002],{},[324,59003,59004],{},"Create a partitioned topic.",[8325,59006,59009],{"className":59007,"code":59008,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics create-partitioned-topic -p 10 public\u002Fns-isolation\u002Ft1\n\n",[4926,59010,59008],{"__ignoreMap":18},[1666,59012,59013],{},[324,59014,59015],{},"Do a partitioned lookup.",[8325,59017,59020],{"className":59018,"code":59019,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics partitioned-lookup public\u002Fns-isolation\u002Ft1\n# output\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-0    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-1    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-2    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-3    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-4    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-5    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-6    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-7    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-8    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-9    pulsar:\u002F\u002Fbroker1:6650\n\n",[4926,59021,59019],{"__ignoreMap":18},[48,59023,59024],{},[384,59025],{"alt":59026,"src":59027},"illustration of brokers with arrow to broker 1","\u002Fimgs\u002Fblogs\u002F63bf2b8ade20fa9586bacffc_image2.png",[1666,59029,59030],{},[324,59031,59032],{},"Stop broker1.",[8325,59034,59037],{"className":59035,"code":59036,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose stop broker1\n# output\nStopping broker1 ... done\n\n",[4926,59038,59036],{"__ignoreMap":18},[1666,59040,59041],{},[324,59042,59043],{},"Check the partitioned lookup.",[48,59045,59046],{},"After broker1 stop, the topics will be owned by secondary broker broker2:*.",[8325,59048,59051],{"className":59049,"code":59050,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics partitioned-lookup public\u002Fns-isolation\u002Ft1\n# output\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-0    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-1    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-2    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-3    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-4    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-5    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-6    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-7    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-8    pulsar:\u002F\u002Fbroker2:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-9    pulsar:\u002F\u002Fbroker2:6650\n\n",[4926,59052,59050],{"__ignoreMap":18},[48,59054,59055],{},[384,59056],{"alt":59057,"src":59058},"illustration of broker","\u002Fimgs\u002Fblogs\u002F63bf2bb5f884416e148c9c61_mayiso2.png",[1666,59060,59061],{},[324,59062,59063],{},"Stop broker2.",[8325,59065,59068],{"className":59066,"code":59067,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose stop broker2\n# output\nStopping broker2 ... done\n\n",[4926,59069,59067],{"__ignoreMap":18},[1666,59071,59072],{},[324,59073,59043],{},[48,59075,59076],{},"After stopping broker2, there are no available brokers for namespace public\u002Fns-isolation-broker.",[8325,59078,59081],{"className":59079,"code":59080,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics partitioned-lookup public\u002Fns-isolation\u002Ft1\n# output\nHTTP 503 Service Unavailable\n\nReason: javax.ws.rs.ServiceUnavailableException: HTTP 503 Service Unavailable\n\n",[4926,59082,59080],{"__ignoreMap":18},[1666,59084,59085],{},[324,59086,59087],{},"Restart broker1 and broker2.",[8325,59089,59092],{"className":59090,"code":59091,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose start broker1\n# output\nStarting broker1 ... done\n\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose start broker2\n# output\nStarting broker2 ... done\n\n",[4926,59093,59091],{"__ignoreMap":18},[32,59095,59097],{"id":59096},"migrate-the-namespace-between-brokers","Migrate the Namespace between Brokers",[48,59099,59100],{},"Because the Pulsar broker is stateless, we can migrate the namespace between broker groups by simply changing the namespace isolation policy.",[1666,59102,59103],{},[324,59104,59105],{},"Check the namespace isolation policies.",[8325,59107,59109],{"className":59108,"code":58997,"language":8330},[8328],[4926,59110,58997],{"__ignoreMap":18},[48,59112,59113],{},"We could find that the primary and secondary brokers of the namespace public\u002Fns-isolation are broker1:* and broker2:*.",[1666,59115,59116],{},[324,59117,59118],{},"Check the topic partitioned lookup results.",[8325,59120,59122],{"className":59121,"code":59019,"language":8330},[8328],[4926,59123,59019],{"__ignoreMap":18},[1666,59125,59126],{},[324,59127,59128],{},"Modify a new namespace isolation policy.",[8325,59130,59133],{"className":59131,"code":59132,"language":8330},[8328],"\nbin\u002Fpulsar-admin ns-isolation-policy set \\\n--auto-failover-policy-type min_available \\\n--auto-failover-policy-params min_limit=1,usage_threshold=80 \\\n--namespaces public\u002Fns-isolation \\\n--primary \"broker3:*\" \\\n--secondary \"broker2:*\" \\\ntest ns-broker-isolation\n\n",[4926,59134,59132],{"__ignoreMap":18},[1666,59136,59137],{},[324,59138,59139],{},"Check the namespace isolation policy.",[8325,59141,59144],{"className":59142,"code":59143,"language":8330},[8328],"\nbin\u002Fpulsar-admin ns-isolation-policy list test\n# output\nns-broker-isolation    NamespaceIsolationDataImpl(namespaces=[public\u002Fns-isolation], primary=[broker3:*], secondary=[broker2:*], autoFailoverPolicy=AutoFailoverPolicyDataImpl(policyType=min_available, parameters={min_limit=1, usage_threshold=80}))\n\n",[4926,59145,59143],{"__ignoreMap":18},[1666,59147,59148],{},[324,59149,59150],{},"Unload the namespace to make the namespace isolation policy take effect.",[8325,59152,59155],{"className":59153,"code":59154,"language":8330},[8328],"\nbin\u002Fpulsar-admin namespaces unload public\u002Fns-isolation\n\n",[4926,59156,59154],{"__ignoreMap":18},[1666,59158,59159],{},[324,59160,59043],{},[48,59162,59163],{},"We could find that topics are already owned by the primary broker(broker3).",[8325,59165,59168],{"className":59166,"code":59167,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics partitioned-lookup public\u002Fns-isolation\u002Ft1\n# output\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-0    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-1    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-2    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-3    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-4    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-5    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-6    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-7    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-8    pulsar:\u002F\u002Fbroker3:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-9    pulsar:\u002F\u002Fbroker3:6650\n\n",[4926,59169,59167],{"__ignoreMap":18},[32,59171,59173],{"id":59172},"scale-up-and-down-brokers","Scale up and down Brokers",[3933,59175,59177],{"id":59176},"scale-up","Scale up",[1666,59179,59180],{},[324,59181,59182],{},"Start broker4.",[48,59184,59185],{},"Add broker4 configurations in the docker-compose file.",[8325,59187,59190],{"className":59188,"code":59189,"language":8330},[8328],"\n  broker4:\n    hostname: broker4\n    container_name: broker4\n    image: apachepulsar\u002Fpulsar:latest\n    restart: on-failure\n    command: >\n      bash -c \"bin\u002Fapply-config-from-env.py conf\u002Fbroker.conf && \\\n               bin\u002Fapply-config-from-env.py conf\u002Fpulsar_env.sh && \\\n               bin\u002Fwatch-znode.py -z $$zookeeperServers -p \u002Finitialized-$$clusterName -w && \\\n               exec bin\u002Fpulsar broker\"\n    environment:\n      clusterName: test\n      zookeeperServers: zk1:2181\n      configurationStore: zk1:2181\n      webSocketServiceEnabled: \"false\"\n      functionsWorkerEnabled: \"false\"\n      managedLedgerMaxEntriesPerLedger: 100\n      managedLedgerMinLedgerRolloverTimeMinutes: 0\n    volumes:\n      - .\u002Fapply-config-from-env.py:\u002Fpulsar\u002Fbin\u002Fapply-config-from-env.py\n    depends_on:\n      - zk1\n      - pulsar-init\n      - bk1\n      - bk2\n      - bk3\n      - bk4\n    networks:\n      pulsar:\n\n",[4926,59191,59189],{"__ignoreMap":18},[48,59193,59182],{},[8325,59195,59198],{"className":59196,"code":59197,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose create\n# output\nzk1 is up-to-date\nbk1 is up-to-date\nbk2 is up-to-date\nbk3 is up-to-date\nbroker1 is up-to-date\nbroker2 is up-to-date\nbroker3 is up-to-date\nCreating broker4 ... done\nproxy1 is up-to-date\n\n",[4926,59199,59197],{"__ignoreMap":18},[8325,59201,59204],{"className":59202,"code":59203,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose start broker4\n# output\nStarting broker4 ... done\n\n",[4926,59205,59203],{"__ignoreMap":18},[1666,59207,59208],{},[324,59209,59210],{},"Check the broker list.",[8325,59212,59215],{"className":59213,"code":59214,"language":8330},[8328],"\nbin\u002Fpulsar-admin brokers list test\n# output\nbroker4:8080\nbroker1:8080\nbroker2:8080\nbroker3:8080\n\n",[4926,59216,59214],{"__ignoreMap":18},[1666,59218,59219],{},[324,59220,59221],{},"Set a namespace isolation policy.",[8325,59223,59226],{"className":59224,"code":59225,"language":8330},[8328],"\nbin\u002Fpulsar-admin ns-isolation-policy set \\\n--auto-failover-policy-type min_available \\\n--auto-failover-policy-params min_limit=1,usage_threshold=80 \\\n--namespaces public\u002Fns-isolation \\\n--primary \"broker1:*,broker4:*\" \\\n--secondary \"broker2:*\" \\\ntest ns-broker-isolation\n\n",[4926,59227,59225],{"__ignoreMap":18},[1666,59229,59230],{},[324,59231,58993],{},[8325,59233,59236],{"className":59234,"code":59235,"language":8330},[8328],"\nbin\u002Fpulsar-admin ns-isolation-policy list test\n# output\nns-broker-isolation    NamespaceIsolationDataImpl(namespaces=[public\u002Fns-isolation], primary=[broker1:*, broker4:*], secondary=[broker2:*], autoFailoverPolicy=AutoFailoverPolicyDataImpl(policyType=min_available, parameters={min_limit=1, usage_threshold=80}))\n\n",[4926,59237,59235],{"__ignoreMap":18},[1666,59239,59240],{},[324,59241,59242],{},"Unload the namespace.",[8325,59244,59246],{"className":59245,"code":59154,"language":8330},[8328],[4926,59247,59154],{"__ignoreMap":18},[1666,59249,59250],{},[324,59251,59043],{},[48,59253,59254],{},"The topic should be owned by broker1 and broker4.",[8325,59256,59259],{"className":59257,"code":59258,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics partitioned-lookup public\u002Fns-isolation\u002Ft1\n# output\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-0    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-1    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-2    pulsar:\u002F\u002Fbroker4:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-3    pulsar:\u002F\u002Fbroker4:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-4    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-5    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-6    pulsar:\u002F\u002Fbroker4:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-7    pulsar:\u002F\u002Fbroker4:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-8    pulsar:\u002F\u002Fbroker1:6650\npersistent:\u002F\u002Fpublic\u002Fns-isolation\u002Ft1-partition-9    pulsar:\u002F\u002Fbroker1:6650\n\n",[4926,59260,59258],{"__ignoreMap":18},[3933,59262,59264],{"id":59263},"scale-down","Scale down",[1666,59266,59267],{},[324,59268,59269],{},"Remove broker4 from the namespace isolation policy.",[8325,59271,59273],{"className":59272,"code":58986,"language":8330},[8328],[4926,59274,58986],{"__ignoreMap":18},[1666,59276,59277],{},[324,59278,59139],{},[8325,59280,59282],{"className":59281,"code":58997,"language":8330},[8328],[4926,59283,58997],{"__ignoreMap":18},[1666,59285,59286],{},[324,59287,59288],{},"Stop broker4.",[8325,59290,59293],{"className":59291,"code":59292,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose stop broker4\n# output\nStopping broker4 ... done\n\n",[4926,59294,59292],{"__ignoreMap":18},[1666,59296,59297],{},[324,59298,59210],{},[8325,59300,59303],{"className":59301,"code":59302,"language":8330},[8328],"\nbin\u002Fpulsar-admin brokers list test\n# output\nbroker1:8080\nbroker2:8080\nbroker3:8080\n\n",[4926,59304,59302],{"__ignoreMap":18},[1666,59306,59307],{},[324,59308,59043],{},[8325,59310,59312],{"className":59311,"code":59019,"language":8330},[8328],[4926,59313,59019],{"__ignoreMap":18},[40,59315,59317],{"id":59316},"bookkeeper-isolation","BookKeeper Isolation",[1666,59319,59320],{},[324,59321,59322],{},"Get the bookie list.",[8325,59324,59327],{"className":59325,"code":59326,"language":8330},[8328],"\nbin\u002Fpulsar-admin bookies list-bookies\n# output\n{\n  \"bookies\" : [ {\n    \"bookieId\" : \"bk2:3181\"\n  }, {\n    \"bookieId\" : \"bk4:3181\"\n  }, {\n    \"bookieId\" : \"bk3:3181\"\n  }, {\n    \"bookieId\" : \"bk1:3181\"\n  } ]\n}\n\n",[4926,59328,59326],{"__ignoreMap":18},[1666,59330,59331],{},[324,59332,59333],{},"Set the bookie rack.",[48,59335,59336],{},"The default value of the configuration bookkeeperClientRackawarePolicyEnabled is true, so the RackawareEnsemblePlacementPolicy is the default bookie isolation policy, we'll set the rack name like this \u002Frack.",[8325,59338,59341],{"className":59339,"code":59340,"language":8330},[8328],"\nbin\u002Fpulsar-admin bookies set-bookie-rack \\\n--bookie bk1:3181 \\\n--hostname bk1:3181 \\\n--group group1 \\\n--rack \u002Frack1\n\nbin\u002Fpulsar-admin bookies set-bookie-rack \\\n--bookie bk3:3181 \\\n--hostname bk3:3181 \\\n--group group1 \\\n--rack \u002Frack1\n\nbin\u002Fpulsar-admin bookies set-bookie-rack \\\n--bookie bk2:3181 \\\n--hostname bk2:3181 \\\n--group group2 \\\n--rack \u002Frack2\n\nbin\u002Fpulsar-admin bookies set-bookie-rack \\\n--bookie bk4:3181 \\\n--hostname bk4:3181 \\\n--group group2 \\\n--rack \u002Frack2\n\n",[4926,59342,59340],{"__ignoreMap":18},[1666,59344,59345],{},[324,59346,59347],{},"Check the bookie racks placement.",[8325,59349,59352],{"className":59350,"code":59351,"language":8330},[8328],"\nbin\u002Fpulsar-admin bookies racks-placement\ngroup1    {bk1:3181=BookieInfoImpl(rack=\u002Frack1, hostname=bk1:3181), bk3:3181=BookieInfoImpl(rack=\u002Frack1, hostname=bk3:3181)}\ngroup2    {bk2:3181=BookieInfoImpl(rack=\u002Frack2, hostname=bk2:3181), bk4:3181=BookieInfoImpl(rack=\u002Frack2, hostname=bk4:3181)}\n\n",[4926,59353,59351],{"__ignoreMap":18},[1666,59355,59356],{},[324,59357,59358],{},"Set the bookie affinity group for the namespace.",[8325,59360,59363],{"className":59361,"code":59362,"language":8330},[8328],"\nbin\u002Fpulsar-admin namespaces set-bookie-affinity-group public\u002Fns-isolation \\\n--primary-group group1 \\\n--secondary-group group2\n\n",[4926,59364,59362],{"__ignoreMap":18},[1666,59366,59367],{},[324,59368,59369],{},"Check the namespace affinity group.",[8325,59371,59374],{"className":59372,"code":59373,"language":8330},[8328],"\nbin\u002Fpulsar-admin namespaces get-bookie-affinity-group public\u002Fns-isolation\n{\n  \"bookkeeperAffinityGroupPrimary\" : \"group1\",\n  \"bookkeeperAffinityGroupSecondary\" : \"group2\"\n}\n\n",[4926,59375,59373],{"__ignoreMap":18},[1666,59377,59378],{},[324,59379,59380],{},"Produce messages to the topic.",[8325,59382,59385],{"className":59383,"code":59384,"language":8330},[8328],"\nbin\u002Fpulsar-client produce -m 'hello' -n 500 public\u002Fns-isolation\u002Ft2\n\n",[4926,59386,59384],{"__ignoreMap":18},[1666,59388,59389],{},[324,59390,59391],{},"Get internal stats of the topic.",[8325,59393,59396],{"className":59394,"code":59395,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics stats-internal public\u002Fns-isolation\u002Ft2 | grep ledgerId | tail -n 6\n    \"ledgerId\" : 0,\n    \"ledgerId\" : 1,\n    \"ledgerId\" : 2,\n    \"ledgerId\" : 3,\n    \"ledgerId\" : 4,\n    \"ledgerId\" : -1,\n\n",[4926,59397,59395],{"__ignoreMap":18},[1666,59399,59400],{},[324,59401,59402,59403,190],{},"Check ledger ensembles for the ledgers ",[2628,59404,59405],{},"0, 1, 2, 3, 4",[8325,59407,59410],{"className":59408,"code":59409,"language":8330},[8328],"\n# execute these commands in the node bk1\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk1 \u002Fbin\u002Fbash\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 0\n# check ensembles\nensembles={0=[bk1:3181, bk3:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 1\n# check ensembles\nensembles={0=[bk3:3181, bk1:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 2\n# check ensembles\nensembles={0=[bk1:3181, bk3:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 3\n# check ensembles\nensembles={0=[bk1:3181, bk3:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 4\n# check ensembles\nensembles={0=[bk1:3181, bk3:3181]}\n\n",[4926,59411,59409],{"__ignoreMap":18},[1666,59413,59414],{},[324,59415,59416],{},"Stop bookie1.",[8325,59418,59421],{"className":59419,"code":59420,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose stop bk1\n\n",[4926,59422,59420],{"__ignoreMap":18},[1666,59424,59425],{},[324,59426,59380],{},[8325,59428,59430],{"className":59429,"code":59384,"language":8330},[8328],[4926,59431,59384],{"__ignoreMap":18},[1666,59433,59434],{},[324,59435,59436],{},"Check ledger metadata.",[8325,59438,59441],{"className":59439,"code":59440,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics stats-internal public\u002Fns-isolation\u002Ft2 | grep ledgerId | tail -n 6\n    \"ledgerId\" : 5,\n    \"ledgerId\" : 6,\n    \"ledgerId\" : 7,\n    \"ledgerId\" : 8,\n    \"ledgerId\" : 9,\n    \"ledgerId\" : -1,\n\n",[4926,59442,59440],{"__ignoreMap":18},[48,59444,59445,59446,59449],{},"Check ledger metadata for the newly added ledgers ",[2628,59447,59448],{},"5,6,7,8,9",". Because bookie1 is not usable and the configuration bookkeeperClientEnforceMinNumRacksPerWriteQuorum is false, we should find that the secondary bookies are used. Bookie3 is in the primary group so bookie3 is always used.",[8325,59451,59454],{"className":59452,"code":59453,"language":8330},[8328],"\n# execute these commands in the node bk2\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk2 \u002Fbin\u002Fbash\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 5\n# check ensembles\nensembles={0=[bk4:3181, bk3:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 6\n# check ensembles\nensembles={0=[bk3:3181, bk2:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 7\n# check ensembles\nensembles={0=[bk2:3181, bk3:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 8\n# check ensembles\nensembles={0=[bk3:3181, bk2:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 9\n# check ensembles\nensembles={0=[bk3:3181, bk2:3181]}\n\n",[4926,59455,59453],{"__ignoreMap":18},[48,59457,59458],{},"Restart bk1",[8325,59460,59463],{"className":59461,"code":59462,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose start bk1\n\n",[4926,59464,59462],{"__ignoreMap":18},[32,59466,59468],{"id":59467},"migrate-bookie-affinity-group","Migrate Bookie Affinity Group",[1666,59470,59471],{},[324,59472,59473],{},"Check the bookie affinity group.",[8325,59475,59477],{"className":59476,"code":59373,"language":8330},[8328],[4926,59478,59373],{"__ignoreMap":18},[1666,59480,59481],{},[324,59482,59483],{},"Modify the bookie affinity group of the namespace.",[8325,59485,59488],{"className":59486,"code":59487,"language":8330},[8328],"\nbin\u002Fpulsar-admin namespaces set-bookie-affinity-group public\u002Fns-isolation \\\n--primary-group group2\n\n",[4926,59489,59487],{"__ignoreMap":18},[1666,59491,59492],{},[324,59493,59473],{},[8325,59495,59498],{"className":59496,"code":59497,"language":8330},[8328],"\nbin\u002Fpulsar-admin namespaces get-bookie-affinity-group public\u002Fns-isolation\n{\n  \"bookkeeperAffinityGroupPrimary\" : \"group2\"\n}\n\n",[4926,59499,59497],{"__ignoreMap":18},[1666,59501,59502],{},[324,59503,59242],{},[8325,59505,59507],{"className":59506,"code":59154,"language":8330},[8328],[4926,59508,59154],{"__ignoreMap":18},[1666,59510,59511],{},[324,59512,59513],{},"Produce messages.",[8325,59515,59517],{"className":59516,"code":59384,"language":8330},[8328],[4926,59518,59384],{"__ignoreMap":18},[1666,59520,59521],{},[324,59522,59523],{},"Check the ensemble's bookies for newly created ledgers.",[8325,59525,59528],{"className":59526,"code":59527,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics stats-internal public\u002Fns-isolation\u002Ft2 | grep ledgerId | tail -n 6\n    \"ledgerId\" : 12,\n    \"ledgerId\" : 13,\n    \"ledgerId\" : 14,\n    \"ledgerId\" : 15,\n    \"ledgerId\" : 16,\n    \"ledgerId\" : -1,\n\n",[4926,59529,59527],{"__ignoreMap":18},[1666,59531,59532],{},[324,59533,59445,59534,190],{},[2628,59535,59536],{},"12, 13, 14, 15, 16",[8325,59538,59541],{"className":59539,"code":59540,"language":8330},[8328],"\n# execute these commands in the node bk2\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk2 \u002Fbin\u002Fbash\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 12\n# check ensembles\nensembles={0=[bk4:3181, bk2:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 13\n# check ensembles\nensembles={0=[bk4:3181, bk2:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 14\n# check ensembles\nensembles={0=[bk4:3181, bk2:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 15\n# check ensembles\nensembles={0=[bk4:3181, bk2:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 16\n# check ensembles\nensembles={0=[bk2:3181, bk4:3181]}\n\n",[4926,59542,59540],{"__ignoreMap":18},[32,59544,59546],{"id":59545},"scale-up-and-down-bookies","Scale up and down Bookies",[3933,59548,59177],{"id":59549},"scale-up-1",[1666,59551,59552],{},[324,59553,59554],{},"Add the following configuration in the docker-compose file.",[8325,59556,59559],{"className":59557,"code":59558,"language":8330},[8328],"\n  bk5:\n    hostname: bk5\n    container_name: bk5\n    image: apachepulsar\u002Fpulsar:latest\n    command: >\n      bash -c \"export dbStorage_writeCacheMaxSizeMb=\"${dbStorage_writeCacheMaxSizeMb:-16}\" && \\\n               export dbStorage_readAheadCacheMaxSizeMb=\"${dbStorage_readAheadCacheMaxSizeMb:-16}\" && \\\n               bin\u002Fapply-config-from-env.py conf\u002Fbookkeeper.conf && \\\n               bin\u002Fapply-config-from-env.py conf\u002Fpulsar_env.sh && \\\n               bin\u002Fwatch-znode.py -z $$zkServers -p \u002Finitialized-$$clusterName -w && \\\n               exec bin\u002Fpulsar bookie\"\n    environment:\n      clusterName: test\n      zkServers: zk1:2181\n      numAddWorkerThreads: 8\n      useHostNameAsBookieID: \"true\"\n    volumes:\n      - .\u002Fapply-config-from-env.py:\u002Fpulsar\u002Fbin\u002Fapply-config-from-env.py\n    depends_on:\n      - zk1\n      - pulsar-init\n    networks:\n      pulsar:\n\n",[4926,59560,59558],{"__ignoreMap":18},[1666,59562,59563],{},[324,59564,59565],{},"Start bookie5.",[8325,59567,59570],{"className":59568,"code":59569,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose create\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose start bk5\n\n",[4926,59571,59569],{"__ignoreMap":18},[1666,59573,59574],{},[324,59575,59576],{},"Check the readable and writable bookie list. Because bookie1 is stopped, there should be 4 bookies.",[8325,59578,59581],{"className":59579,"code":59580,"language":8330},[8328],"\n# execute this command in bk2\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk2 bin\u002Fbookkeeper shell listbookies -rw\n\n",[4926,59582,59580],{"__ignoreMap":18},[8325,59584,59587],{"className":59585,"code":59586,"language":8330},[8328],"\nReadWrite Bookies :\nBookieID:bk2:3181, IP:192.168.32.5, Port:3181, Hostname:bk2\nBookieID:bk4:3181, IP:192.168.32.7, Port:3181, Hostname:bk4\nBookieID:bk3:3181, IP:192.168.32.6, Port:3181, Hostname:bk3\nBookieID:bk1:3181, IP:192.168.32.4, Port:3181, Hostname:bk1\nBookieID:bk5:3181, IP:192.168.32.9, Port:3181, Hostname:bk5\n\n",[4926,59588,59586],{"__ignoreMap":18},[1666,59590,59591],{},[324,59592,59593],{},"Add the newly added bookie node to the primary group.",[8325,59595,59598],{"className":59596,"code":59597,"language":8330},[8328],"\nbin\u002Fpulsar-admin bookies set-bookie-rack \\\n--bookie bk5:3181 \\\n--hostname bk5:3181 \\\n--group group2 \\\n--rack \u002Frack2\n\n",[4926,59599,59597],{"__ignoreMap":18},[1666,59601,59602],{},[324,59603,59347],{},[8325,59605,59608],{"className":59606,"code":59607,"language":8330},[8328],"\nbin\u002Fpulsar-admin bookies racks-placement\ngroup1    {bk1:3181=BookieInfoImpl(rack=\u002Frack1, hostname=bk1:3181), bk3:3181=BookieInfoImpl(rack=\u002Frack1, hostname=bk3:3181)}\ngroup2    {bk2:3181=BookieInfoImpl(rack=\u002Frack2, hostname=bk2:3181), bk4:3181=BookieInfoImpl(rack=\u002Frack2, hostname=bk4:3181), bk5:3181=BookieInfoImpl(rack=\u002Frack2, hostname=bk5:3181)}\n\n",[4926,59609,59607],{"__ignoreMap":18},[1666,59611,59612],{},[324,59613,59242],{},[8325,59615,59617],{"className":59616,"code":59154,"language":8330},[8328],[4926,59618,59154],{"__ignoreMap":18},[1666,59620,59621],{},[324,59622,59623],{},"Produce messages to a new topic.",[8325,59625,59627],{"className":59626,"code":59384,"language":8330},[8328],[4926,59628,59384],{"__ignoreMap":18},[1666,59630,59631],{},[324,59632,59633],{},"Check the newly added ledger of the topic.",[8325,59635,59638],{"className":59636,"code":59637,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics stats-internal public\u002Fns-isolation\u002Ft2 | grep ledgerId | tail -n 6\n    \"ledgerId\" : 17,\n    \"ledgerId\" : 20,\n    \"ledgerId\" : 21,\n    \"ledgerId\" : 22,\n    \"ledgerId\" : 23,\n    \"ledgerId\" : -1,\n\n",[4926,59639,59637],{"__ignoreMap":18},[48,59641,59642],{},"Verify ledger ensembles, we could find that the new created ledgers are all wrote to primary group, because there are enough rw bookies.",[8325,59644,59647],{"className":59645,"code":59646,"language":8330},[8328],"\n# execute these commands in the node bk1\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk2 \u002Fbin\u002Fbash\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 17\n# check ensembles\nensembles={0=[bk5:3181, bk2:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 20\n# check ensembles\nensembles={0=[bk2:3181, bk4:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 21\n# check ensembles\nensembles={0=[bk5:3181, bk4:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 22\n# check ensembles\nensembles={0=[bk5:3181, bk4:3181]}\n\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 23\n# check ensembles\nensembles={0=[bk2:3181, bk4:3181]}\n\n",[4926,59648,59646],{"__ignoreMap":18},[32,59650,59264],{"id":59651},"scale-down-1",[1666,59653,59654],{},[324,59655,59656],{},"Check the placement of the racks.",[8325,59658,59660],{"className":59659,"code":59607,"language":8330},[8328],[4926,59661,59607],{"__ignoreMap":18},[1666,59663,59664],{},[324,59665,59666],{},"Delete the bookie from the affinity bookie group.",[8325,59668,59671],{"className":59669,"code":59670,"language":8330},[8328],"\nbin\u002Fpulsar-admin bookies delete-bookie-rack -b bk5:3181\n\n",[4926,59672,59670],{"__ignoreMap":18},[1666,59674,59675],{},[324,59676,59677],{},"Check if there are under-replicated ledgers, which should be expected because we deleted a bookie.",[8325,59679,59682],{"className":59680,"code":59681,"language":8330},[8328],"\n# execute these commands in the node bk2\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk2 bin\u002Fbookkeeper shell listunderreplicated\n\n",[4926,59683,59681],{"__ignoreMap":18},[1666,59685,59686],{},[324,59687,59688],{},"Stop the bookie.",[8325,59690,59693],{"className":59691,"code":59692,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose stop bk5\n\n",[4926,59694,59692],{"__ignoreMap":18},[1666,59696,59697],{},[324,59698,59699],{},"Decommission the bookie.",[8325,59701,59704],{"className":59702,"code":59703,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk2 bin\u002Fbookkeeper shell decommissionbookie -bookieid bk5:3181\n\n",[4926,59705,59703],{"__ignoreMap":18},[1666,59707,59708],{},[324,59709,59710],{},"Check ledgers in the decommissioned bookie.",[8325,59712,59715],{"className":59713,"code":59714,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk2 bin\u002Fbookkeeper shell listledgers -bookieid bk5:3181\n\n",[4926,59716,59714],{"__ignoreMap":18},[1666,59718,59719],{},[324,59720,59721],{},"List the bookies.",[8325,59723,59726],{"className":59724,"code":59725,"language":8330},[8328],"\n${DOCKER_COMPOSE_HOME}\u002Fdocker-compose exec bk2 bin\u002Fbookkeeper shell listbookies -rw\nReadWrite Bookies :\nBookieID:bk2:3181, IP:192.168.48.5, Port:3181, Hostname:bk2\nBookieID:bk4:3181, IP:192.168.48.7, Port:3181, Hostname:bk4\nBookieID:bk3:3181, IP:192.168.48.6, Port:3181, Hostname:bk3\nBookieID:bk1:3181, IP:192.168.48.4, Port:3181, Hostname:bk1\n\n",[4926,59727,59725],{"__ignoreMap":18},[40,59729,7126],{"id":1727},[1666,59731,59732,59735,59739,59743,59747,59754,59761],{},[324,59733,59734],{},"Read the previous blogs in this series to learn more about Pulsar isolation:",[324,59736,59737],{},[55,59738,58870],{"href":58869},[324,59740,59741],{},[55,59742,58886],{"href":58885},[324,59744,59745],{},[55,59746,58893],{"href":58892},[324,59748,59749,59750,59753],{},"Learn Pulsar Fundamentals with StreamNative Academy: If you are new to Pulsar, we recommend taking the ",[55,59751,36487],{"href":31912,"rel":59752},[264]," developed by the original creators of Pulsar.",[324,59755,59756,59757,58616],{},"Spin up a Pulsar cluster in minutes: ",[55,59758,59760],{"href":17075,"rel":59759},[264],"Sign up for StreamNative Cloud",[324,59762,59763,59764,57958],{},"Save your spot at the Pulsar Summit San Francisco: The first in-person Pulsar Summit is taking place this August! ",[55,59765,25339],{"href":35357,"rel":59766},[264],{"title":18,"searchDepth":19,"depth":19,"links":59768},[59769,59770,59771,59775,59780],{"id":42,"depth":19,"text":46},{"id":47823,"depth":19,"text":47824},{"id":58951,"depth":19,"text":58952,"children":59772},[59773,59774],{"id":59096,"depth":279,"text":59097},{"id":59172,"depth":279,"text":59173},{"id":59316,"depth":19,"text":59317,"children":59776},[59777,59778,59779],{"id":59467,"depth":279,"text":59468},{"id":59545,"depth":279,"text":59546},{"id":59651,"depth":279,"text":59264},{"id":1727,"depth":19,"text":7126},"2022-06-01","This blog provides a step-by-step tutorial on how to use a single cluster to achieve broker and bookie isolation.",{},"\u002Fblog\u002Fpulsar-isolation-part-iv-single-cluster-isolation","30 min read",{"title":38154,"description":59782},"blog\u002Fpulsar-isolation-part-iv-single-cluster-isolation",[38442,821],"Pf3D8FHhmM81rOXoDzUIFcxvIGyDygwhbONG8DDPTxs",{"id":59791,"title":46724,"authors":59792,"body":59793,"category":821,"createdAt":290,"date":60112,"description":60113,"extension":8,"featured":294,"image":60114,"isDraft":294,"link":290,"meta":60115,"navigation":7,"order":296,"path":46369,"readingTime":290,"relatedResources":290,"seo":60116,"stem":60117,"tags":60118,"__hash__":60119},"blogs\u002Fblog\u002Fspring-into-pulsar.md",[46357],{"type":15,"value":59794,"toc":60102},[59795,59797,59806,59815,59818,59821,59824,59831,59839,59842,59851,59857,59859,59865,59867,59869,59874,59877,59883,59889,59892,59895,59901,59904,59910,59915,59917,59923,59929,59937,59939,59945,59955,59961,59969,59971,59974,59982,59984,59986,60054,60056,60100],[40,59796,46363],{"id":46362},[48,59798,59799,59800,59805],{},"In this article we will discuss using the Java Framework, Spring, with Apache Pulsar. We will explain how to build Spring-based microservices in Java. For those who are not familiar with ",[55,59801,59804],{"href":59802,"rel":59803},"https:\u002F\u002Fspring.io\u002F",[264],"Spring",", it is impressive as it is the leading Java framework and has been around for almost 20 years! Spring makes building Java applications easier by providing the wiring and control needed for building applications. It removes the repetitive boilerplate code that one would have to write. It allows developers to quickly build microservices as REST APIs, web applications, console applications, and more. I highly recommend checking out this impressive framework.",[48,59807,59808,59809,59814],{},"To get started building your first application, check out the ",[55,59810,59813],{"href":59811,"rel":59812},"https:\u002F\u002Fstart.spring.io\u002F",[264],"Spring Starter Page",", which gives the full source code for a custom running application that you just need to add your business logic to. You will find a number of resources for the Apache Pulsar Spring boot.",[48,59816,59817],{},"In my examples, I will build simple Spring Boot applications that use dependency injection to provide our application with instantiated and configured Apache Pulsar connections for producing and consuming messages. I will also show off the flexibility of Apache Pulsar to work with other messaging protocols by sending and receiving messages with AMQP, Kafka, and MQTT.",[48,59819,59820],{},"Finally, I want to mention there is also an advanced Reactive framework that is a great option for developers building Reactive Pulsar applications in Spring.",[40,59822,59823],{"id":46395},"Building an Air-quality Application with Spring and Pulsar",[48,59825,59826,59827,59830],{},"Below is a diagram of the example application that I will build. As you can see Apache Pulsar is the lynchpin of this design. Pulsar acts as a router, gateway, messaging bus, and data distribution channel.\n",[384,59828],{"alt":18,"src":59829},"\u002Fimgs\u002Fblogs\u002F63be840ca272470b1265cfe9_image1.png","AirQuality Architecture from Ingest to Real-Time Analytics\nOne of the key reasons we use Apache Pulsar is for its ability to store and distribute messages at any scale to any number of clients and applications. This makes it easy to build on and use data without duplication, which is ideal for many purposes, such as ETL with Spark and Real-Time Continuous SQL Analytics with Flink. Pulsar also allows our Spring microservices to interoperate seamlessly with services written in other languages, such as Go, Python, C#, C++, Node.JS, and more.",[48,59832,59833,59834,59838],{},"Here is the ",[55,59835,53164],{"href":59836,"rel":59837},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=yYLLyiBo8nM",[264]," for my example application.",[48,59840,59841],{},"After building the empty application with Spring Boot starter, we need to add a few things to our maven build pom. You can also build with Gradle.",[48,59843,59844,59845,59850],{},"First, we set the version of Pulsar to build against. For this example, I chose ",[55,59846,59849],{"href":59847,"rel":59848},"https:\u002F\u002Fpulsar.apache.org\u002Fen\u002Frelease-notes\u002F",[264],"Pulsar 2.10.0",". I am also using JDK 1. At this point, we should not be using JDK 8 because JDK 17 will become the standard version soon.",[8325,59852,59855],{"className":59853,"code":59854,"language":8330},[8328],"\n       11\n       2.10.0\n    \n",[4926,59856,59854],{"__ignoreMap":18},[48,59858,46418],{},[8325,59860,59863],{"className":59861,"code":59862,"language":8330},[8328],"\n        org.apache.pulsar\n        pulsar-client\n        ${pulsar.version}\n    \n    \n        org.apache.pulsar\n        pulsar-client-admin\n        ${pulsar.version}\n    \n    \n        org.apache.pulsar\n        pulsar-client-original\n        ${pulsar.version}\n        pom\n    \n",[4926,59864,59862],{"__ignoreMap":18},[48,59866,46427],{},[48,59868,46436],{},[8325,59870,59872],{"className":59871,"code":46440,"language":8330},[8328],[4926,59873,46440],{"__ignoreMap":18},[48,59875,59876],{},"We need to populate our configuration file (application.resources) with the necessary values to connect to our cluster and ingest data. This file is typically in src\u002Fmain\u002Fresources.",[8325,59878,59881],{"className":59879,"code":59880,"language":8330},[8328],"airnowapi.url=${AIRPORTNOWAPIURL}\ntopic.name=persistent:\u002F\u002Fpublic\u002Fdefault\u002Fairquality\nproducer.name=airquality\nsend.timeout=60\nsecurity.mode=off\npulsar.service.url=pulsar:\u002F\u002Fpulsar1:6650\n#security.mode=on\n#pulsar.service.url=pulsar+ssl:\u002F\u002Fdemo.sndemo.snio.cloud:6651\npulsar.oauth2.audience=urn:sn:pulsar:sndemo:demo-cluster\npulsar.oauth2.credentials-url=file:\u002F\u002F\u002Fcr\u002Fsndemo-tspann.json\npulsar.oauth2.issuer-url=https:\u002F\u002Fauth.streamnative.cloud\u002F\nserver.port=8999\n#kafka\nkafka.bootstrapAddress=pulsar1:9092\nkafka.topic.name=airqualitykafka\n#mqtt\nmqtt.automaticReconnect=true\nmqtt.cleanSession=true\nmqtt.connectionTimeout=60\nmqtt.clientId=airquality-MQTT\nmqtt.hostname=pulsar1\nmqtt.port=1883\nmqtt.topic=airqualitymqtt\n#amqp\u002Frabbitmq\namqp.server=pulsar1:5672\namqp.topic=amqp-airquality\n",[4926,59882,59880],{"__ignoreMap":18},[48,59884,59885,59886,46459],{},"If you notice there is a security.mode and a pulsar.service.url that are commented out, these are so I can switch between my unsecured development environment and my production StreamNative hosted cloud version. We could automate this or use environment variables to make this more production quality. The airnowapi.url variable is set by the environment and includes a custom token to access Air Now REST feeds. You will need to ",[55,59887,29176],{"href":46457,"rel":59888},[264],[48,59890,59891],{},"We can now start building our application. First, we will need to configure our connection to our Apache Pulsar cluster.",[48,59893,59894],{},"We create a Spring Configuration class that will instantiate a Pulsar client. We need a number of parameters using @Value tags to inject them from our application.properties file.",[8325,59896,59899],{"className":59897,"code":59898,"language":8330},[8328],"@Configuration \npublic class PulsarConfig {\n    @Value(\"${pulsar.service.url}\")\n    String pulsarUrl;\n    @Value(\"${security.mode:off}\")\n    String securityMode;\n    @Value(\"${pulsar.oauth2.audience}\")\n    String audience;\n    @Value(\"${pulsar.oauth2.credentials-url}\")\n    String credentialsUrl;\n    @Value(\"${pulsar.oauth2.issuer-url}\")\n    String issuerUrl;\n\n    @Bean\n    public org.apache.pulsar.client.api.PulsarClient pulsarClient() {\n        PulsarClient client = null;\n\n        if (securityMode.equalsIgnoreCase(OFF)) {\n            try {\n                client = PulsarClient.builder().serviceUrl(pulsarUrl).build();\n            } catch (PulsarClientException e) {\n                e.printStackTrace();\n                client = null;\n            }\n        } else {\n            try {\n                try {\n                    client = PulsarClient.builder()\n                            .serviceUrl(pulsarUrl)\n                            .authentication(\n                              AuthenticationFactoryOAuth2.clientCredentials(\n                              new URL(issuerUrl),\n                              new URL(credentialsUrl),audience)\n                             ).build();\n                } catch (MalformedURLException e) {\n                    e.printStackTrace();\n                }\n            } catch (PulsarClientException e) {\n                e.printStackTrace();\n                client = null;\n            }\n        }\n        return client;\n    }\n}\n",[4926,59900,59898],{"__ignoreMap":18},[48,59902,59903],{},"We can now configure a producer to use in our service.",[8325,59905,59908],{"className":59906,"code":59907,"language":8330},[8328],"@Configuration\npublic class PulsarProducerConfig {\n    @Value(\"${producer.name:producername}\")\n    String producerName;\n\n    @Value(\"${topic.name:airquality}\")\n    String topicName;\n\n    @Autowired\n    PulsarClient pulsarClient;\n\n    @Bean\n    public Producer  getProducer() {\n        ProducerBuilder producerBuilder = pulsarClient.newProducer(JSONSchema.of(Observation.class))\n           .topic(topicName)\n           .producerName(producerName)\n           .sendTimeout(60, TimeUnit.SECONDS);\n\n        Producer producer = null;\n        try {\n            producer = producerBuilder.create();\n        } catch (PulsarClientException e1) {\n            e1.printStackTrace();\n        }\n        return producer;\n    }\n}\n",[4926,59909,59907],{"__ignoreMap":18},[48,59911,46480,59912,46486],{},[55,59913,46485],{"href":46483,"rel":59914},[264],[32,59916,46490],{"id":46489},[48,59918,46493,59919,190],{},[55,59920,59922],{"href":46606,"rel":59921},[264],"in this Github repo",[8325,59924,59927],{"className":59925,"code":59926,"language":8330},[8328],"@Service\npublic class PulsarService {\n    @Autowired\n    PulsarClient pulsarClient;\n\n    @Autowired\n    Producer producer;\n\n    public MessageId sendObservation(Observation observation) {\n        if (observation == null) {\n            return null;\n        }\n        UUID uuidKey = UUID.randomUUID();\n        MessageId msgID = null;\n        try {\n            msgID = producer.newMessage()\n                    .key(uuidKey.toString())\n                    .value(observation)\n                    .send();\n        } catch (PulsarClientException e) {\n            e.printStackTrace();\n        }\n        return msgID;\n    }\n}\n",[4926,59928,59926],{"__ignoreMap":18},[48,59930,59931,59934],{},[384,59932],{"alt":18,"src":59933},"\u002Fimgs\u002Fblogs\u002F63be840cc6313b8bac487dfb_image2.jpeg",[384,59935],{"alt":18,"src":59936},"\u002Fimgs\u002Fblogs\u002F63be840d3f5b75e8d85fb9bd_image7.png",[32,59938,24840],{"id":46507},[48,59940,59941,59942,190],{},"Now that we have sent messages, we can also read them with Spring. In this section, we will build a consumer application to test ingesting the data. If we want to add logic, routing, or transformations to the events in one or more topics, we could use a Pulsar Function that we can write in Java, Python, or Go to achieve this instead of a Spring Boot microservices. I chose to do both. The source code for the Pulsar Spring Boot Consumer is ",[55,59943,59922],{"href":46623,"rel":59944},[264],[48,59946,59947,59948,59951,59952,59954],{},"An example Java Pulsar Function for processing air quality data is available ",[55,59949,59922],{"href":46517,"rel":59950},[264],". As you can see in our architecture diagram below, Functions, Microservices, Spark jobs and Flink jobs can all collaborate as part of real-time data pipelines with ease.\n",[384,59953],{"alt":18,"src":59829},"\nWe can reuse the connection configuration that we have from the Producer, but we need a configuration to produce our Consumer. The configuration class for the Consumer will need the consumer name, subscription name and topic name from the application.properties file. In the code we set the subscription type and starting point to Shared and Earliest. We are also using the JSON Schema for Observation as used in the Pulsar Producer.",[8325,59956,59959],{"className":59957,"code":59958,"language":8330},[8328],"@Configuration\npublic class PulsarConsumerConfig {\n    @Autowired\n    PulsarClient pulsarClient;\n\n    @Value(\"${consumer.name:consumerName}\")\n    String consumerName;\n\n    @Value(\"${topic.name:airquality}\")\n    String topicName;\n\n    @Value(\"${subscription.name:airqualitysubscription}\")\n    String subscriptionName;\n\n    @Bean\n    public Consumer getConsumer() {\n        Consumer pulsarConsumer = null;\n        ConsumerBuilder consumerBuilder =\n        pulsarClient.newConsumer(JSONSchema.of(Observation.class))\n                       .topic(topicName)\n                       .subscriptionName(subscriptionName)\n                       .subscriptionType(SubscriptionType.Shared)                     .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest)\n                       .consumerName(consumerName);\n        try {\n            pulsarConsumer = consumerBuilder.subscribe();\n        } catch (PulsarClientException e) {\n            e.printStackTrace();\n        }\n        return pulsarConsumer;\n    }\n}\n",[4926,59960,59958],{"__ignoreMap":18},[48,59962,59963,59964,59966],{},"As we can see it is very easy to run the consumer. After we receive the event as a plain old Java object (POJO), we can do whatever we want with the data. For example, you could use another Spring library to store to a database, send to a REST service, or store to a file.\n",[384,59965],{"alt":18,"src":51684},[384,59967],{"alt":18,"src":59968},"\u002Fimgs\u002Fblogs\u002F63be840dc42a961b56adfc36_image3.png",[40,59970,2125],{"id":2122},[48,59972,59973],{},"We explored a number of protocols for communicating with Apache Pulsar clusters, but we did not explore all of them. We could also use RocketMQ, Websockets, or communicate via JDBC to the Pulsar SQL (Presto SQL) layer.",[48,59975,59976,59977,38617],{},"I also highly recommend that if you are interested in high-speed reactive applications, give the Reactive Pulsar library a try. It is a fast, impressive library that could have its own full article. Check out this ",[55,59978,59981],{"href":59979,"rel":59980},"https:\u002F\u002Fgithub.com\u002Flhotari\u002Freactive-iot-backend-ApacheCon2021",[264],"talk done by Lari Hotari at ApacheCon 2021",[48,59983,46594],{},[40,59985,4135],{"id":4132},[321,59987,59988,59998,60005,60012,60019,60026,60033,60040,60047],{},[324,59989,59990,758,59993],{},[2628,59991,59992],{},"Slides",[55,59994,59997],{"href":59995,"rel":59996},"https:\u002F\u002Fwww.slideshare.net\u002Fbunkertor\u002Fthe-dream-stream-team-for-pulsar-and-spring",[264],"The Dream Stream Team for Pulsar and Spring",[324,59999,60000,758,60002],{},[2628,60001,46603],{},[55,60003,46608],{"href":46606,"rel":60004},[264],[324,60006,60007,758,60009],{},[2628,60008,46603],{},[55,60010,46616],{"href":46517,"rel":60011},[264],[324,60013,60014,758,60016],{},[2628,60015,46603],{},[55,60017,46625],{"href":46623,"rel":60018},[264],[324,60020,60021,758,60023],{},[2628,60022,46603],{},[55,60024,46634],{"href":46632,"rel":60025},[264],[324,60027,60028,758,60030],{},[2628,60029,46603],{},[55,60031,46642],{"href":46575,"rel":60032},[264],[324,60034,60035,758,60037],{},[2628,60036,39680],{},[55,60038,46652],{"href":46650,"rel":60039},[264],[324,60041,60042,758,60044],{},[2628,60043,39680],{},[55,60045,46661],{"href":46659,"rel":60046},[264],[324,60048,60049,758,60051],{},[2628,60050,42753],{},[55,60052,46670],{"href":51763,"rel":60053},[264],[40,60055,58598],{"id":58597},[1666,60057,60058,60066,60071,60076,60079,60087,60094],{},[324,60059,51819,60060,1154,60063,60065],{},[55,60061,36487],{"href":36485,"rel":60062},[264],[55,60064,36491],{"href":36490}," developed by the original creators of Pulsar. This will get you started with Pulsar and help accelerate your streaming.",[324,60067,51828,60068,58616],{},[55,60069,3550],{"href":17075,"rel":60070},[264],[324,60072,59763,60073,57958],{},[55,60074,25339],{"href":35357,"rel":60075},[264],[324,60077,60078],{},"Build microservices with Pulsar: If you are interested in learning more about microservices and Pulsar, take a look at the following resources:",[324,60080,60081,46714,60083,51839,60085,190],{},[2628,60082,46713],{},[55,60084,267],{"href":51838},[55,60086,267],{"href":51838},[324,60088,60089,758,60091],{},[2628,60090,42753],{},[55,60092,51850],{"href":58635,"rel":60093},[264],[324,60095,60096,758,60098],{},[2628,60097,40436],{},[55,60099,51857],{"href":44957},[48,60101,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":60103},[60104,60105,60109,60110,60111],{"id":46362,"depth":19,"text":46363},{"id":46395,"depth":19,"text":59823,"children":60106},[60107,60108],{"id":46489,"depth":279,"text":46490},{"id":46507,"depth":279,"text":24840},{"id":2122,"depth":19,"text":2125},{"id":4132,"depth":19,"text":4135},{"id":58597,"depth":19,"text":58598},"2022-05-26","Explore the ease of integrating Apache Pulsar with Spring Boot using Spring for Pulsar. Learn how to leverage the Spring ecosystem to quickly and easily develop Pulsar-based applications. A step by step guide for developers looking to simplify their work with Pulsar and Spring.","\u002Fimgs\u002Fblogs\u002F63be83f2ac4c2ab747f5238c_spring-top.png",{},{"title":46724,"description":60113},"blog\u002Fspring-into-pulsar",[821,8058],"zy7kHbQGFOg2NFj4wUlyk3X82OX7NWKvoCLWI3TYGGQ",{"id":60121,"title":60122,"authors":60123,"body":60125,"category":821,"createdAt":290,"date":60428,"description":60429,"extension":8,"featured":294,"image":60430,"isDraft":294,"link":290,"meta":60431,"navigation":7,"order":296,"path":60432,"readingTime":4475,"relatedResources":290,"seo":60433,"stem":60434,"tags":60435,"__hash__":60436},"blogs\u002Fblog\u002Fdeep-dive-into-message-chunking-in-pulsar.md","Deep Dive into Message Chunking in Pulsar",[60124],"Zike Yang",{"type":15,"value":60126,"toc":60413},[60127,60130,60133,60136,60139,60150,60154,60157,60161,60167,60171,60174,60177,60180,60183,60187,60190,60194,60197,60202,60205,60209,60222,60225,60229,60236,60247,60251,60254,60258,60261,60264,60273,60277,60280,60283,60291,60295,60298,60301,60304,60307,60310,60316,60319,60323,60326,60337,60346,60350,60363,60365,60373,60376,60411],[48,60128,60129],{},"Apache Pulsar™, like all messaging systems, imposes a size limit on each message sent to the broker. This prevents the payload of each message from exceeding the maxMessageSize set in the broker. The default value of the broker configuration maxMessageSize is 5 MB.",[48,60131,60132],{},"However, many users need the Pulsar client to send large messages to the broker for use cases such as image processing and audio processing. You can achieve this by adjusting the maxMessageSize; however, this approach can cause many problems. For example, if a client publishes a message of 100 MB and the broker allows storing this message into the bookie, then the bookie will spend too many resources on processing this message. This will impact other clients on publishing and cause backlog draining.",[48,60134,60135],{},"Therefore, instead of increasing the value of maxMessageSize, Pulsar provides a message chunking feature to enable sending large messages. With message chunking, the producer can split a large message into multiple chunks based on maxMessageSize and send each chunk to the broker as an ordinary message. The consumer then combines the chunks back to the original message.",[48,60137,60138],{},"In this blog, we will explain the concept of message chunking, deep dive into its implementation, and share best practices for this feature, including:",[321,60140,60141,60144,60147],{},[324,60142,60143],{},"How to use message chunking correctly.",[324,60145,60146],{},"Issues you may encounter when using chunk messages.",[324,60148,60149],{},"How to debug chunked messages.",[40,60151,60153],{"id":60152},"how-message-chunking-works","How Message Chunking Works",[48,60155,60156],{},"Message chunking is a process by which a large message can be split into multiple chunks for production and consumption. When using message chunking, you don’t need to worry about how Pulsar splits large messages and combines them or the detail of handling chunked messages. Pulsar does all of these things for you. In this section, let's look at how message chunking works in different scenarios.",[32,60158,60160],{"id":60159},"a-single-producer-publishes-chunked-messages-to-a-topic","A single producer publishes chunked messages to a topic",[48,60162,60163],{},[384,60164],{"alt":60165,"src":60166},"Splitting Chunked Messages","\u002Fimgs\u002Fblogs\u002F63b3f14d3e6819ff036567df_screen-shot-2022-05-10-at-9.25.49-am.png",[32,60168,60170],{"id":60169},"publishing-chunked-messages","Publishing Chunked Messages",[48,60172,60173],{},"Once the large message is split into chunks, the producer sends each chunk as an ordinary message to the broker. Each chunk is still subject to the flow control of the producer and the memory limiter of the client as if it were an ordinary message. There is a maxPendingMessages parameter in the producer configuration that limits the maximum number of messages the producer can publish concurrently. Each chunk is counted individually in maxPendingMessages. This means that sending a large message with three chunks will take up three messages in the producer for the pending message.",[48,60175,60176],{},"When sending each chunk, an individual OpSendMsg is created for each chunk. Each OpSendMsg shares the same ChunkMessageCtx that is used to return the chunked message ID to the user. Each chunk also shares the same uuid of the chunked message in its message metadata.",[48,60178,60179],{},"The producer sends each chunk to the broker in order, and the broker receives the chunks in order. This ensures that the entire chunked message is sent successfully when the publishing acknowledgment of the last chunk is received. This also ensures that all of the chunks of a chunked message are stored in the topic in order. The consumer relies on this ordering guarantee to consume the chunks in order.",[48,60181,60182],{},"Regarding a partitioned topic, the producer publishes all of the chunks of the same large message to the same partition.",[32,60184,60186],{"id":60185},"chunked-message-id","Chunked Message ID",[48,60188,60189],{},"In Pulsar, after an ordinary message is published or consumed, its message ID is returned to the user. For chunked messages, how does Pulsar return the message ID to the user?",[3933,60191,60193],{"id":60192},"before-pulsar-2100","Before Pulsar 2.10.0",[48,60195,60196],{},"Before Pulsar 2.10.0, the producer or consumer only returns the message ID of the last chunk as the message ID of the entire chunked message. This implementation sometimes raises issues. For example, when we use this message ID to seek, the consumer will consume from the position of the last chunk. The consumer will mistakenly think the previous chunks are lost and choose to skip the current message. If we use inclusive seek, the consumer may skip the first message, which causes unexpected behavior.",[48,60198,60199],{},[384,60200],{"alt":60193,"src":60201},"\u002Fimgs\u002Fblogs\u002F63b3f14d79950d580d8c90d7_screen-shot-2022-05-10-at-9.26.22-am.png",[48,60203,60204],{},"As shown in the image above, the consumer returns the message of the last chunk as the message ID of the chunked message to the user. If the consumer uses this message ID to do an inclusive seek, in the broker's view, the consumer is seeking M1-C3. According to the semantics of inclusive seek, the first message consumed by the consumer after performing an inclusive seek should be the message of the current seek position. So the first message to be consumed by the consumer after the seek is supposed to be message M1. But in fact, the first message that comes to the consumer is M1-C3. The consumer then discovers that it has not received the previous chunks of that chunked message and cannot continue to receive the previous chunks, so it will discard M1. Therefore, the first message consumed is actually M2. This is an unexpected behavior.",[3933,60206,60208],{"id":60207},"introducing-chunk-message-id-in-pulsar-2100","Introducing Chunk Message ID in Pulsar 2.10.0",[48,60210,60211,60212,60217,60218,60221],{},"To solve this issue, Pulsar introduced the feature of chunk message ID in version 2.10.0. Issue ",[55,60213,60216],{"href":60214,"rel":60215},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F12402",[264],"PIP 107"," details the proposal of this feature. The chunk message ID is consistent with the original behavior to achieve compatibility with the original logic. It contains two ordinary message IDs: the message ID of the first chunk and the one of the last chunk. Both the producer and the consumer use chunked message context to generate the chunked message ID.\n",[384,60219],{"alt":18,"src":60220},"\u002Fimgs\u002Fblogs\u002F63b3f14d33ad8cbc3a7be3e7_screen-shot-2022-05-10-at-9.27.03-am.png","\nAs shown in the figure above, the producer gets the message ID of the first chunk (corresponding to “first chunk mid”) and the message ID of the last chunk while receiving the publishing acknowledgment of each chunk. The producer buffers them in the chunked message context and generates the chunk message ID after receiving the last chunked message ID. The same is true for the consumer. The difference is that the consumer gets the message ID by receiving the message.",[48,60223,60224],{},"The chunk message ID feature not only fixes the issues caused by seeking, but also allows you to get more information about the chunked message.",[32,60226,60228],{"id":60227},"combining-chunks","Combining Chunks",[48,60230,60231,60232,60235],{},"The consumer needs to combine all of the chunks into the original message before returning it to the application. The consumer uses the chunked message context to buffer all the chunk data, such as payload, metadata, and message ID. When processing chunked messages, the consumer assumes that the chunks are received in order and will discard the whole chunked message if the received chunks are out of order.\n",[384,60233],{"alt":18,"src":60234},"\u002Fimgs\u002Fblogs\u002F63b3f14d788d5d75f421f28b_screen-shot-2022-05-10-at-9.27.33-am.png","\nSuppose we have published two large messages, “abc” and “de,” as shown in the image above. They are waiting to be consumed on the topic. The maxMessageSize for the broker is set too small (just 1 byte), resulting in a small payload size in every chunk.",[48,60237,60238,60239,60242,60243,60246],{},"When the consumer consumes the first message, it finds the message to be a chunked message and creates a ChunkedMessageCtx. (We did not list all of the fields in the chunked message context in the example above.) The uuid uniquely identifies the chunked message so that we can know in which context the chunk should be placed. The lastChunkedMessageId in the context means the chunked ID of the last received chunk. It will be updated whenever the consumer receives a new chunk. The payload of the context is the payload of the entire chunked message currently buffered. It will keep growing as the consumer receives more chunks.\n",[384,60240],{"alt":18,"src":60241},"\u002Fimgs\u002Fblogs\u002F63b3f14d3e68191146656859_screen-shot-2022-05-10-at-9.28.10-am.png","\nOnce the consumer receives all the chunks of the message with the uuid of uuid1, it can use the chunked message context to generate the original message. The complete message is then returned to the application and the consumer releases that context. Note that because we have received a chunk of message uuid2 during this process, a second chunked message context is created in the consumer.\n",[384,60244],{"alt":18,"src":60245},"\u002Fimgs\u002Fblogs\u002F63b3f14de4af5c198bc1e4df_screen-shot-2022-05-10-at-9.28.51-am.png","\nJust like the previous example, when the consumer receives all of the chunks of uuid2, it generates a new message from the chunked message context and returns the complete message to the application.",[40,60248,60250],{"id":60249},"best-practices-for-message-chunking","Best Practices for Message Chunking",[48,60252,60253],{},"In this section, we share some best practices for using message chunking.",[40,60255,60257],{"id":60256},"dont-use-large-message-metadata","Don’t use large message metadata",[48,60259,60260],{},"It is not recommended to set very large message metadata in chunked messages. The producer often publishes chunks with the maximum payload size. In the process of writing a chunk to a bookie, if the header part (which includes the message metadata) of the chunk exceeds 10 KB (the padding max frame size for a bookie), there will be an error.",[48,60262,60263],{},"The maxMessageSize limits the size of the payload for each message from the client to the broker, but it doesn't count the size of the header. In BookKeeper, there is a similar setting called nettyMaxFrameSizeBytes that limits the size of writing to each message. Any message sent to the BookKeeper larger than nettyMaxFrameSizeBytes will be rejected. In BookKeeper, the message size calculation includes the message header and payload. There are differences in the way the BookKeeper and the broker calculate maxMessageSize. As a result, the broker will reserve some padding size for the message's header, and the value is 10 KB (which is defined in Commands.MESSAGE_SIZE_FRAME_PADDING). The broker will set nettyMaxFrameSizeBytes to maxMessageSize plus 10 KB when establishing a connection with the BookKeeper.",[48,60265,60266,60267,60272],{},"For this reason, you need to ensure the size of the message header does not exceed 10 KB when sending large messages. You should also not exceed this limit when setting values of the key and other properties of large messages. There has been a ",[55,60268,60271],{"href":60269,"rel":60270},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F13591",[264],"new proposal"," to lift this limitation by including the header part on the client-side when splitting the message.",[40,60274,60276],{"id":60275},"topic-level-maxmessagesize-restriction","Topic level maxMessageSize restriction",[48,60278,60279],{},"It is not recommended to set the topic level maxMessageSize for a topic if you want to publish chunked messages to that topic.",[48,60281,60282],{},"The topic level maxMessageSize was introduced in Pulsar 2.7.1, and it can cause some problems when message chunking is enabled. As mentioned above, the chunked message splitting uses broker level maxMessageSize as the chunk size. In most use cases, the topic level maxMessageSize is always less or equal to the broker level maxMessageSize. In this case, publishing a chunked message will be rejected by the broker. Because the payload size of some chunks in this message will reach the broker level maxMessageSize and exceed topic level maxMessageSize, causing the broker to reject it. In addition, the broker calculates the header and payload together when checking whether the message exceeds the topic level maxMessageSize. Therefore, when we use the chunked message feature, you should be careful not to set the topic level maxMessageSize on that topic.",[48,60284,60285,60286,38617],{},"A fix to this known issue has been released in Pulsar 2.10. The topic level maxMessageSize check for chunked messages is removed. Upgrade your Pulsar version to 2.10 to get this fix. Read ",[55,60287,60290],{"href":60288,"rel":60289},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F13544",[264],"PIP-131",[40,60292,60294],{"id":60293},"best-practices-for-the-consumer","Best practices for the consumer",[48,60296,60297],{},"You can use maxPendingChunkedMessage, expireTimeOfIncompleteChunkedMessage, and autoAckOldestChunkedMessageOnQueueFull in the consumer configuration to control the memory used in the consumer for chunked messages.",[48,60299,60300],{},"The chunked message contexts are buffered in the client’s memory. As the consumer creates more context, it may take up too much memory and lead to an out-of-memory error. Therefore, Pulsar introduced maxPendingChunkedMessage in the consumer configuration. It limits the maximum number of chunked message contexts that a consumer can maintain concurrently.",[48,60302,60303],{},"In addition, you can also set the expiration time of the chunked message context by setting the expireTimeOfIncompleteChunkedMessage in the consumer configuration. If the producer fails to publish all of the chunks of a message, resulting in the consumer unable to receive all of the chunks with the expiration time, the consumer will then expire incomplete chunks. The default value is one minute.",[48,60305,60306],{},"You can also delete the oldest context when the maximum number of pending chunked messages has reached the context is expired using a setting called autoAckOldestChunkedMessageOnQueueFull. If the setting is set to true, the consumer will drop the chunked message context that you want to delete by silently acknowledging it; if not, the consumer will mark the message for redelivery.",[48,60308,60309],{},"Below is an example of how to configure message chunking on the consumer.",[8325,60311,60314],{"className":60312,"code":60313,"language":8330},[8328],"Consumer consumer = client.newConsumer()\n        .topic(topic)\n        .subscriptionName(\"test\")\n        .autoAckOldestChunkedMessageOnQueueFull(true)\n        .maxPendingChunkedMessage(100)\n        .expireTimeOfIncompleteChunkedMessage(10, TimeUnit.MINUTES)\n        .subscribe();\n",[4926,60315,60313],{"__ignoreMap":18},[48,60317,60318],{},"It is also not recommended to publish chunked messages that are too large as it can lead to high memory usage on the consumer side. Although the consumer can limit the number of chunked messages that can be buffered at the same time, there is no easy way to control the amount of memory used by buffered chunked messages.",[40,60320,60322],{"id":60321},"best-practices-for-debugging","Best practices for debugging",[48,60324,60325],{},"In the broker, you can debug message chunking by checking certain stats in a topic. Below are three commonly used ones:",[1666,60327,60328,60331,60334],{},[324,60329,60330],{},"msgChunkPublished: A boolean type. It shows whether this topic has a chunked message published on it.",[324,60332,60333],{},"​​chunkedMessageRate in publishers: It shows the total count of chunked messages received for this producer on this topic.",[324,60335,60336],{},"chunkedMessageRate in subscriptions and consumers: It tells you the chunked message dispatch rate in the subscription or the consumer.",[48,60338,60339,60340,60345],{},"Read ",[55,60341,60344],{"href":60342,"rel":60343},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fadmin-api-topics\u002F#get-stats",[264],"Manage Topics"," in the Pulsar documentation to learn more.",[40,60347,60349],{"id":60348},"upgrade-your-pulsar-version","Upgrade your Pulsar version",[48,60351,60352,60353,4003,60358,190],{},"It’s best to keep your Pulsar version up to date as the Pulsar community continues to optimize message chunking. We recommend updating your Pulsar version to 2.10 or later. There are important bug fixes for message chunking, such as ",[55,60354,60357],{"href":60355,"rel":60356},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F13454",[264],"fixing memory leak with the flow control of chunked messages",[55,60359,60362],{"href":60360,"rel":60361},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12403",[264],"fixing the issue with seeking chunked messages",[40,60364,36477],{"id":36476},[48,60366,60367,60368,190],{},"If you have any better ideas or encounter any issues when using message chunking, please feel free to create an issue in the ",[55,60369,60372],{"href":60370,"rel":60371},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues",[264],"Pulsar repo",[48,60374,60375],{},"You can find more details about the message chunking implementation in the following links:",[321,60377,60378,60385,60395,60401],{},[324,60379,60380],{},[55,60381,60384],{"href":60382,"rel":60383},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-37:-Large-message-size-handling-in-Pulsar",[264],"PIP 37: Large message size handling in Pulsar",[324,60386,60387],{},[55,60388,60390,60391,60394],{"href":60389},"%5Bhttps:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F4400%5D(https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F4400)","PR: PIP 37: ",[2628,60392,60393],{},"Pulsar-client"," support large message size",[324,60396,60397],{},[55,60398,60400],{"href":60214,"rel":60399},[264],"PIP 107: Introduce the chunk message ID",[324,60402,60403],{},[55,60404,60406,60407,60410],{"href":60405},"%5Bhttps:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12403%5D(https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12403)","PR: [PIP 107]",[2628,60408,60409],{},"Client"," Introduce chunk message ID",[48,60412,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":60414},[60415,60421,60422,60423,60424,60425,60426,60427],{"id":60152,"depth":19,"text":60153,"children":60416},[60417,60418,60419,60420],{"id":60159,"depth":279,"text":60160},{"id":60169,"depth":279,"text":60170},{"id":60185,"depth":279,"text":60186},{"id":60227,"depth":279,"text":60228},{"id":60249,"depth":19,"text":60250},{"id":60256,"depth":19,"text":60257},{"id":60275,"depth":19,"text":60276},{"id":60293,"depth":19,"text":60294},{"id":60321,"depth":19,"text":60322},{"id":60348,"depth":19,"text":60349},{"id":36476,"depth":19,"text":36477},"2022-05-10","Learn about how message chunking works in Pulsar and the best practices for this feature.","\u002Fimgs\u002Fblogs\u002F63c7fa6e403da9ab01a37dbe_63b3f0d7e1511bdb30846ca2_content-blog-banner-image.png",{},"\u002Fblog\u002Fdeep-dive-into-message-chunking-in-pulsar",{"title":60122,"description":60429},"blog\u002Fdeep-dive-into-message-chunking-in-pulsar",[821],"lLfxx8a7xamTJqsijimNcyXpjKlf8dBfN4Jhd42Zuwg",{"id":60438,"title":60439,"authors":60440,"body":60442,"category":3550,"createdAt":290,"date":60480,"description":60439,"extension":8,"featured":294,"image":60481,"isDraft":294,"link":290,"meta":60482,"navigation":7,"order":296,"path":60483,"readingTime":20144,"relatedResources":290,"seo":60484,"stem":60485,"tags":60486,"__hash__":60487},"blogs\u002Fblog\u002Fintroducing-streamnative-cloud-for-kafka.md","Introducing StreamNative Cloud for KafkaⓇ",[60441],"Addison Higham",{"type":15,"value":60443,"toc":60478},[60444,60450,60453,60456,60459,60462,60465],[48,60445,60446,60447,190],{},"We’re excited to announce StreamNative Cloud for KafkaⓇ. This cloud-native managed service is built upon Apache Pulsar and provides a fully compliant Kafka API implementation (without the limits of Kafka). It supports millions of topics with consistent ",[55,60448,60449],{"href":51792},"low latency",[48,60451,60452],{},"In 2020, StreamNative released Kafka-on-Pulsar (KoP) to the open-source community. This project, co-developed with companies like Tencent, re-implemented the Kafka API inside Pulsar, allowing Kafka developers and operators to leverage Pulsar’s cloud-native architecture and advantages. Since that time, KoP has continued to mature and helps teams to fix the pain points of Kafka without having to change their application code.",[48,60454,60455],{},"In parallel, StreamNative has continued to develop our cloud offering, which gives customers the unique ability to run a fully managed cluster either in the StreamNative Cloud or in their own cloud. The newly released simplified provisioning process and enhancements to the networking model are improvements that help support KoP and the evolution of StreamNative Cloud.",[48,60457,60458],{},"StreamNative Cloud for KafkaⓇ builds on the foundation of KoP, and provides a fully-managed Kafka API with all of the existing functionality of Apache Pulsar. Bringing these technologies together enables teams to seamlessly scale and support millions of topics. By utilizing StreamNative Cloud for KafkaⓇ, your existing Kafka applications and services can leverage the benefits of Pulsar’s architecture - without complex code modifications - while new applications gain the benefits of Pulsar’s flexible messaging model.",[48,60460,60461],{},"In addition to fully supporting the Kafka API, StreamNative Cloud for KafkaⓇ utilizes Pulsar’s built-in multi-tenancy and geo-replication capabilities. Teams no longer need to design work-arounds to share a Kafka cluster, instead, Pulsar’s inclusion of tenants, namespaces, and enforceable policies allows for a robust model to share a cluster. This model improves cluster utilization and simplifies sharing data, especially in event-driven architectures and micro-services based applications. Geo-replication, (currently in beta in StreamNative Cloud,) allows for global applications with a single API to configure a shared namespace across regions or cloud providers.",[48,60463,60464],{},"For users who need to interconnect with a large number of systems, the combined ecosystems of Pulsar and Kafka are now available to them. Third party APIs and integrations that communicate over the Kafka protocol can now communicate directly to a Kafka-enabled StreamNative cluster, while teams continue to develop their applications using Pulsar’s flexible, unified messaging API.",[48,60466,60467,60468,60473,60474,190],{},"StreamNative is excited to see what teams are able to build with the combined power of Pulsar’s cloud-native architecture and the reach and breadth of the Kafka API. Today, we are launching the StreamNative Cloud for KafkaⓇ feature as a private beta. You can ",[55,60469,60472],{"href":60470,"rel":60471},"https:\u002F\u002Fhubs.ly\u002FQ018wd_J0",[264],"sign up for the beta"," and learn more ",[55,60475,267],{"href":60476,"rel":60477},"https:\u002F\u002Fhubs.ly\u002FQ018wf7V0",[264],{"title":18,"searchDepth":19,"depth":19,"links":60479},[],"2022-04-21","\u002Fimgs\u002Fblogs\u002F63c7fa7b904666d18811a9c7_63b3f06d9ebbeed3e8bf4f3d_sncfk-1600x660-banner.png",{},"\u002Fblog\u002Fintroducing-streamnative-cloud-for-kafka",{"title":60439,"description":60439},"blog\u002Fintroducing-streamnative-cloud-for-kafka",[799,302],"vBOzcqCdxeA51xElxRtoFlkZskW9Zp7aEcLRGR71y1g",{"id":60489,"title":60490,"authors":60491,"body":60492,"category":821,"createdAt":290,"date":60951,"description":60952,"extension":8,"featured":294,"image":60953,"isDraft":294,"link":290,"meta":60954,"navigation":7,"order":296,"path":60955,"readingTime":31039,"relatedResources":290,"seo":60956,"stem":60957,"tags":60958,"__hash__":60959},"blogs\u002Fblog\u002Fwhat-flip-is-flip-stack.md","What the FLiP is the FLiP Stack?",[46357],{"type":15,"value":60493,"toc":60935},[60494,60498,60501,60504,60513,60517,60520,60526,60530,60537,60540,60543,60572,60576,60588,60591,60597,60600,60628,60632,60635,60641,60645,60659,60663,60695,60699,60740,60744,60750,60753,60756,60759,60763,60769,60773,60779,60783,60789,60792,60796,60802,60806,60812,60816,60822,60826,60832,60835,60839,60845,60851,60854,60860,60862,60865,60868,60870,60914,60916],[40,60495,60497],{"id":60496},"introduction-to-flip-stack","Introduction to FLiP Stack",[48,60499,60500],{},"In this article on the FLiP Stack, we will explain how to build a real-time event driven application using the latest open source frameworks. We will walk through building a Python IoT application utilizing Apache Pulsar, Apache Flink, Apache Spark, and Apache NiFi. You will see how quickly we can build applications for a plethora of use cases. The easy, fast, scalable way: The FLiP Way.",[48,60502,60503],{},"The FLiP Stack is a number of open source technologies that work well together. FLiP is a best practice pattern for building a variety of streaming data applications. The projects in the stack are dictated by the needs of that use case; the available technologies in the builder’s current stack; and the desired end results. As we shall see, there are several variations of the FLiP stack built upon the base of Apache Flink and Apache Pulsar.",[48,60505,60506,60507,60512],{},"For some use cases like log analytics, you will need a nice dashboard for visualizing, aggregating, and querying your log data. For that one you would most likely want something like FLiPEN, as an enhancement to the ",[55,60508,60511],{"href":60509,"rel":60510},"https:\u002F\u002Fwww.elastic.co\u002Fwhat-is\u002Felk-stack",[264],"ELK Stack",". As you can tell, FLiP+ is a moving list of acronyms for open source projects that are commonly used together.",[40,60514,60516],{"id":60515},"common-use-cases","Common Use Cases",[48,60518,60519],{},"With so many variations of the FLiP stack, it might be difficult to figure out which one is right for you. Therefore, we have provided some general guidelines for selecting the proper FLiP+ stack to use based on your use case. We already mentioned Log Analytics, which is a common use case. There are many more, driven usually by data sources and data sinks.",[48,60521,60522],{},[384,60523],{"alt":60524,"src":60525},"table Common Use Cases","\u002Fimgs\u002Fblogs\u002F63b3eef05b8e181733991394_image-6.webp",[40,60527,60529],{"id":60528},"flink-pulsar-integration","Flink-Pulsar Integration",[48,60531,60532,60533,60536],{},"A critical component of the FLiP stack is utilizing ",[55,60534,31802],{"href":31800,"rel":60535},[264]," as a stream processing engine against Apache Pulsar data. This is enabled by the Pulsar-Flink Connector that enables developers to build Flink applications natively and stream in events from Pulsar at scale as they happen. This allows for use cases such as streaming ELT and continuous SQL on joined topic streams. SQL is the language of business that can drive event-driven, real-time applications by writing simple SQL queries against Pulsar streams with Flink SQL, including aggregation and joins.",[48,60538,60539],{},"The connector builds an elastic data processing platform enabled by Apache Pulsar and Apache Flink that is seamlessly integrated to allow full read and write access to all Pulsar messages at any scale. As a citizen data engineer or analyst you can focus on building business logic without concern about where the data is coming from or how it is stored.",[48,60541,60542],{},"Check out the resources below to learn more about this connector:",[321,60544,60545,60554,60562],{},[324,60546,60547,758,60549],{},[2628,60548,40436],{},[55,60550,60553],{"href":60551,"rel":60552},"https:\u002F\u002Fflink.apache.org\u002F2021\u002F01\u002F07\u002Fpulsar-flink-connector-270.html",[264],"What's New in the Pulsar Flink Connector 2.7.0",[324,60555,60556,758,60558],{},[2628,60557,40436],{},[55,60559,60561],{"href":60560},"\u002Fblog\u002Frelease\u002F2021-04-20-flink-sql-on-streamnative-cloud\u002F","Flink SQL on StreamNative Cloud",[324,60563,60564,758,60567],{},[2628,60565,60566],{},"Code",[55,60568,60571],{"href":60569,"rel":60570},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiP-SQL",[264],"Streaming Analytics with Apache Pulsar and Apache Flink SQL",[40,60573,60575],{"id":60574},"nifi-pulsar-integration","NiFi-Pulsar Integration",[48,60577,60578,60579,60582,60583,60587],{},"If you have been following ",[55,60580,34443],{"href":60581},"\u002Fblog\u002F",", you have seen the recent formal ",[55,60584,60586],{"href":60585},"\u002Fblog\u002Frelease\u002F2022-03-09-cloudera-and-streamnative-announce-the-integration-of-apache-nifi-and-apache-pulsar\u002F","announcement"," of the Apache Pulsar processor for Apache NiFi release. We now have an official way to consume and produce messages from any Pulsar topic with the low code streaming tool that is Apache NiFi.",[48,60589,60590],{},"This integration allows us to build a real-time data processing and analytics platform for all types of rich data pipelines. This is the keystone connector for the democratization of streaming application development.",[48,60592,60593],{},[384,60594],{"alt":60595,"src":60596},"illustration NiFi-Pulsar Integration","\u002Fimgs\u002Fblogs\u002F63b3ef11c96af41a85b37e01_screen-shot-2022-04-14-at-1.29.45-pm.png",[48,60598,60599],{},"Read the articles below to learn more:",[321,60601,60602,60609,60618],{},[324,60603,60604,758,60606],{},[2628,60605,40436],{},[55,60607,60608],{"href":60585},"Cloudera and StreamNative Announce the Integration of Apache NiFi and Apache Pulsar",[324,60610,60611,758,60613],{},[2628,60612,40436],{},[55,60614,60617],{"href":60615,"rel":60616},"https:\u002F\u002Fwww.datainmotion.dev\u002F2021\u002F11\u002Fproducing-and-consuming-pulsar-messages.html",[264],"Producing and Consuming Pulsar messages with Apache NiFi",[324,60619,60620,758,60623],{},[2628,60621,60622],{},"Datanami Article",[55,60624,60627],{"href":60625,"rel":60626},"https:\u002F\u002Fwww.datanami.com\u002F2022\u002F03\u002F09\u002Fcode-for-pulsar-nifi-tie-up-now-open-source\u002F",[264],"Code for Pulsar, NiFi Tie-Up Now Open Source",[40,60629,60631],{"id":60630},"an-example-flip-stack-application","An Example FLiP Stack Application",[48,60633,60634],{},"Now that you have seen the combinations, use cases, and the basic integration, we can walk through an example FLiP Stack application. In this example, we will be ingesting sensor data from a device running a Python Pulsar application.",[48,60636,60637],{},[384,60638],{"alt":60639,"src":60640},"Demo Edge Hardware Specifications","\u002Fimgs\u002Fblogs\u002F63b3ef129b1cdfa4cb43d43a_screen-shot-2022-04-14-at-2.25.52-pm.png",[32,60642,60644],{"id":60643},"demo-edge-software-specification","Demo Edge Software Specification",[321,60646,60647,60650],{},[324,60648,60649],{},"Apache Pulsar C++ and Python Client",[324,60651,60652,60653,60658],{},"Pimoroni ",[55,60654,60657],{"href":60655,"rel":60656},"https:\u002F\u002Fgithub.com\u002Fpimoroni\u002Fsgp30-python",[264],"SGP30"," Python Library",[32,60660,60662],{"id":60661},"streaming-server","Streaming Server",[321,60664,60665,60668,60671,60674,60677,60680,60683,60686,60689,60692],{},[324,60666,60667],{},"HP ProLiant DL360 G7 1U RackMount 64-bit Server",[324,60669,60670],{},"Ubuntu 18.04.6 LTS",[324,60672,60673],{},"72GB PC3 RAM",[324,60675,60676],{},"X5677 Xeon 3.46GHz CPU with 24 Cores",[324,60678,60679],{},"4×900GB 10K SAS SFF HDD",[324,60681,60682],{},"Apache Pulsar 2.9.1",[324,60684,60685],{},"Apache Spark 3.2.0",[324,60687,60688],{},"Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_302)",[324,60690,60691],{},"Apache Flink 1.13.2",[324,60693,60694],{},"MongoDB",[32,60696,60698],{"id":60697},"nifiai-server","NiFi\u002FAI Server",[321,60700,60701,60704,60707,60710,60713,60716,60719,60722,60725,60728,60731,60734,60737],{},[324,60702,60703],{},"NVIDIA® Jetson Xavier™ NX Developer Kit",[324,60705,60706],{},"AI Perf: 21 TOPS",[324,60708,60709],{},"GPU: 384-core NVIDIA Volta™ GPU with 48 Tensor Cores",[324,60711,60712],{},"CPU: 6-core NVIDIA Carmel ARM®v8.2 64-bit CPU 6 MB L2 + 4 MB L3",[324,60714,60715],{},"Memory: 8 GB 128-bit LPDDR4x 59.7GB\u002Fs",[324,60717,60718],{},"Ubuntu 18.04.5 LTS (GNU\u002FLinux 4.9.201-tegra aarch64)",[324,60720,60721],{},"Apache NiFi 1.15.3",[324,60723,60724],{},"Apache NiFi Registry 1.15.3",[324,60726,60727],{},"Apache NiFi Toolkit 1.15.3",[324,60729,60730],{},"Pulsar Processors",[324,60732,60733],{},"OpenJDK 8 and 11",[324,60735,60736],{},"Jetson Inference GoogleNet",[324,60738,60739],{},"Python 3",[32,60741,60743],{"id":60742},"building-the-air-quality-sensors-application-with-flipn-py","Building the Air Quality Sensors Application with FLiPN-Py",[48,60745,60746],{},[384,60747],{"alt":60748,"src":60749},"illustration Air Quality Sensors Application with FLiPN-Py","\u002Fimgs\u002Fblogs\u002F63b3ef114106f40fafdcfcbf_screen-shot-2022-04-14-at-1.35.02-pm.png",[48,60751,60752],{},"In this application, we want to monitor the air quality in an office continuously and then hand off a large amount of data to a data scientist to make predictions. Once that model is done, we will add that model to a Pulsar function for live anomaly detection to alert office occupants of the situation. We will also want dashboards to monitor trends, aggregates and advanced analytics.",[48,60754,60755],{},"Once the initial prototype proves itself, we will deploy it to all the remote offices for monitoring internal air quality. For future enhancements, we will ingest outside air quality data as well local weather conditions.",[48,60757,60758],{},"On our edge devices, we will perform the following three steps to collect the sensor readings, format the data into the desired schema, and forward the records to Pulsar.",[3933,60760,60762],{"id":60761},"edge-step-1-collect-sensor-readings","Edge Step 1: Collect Sensor Readings",[8325,60764,60767],{"className":60765,"code":60766,"language":8330},[8328],"\nresult = sgp30.get_air_quality()\n\n",[4926,60768,60766],{"__ignoreMap":18},[3933,60770,60772],{"id":60771},"edge-step-2-format-data-according-to-schema","Edge Step 2: Format Data According to Schema",[8325,60774,60777],{"className":60775,"code":60776,"language":8330},[8328],"\nclass Garden(Record):\n    cpu = Float()\n    diskusage = String()\n    endtime = String()\n    equivalentco2ppm = String()\n    host = String()\n    hostname = String()\n    ipaddress = String()\n    macaddress = String()\n    memory = Float()\n    rowid = String()\n    runtime = Integer()\n    starttime = String()\n    systemtime = String()\n    totalvocppb = String()\n    ts = Integer()\n    uuid = String()\n\n",[4926,60778,60776],{"__ignoreMap":18},[3933,60780,60782],{"id":60781},"edge-step-3-produce-record-to-pulsar-topic","Edge Step 3: Produce Record to Pulsar Topic",[8325,60784,60787],{"className":60785,"code":60786,"language":8330},[8328],"\nproducer.send(gardenRec,partition_key=str(uniqueid))\n\n",[4926,60788,60786],{"__ignoreMap":18},[48,60790,60791],{},"Now that we have built the edge-to-pulsar ingestion pipeline, let’s do something interesting with the sensor data that we have published to Pulsar.",[3933,60793,60795],{"id":60794},"cloud-step-1-spark-etl-to-parquet-files","Cloud Step 1: Spark ETL to Parquet Files",[8325,60797,60800],{"className":60798,"code":60799,"language":8330},[8328],"\n    val dfPulsar = \nspark.readStream.format(\"pulsar\")\n.option(\"service.url\", \"pulsar:\u002F\u002Fpulsar1:6650\")\n.option(\"admin.url\", \"http:\u002F\u002Fpulsar1:8080\")\n.option(\"topic\",\"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fgarden3\")\n.load()\n\nval pQuery = dfPulsar.selectExpr(\"*\")\n.writeStream\n.format(\"parquet\")\n.option(\"truncate\", false) \n.option(\"checkpointLocation\", \"\u002Ftmp\u002Fcheckpoint\")\n.option(\"path\", \"\u002Fopt\u002Fdemo\u002Fgasthermal\").start()\n\n",[4926,60801,60799],{"__ignoreMap":18},[3933,60803,60805],{"id":60804},"cloud-step-2-continuous-sql-analytics-with-flink-sql","Cloud Step 2: Continuous SQL Analytics with Flink SQL",[8325,60807,60810],{"className":60808,"code":60809,"language":8330},[8328],"\nselect equivalentco2ppm, totalvocppb, cpu, starttime, systemtime, ts, cpu, diskusage, endtime, memory, uuid from garden3;\n\nselect max(equivalentco2ppm) as MaxCO2, max(totalvocppb) as MaxVocPPB from garden3;\n\n",[4926,60811,60809],{"__ignoreMap":18},[3933,60813,60815],{"id":60814},"cloud-step-3-sql-analytics-with-pulsar-sql","Cloud Step 3: SQL Analytics with Pulsar SQL",[8325,60817,60820],{"className":60818,"code":60819,"language":8330},[8328],"\nselect * from pulsar.\"public\u002Fdefault\".\"garden3\"\n\n",[4926,60821,60819],{"__ignoreMap":18},[3933,60823,60825],{"id":60824},"cloud-step-4-nifi-filter-route-transform-and-store-to-mongodb","Cloud Step 4: NiFi Filter, Route, Transform and Store to MongoDB",[48,60827,60828],{},[384,60829],{"alt":60830,"src":60831},"configure processor pulsar","\u002Fimgs\u002Fblogs\u002F63b3efc2402a2048f42577de_screen-shot-2022-04-14-at-2.27.56-pm.png",[48,60833,60834],{},"We could have used a Pulsar Function and Pulsar IO Sink for MongoDB instead, but you may want to do other data enrichment with Apache NiFi without coding.",[3933,60836,60838],{"id":60837},"cloud-step-5-validate-mongodb-data","Cloud Step 5: Validate MongoDB Data",[8325,60840,60843],{"className":60841,"code":60842,"language":8330},[8328],"\nshow collections\n\ndb.garden3.find().pretty()\n\n",[4926,60844,60842],{"__ignoreMap":18},[48,60846,60847],{},[384,60848],{"alt":60849,"src":60850},"Example HTML Data Display Utilizing Web Sockets","\u002Fimgs\u002Fblogs\u002F63b3f00b5b09f496bf8ce878_screen-shot-2022-04-14-at-2.31.12-pm.png",[32,60852,6386],{"id":60853},"watch-the-demo",[48,60855,60856],{},[384,60857],{"alt":60858,"src":60859},"pulsar cluster ","\u002Fimgs\u002Fblogs\u002F63b3f00b770e3587cb9c79d0_screen-shot-2022-04-14-at-2.31.41-pm.png",[40,60861,2125],{"id":2122},[48,60863,60864],{},"In this blog, we explained how to build real-time event driven applications utilizing the latest open source frameworks together as FLiP Stack applications. So now you know what we are talking about when we say “FLiPN Stack”. By using the latest and greatest open source Apache streaming and big data projects together, we can build applications faster, easier, and with known scalable results.",[48,60866,60867],{},"Join us in building scalable applications today with Pulsar and its awesome friends. Start with data, route it through Pulsar, transform it to meet your analytic needs, and stream it to every corner of your enterprise. Dashboards, live reports, applications, and machine learning analytics driven by fast data at scale built by citizen data engineers in hours, not months. Let’s get these FLiPN applications built now.",[40,60869,4135],{"id":4132},[321,60871,60872,60881,60889,60899,60907],{},[324,60873,60874,758,60876],{},[2628,60875,60566],{},[55,60877,60880],{"href":60878,"rel":60879},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiP-Py-Pi-GasThermal\u002F",[264],"Source code for the air quality sensors application",[324,60882,60883,758,60885],{},[2628,60884,4135],{},[55,60886,60888],{"href":51747,"rel":60887},[264],"FLiP Stack for Apache Pulsar Developer",[324,60890,60891,758,60894],{},[2628,60892,60893],{},"Talk",[55,60895,60898],{"href":60896,"rel":60897},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pfhoF3yTdHU",[264],"Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)",[324,60900,60901,758,60904],{},[2628,60902,60903],{},"Connector",[55,60905,60906],{"href":56000},"Flink-Pulsar Sink Connector",[324,60908,60909,758,60911],{},[2628,60910,60903],{},[55,60912,60913],{"href":54447},"Flink-Pulsar SQL Connector",[40,60915,58598],{"id":58597},[321,60917,60918,60922,60930],{},[324,60919,45216,60920,47757],{},[55,60921,38404],{"href":45219},[324,60923,47760,60924,1154,60927,45209],{},[55,60925,47764],{"href":45463,"rel":60926},[264],[55,60928,47768],{"href":45206,"rel":60929},[264],[324,60931,45223,60932,45227],{},[55,60933,31914],{"href":31912,"rel":60934},[264],{"title":18,"searchDepth":19,"depth":19,"links":60936},[60937,60938,60939,60940,60941,60948,60949,60950],{"id":60496,"depth":19,"text":60497},{"id":60515,"depth":19,"text":60516},{"id":60528,"depth":19,"text":60529},{"id":60574,"depth":19,"text":60575},{"id":60630,"depth":19,"text":60631,"children":60942},[60943,60944,60945,60946,60947],{"id":60643,"depth":279,"text":60644},{"id":60661,"depth":279,"text":60662},{"id":60697,"depth":279,"text":60698},{"id":60742,"depth":279,"text":60743},{"id":60853,"depth":279,"text":6386},{"id":2122,"depth":19,"text":2125},{"id":4132,"depth":19,"text":4135},{"id":58597,"depth":19,"text":58598},"2022-04-14","Learn how to build a real-time event-driven IoT application using Apache Pulsar, Flink, Spark, and NiFi.","\u002Fimgs\u002Fblogs\u002F63c7fa8e53f98a82e4a46ad3_63b3eebd7c31683b481256bc_flip-top.png",{},"\u002Fblog\u002Fwhat-flip-is-flip-stack",{"title":60490,"description":60952},"blog\u002Fwhat-flip-is-flip-stack",[38442,821,8057,8058,303],"QboLI20T_wfa4tah2cMEaW2R7b7N-LvDxqC_8tb0-QM",{"id":60961,"title":60962,"authors":60963,"body":60965,"category":821,"createdAt":290,"date":61288,"description":60969,"extension":8,"featured":294,"image":61289,"isDraft":294,"link":290,"meta":61290,"navigation":7,"order":296,"path":61291,"readingTime":33204,"relatedResources":290,"seo":61292,"stem":61293,"tags":61294,"__hash__":61295},"blogs\u002Fblog\u002Fnew-apache-pulsar-2-10.md","What’s New in Apache Pulsar 2.10",[808,60964],"Dave Duggins",{"type":15,"value":60966,"toc":61278},[60967,60970,60974,60993,61000,61004,61007,61011,61014,61017,61021,61024,61027,61038,61040,61044,61047,61050,61056,61059,61062,61066,61070,61073,61076,61087,61089,61093,61096,61099,61103,61106,61109,61113,61116,61119,61123,61126,61128,61142,61145,61149,61154,61157,61161,61165,61168,61171,61175,61178,61180,61184,61198,61202,61213,61217,61222,61226,61229,61232,61235,61239,61242,61244,61252,61255,61259,61262,61265,61276],[48,60968,60969],{},"The Apache Pulsar community releases version 2.10. 99 contributors provided improvements and bug fixes that delivered over 800 commits.",[40,60971,60973],{"id":60972},"highlights-of-this-release","Highlights of this release:",[321,60975,60976,60979,60987,60990],{},[324,60977,60978],{},"Pulsar provides automatic failure recovery between the primary and backup clusters. #13316",[324,60980,60981,60982],{},"Original PIP ",[55,60983,60986],{"href":60984,"rel":60985},"https:\u002F\u002Fwww.google.com\u002Furl?q=https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F13315&sa=D&source=docs&ust=1646058957138073&usg=AOvVaw3mGki2sHW2QpIsoYf5pt3w",[264],"#13315",[324,60988,60989],{},"Fewer producers needed and more efficient use of broker memory with lazy-loading feature added to PartitionedProducer. #10279",[324,60991,60992],{},"Topic map support added with new TableView type using key values in received messages.",[48,60994,60995,60996,190],{},"This blog documents the most noteworthy changes in this release. For the complete list including all features, enhancements, and bug fixes, check out the ",[55,60997,58679],{"href":60998,"rel":60999},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#placeholder",[264],[40,61001,61003],{"id":61002},"notable-bug-fixes-and-enhancements","Notable bug fixes and enhancements",[32,61005,61006],{"id":15942},"Cluster",[3933,61008,61010],{"id":61009},"pulsar-cluster-level-auto-failover-on-client-side-13316","Pulsar cluster level auto failover on client side #13316",[48,61012,61013],{},"Issue: A Pulsar administrator must manually failover a cluster.",[48,61015,61016],{},"Resolution: Added Pulsar cluster-level auto-failover, which automatically and seamlessly switches from primary to one or more secondary clusters when a failover event is detected. When the primary cluster recovers, the client automatically switches back.",[3933,61018,61020],{"id":61019},"topic-policy-across-multiple-clusters-12517","Topic policy across multiple clusters #12517",[48,61022,61023],{},"Issue: Some topic policies for a geo-replicated cluster affect the entire geo-replicated cluster while some only affect the local cluster.",[48,61025,61026],{},"Resolution: Topic policies now support cross-cluster replication.",[321,61028,61029,61032,61035],{},[324,61030,61031],{},"For local topic policies, set the replicateTo property of the message to avoid being replicated to the remote.",[324,61033,61034],{},"Retention supports setting global parameters.",[324,61036,61037],{},"Added global topic policies for SystemTopicBasedTopicPoliciesService.",[32,61039,46490],{"id":46489},[3933,61041,61043],{"id":61042},"add-lazy-loading-feature-to-partitionedproducer-10279","Add lazy-loading feature to PartitionedProducer #10279",[48,61045,61046],{},"Issue: With the number of partitions set according to the highest rate producer, the lowest rate producer does not always need to connect to every partition, so extra producers take up broker memory.",[48,61048,61049],{},"Resolution: Reduced the number of producers to use broker memory more efficiently by introducing lazy-loading for partitioned producers; also added round-robin routing mode class to limit the number of partitions.",[3933,61051,61053,61055],{"id":61052},"client-introduce-chunk-message-id-12403",[2628,61054,60409],{}," Introduce chunk message ID #12403",[48,61057,61058],{},"Issue: When sending chunked messages, the producer returns the message-id of the last chunk, causing incorrect behaviors in some processes.",[48,61060,61061],{},"Resolution: Introduced the new ChunkMessage-ID type. The chunk message-id inherits from MessageIdImpl and adds two new methods: getFirstChunkMessageId and getLastChunkMessageID. For other method implementations, the lastChunkMessageID is called directly, which is compatible with much of the existing business logic.",[32,61063,61065],{"id":61064},"broker","Broker",[3933,61067,61069],{"id":61068},"broker-extensions-to-allow-operators-of-enterprise-wide-cluster-better-control-and-flexibility-12536","Broker extensions to allow operators of enterprise wide cluster better control and flexibility #12536",[48,61071,61072],{},"Issue: Operators of enterprise Pulsar cluster(s) need greater flexibility and control to intercept broker events (including ledger writes\u002Freads) for template validations, observability and access control.",[48,61074,61075],{},"Resolution:",[321,61077,61078,61081,61084],{},[324,61079,61080],{},"Enhanced org.apache.pulsar.broker.intercept.BrokerInterceptor interface to include additional events for tracing",[324,61082,61083],{},"Created a new interface org.apache.pulsar.common.intercept.MessagePayloadProcessor to allow interception of ledger write\u002Fread operations",[324,61085,61086],{},"Enhanced PulsarAdmin to give operators a control in managing super-users",[32,61088,24840],{"id":46507},[3933,61090,61092],{"id":61091},"redeliver-command-add-epoch-10478","Redeliver command add epoch #10478",[48,61094,61095],{},"Issue: Pull and redeliver operations are asynchronous, so the client consumer may receive a new message, execute a cumulative ack based on a new messageID, and fail to consume older messages.",[48,61097,61098],{},"Resolution: The Pulsar client synchronizes redeliver and pull messages operations using an incrementing epoch for the server and client consumer.",[3933,61100,61102],{"id":61101},"support-pluggable-entry-filter-in-dispatcher-12269","Support pluggable entry filter in Dispatcher #12269",[48,61104,61105],{},"Issue: Message tagging is not natively supported.",[48,61107,61108],{},"Resolution: Implemented an entry filter framework at the broker level. Working to support namespace and topic level in an upcoming release.",[3933,61110,61112],{"id":61111},"create-init-subscription-before-sending-message-to-dlq-13355","Create init subscription before sending message to DLQ #13355",[48,61114,61115],{},"Issue: DLQ data in unprocessed messages is removed automatically without a data retention policy for the namespace or a subscription for the DLQ.",[48,61117,61118],{},"Resolution: Initial subscription is now created before sending messages to the DLQ. When deadLetterProducer is initialized, the consumer sets the initial subscription according to DeadLetterPolicy.",[3933,61120,61122],{"id":61121},"apply-redelivery-backoff-policy-for-ack-timeout-13707","Apply redelivery backoff policy for ack timeout #13707",[48,61124,61125],{},"Issue: The redelivery backoff policy recently introduced in PIP 106 only applies to the negative acknowledgment API. If ack timeout is used to trigger the message redelivery instead of the negative acknowledgment API, the backoff policy is bypassed.",[48,61127,61075],{},[321,61129,61130,61133,61136,61139],{},[324,61131,61132],{},"Applied message redelivery policy for ack timeout.",[324,61134,61135],{},"Alerted NegativeAckBackoff interface to RedeliveryBackoff.",[324,61137,61138],{},"Exposed AckTimeoutRedeliveryBackoff in ConsumerBuilder.",[324,61140,61141],{},"Added unit test case.",[48,61143,61144],{},"Currently only the Java client is modified.",[3933,61146,61148],{"id":61147},"resolve-produce-chunk-messages-failed-when-topic-level-maxmessagesize-is-set-13599","Resolve produce chunk messages failed when topic level maxMessageSize is set #13599",[48,61150,61151,61152,190],{},"Issue: Currently, chunk messages produce fails if topic level maxMessageSize is set to ",[2628,61153,42523],{},[48,61155,61156],{},"Resolution: Added isChunked in PublishContext. Skips themaxMessageSize check if it's chunked.",[32,61158,61160],{"id":61159},"function","Function",[3933,61162,61164],{"id":61163},"pulsar-functions-preload-and-release-external-resources-13205","Pulsar Functions: Preload and release external resources #13205",[48,61166,61167],{},"Issue: External resource initialization and release was accomplished either manually or through use of a complicated initialization logic.",[48,61169,61170],{},"Resolution: Introduced RichFunction interface to extend Function by providing a setup and tearDown API.",[3933,61172,61174],{"id":61173},"update-authentication-interfaces-to-include-async-authentication-methods-12104","Update Authentication Interfaces to Include Async Authentication Methods #12104",[48,61176,61177],{},"Issue: Pulsar's current AuthenticationProvider interface only exposes synchronous methods for authenticating a connection. To date, this has been sufficient because we do not have any providers that rely on network calls. However, in looking at the OAuth2.0 spec, there are some cases where network calls are necessary to verify a token.",[48,61179,61075],{},[225,61181,61183],{"id":61182},"authenticationprovider","AuthenticationProvider",[321,61185,61186,61189,61192,61195],{},[324,61187,61188],{},"Added AuthenticationProvider#authenticateAsync. Included a default implementation that calls the authenticate method.",[324,61190,61191],{},"Deprecated AuthenticationProvider#authenticate.",[324,61193,61194],{},"Added AuthenticationProvider#authenticateHttpRequestAsync.",[324,61196,61197],{},"Deprecated AuthenticationProvider#authenticateHttpRequest.",[225,61199,61201],{"id":61200},"authenticationstate","AuthenticationState",[321,61203,61204,61207,61210],{},[324,61205,61206],{},"Added AuthenticationState#authenticateAsync.",[324,61208,61209],{},"Deprecated AuthenticationState#authenticate. The preferred method is AuthenticationState#authenticateAsync.",[324,61211,61212],{},"Deprecated AuthenticationState#isComplete. This method can be avoided by inferring authentication completeness from the result of AuthenticationState#authenticateAsync.",[225,61214,61216],{"id":61215},"authenticationdatasource","AuthenticationDataSource",[321,61218,61219],{},[324,61220,61221],{},"Deprecated AuthenticationDataSource#authenticate. There is no need for an async version of this method.",[3933,61223,61225],{"id":61224},"initial-commit-for-tableview-12838","Initial commit for TableView #12838",[48,61227,61228],{},"Issue: In many use cases, applications use Pulsar consumers or readers to fetch all the updates from a topic and construct a map with the latest value of each key for received messages. This is common when constructing a local cache of the data. We do not offer support for This access pattern was not included in the Pulsar client API.",[48,61230,61231],{},"Resolution: Added new TableView type and updated the PulsarClient.",[48,61233,61234],{},"####Topic",[3933,61236,61238],{"id":61237},"support-topic-metadata-part-1-create-topic-with-properties-12818","Support Topic metadata - PART-1 create topic with properties #12818",[48,61240,61241],{},"Issue: Can’t store topic metadata.",[48,61243,61075],{},[321,61245,61246,61249],{},[324,61247,61248],{},"Added new storage methods in topics.java.",[324,61250,61251],{},"Added two new paths to REST API to reduce compatibility issues.",[48,61253,61254],{},"Metadata Store",[3933,61256,61258],{"id":61257},"added-etcd-metadatastore-implementation-13225","Added Etcd MetadataStore implementation #13225",[48,61260,61261],{},"Issue: We’re working to add metadata backends that support non-Zookeeper implementations.",[48,61263,61264],{},"Resolution: Added Etcd support for:",[321,61266,61267,61270,61273],{},[324,61268,61269],{},"Batching of read\u002Fwrite requests",[324,61271,61272],{},"Session watcher",[324,61274,61275],{},"Lease manager",[48,61277,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":61279},[61280,61281],{"id":60972,"depth":19,"text":60973},{"id":61002,"depth":19,"text":61003,"children":61282},[61283,61284,61285,61286,61287],{"id":15942,"depth":279,"text":61006},{"id":46489,"depth":279,"text":46490},{"id":61064,"depth":279,"text":61065},{"id":46507,"depth":279,"text":24840},{"id":61159,"depth":279,"text":61160},"2022-04-12","\u002Fimgs\u002Fblogs\u002F63c7fa601620286cce5f39aa_63b3f24e494f091242f08445_210top.png",{},"\u002Fblog\u002Fnew-apache-pulsar-2-10",{"title":60962,"description":60969},"blog\u002Fnew-apache-pulsar-2-10",[302,821],"_qSaO_3h1TpztxUQrQIE2rsaKp-l-5bxZBwyXGEOAIY",{"id":61297,"title":61298,"authors":61299,"body":61301,"category":821,"createdAt":290,"date":61598,"description":61599,"extension":8,"featured":294,"image":61600,"isDraft":294,"link":290,"meta":61601,"navigation":7,"order":296,"path":61602,"readingTime":3556,"relatedResources":290,"seo":61603,"stem":61604,"tags":61605,"__hash__":61606},"blogs\u002Fblog\u002Fnew-apache-pulsar-2-9-2.md","What’s New in Apache Pulsar 2.9.2",[58855,61300],"Yu Liu",{"type":15,"value":61302,"toc":61572},[61303,61306,61309,61335,61343,61345,61351,61353,61356,61358,61361,61367,61369,61372,61374,61377,61386,61388,61391,61393,61396,61405,61407,61410,61412,61415,61424,61426,61429,61431,61434,61443,61445,61448,61451,61454,61460,61463,61466,61469,61472,61481,61484,61487,61490,61493,61502,61505,61508,61511,61514,61523,61526,61529,61532,61535,61537,61543,61548,61559,61561],[48,61304,61305],{},"The Apache Pulsar community releases version 2.9.2! 60 contributors provided improvements and bug fixes that delivered 317 commits.",[48,61307,61308],{},"Highlights of this release are as below:",[321,61310,61311,61319,61327],{},[324,61312,61313,61314],{},"Transactions performance test tool is available. ",[55,61315,61318],{"href":61316,"rel":61317},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11933",[264],"PR-11933",[324,61320,61321,61322],{},"Brokers decrease the number of unacked messages. ",[55,61323,61326],{"href":61324,"rel":61325},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F13383",[264],"PR-13383",[324,61328,61329,61330],{},"Readers continue to read data from the compacted ledgers. ",[55,61331,61334],{"href":61332,"rel":61333},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F13629",[264],"PR-13629",[48,61336,61337,61338,190],{},"This blog walks through the most noteworthy changes grouped by the affected functionalities. For the complete list including all features, enhancements, and bug fixes, check out the ",[55,61339,61342],{"href":61340,"rel":61341},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#292",[264],"Pulsar 2.9.2 Release Notes",[40,61344,61003],{"id":61002},[32,61346,61329,61348],{"id":61347},"readers-continue-to-read-data-from-the-compacted-ledgers-pr-13629",[55,61349,61334],{"href":61332,"rel":61350},[264],[3933,61352,57576],{"id":44661},[48,61354,61355],{},"Previously, when topics were unloaded, some data was lost to be read by readers if they have consumed some messages from some compacted ledgers.",[3933,61357,57583],{"id":57582},[48,61359,61360],{},"Rewound the reader cursor to the next message of the mark delete position if readCompacted = true.",[32,61362,61321,61364],{"id":61363},"brokers-decrease-the-number-of-unacked-messages-pr-13383",[55,61365,61326],{"href":61324,"rel":61366},[264],[3933,61368,57576],{"id":57598},[48,61370,61371],{},"Previously, brokers did not decrease the number of unacked messages if batch ack was enabled. Consequently, consumers were blocked if they reached maxUnackedMessagesPerConsumer limit.",[3933,61373,57583],{"id":57612},[48,61375,61376],{},"Decreased the number of unacked messages when individualAckNormal was called.",[32,61378,61380,61381],{"id":61379},"chunked-messages-can-be-queried-through-pulsar-sql-pr-12720","Chunked messages can be queried through Pulsar SQL. ",[55,61382,61385],{"href":61383,"rel":61384},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12720",[264],"PR-12720",[3933,61387,57576],{"id":57632},[48,61389,61390],{},"Previously, chunked messages could not be queried through Pulsar SQL.",[3933,61392,57583],{"id":57638},[48,61394,61395],{},"Add a chunked message map in PulsarRecordCursor to maintain incomplete chunked messages. If one chunked message was received completely, it would be offered in the message queue to wait for deserialization.",[32,61397,61399,61400],{"id":61398},"support-enable-or-disable-schema-upload-at-the-broker-level-pr-12786","Support enable or disable schema upload at the broker level. ",[55,61401,61404],{"href":61402,"rel":61403},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12786",[264],"PR-12786",[3933,61406,57576],{"id":57653},[48,61408,61409],{},"Previously, Pulsar didn't support enabling or disabling schema upload at the broker level.",[3933,61411,57583],{"id":57659},[48,61413,61414],{},"Added the configuration isSchemaAutoUploadEnabled on the broker side.",[32,61416,61418,61419],{"id":61417},"readers-can-read-the-latest-messages-in-compacted-topics-pr-14449","Readers can read the latest messages in compacted topics. ",[55,61420,61423],{"href":61421,"rel":61422},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F14449",[264],"PR-14449",[3933,61425,57576],{"id":57674},[48,61427,61428],{},"Previously, readers were not able to read the latest messages in compacted topics if readers enabled readCompacted and all the data of topics has been compacted to compacted ledgers.",[3933,61430,57583],{"id":57725},[48,61432,61433],{},"Added the forceReset configuration for the managed cursor, so that the cursor could be reset to a given position and readers can read data from compacted ledgers.",[32,61435,61437,61438],{"id":61436},"transaction-sequenceid-can-be-recovered-correctly-pr-13209","Transaction sequenceId can be recovered correctly. ",[55,61439,61442],{"href":61440,"rel":61441},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F13209",[264],"PR-13209",[3933,61444,57576],{"id":57684},[48,61446,61447],{},"Previously, the wrong transaction sequenceId was recovered due to incorrect managedLedger properties.",[3933,61449,57583],{"id":61450},"resolution-5",[48,61452,61453],{},"Used ManagedLedgerInterceptor to update current sequenceId to managedLedger properties and more.",[32,61455,61313,61457],{"id":61456},"transactions-performance-test-tool-is-available-pr-11933",[55,61458,61318],{"href":61316,"rel":61459},[264],[3933,61461,57576],{"id":61462},"issue-6",[48,61464,61465],{},"Previously, it was hard to test transaction performance (such as the delay and rate of sending and consuming messages) when opening a transaction.",[3933,61467,57583],{"id":61468},"resolution-6",[48,61470,61471],{},"Added PerformanceTransaction class to support this enhancement.",[32,61473,61475,61476],{"id":61474},"port-exhaustion-and-connection-issues-no-longer-exist-in-pulsar-proxy-pr-14078","Port exhaustion and connection issues no longer exist in Pulsar Proxy. ",[55,61477,61480],{"href":61478,"rel":61479},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F14078",[264],"PR-14078",[3933,61482,57576],{"id":61483},"issue-7",[48,61485,61486],{},"Previously, Pulsar proxy would get into a state where it stopped proxying broker connections while Admin API proxying kept working.",[3933,61488,57583],{"id":61489},"resolution-7",[48,61491,61492],{},"Optimized the proxy connection to fail-fast if the target broker was not active, added connect timeout handling to proxy connection, and more.",[32,61494,61496,61497],{"id":61495},"no-race-condition-in-opsendmsgqueue-when-publishing-messages-pr-14231","No race condition in OpSendMsgQueue when publishing messages. ",[55,61498,61501],{"href":61499,"rel":61500},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F14231",[264],"PR-14231",[3933,61503,57576],{"id":61504},"issue-8",[48,61506,61507],{},"After the method getPendingQueueSize() was called and the send receipt came back, the peek from the pendingMessages might get NPE during the process.",[3933,61509,57583],{"id":61510},"resolution-8",[48,61512,61513],{},"Added a thread-safe message count object in OpSendMsgQueue for each compute process.",[32,61515,61517,61518],{"id":61516},"change-contextclassloader-to-narclassloader-in-additionalservlet-pr-13501","Change ContextClassLoader to NarClassLoader in AdditionalServlet. ",[55,61519,61522],{"href":61520,"rel":61521},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F13501",[264],"PR-13501",[3933,61524,57576],{"id":61525},"issue-9",[48,61527,61528],{},"Previously, if a class was dynamically loaded by NarClassLoader, ClassNotFoundException occurred when it was used by the default class load.",[3933,61530,57583],{"id":61531},"resolution-9",[48,61533,61534],{},"Changed context class loader through Thread.currentThread().setContextClassLoader(classLoader) before every plugin calling back and changed the context class loader back to original class loader afterwards.",[40,61536,13565],{"id":1727},[48,61538,61539,61540,57738],{},"If you are interested in learning more about Pulsar 2.9.2, you can ",[55,61541,36195],{"href":58799,"rel":61542},[264],[48,61544,57741,61545,57746],{},[55,61546,57745],{"href":35357,"rel":61547},[264],[48,61549,57749,61550,57753,61553,57757,61556,20076],{},[55,61551,40821],{"href":23526,"rel":61552},[264],[55,61554,36238],{"href":36236,"rel":61555},[264],[55,61557,57762],{"href":57760,"rel":61558},[264],[40,61560,39647],{"id":39646},[48,61562,57767,61563,57772,61566,57775,61570,57779],{},[55,61564,57771],{"href":17075,"rel":61565},[264],[55,61567,3550],{"href":61568,"rel":61569},"https:\u002F\u002Fauth.streamnative.cloud\u002Flogin?state=hKFo2SBVeG81YTFiSWUtdDhhQkgtd19LdWhWYm9jUng4NGpua6FupWxvZ2luo3RpZNkgVHh1bFN0bHozeEFpeDR5QlNGMnlWM19oUHpwcTlvSk2jY2lk2SA2ZXI3M3FLcTQycUIwd2JzcjFTT01hWWJhdTdLaGxldw&client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&protocol=oauth2&audience=https%3A%2F%2Fapi.streamnative.cloud&redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&defaultMethod=singup&scope=openid%20profile%20email%20offline_access&response_type=code&response_mode=query&nonce=VDRWNG5rYVhpcWZJYTdOWlF4Q1BDeENxcFZKQlFneU9VYlllRzdTdXF4UQ%3D%3D&code_challenge=W__xPbFyDLkHTgO8p7DmrT84cHkZC3RvLsr3iE438sQ&code_challenge_method=S256&auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTQuMCJ9",[264],[55,61571,24379],{"href":57778},{"title":18,"searchDepth":19,"depth":19,"links":61573},[61574,61596,61597],{"id":61002,"depth":19,"text":61003,"children":61575},[61576,61578,61580,61582,61584,61586,61588,61590,61592,61594],{"id":61347,"depth":279,"text":61577},"Readers continue to read data from the compacted ledgers. PR-13629",{"id":61363,"depth":279,"text":61579},"Brokers decrease the number of unacked messages. PR-13383",{"id":61379,"depth":279,"text":61581},"Chunked messages can be queried through Pulsar SQL. PR-12720",{"id":61398,"depth":279,"text":61583},"Support enable or disable schema upload at the broker level. PR-12786",{"id":61417,"depth":279,"text":61585},"Readers can read the latest messages in compacted topics. PR-14449",{"id":61436,"depth":279,"text":61587},"Transaction sequenceId can be recovered correctly. PR-13209",{"id":61456,"depth":279,"text":61589},"Transactions performance test tool is available. PR-11933",{"id":61474,"depth":279,"text":61591},"Port exhaustion and connection issues no longer exist in Pulsar Proxy. PR-14078",{"id":61495,"depth":279,"text":61593},"No race condition in OpSendMsgQueue when publishing messages. PR-14231",{"id":61516,"depth":279,"text":61595},"Change ContextClassLoader to NarClassLoader in AdditionalServlet. PR-13501",{"id":1727,"depth":19,"text":13565},{"id":39646,"depth":19,"text":39647},"2022-04-08","We are excited to see the Apache Pulsar community has successfully released the 2.9.2 version! 60 contributors provided improvements and bug fixes that delivered 317 commits. Let's walk through the most noteworthy changes!","\u002Fimgs\u002Fblogs\u002F63c20b3e787c552e6fc77ad6_Pulsar-release-blog-292.jpg",{},"\u002Fblog\u002Fnew-apache-pulsar-2-9-2",{"title":61298,"description":61599},"blog\u002Fnew-apache-pulsar-2-9-2",[302,821,10503,9144],"ky1DaKiQPNq9835QE8ueWLrO0GRAsS2W46ehekqS8j8",{"id":61608,"title":42239,"authors":61609,"body":61610,"category":821,"createdAt":290,"date":62314,"description":62315,"extension":8,"featured":294,"image":62316,"isDraft":294,"link":290,"meta":62317,"navigation":7,"order":296,"path":27690,"readingTime":31039,"relatedResources":290,"seo":62318,"stem":62319,"tags":62320,"__hash__":62321},"blogs\u002Fblog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark.md",[807,808],{"type":15,"value":61611,"toc":62296},[61612,61615,61634,61640,61643,61646,61649,61653,61655,61659,61662,61666,61669,61673,61676,61680,61686,61689,61693,61696,61700,61703,61711,61715,61718,61721,61725,61728,61731,61734,61737,61740,61744,61757,61760,61771,61774,61779,61787,61792,61806,61809,61812,61815,61823,61837,61842,61852,61855,61859,61863,61866,61870,61873,61886,61894,61898,61903,61909,61913,61916,61919,61922,61929,61932,61936,61939,61942,61951,61958,61962,61971,61975,61978,61981,61984,61987,61990,61994,61997,62000,62003,62006,62009,62019,62026,62030,62039,62043,62046,62049,62052,62055,62063,62066,62070,62073,62076,62087,62090,62095,62100,62105,62110,62113,62121,62128,62132,62155,62159,62162,62165,62168,62171,62174,62176,62179,62190,62194,62200,62203,62205,62209,62211,62253,62255,62262,62264,62273,62282,62290,62292],[48,61613,61614],{},"The Apache PulsarTM versus Apache KafkaⓇ debate continues. Organizations often make comparisons based on features, capabilities, size of the community, and a number of other metrics of varying importance. This report focuses purely on comparing the technical performance based on benchmark tests.",[48,61616,61617,61618,61622,61623,61628,61629,61633],{},"The last widely published ",[55,61619,61621],{"href":61620},"\u002Fwhitepapers\u002Fbenchmarking-pulsar-vs-kafka","Pulsar versus Kafka benchmark"," was performed in 2020, and a lot has happened since then. In 2021, Pulsar ranked as a ",[55,61624,61627],{"href":61625,"rel":61626},"https:\u002F\u002Fhubs.ly\u002FQ01701DL0",[264],"Top 5 Apache Software Foundation"," project and ",[55,61630,61632],{"href":61631},"\u002Fblog\u002Fpulsar-hits-400th-contributor-passes-kafka-monthly-active-contributors","surpassed Apache Kafka"," in monthly active contributors as shown in the chart below. Pulsar also averaged more monthly active contributors than Kafka for most of the past 18 months.",[48,61635,61636],{},[384,61637],{"alt":61638,"src":61639},"Pulsar vs Kafka result","\u002Fimgs\u002Fblogs\u002F63b3ed65fb095c6d8670d2da_screen-shot-2022-04-07-at-7.51.37-am.png",[48,61641,61642],{},"These contributions led to major performance improvements for Pulsar. To measure the impact of the improvements, the engineering team at StreamNative, led by Matteo Merli, one of the original creators of Apache Pulsar, and Apache Pulsar PMC Chairperson, performed a benchmark study using the Linux Foundation Open Messaging benchmark.",[48,61644,61645],{},"The team measured Pulsar performance in terms of throughput and latency, and then performed the same tests on Kafka. We’ve included the testing framework and details in the report and encourage anyone who is interested in validating the tests to do so.",[48,61647,61648],{},"Let's take a look at three key findings before jumping into the full results.",[48,61650,61651],{},[34077,61652],{"value":34079},[40,61654,22053],{"id":22052},[32,61656,61658],{"id":61657},"_25x-maximum-throughput-compared-to-kafka","2.5x Maximum Throughput Compared to Kafka",[48,61660,61661],{},"Pulsar is able to achieve 2.5 times the maximum throughput compared to Kafka. This is a significant advantage for use cases that ingest and process large volumes of data, such as log analysis, cybersecurity, and sensor data collection. Higher throughput means less hardware, resulting in lower operational costs.",[32,61663,61665],{"id":61664},"_100x-lower-single-digit-publish-latency-than-kafka","100x Lower Single-digit Publish Latency than Kafka",[48,61667,61668],{},"Pulsar provides consistent single-digit publish latency that is 100x lower than Kafka at P99.99 (ms). Low publish latency is important because it enables systems to hand off messages to a message bus quickly. Once a message is published, the data is safe and the \"action\" will be executed.",[32,61670,61672],{"id":61671},"_15x-faster-historical-read-rate-than-kafka","1.5x Faster Historical Read Rate than Kafka",[48,61674,61675],{},"With a historical read rate that is 1.5 times faster than Kafka, applications using Pulsar as their messaging system can catch-up after an unexpected interruption in half the time. Read throughput is critically important for use cases such as Database Migration\u002FReplication where you are feeding data into a system of record.",[40,61677,61679],{"id":61678},"benchmark-tests","Benchmark Tests",[48,61681,61682,61683,61685],{},"Using the Linux Foundation Open Messaging benchmark [",[55,61684,42523],{"href":61620},"], we ran the latest versions of Apache Pulsar (2.9.1) and Apache Kafka (3.0.0). To ensure an objective baseline comparison, each test in this Benchmark Report compares Kafka to Pulsar in two scenarios:  Pulsar with Journaling and Pulsar without Journaling.",[48,61687,61688],{},"Pulsar’s default configuration includes Journaling, which offers a higher durability guarantee than Kafka’s default configuration. Pulsar without Journaling provides the same durability guarantees as the default Kafka configuration, which results in an apples-to-apples comparison.",[32,61690,61692],{"id":61691},"i-what-we-tested","I. What We Tested",[48,61694,61695],{},"For this benchmark, we selected a handful of tests to represent common patterns in the messaging and streaming domains and to test the limits of each system:",[3933,61697,61699],{"id":61698},"a-maximum-sustainable-throughput","A. Maximum Sustainable Throughput",[48,61701,61702],{},"This test measures the maximum data throughput the system can deliver when consumers are keeping up with the incoming traffic. We ran this test in two scenarios to test the upper boundary performance and to test the cost profile for each system:",[1666,61704,61705,61708],{},[324,61706,61707],{},"Topic with a single partition. This scenario tests the upper boundary performance for a total-order use case or, in the worst case, where partition keys’ data is skewed. At some scale, the design of a system that relies upon single ordering or handling large amounts of skewed data will need to be reconsidered. Pulsar has the ability to handle situations where total ordering is required at higher scale or large amounts of skew arise.",[324,61709,61710],{},"Topic with 100 partitions. With more partitions to stress available resources, this test illustrates how well a system scales horizontally (by adding more machines) and its cost effectiveness. For example, by modeling the hardware cost per 1GB\u002Fs of traffic, it is easy to derive the cost profile for each system.",[3933,61712,61714],{"id":61713},"b-publish-latency-at-a-fixed-throughput","B. Publish Latency at a Fixed Throughput",[48,61716,61717],{},"For this test, we set a fixed rate for the incoming traffic and measured the publish latency profile. Publish latency begins at the moment when a producer tries to publish a message and ends at the moment when it receives confirmation from the brokers that the message is stored and replicated.",[48,61719,61720],{},"In many real-world applications, it is required to guarantee a certain latency SLA (service-level agreement). In particular, this is true in cases where the message is published as the result of some user interaction, or when the user is waiting for the confirmation.",[3933,61722,61724],{"id":61723},"c-catch-up-reads-backlog-draining","C. Catch-up Reads \u002F Backlog Draining",[48,61726,61727],{},"One of the primary purposes of a messaging bus is to act as a “buffer” between different applications or systems. When the consumers are not available, or when there are not enough of them, the system accumulates the data.",[48,61729,61730],{},"In these situations, the system must be able to let the consumers drain the backlog of accumulated data and catch up with the newly produced data as fast as possible.",[48,61732,61733],{},"While this catch-up is happening, it is important that there is no impact on the performance of existing producers (in terms of throughput and latency) on the same topic or in other topics that are present in the cluster.",[48,61735,61736],{},"In all the tests, producers and consumers are always running from a dedicated pool of nodes, and all messages contain a 1KB payload. Additionally, in each test, both Pulsar and Kafka are configured to provide two guaranteed copies of each message.",[48,61738,61739],{},"Note: Pulsar also supports message queuing, complex routing, individual and negative acknowledgments, delayed message delivery, and dead-letter-queues (features not available in Kafka). This benchmark does not evaluate these features.",[32,61741,61743],{"id":61742},"ii-how-we-set-up-the-tests","II. How We Set up the Tests",[48,61745,61746,61747,61751,61752,61756],{},"The benchmark uses the Linux Foundation Open Messaging Benchmark suite [",[55,61748,42523],{"href":61749,"rel":61750},"https:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks\u002F?utm_campaign=Benchmarking%20Pulsar%20vs.%20Kafka%202022&utm_source=%20Linux%20Foundation%20Open%20Messaging%20Benchmark%20Link&utm_medium=Benchmark%202022%20Report%20Reference",[264],"]. You can find all deployments, configurations, and workloads in the Open Messaging Benchmark Github repo [",[55,61753,46057],{"href":61754,"rel":61755},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark?utm_campaign=Benchmarking%20Pulsar%20vs.%20Kafka%202022&utm_source=Open%20Messaging%20Benchmark%20Github%20Link&utm_medium=Benchmark%202022%20Report%20Reference",[264],"].",[48,61758,61759],{},"The testbed for the OpenMessaging Benchmark is set up as follows:",[1666,61761,61762,61765,61768],{},[324,61763,61764],{},"3 Broker VMs  of type i3en.6xlarge, with 24-cores, 192GB of memory, 25Gbps guaranteed networking, and two NVMe SSD devices that support up to 1GB\u002Fs write throughput on each disk.",[324,61766,61767],{},"4 Client (producers and consumers) VMs  of type m5n.8xlarge, with 32-cores and with 25Gbps of guaranteed networking throughput and 128GB of memory to ensure the bottleneck would not be on the client-side.",[324,61769,61770],{},"ZooKeeper VMs of type t2.small. These are not critical because ZooKeeper is not stressed in any form during the benchmark execution.",[48,61772,61773],{},"We tested two configurations for Pulsar:",[1666,61775,61776],{},[324,61777,61778],{},"Pulsar with Journaling (Default):",[321,61780,61781,61784],{},[324,61782,61783],{},"Uses a journal for strong durability (this exceeds the durability provided by Kafka).",[324,61785,61786],{},"Replicates and f-syncs data on disk before acknowledging producers.",[1666,61788,61789],{},[324,61790,61791],{},"Pulsar without Journaling:",[321,61793,61794,61797,61800,61803],{},[324,61795,61796],{},"Replicates data in memory on multiple nodes, before acknowledging producers, and then flushes to disk in the background.",[324,61798,61799],{},"Provides the same durability guarantees as Kafka.",[324,61801,61802],{},"Achieves higher throughput and lower latency when compared to the default Pulsar setup with journaling.",[324,61804,61805],{},"Provides a cost-effective alternative to the standard Pulsar setup, at the expense of strong durability. (“Strong durability” means that the data is flushed to disk before an acknowledgement is returned.)",[48,61807,61808],{},"We configured Apache Pulsar 2.9.1 to run with the 3\u002F3\u002F2 persistence policy, which writes entries to 3 storage nodes and waits for 2 confirmations. We are deploying 1 broker and 1 bookie for each of the 3 VMs we are using.",[48,61810,61811],{},"We used Apache Kafka 3.0.0 and the configuration recommended by Confluent in its fork of the OpenMessaging benchmark.",[48,61813,61814],{},"Details on the Kafka configurations include:",[1666,61816,61817,61820],{},[324,61818,61819],{},"Uses in-memory replication (using the OS page-cache) but it’s not guaranteed to be on disk when a producer is acknowledged.",[324,61821,61822],{},"Uses the recommended Confluent setup to increase the throughput compared to the defaults:",[321,61824,61825,61828,61831,61834],{},[324,61826,61827],{},"num.replica.fetchers=8",[324,61829,61830],{},"message.max.bytes=10485760",[324,61832,61833],{},"replica.fetch.max.bytes=10485760",[324,61835,61836],{},"num.network.threads=8",[1666,61838,61839],{"start":279},[324,61840,61841],{},"Uses Producers settings to ensure a minimum replication factor of 2:",[321,61843,61844,61847,61850],{},[324,61845,61846],{},"acks=all",[324,61848,61849],{},"replicationFactor=3",[324,61851,37639],{},[48,61853,61854],{},"Note: For both Kafka and Pulsar, the clients were configured to use ZGC to get lower GC pause time.",[32,61856,61858],{"id":61857},"iii-benchmark-tests-results","III. Benchmark Tests & Results",[3933,61860,61862],{"id":61861},"a-test-1-maximum-throughput","A. Test #1:  Maximum Throughput",[48,61864,61865],{},"This test measures the maximum “sustainable throughput” reachable on a topic. Eg: The max throughput that is able to push from producers through consumers, without accumulating any backlog.",[225,61867,61869],{"id":61868},"_1-test-1-case-1-maximum-throughput-with-1-partition","1. Test #1 \u002F Case #1: Maximum Throughput with 1 Partition",[48,61871,61872],{},"This first test uses a topic with a single partition to establish the boundary for ingesting data in a totally ordered way. This is common in all the use case scenarios where a single history of all the events in a precise order is required, such as “change data capture” or event sourcing.",[48,61874,61875,61876,1186,61881],{},"Driver files: ",[55,61877,61880],{"href":61878,"rel":61879},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fdriver-pulsar\u002Fpulsar.yaml",[264],"pulsar.yaml",[55,61882,61885],{"href":61883,"rel":61884},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fdriver-kafka\u002Fkafka-throughput.yaml",[264],"kafka-throughput.yaml ",[48,61887,61888,61889],{},"Workload file: ",[55,61890,61893],{"href":61891,"rel":61892},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fworkloads\u002Fmax-rate-1-topic-1-partition-4p-1c-1kb.yaml",[264],"max-rate-1-topic-1-partition-4p-1c-1kb.yaml",[225,61895,61897],{"id":61896},"a-case-1-results-maximum-throughput-with-1-partition","a. Case #1 Results: Maximum Throughput with 1 Partition",[48,61899,61900],{},[384,61901],{"alt":18,"src":61902},"\u002Fimgs\u002Fblogs\u002F63c71ab1a1ca8b3201e7d469_swCzXmWwN5hXgyKExG4ay8JBL1S7o7YzU8hDSTx_YlS1Ef7i5JWo8AcCyjY6Uo5vMRVOZEdoj13LfKls1xBGoqKLkqzFK20QiTdIlmAzirPjo1-NiVRgmGO0KUt4echv9JBBEolNcsXyPyREPlZiDMDltg52oLAwtOav6EW9UwKp0pB38Lk95vTP2e9K.png",[48,61904,61905,61908],{},[384,61906],{"alt":18,"src":61907},"\u002Fimgs\u002Fblogs\u002F63c73ad5f47e238fa299b754_figure-2-table.png","Figure 2: Single partition max write throughput (MB\u002Fs): Higher is better.",[225,61910,61912],{"id":61911},"b-case-1-analysis","b. Case #1 Analysis",[48,61914,61915],{},"The difference in throughput between Pulsar and Kafka reflects how efficiently each system is able to “pipeline” data across the different components from producers to brokers, and then the data replication protocol of each system.",[48,61917,61918],{},"Pulsar achieves a throughput of 700 MB\u002Fs and 580 MB\u002Fs, respectively, on the single partitions, compared to Kafka’s 280 MB\u002Fs. This is possible because the Pulsar client library combines messages into batches when sending them to the brokers. The brokers then pipeline data to the storage nodes.",[48,61920,61921],{},"In Kafka, two factors impose a bottleneck on the maximum achievable throughput: (1) the producer default limit of 5 maximum outstanding batches; and  (2) the producer buffer size (batch.size=1048576) recommended by Confluent for high throughput.",[48,61923,61924,61925,61928],{},"Note: Increasing the batch.size setting has negative effects on the latency. This is not the case for Pulsar producers, where the batching latency is controlled by the ",[4926,61926,61927],{},"batchingMaxDelay()"," setting, in addition to the batch max size.",[48,61930,61931],{},"With the increase in single topic throughput, Pulsar provides developers and architects more options in how they build their system. Teams can worry less about finding optimal partition keys and focus instead on mapping their data into streams.",[225,61933,61935],{"id":61934},"_2-test-1-case-2-maximum-throughput-with-100-partitions","2. Test #1 \u002F Case #2: Maximum Throughput with 100 Partitions",[48,61937,61938],{},"Most use cases that involve a significant amount of real-time data use partitioning to avoid the bottleneck of a single node. Partitioning is a way for messaging systems to divide a single topic into smaller chunks that can be assigned to different brokers.",[48,61940,61941],{},"Given that we tested on a 3-nodes cluster, we used 100 partitions to maximize the throughput of the system across the nodes. There is no advantage to using a higher number of partitions on this cluster because the partitions are handled independently and spread uniformly across the available brokers.",[48,61943,61944,61945,1186,61948],{},"Driver file: ",[55,61946,61880],{"href":61878,"rel":61947},[264],[55,61949,61885],{"href":61883,"rel":61950},[264],[48,61952,61888,61953],{},[55,61954,61957],{"href":61955,"rel":61956},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fworkloads\u002F1-topic-100-partitions-1kb-4p-4c-2000k.yaml",[264],"1-topic-100-partitions-1kb-4p-4c-2000k.yaml",[225,61959,61961],{"id":61960},"a-case-2-results-maximum-throughput-with-100-partitions","a. Case #2 Results: Maximum Throughput with 100 Partitions",[48,61963,61964,61967,61970],{},[384,61965],{"alt":18,"src":61966},"\u002Fimgs\u002Fblogs\u002F63c71ab1aaacb6f5ecad1b50_SWJaDwgVnLYGckeUhJnwVDTu1vSvZfQ2pqc8-WBP2QfdKIkydqSyT3RBQBNF6WIvQwL_0OM1k6U0vpia7q4VD269rFXqLlXdlDxkwdw3-lOyRU5CFpOZFXxv-HivbuRjK42gxOToo5DfMcrepufOfMwc_BdLQRNH3Mnsdrfq4fiWHosNq1POqyMVe76v.png",[384,61968],{"alt":18,"src":61969},"\u002Fimgs\u002Fblogs\u002F63c73b6d3d155a4ae5b6f20c_figure-3-table.png","Figure 3: 100 partitions max write throughput (MB\u002Fs): Higher is better.",[225,61972,61974],{"id":61973},"b-case-2-analysis","b. Case #2 Analysis",[48,61976,61977],{},"Pulsar without Journaling achieves a throughput of 1600 (MB\u002Fs), Kafka achieves a throughput of 1087 (MB\u002Fs) and Pulsar with Journaling (Default) achieves a throughput of 800 (MB\u002Fs). At equivalent durability guarantees Pulsar is able to outperform Kafka in terms of maximum write throughput. The difference in performance stems from how Kafka implements access to the disk. Kafka stores data for each partition in different directories and files, resulting in more files open for writing and scattering the IO operations across the disk. This increases the stress and contention on the OS page caching system that Kafka relies on.",[48,61979,61980],{},"When reading a file, the OS tries to cache blocks of data in the available system RAM. When the data is not available in the OS cache, the thread is blocked while the data is read from the disk and pulled in the cache.",[48,61982,61983],{},"The cost of pulling the blocked data into the cache is a significant delay (~100s of milliseconds) in serving write\u002Fread requests for other topics. This delay is observed in the benchmark results in the form of the publish latency experienced by the producers.",[48,61985,61986],{},"In the case of the default Pulsar deployment (with a journal for strong durability), the throughput is lower because 1 disk (out of 2 available in the VMs) is dedicated to the journal. Therefore we are capping the available IO bandwidth. In a production environment, this cap could be mitigated by having more disks to increase the IOPS\u002Fnode capacity, but for this benchmark we used the same VM resources for each of the system configurations.",[48,61988,61989],{},"The difference in throughput can impact the cost of the solution. With parity of guarantees, this test shows that Pulsar would require 32% less hardware compared to Kafka for the same amount of traffic.",[3933,61991,61993],{"id":61992},"b-test-2-publish-latency","B. Test #2:  Publish Latency",[48,61995,61996],{},"The purpose of this test is to measure the latency perceived by the producers at a steady state, with a fixed publish rate.",[48,61998,61999],{},"Messaging systems are often used in applications where data must efficiently and reliably be moved from a producing application to be durably stored in the messaging system. In high volume scenarios, even momentary increases in latency can result in memory resources being exhausted. In other situations, a human user may be “in-the-loop” and waiting on an operation which publishes a message - for example, a web page needs the confirmation of the action before proceeding - and latency spikes can degrade the user experience. In these use cases, it is important to have a latency performance profile that is consistently within a given SLA (service-level agreement).",[48,62001,62002],{},"It is also important to consider that a high latency in the long tail (eg: 99.9 percentile and above) will still have an outsized impact over an SLA that can be offered by an application. In practical terms, a higher 99.9% latency in the producer will often result in a significantly higher 99% latency for the application request.",[48,62004,62005],{},"Because the messaging bus sits at the bottom of the stack, it needs to provide a low and consistent latency profile so that applications can provide their own latency SLAs.",[48,62007,62008],{},"This test is conducted by publishing and consuming at a fixed rate of 500 MB\u002Fs and comparing it to the publish latency seen by producers.",[48,62010,61944,62011,1186,62014],{},[55,62012,61880],{"href":61878,"rel":62013},[264],[55,62015,62018],{"href":62016,"rel":62017},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fdriver-kafka\u002Fkafka-latency.yaml",[264],"kafka-latency.yaml ",[48,62020,61888,62021],{},[55,62022,62025],{"href":62023,"rel":62024},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fworkloads\u002F1-topic-100-partitions-1kb-4p-4c-500k.yaml",[264],"1-topic-100-partitions-1kb-4p-4c-500k.yaml",[225,62027,62029],{"id":62028},"a-test-2-results-publish-latency","a. Test #2 Results: Publish Latency",[48,62031,62032,62035,62038],{},[384,62033],{"alt":18,"src":62034},"\u002Fimgs\u002Fblogs\u002F63c71ab1a1ca8b854fe7d468_MCUf-xMXk9i4GST8unRDS1C5AoCBLtHiEfyiIQ320_FUKIeP4K8urFfhEv-TDFxSPoUuvWvDRmdvWiUKJvy_pyxHui9h1CM84FAhXcBle8zq1cmq25qkheT_EmDeHulx2UBXiSQzaVYOoReLM1c9JgprXdWsV8-1Cb--HapmjH1VHWIYtPPHF6OYbXO2.png",[384,62036],{"alt":18,"src":62037},"\u002Fimgs\u002Fblogs\u002F63c73a3a2753b445eb5fee87_figure-4-table.png","Figure 4: 500K Rate publish latency percentiles (ms): Lower is better.",[225,62040,62042],{"id":62041},"b-test-2-analysis","b. Test #2 Analysis",[48,62044,62045],{},"In this test, Pulsar is able to maintain a low publish latency while sustaining a high per-node utilization. Pulsar without Journaling is able to sustain 1.58 milliseconds latency at the 99 percentile and Pulsar with Journaling is able to sustain 7.89 milliseconds.",[48,62047,62048],{},"Kafka maintains a low publish latency up to the 99 percentile, where it is able to sustain 3.46 milliseconds in latency. But at 99.9%, Kafka’s latency spikes to 54.56 ms.",[48,62050,62051],{},"Publishing at a fixed rate, below the max burst throughput, at 99.9% and above, Pulsar has lower latency than Kafka for both Pulsar with Journaling (default) and the Pulsar without Journaling.",[48,62053,62054],{},"The reasons for lower latency with Pulsar are:",[1666,62056,62057,62060],{},[324,62058,62059],{},"When running Pulsar without Journaling, the critical data write path is decoupled from the disk access so it is not susceptible to the noise introduced by IO operations. The data is guaranteed to only be copied in memory, (unlike OS page cache which blocks under high load situations,) and then is flushed by background threads.",[324,62061,62062],{},"Pulsar with Journaling (Default) is able to maintain low latency because the BookKeeper replication protocol is able to ignore the slowest responding storage node. Due to the internal disk garbage collection mechanism, the performance profile of SSD and NVMe disks is characterized by good average write latency but with periodic latency spikes of up to 100 milliseconds. BookKeeper is able to smooth out the latency when used in 3\u002F3\u002F2 configuration, because it only waits for the two fast storage nodes for each entry.",[48,62064,62065],{},"By contrast, Kafka replication protocol is set to wait for all three of the brokers that are in the in-replica-set. Because of that, unless a broker crashes or is falling behind the leader for more than 30 seconds, each entry in Kafka needs to wait for all three brokers to have the entry.",[3933,62067,62069],{"id":62068},"c-test-3-catch-up-reads","C. Test #3:  Catch-up Reads",[48,62071,62072],{},"In the consumer catch-up test, we build a backlog of data and then start the consumers. While the consumers catch-up, the writers continue publishing data at the same rate.",[48,62074,62075],{},"This is a common, real-life scenario for a messaging\u002Fstreaming system. Below are a few common use cases:",[1666,62077,62078,62081,62084],{},[324,62079,62080],{},"Consumers come back online after a few hours of downtime and try to catch-up.",[324,62082,62083],{},"New consumers get bootstrapped and replay the data in the topic.",[324,62085,62086],{},"Periodic batch jobs that scan and process the historical data stored in the topic.",[48,62088,62089],{},"With this test, we can measure the following:",[1666,62091,62092],{},[324,62093,62094],{},"The catch-up speed.",[321,62096,62097],{},[324,62098,62099],{},"Consuming applications want to be able to recover as fast as possible, draining all the pending backlog and catching up with the producers in the shortest time.",[1666,62101,62102],{"start":19},[324,62103,62104],{},"The ability to avoid performance degradation and isolate workloads.",[321,62106,62107],{},[324,62108,62109],{},"Producing applications need to be decoupled and isolated from consuming applications and also from different, unrelated topics in the same cluster.",[48,62111,62112],{},"The size of the backlog is 512 GBs. It is larger than the RAM available in the nodes in order to simulate the case where the entire data does not fit in cache and the storage systems are forced to read from disk.",[48,62114,61944,62115,1186,62118],{},[55,62116,61880],{"href":61878,"rel":62117},[264],[55,62119,62018],{"href":62016,"rel":62120},[264],[48,62122,61888,62123],{},[55,62124,62127],{"href":62125,"rel":62126},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fworkloads\u002F1-topic-100-partitions-1kb-4p-4c-200k-backlog.yaml",[264],"1-topic-100-partitions-1kb-4p-4c-200k-backlog.yaml",[225,62129,62131],{"id":62130},"a-test-3-results-catch-up-reads","a. Test #3 Results: Catch-up Reads",[48,62133,62134,62137,62140,62141,62144,62147,62148,62151,62154],{},[384,62135],{"alt":18,"src":62136},"\u002Fimgs\u002Fblogs\u002F63c71ab15ed199fbb1d6e088_Fvef71g8AHCQAbbo6Uo-1Wv9iGMbP9nxd1nnDndi8bYNpYt8dYOuVy5XATUl0wO4UaOX3wYzlIvWjBQbK-kd7X1-rHWti2QdQku7AfFcUGGZKuStYq7eO2_42r5tsdFi4Z3a_H3_ccu0K9XFb1o3LzASHvzK5aeKg5AYZ_H8vyfQlsePegBX34w79NYv.png",[384,62138],{"alt":18,"src":62139},"\u002Fimgs\u002Fblogs\u002F63c73d2c5ecd19269dfb2aec_figure-5a-table.png","Figure 5a: Catch-up read throughput (msg\u002Fs): Higher is better.",[384,62142],{"alt":18,"src":62143},"\u002Fimgs\u002Fblogs\u002F63c71ab225436150ccf4fd11_BgqqL7qDd8JC9zjC87183t6d2y6-iUGF0rBJey9vyzsvhpyp8vPctxWhSq9MbsOm2UixgQAfjm1cjv3iDSMiEibCPMUVyHcaPBGvOwAISevM0BlhEgEPW8lsUiE6XEeu3gMVEeG8gUhnrMEOIAcRpAV43jROuT85hRbGbKGDQ9YBQh_jkgYPLt0UcxkW.png",[384,62145],{"alt":18,"src":62146},"\u002Fimgs\u002Fblogs\u002F63c73d8d187a390bc382b477_figure-5b-table.png","Figure 5b: Catch-up read chase time (seconds): Shorter is better.",[384,62149],{"alt":18,"src":62150},"\u002Fimgs\u002Fblogs\u002F63c71ab1c37fd1acad9a0bcc_q7zKC60ZrjFQbUSYvSodbtz88-VKxy5JcapxW7CENWDfQmS2v7P47Jo4jDqChoMrPUqU7CQlWje6t6XM9mXAL13HEeDPiPPcp-LjWA3DfAsULd-bdcogG2Z9jJlyq45GpZrwHrGVlXysHtCYI9MZgFwgp3LYIfjkXPkbNxpFy8EyXKeUPagQPVAPlJar.png",[384,62152],{"alt":18,"src":62153},"\u002Fimgs\u002Fblogs\u002F63c73db681d346249886ddd7_figure-5c-table.png","Figure 5c: Impact publish latency during catchup read (ms): Lower is better.",[225,62156,62158],{"id":62157},"b-test-3-analysis","b. Test #3 Analysis",[48,62160,62161],{},"The test shows that Pulsar consumers are able to drain the backlog of data ~2.5x faster than Kafka consumers, without impacting the performance of the connected producers.",[48,62163,62164],{},"With Kafka, the test showed that while the consumers are catching up, the producers are heavily impacted, with 99% latencies up to ~700 milliseconds and consequent throughput reductions.",[48,62166,62167],{},"The increase in latency is caused by the contention on the OS page cache used by Kafka. When the size of the backlog of data exceeds the RAM available in the Kafka broker, the OS will start to evict pages from the cache. This causes page cache misses that stop the Kafka threads. When there are enough producers and consumers in a broker, it becomes easy to end up in a “cache-thrashing” scenario, where time is spent paging data in from the disk and evicting it from the cache soon after.",[48,62169,62170],{},"In contrast, Pulsar with BookKeeper adopts a more sophisticated approach to write and read operations. Pulsar does not rely on the OS page cache because BookKeeper has its own set of write and read caches, for which the eviction and pre-fetching are specifically designed for streaming storage use cases.",[48,62172,62173],{},"This test demonstrates the degradation that consumers can cause in a Kafka cluster. This impacts the performance of the Kafka cluster and can lead to reliability problems.",[40,62175,2125],{"id":2122},[48,62177,62178],{},"The benchmark demonstrates Apache Pulsar’s ability to provide high performance across a broad range of use cases. In particular, Pulsar provides better and more predictable performance, even for the use cases that are generally associated with Kafka, such as large volume streaming data over partitioned topics. Key highlights on the Pulsar versus Kafka performance comparison include:",[1666,62180,62181,62184,62187],{},[324,62182,62183],{},"Pulsar provides 99pct write latency \u003C1.6ms without journal, and \u003C8ms with journal for fixed 500MB\u002Fs write throughput. The latency profile does not degrade at the higher quantiles, while Kafka latency quickly spikes up to 100s of milliseconds.",[324,62185,62186],{},"Pulsar can prove up to 3.2 GB\u002Fs historical data read throughput, 60% more than Kafka which can only achieve 2.0 GB\u002Fs.",[324,62188,62189],{},"During historical data reading, Pulsar’s I\u002FO isolation provides a low  and consistent publish latency, 2 orders of magnitude lower than Kafka. This ensures that the real-time data stream will not be affected when reading historical data.",[32,62191,62193],{"id":62192},"pulsar-unified-messaging-streaming-and-the-future","Pulsar: Unified Messaging & Streaming, and the Future",[48,62195,62196,62197,190],{},"While Pulsar is often adopted for streaming use cases, it also provides a superset of features and is widely adopted for message queuing use cases and for use cases that require unified messaging and streaming capabilities. This benchmark did not cover the message queuing capabilities of Pulsar, but you can learn more in the Pulsar Launches 2.8.0, Unified Messaging and Streaming ",[55,62198,39553],{"href":62199},"\u002Fblog\u002Fapache-pulsar-launches-2-8-unified-messaging-streaming-transactions",[48,62201,62202],{},"Beyond the development of Pulsar’s capabilities, the Pulsar ecosystem continues to expand. Protocol handlers allow for Pulsar brokers to natively communicate via other protocols, such as Kafka and RabbitMQ, enabling teams to easily integrate existing applications with Pulsar. Integrations with Apache Pinot, Delta Lake, Apache Spark, and Apache Flink have allowed teams to make Pulsar the ideal choice to help teams use one technology across both the data and application tiers.",[48,62204,33334],{},[48,62206,62207],{},[34077,62208],{"value":34079},[32,62210,33331],{"id":32196},[1666,62212,62213,62219,62225,62231,62240,62246],{},[324,62214,62215,62216,190],{},"To learn more about how Pulsar compares to Kafka, visit this ",[55,62217,51627],{"href":62218},"\u002Fpulsar\u002Fpulsar-vs-kafka",[324,62220,62221,62222,62224],{},"Read this ",[55,62223,39553],{"href":32263}," to bootstrap yourknowledge by translating your existing Apache Kafka experience.",[324,62226,62227,62228,190],{},"To learn more about Apache Pulsar use cases, check out this ",[55,62229,51627],{"href":62230},"\u002Fcontent-type-filtring-system\u002Fsuccess-stories",[324,62232,62233,62234,62236,62237],{},"Interested in spinning up a Pulsar cluster in minutes using StreamNative Cloud? ",[55,62235,38404],{"href":38403}," today. ",[55,62238,3931],{"href":45212,"rel":62239},[264],[324,62241,62242,62245],{},[55,62243,10265],{"href":45212,"rel":62244},[264]," for the monthly StreamNative Newsletter for Apache Pulsar.",[324,62247,62248,62252],{},[55,62249,62251],{"href":31912,"rel":62250},[264],"Learn Pulsar"," from the original creators of Pulsar. Watch on-demand videos, enroll in self-paced courses, and complete our certification program to demonstrate your Pulsar knowledge.",[32,62254,10248],{"id":10247},[48,62256,62257,62258,190],{},"Founded by the original creators of Apache Pulsar, the StreamNative team has more experience deploying and running Pulsar than any company in the world. StreamNative offers a cloud-native, scalable, resilient, and secure messaging and event streaming solution powered by Apache Pulsar. With StreamNative Cloud, you get a fully-managed Apache-Pulsar-as-a-Service offering available in our cloud or yours. Learn more at ",[55,62259,62261],{"href":62260},"\u002Fabout","Streamnative.io",[32,62263,22673],{"id":22672},[48,62265,62266,758,62268],{},[2628,62267,42523],{},[55,62269,62272],{"href":62270,"rel":62271},"https:\u002F\u002Fhubs.ly\u002FQ016_P830",[264],"The Linux Foundation Open Messaging Benchmark suite",[48,62274,62275,758,62277],{},[2628,62276,46057],{},[55,62278,62281],{"href":62279,"rel":62280},"https:\u002F\u002Fhubs.ly\u002FQ016_PcP0",[264],"The Open Messaging Benchmark Github repo",[48,62283,62284,758,62286],{},[2628,62285,46068],{},[55,62287,62289],{"href":62288},"\u002Fblog\u002Fperspective-on-pulsars-performance-compared-to-kafka","A More Accurate Perspective on Pulsar’s Performance",[48,62291,3931],{},[48,62293,62294],{},[34077,62295],{"value":34079},{"title":18,"searchDepth":19,"depth":19,"links":62297},[62298,62303,62308],{"id":22052,"depth":19,"text":22053,"children":62299},[62300,62301,62302],{"id":61657,"depth":279,"text":61658},{"id":61664,"depth":279,"text":61665},{"id":61671,"depth":279,"text":61672},{"id":61678,"depth":19,"text":61679,"children":62304},[62305,62306,62307],{"id":61691,"depth":279,"text":61692},{"id":61742,"depth":279,"text":61743},{"id":61857,"depth":279,"text":61858},{"id":2122,"depth":19,"text":2125,"children":62309},[62310,62311,62312,62313],{"id":62192,"depth":279,"text":62193},{"id":32196,"depth":279,"text":33331},{"id":10247,"depth":279,"text":10248},{"id":22672,"depth":279,"text":22673},"2022-04-07","This Apache Pulsar versus Apache Kafka report focuses purely on comparing the technical performance based on benchmark tests.","\u002Fimgs\u002Fblogs\u002F63c7fa9e7c4ed1b41436347f_63b3ed64316977607d62783a_blog_cover_pulsar_vs_kafka.png",{},{"title":42239,"description":62315},"blog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark",[799,821,10503,26747],"lZD3D6Fx3oahLRZEB_N8Wg9iqTd7Yx0U7UpF3ujauk0",{"id":62323,"title":62324,"authors":62325,"body":62326,"category":821,"createdAt":290,"date":62541,"description":62542,"extension":8,"featured":294,"image":62543,"isDraft":294,"link":290,"meta":62544,"navigation":7,"order":296,"path":62545,"readingTime":31039,"relatedResources":290,"seo":62546,"stem":62547,"tags":62548,"__hash__":62549},"blogs\u002Fblog\u002Fstreaming-real-time-chat-messages-scylla-apache-pulsar.md","Streaming Real-Time Chat Messages into Scylla with Apache Pulsar",[46357],{"type":15,"value":62327,"toc":62536},[62328,62337,62340,62343,62347,62350,62353,62356,62362,62365,62371,62375,62389,62398,62401,62407,62410,62413,62419,62422,62428,62431,62437,62441,62444,62450,62453,62456,62462,62465,62468,62474,62477,62480,62483,62489,62492,62495,62498,62505,62511,62514,62517,62520,62527,62530],[48,62329,62330,62331,62336],{},"At Scylla Summit 2022, I presented ",[55,62332,62335],{"href":62333,"rel":62334},"https:\u002F\u002Fwww.scylladb.com\u002Fpresentations\u002Fflip-into-apache-pulsar-apps-with-scylladb\u002F",[264],"“FLiP Into Apache Pulsar Apps with ScyllaDB”",". Using the same content, in this blog we’ll demonstrate step-by-step how to build real-time messaging and streaming applications using a variety of OSS libraries, schemas, languages, frameworks, and tools utilizing ScyllaDB. We’ll also introduce options from MQTT, Web Sockets, Java, Golang, Python, NodeJS, Apache NiFi, Kafka on Pulsar, Pulsar protocol and more. You will learn how to quickly deploy an app to a production cloud cluster with StreamNative, and build your own fast applications using the Apache Pulsar and Scylla integration.",[48,62338,62339],{},"Before we jump into the how, let’s review why this integration can be used for speedy application build. Scylla is an ultra-fast, low-latency, high-throughput, open source NoSQL platform that is fully compatible with Cassandra. Populating Scylla tables utilizing the Scylla-compatible Pulsar IO sink doesn’t require any complex or specialized coding, and the sink makes it easy to load data to Scylla using a simple configuration file pointing to Pulsar topics that stream all events directly to Scylla tables.",[48,62341,62342],{},"Now, let’s build a streaming real-time chat message system utilizing Scylla and Apache Pulsar!",[40,62344,62346],{"id":62345},"why-apache-pulsar-for-streaming-event-based-applications","Why Apache Pulsar for Streaming Event Based Applications",[48,62348,62349],{},"Let’s start the process to create a chat application that publishes messages to an event bus anytime someone fills out a web form. After the message is published, sentiment analysis is performed on the “comments” text field of the payload, and the result of the analysis is output to a downstream topic.",[48,62351,62352],{},"Event-driven applications, like our chat application, use a message bus to communicate between loosely-coupled, collaborating services. Different services communicate with each other by exchanging messages asynchronously. In the context of microservices, these messages are often referred to as events.",[48,62354,62355],{},"The message bus receives events from producers, filters the events, and then pushes the events to consumers without tying the events to individual services. Other services can subscribe to the event bus to receive those events for processing (consumers).",[48,62357,62358,62361],{},[55,62359,821],{"href":23526,"rel":62360},[264]," is a cloud-native, distributed messaging and event-streaming platform that acts as a message bus. It supports common messaging paradigms with its diverse subscription types and consumption patterns.",[48,62363,62364],{},"As a feature required for our integration, Pulsar supports IO Connectors. Pulsar IO connectors enable you to create, deploy, and manage connectors utilizing simple configuration files and basic CLI tools and REST APIs. We will utilize a Pulsar IO Connector to sink data from Pulsar topics to Scylla DB.",[48,62366,62367],{},[384,62368],{"alt":62369,"src":62370},"illustration with logo pulsar","\u002Fimgs\u002Fblogs\u002F63b3ec28fb095cfe9c700771_screen-shot-2022-03-17-at-10.32.01-am.png",[40,62372,62374],{"id":62373},"pulsar-io-connector-for-scylla-db","Pulsar IO Connector for Scylla DB",[48,62376,62377,62378,62383,62384,190],{},"First, we ",[55,62379,62382],{"href":62380,"rel":62381},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fio-overview\u002F",[264],"download the Cassandra connector"," to deploy it to my Pulsar cluster. This process is documented at the ",[55,62385,62388],{"href":62386,"rel":62387},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fio-cassandra-sink\u002F",[264],"Pulsar IO Cassandra Sink connector information",[48,62390,62391,62392,62397],{},"Next, we ",[55,62393,62396],{"href":62394,"rel":62395},"https:\u002F\u002Fdlcdn.apache.org\u002Fpulsar\u002Fpulsar-2.9.1\u002Fconnectors\u002Fpulsar-io-cassandra-2.9.1.nar",[264],"download the pulsar-io-cassandra-X.nar archive"," to our connectors directory. Scylla DB is fully compatible with Cassandra, so we can use that connector to stream messages to it.",[48,62399,62400],{},"When using a Pulsar IO connector like the Scylla DB one I used for my demo, you can specify the configuration details inside a YAML file like the one shown below.",[8325,62402,62405],{"className":62403,"code":62404,"language":8330},[8328],"configs:\n    roots: \"172.17.0.2:9042\"\n    keyspace: \"pulsar_test_keyspace\"\n    columnFamily: \"pulsar_test_table\"\n    keyname: \"key\"\n    columnName: \"col\"\n",[4926,62406,62404],{"__ignoreMap":18},[48,62408,62409],{},"The main configuration shown above is done in YAML format and lists the root server with port, a keyspace, a column family, keyname, and column name to populate.",[48,62411,62412],{},"First, we will need to create a topic to consume from.",[8325,62414,62417],{"className":62415,"code":62416,"language":8330},[8328],"bin\u002Fpulsar-admin topics create persistent:\u002F\u002Fpublic\u002Fdefault\u002Fchatresult2\n",[4926,62418,62416],{"__ignoreMap":18},[48,62420,62421],{},"When you deploy the connector you pass in these configuration properties by command line call as shown below.",[8325,62423,62426],{"className":62424,"code":62425,"language":8330},[8328],"bin\u002Fpulsar-admin sinks create --tenant public --namespace default --name \"scylla-test-sink\" --sink-type cassandra --sink-config-file conf\u002Fscylla.yml --inputs chatresult2\n",[4926,62427,62425],{"__ignoreMap":18},[48,62429,62430],{},"For new data, create a keyspace, table and index or use one of your existing ones.",[8325,62432,62435],{"className":62433,"code":62434,"language":8330},[8328],"CREATE KEYSPACE pulsar_test_keyspace with replication = {‘class’:’SimpleStrategy’, ‘replication_factor’:1};\nCREATE TABLE pulsar_test_table (key text PRIMARY KEY, col text);\nCREATE INDEX on pulsar_test_table(col);\n",[4926,62436,62434],{"__ignoreMap":18},[40,62438,62440],{"id":62439},"adding-ml-functionality-with-a-pulsar-function","Adding ML Functionality with a Pulsar Function",[48,62442,62443],{},"In the previous section, we discussed why Apache Pulsar is well-suited for event-driven applications. In this section, we’ll cover Pulsar Functions–a lightweight, serverless computing framework (similar to AWS Lambda). We’ll leverage a Pulsar Function to deploy our ML model to transform or process messages in Pulsar. The diagram below illustrates our chat application example.",[48,62445,62446],{},[384,62447],{"alt":62448,"src":62449},"illustration ML functionality pulsar cluster","\u002Fimgs\u002Fblogs\u002F63b3ec83802916b866a07a11_screen-shot-2022-03-17-at-10.34.54-am.png",[48,62451,62452],{},"Keep in mind: Pulsar Functions give you the flexibility to use Java, Python, or Go for implementing your processing logic. You can easily use alternative libraries for your sentiment analysis algorithm.",[48,62454,62455],{},"The code below is a Pulsar Function that runs Sentiment Analysis on my stream of events. (The function runs once per event.)",[8325,62457,62460],{"className":62458,"code":62459,"language":8330},[8328],"from pulsar import Function\nfrom vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer\nimport json\n\nclass Chat(Function):\n    def __init__(self):\n        pass\n\n    def process(self, input, context):\n        logger = context.get_logger()\n        logger.info(\"Message Content: {0}\".format(input))\n        msg_id = context.get_message_id()\n\n        fields = json.loads(input)\n        sid = SentimentIntensityAnalyzer()\n        ss = sid.polarity_scores(fields[\"comment\"])\n        logger.info(\"Polarity: {0}\".format(ss['compound']))\n        sentimentVal = 'Neutral'\n        if ss['compound'] == 0.00:\n            sentimentVal = 'Neutral'\n        elif ss['compound'] \nHere, we use the Vader Sentiment NLP ML Library to analyze the user’s sentiment on the comment. We enrich our input record with the sentiment and then write it in JSON format to the output topic.\n\nI use the Pulsar context to do logging. I could also push data values to state storage or record some metrics. For this example, we will just do some logging.\n\n## Deploy Our Function\n\nBelow is the deployment script where you can find all of the options and tools in its github directory. We have to make sure we have our NLP library installed on all of our nodes.\n\n",[4926,62461,62459],{"__ignoreMap":18},[48,62463,62464],{},"bin\u002Fpulsar-admin functions create --auto-ack true\n--py pulsar-pychat-function\u002Fsrc\u002Fsentiment.py --classname \"sentiment.Chat\" --inputs \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fchat2\" --log-topic \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fchatlog2\" --name Chat --namespace default --output \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fchatresult2\" --tenant public",[48,62466,62467],{},"pip3 install vaderSentiment",[8325,62469,62472],{"className":62470,"code":62471,"language":8330},[8328],"\n## Let’s Run Our Chat Application\n\nNow that we have built our topic, Function, and sink, let’s build our application. The full web page is in the github directory, but I’ll show you the critical portions here. For this Single Page Application (SPA), I am using JQuery and DataTables that are included from their public CDNs. Datatable.html\n\n",[4926,62473,62471],{"__ignoreMap":18},[48,62475,62476],{},"User:",[48,62478,62479],{},"Question:",[48,62481,62482],{},"Contact Info:",[8325,62484,62487],{"className":62485,"code":62486,"language":8330},[8328],"\nIn the above HTML Form, we let users add a comment to our chat.\n\nNow we are using JavaScript to send the form data as JSON to a Pulsar topic via WebSockets. WebSockets are a supported protocol for Apache Pulsar. The WebSocket URL is ws:\u002F\u002Fpulsar1:8080\u002Fws\u002Fv2\u002Fproducer\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fchat2.\n\nWhere ws is the protocol, pulsar1 is the Pulsar server, port 8080 is our REST port, producer is what we are doing, persistent is our type of topic, public is our tenant, default is our namespace and chat2 is our topic: We populate an object and convert it to a JSON String and encode that payload as a Base64-encoded ASCII string. Then, we add that encoded String as the payload in a new JSON string that includes payload, properties and context for our Pulsar Message. This format is required for the WebSocket protocol to convert to a regular message in our Pulsar topic.\n\n",[4926,62488,62486],{"__ignoreMap":18},[48,62490,62491],{},"function loadDoc() {\n var xhttp = new XMLHttpRequest();\n xhttp.onreadystatechange = function() {\n   if (this.readyState == 4 && this.status == 200) {\n     document.getElementById(\"demo\").innerHTML = '';\n   }\n };\nvar wsUri = \"ws:\u002F\u002Fpulsar1:8080\u002Fws\u002Fv2\u002Fproducer\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fchat2\";",[48,62493,62494],{},"websocket = new WebSocket(wsUri);",[48,62496,62497],{},"const pulsarObject = {\n        userInfo: document.getElementById('user-id').value.substring(0,200),\n       contactInfo: document.getElementById('contactinfo-id').value.substring(0,200),\n       comment: document.getElementById('other-field-id').value.substring(0, 200)};\nconst jsonStr = JSON.stringify(pulsarObject);\nvar payloadStr = btoa(jsonStr);\nconst propertiesObject = {key: Date.now() }\nvar data = JSON.stringify({ \"payload\": payloadStr, \"properties\": propertiesObject, \"context\": \"cs\" });",[48,62499,62500,62501,62504],{},"websocket.onopen = function(evt) {\n  if (websocket.readyState === WebSocket.OPEN) {\n          websocket.send(data);\n  }\n};\nwebsocket.onerror = function(evt) {console.log('ERR', evt)};\nwebsocket.onmessage = function(evt) {}\nwebsocket.onclose = function(evt) {\n if (evt.wasClean) {    console.log(evt);\n } else {    console.log('",[2628,62502,62503],{},"close"," Connection died');\n }\n};\n}\nvar form = document.getElementById('form-id');\nform.onsubmit = function() {\n   var formData = new FormData(form);\n   var action = form.getAttribute('action');\n   loadDoc();\n   return false;\n }",[8325,62506,62509],{"className":62507,"code":62508,"language":8330},[8328],"\nIn the above code, we’ll grab the value of the fields from the form, stop the form from reloading the page, and then send the data to Pulsar.\n\nNow, let’s consume any messages sent to the result topic of our Sentiment Pulsar function.\n\nIn the below code we consume from a Pulsar topic: ws:\u002F\u002Fpulsar1:8080\u002Fws\u002Fv2\u002Fconsumer\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fchatresult2\u002Fchatrreader?subscriptionType=Shared&receiverQueueSize=500.\n\nIn this URI, we can see this differs some from the producer URI. We have a receiverQueueSize, consumer tag and a subscription Type of Shared.\n\nJavaScript:\n\n",[4926,62510,62508],{"__ignoreMap":18},[48,62512,62513],{},"$(document).ready(function() {\n   var t = $('#example').DataTable();",[48,62515,62516],{},"var wsUri = \"ws:\u002F\u002Fpulsar1:8080\u002Fws\u002Fv2\u002Fconsumer\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fchatresult2\u002Fchatrreader?subscriptionType=Shared&receiverQueueSize=500\";\nwebsocket = new WebSocket(wsUri);\nwebsocket.onopen = function(evt) {\n  console.log('open');\n};\nwebsocket.onerror = function(evt) {console.log('ERR', evt)};\nwebsocket.onmessage = function(evt) {",[48,62518,62519],{},"   var dataPoints = JSON.parse(evt.data);\n   if ( dataPoints === undefined || dataPoints == null || dataPoints.payload === undefined || dataPoints.payload == null ) {\n           return;\n   }\n   if (IsJsonString(atob(dataPoints.payload))) {\n        var pulsarMessage = JSON.parse(atob(dataPoints.payload));\n        if ( pulsarMessage === undefined || pulsarMessage == null ) {\n                return;\n        }\n        var sentiment = \"\";\n        if ( !isEmpty(pulsarMessage.sentiment) ) {\n                sentiment = pulsarMessage.sentiment;\n        }\n        var publishTime = \"\";\n        if ( !isEmpty(dataPoints.publishTime) ) {\n                publishTime = dataPoints.publishTime;\n        }\n        var comment = \"\";\n        if ( !isEmpty(pulsarMessage.comment) ) {\n                comment = pulsarMessage.comment;\n        }\n        var userInfo= \"\";\n        if ( !isEmpty(pulsarMessage.userInfo) ) {\n               userInfo = pulsarMessage.userInfo;\n        }\n        var contactInfo= \"\";\n        if ( !isEmpty(pulsarMessage.contactInfo) ) {\n                contactInfo = pulsarMessage.contactInfo;\n        }",[48,62521,62522,62523,62526],{},"                t.row.add( ",[2628,62524,62525],{}," sentiment, publishTime, comment, userInfo, contactInfo"," ).draw(true);\n      }\n};",[48,62528,62529],{},"} );",[8325,62531,62534],{"className":62532,"code":62533,"language":8330},[8328],"\nFor messages consumed in JavaScript WebSockets, we have to Base64-decode the payload and parse the JSON into an object and then use the DataTable row.add method to add these new table rows to our results. This will happen whenever messages are received.\n\n## Conclusion\n\nIn this blog, we explained how to use Apache Pulsar to build simple, streaming applications regardless of the data source. We chose to add a Scylla compatible sink to our Chat application; however, we could do this for any data store in Apache Pulsar.\n\n![illustration streamnative interface](\u002Fimgs\u002Fblogs\u002F63b3ed0729055a12f20bcdb9_screen-shot-2022-03-17-at-10.39.04-am.png)\n\nYou can find the source code in the Github repo [Scylla FLiPS The Stream With Apache Pulsar](https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FScyllaFLiPSTheStream).\n\nIf you’d like to see this process in action, view the original [on-demand recording](https:\u002F\u002Fwww.scylladb.com\u002Fpresentations\u002Fflip-into-apache-pulsar-apps-with-scylladb\u002F).\n\n## Resources & References\n\n- [Doc] [How to connect Pulsar to database](https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fio-quickstart\u002F)\n- [Doc] [Cassandra Sink](https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Fcassandra-sink\u002F2.5.1)\n- [Code] [FLiP Meetup Chat](https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiP-Meetup-Chat)\n- [Code] [Pulsar Pychat](https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fpulsar-pychat-function)\n- [Doc] [Cassandra Sink Connector](https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fio-cassandra\u002F)\n- [Doc] [Pulsar Functions Overview](https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Ffunctions-overview\u002F)\n- [Doc] [Pulsar WebSocket API](https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fclient-libraries-websocket\u002F)\n- [Slides] [FLiP into ScyllaDB - Scylla Summit 2022](https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FSpeakerProfile\u002Fblob\u002Fmain\u002F2022\u002Ftalks\u002FStreamNative%20-%20FLiP%20Into%20ScyllaDB%20-%20Scylla%20Summit%202022.pdf)\n\n## More on Pulsar\n\n1. Learn Pulsar Fundamentals: While this blog did not cover Pulsar fundamentals, there are great resources available to help you learn more. If you are new to Pulsar, we recommend you to take the on-demand [self-paced Pulsar courses](https:\u002F\u002Fwww.academy.streamnative.io\u002Ftracks) or test your Pulsar knowledge with the [Fundamentals TestOut](https:\u002F\u002Fwww.academy.streamnative.io\u002Fcourses\u002Fcourse-v1:streamnative+APFTO-001+2022\u002Fabout).\n2. Spin up a Pulsar Cluster in Minutes: If you want to try building microservices without having to set up a Pulsar cluster yourself, sign up for [StreamNative Cloud](https:\u002F\u002Fconsole.streamnative.cloud\u002F?defaultMethod=login) today. StreamNative Cloud is the simple, fast, and cost-effective way to run Pulsar in the public cloud.\n3. Continued Learning: If you are interested in learning more about Pulsar functions and Pulsar, take a look at the following resources:\n\n- [Doc] [How to develop Pulsar Functions](https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Ffunctions-develop\u002F)\n- [Blog] [Function Mesh - Simplify Complex Streaming Jobs in Cloud=](\u002Fblog\u002Frelease\u002F2021-05-03-function-mesh-open-source\u002F)\n",[4926,62535,62533],{"__ignoreMap":18},{"title":18,"searchDepth":19,"depth":19,"links":62537},[62538,62539,62540],{"id":62345,"depth":19,"text":62346},{"id":62373,"depth":19,"text":62374},{"id":62439,"depth":19,"text":62440},"2022-03-17","Learn how to build a streaming real-time chat message system utilizing Scylla and Apache Pulsar.","\u002Fimgs\u002Fblogs\u002F63c7fab10f6d0cedf68b589a_63b3ec2833ad8cc3ac7728a1_scylla-top.png",{},"\u002Fblog\u002Fstreaming-real-time-chat-messages-scylla-apache-pulsar",{"title":62324,"description":62542},"blog\u002Fstreaming-real-time-chat-messages-scylla-apache-pulsar",[38442,799,821,51871,303,5376],"K_CmYMCxphjbja_ANCRj-jZG6E5eF1AdgOcdI0AF7go",{"id":62551,"title":62552,"authors":62553,"body":62555,"category":821,"createdAt":290,"date":62815,"description":62816,"extension":8,"featured":294,"image":62817,"isDraft":294,"link":290,"meta":62818,"navigation":7,"order":296,"path":62819,"readingTime":62820,"relatedResources":290,"seo":62821,"stem":62822,"tags":62823,"__hash__":62824},"blogs\u002Fblog\u002Fapache-pulsar-client-application-best-practices.md","Apache Pulsar Client Application Best Practices",[62554],"Ioannis Polyzos",{"type":15,"value":62556,"toc":62804},[62557,62560,62564,62567,62573,62577,62580,62584,62590,62593,62596,62599,62603,62606,62609,62619,62623,62626,62629,62633,62636,62639,62643,62646,62649,62652,62657,62661,62664,62667,62675,62679,62682,62686,62689,62692,62695,62698,62701,62706,62710,62713,62721,62724,62729,62735,62740,62746,62751,62757,62760,62764],[48,62558,62559],{},"In this blog post, I will provide an in-depth review of the internal details of Apache Pulsar producers and consumers. Next, I will outline the common pitfalls that application developers encounter when working with Pulsar. Finally, I will introduce best practices you can use when developing streaming\u002Fmessaging applications.",[40,62561,62563],{"id":62562},"pulsar-terminology","Pulsar Terminology",[48,62565,62566],{},"Apache Pulsar is a cloud-native, distributed messaging and event streaming platform that supports both pub\u002Fsub and event streaming use cases. Let’s start by introducing some key Pulsar terminology.",[48,62568,62569],{},[384,62570],{"alt":62571,"src":62572},"table Pulsar Terminology","https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F63b3eb4d02b0c73733c98fb2_table%20(1).webp",[40,62574,62576],{"id":62575},"demystifying-the-consuming-side","Demystifying the Consuming Side",[48,62578,62579],{},"The consuming side is where you are most likely to encounter potentially confusing behavior, so let’s start there. We will also cover best practices you might want to use as a checklist when you create your applications.",[32,62581,62583],{"id":62582},"how-consumers-work","How Consumers Work",[48,62585,62586,62589],{},[384,62587],{"alt":18,"src":62588},"\u002Fimgs\u002Fblogs\u002F63c7e8cb4bbd6ec0217a74da_63b3eb66c0e731bc9fb8f36c_screen-shot-2022-03-10-at-9.04.49-am.webp","\nWhen a consumer comes up, it sends a “Flow” command to request messages from the broker, and then the broker sends messages up to the number of availablePermits. The max availablePermits is equal to the receiverQueue size which, by default is 1000 messages; therefore, it is as though the consumer says to the broker - “I have 1000 spots in my queue, so you can go ahead and send me up to 1000 messages.”",[48,62591,62592],{},"The broker receives this message that says this consumer has 1000 availablePermits and promptly dispatches the data while keeping track of every instance of activity. For example, if there are 10 messages to send, it dispatches those 10 and will continue to send 990 more. Only when that number reaches 0 will it stop sending data.",[48,62594,62595],{},"The consumer receives these messages in the receiverQueue. As long as you keep calling the receive() method, you pop those messages out of the queue. Later, when those 1000 messages are about halfway processed, it will send more (for example, it might send 500 more) permits to the broker.",[48,62597,62598],{},"The goal with message consumption is to keep it flowing. You always want to have messages available in the consumer queue size so that the application continually has messages to read and process.",[32,62600,62602],{"id":62601},"message-acknowledgement","Message Acknowledgement",[48,62604,62605],{},"An important mechanism in the flow described above is message acknowledgement. For the consuming side to be able to increase the number of availablePermits and request more messages, it needs to acknowledge the message back to the broker, validating that a specific message or group of messages was successfully consumed. In case an exception occurs, it can provide a negative acknowledgement (manually or automatically if an ackTimeout is provided - consumer.receive(500, TimeUnit.MILLISECONDS)), in which case it will redeliver the message. Pulsar supports two types of acknowledgements: individual and cumulative.",[48,62607,62608],{},"As its name suggests, individual acknowledgement sends an acknowledgment back to the broker after each message has been successfully processed. Cumulative acknowledgment, on the other hand, sends an acknowledgment for a batch of messages, which means that all the messages up to the specific offset will be acknowledged.",[916,62610,62611],{},[48,62612,62613,62614,190],{},"Note: Cumulative acknowledgment is not supported on the ",[55,62615,62618],{"href":62616,"rel":62617},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Flatest\u002Fconcepts\u002Fpub-sub-concepts#subscriptions",[264],"Shared Subscription mode",[32,62620,62622],{"id":62621},"what-is-a-backlog","What Is a Backlog?",[48,62624,62625],{},"A backlog (or consumer lag in the context of Kafka) describes how far behind a consumer is from the producing side. A backlog is the number of unacknowledged messages within a subscription.",[48,62627,62628],{},"For example, let’s say our producer has just sent successful message 1000 to the broker and our consumer has just successfully acknowledged processing message 800. This means that our backlog (or consumer lag) is 200, i.e. a subscription contains 200 messages that haven’t been acknowledged yet. In the upcoming Pulsar 2.10 release there will be built-in functionality for retrieving this metric directly with the Pulsar admin CLI by running pulsar-admin topics stats –etb true. The metric shows the timestamp from the publish time of the earliest message in the backlog to the current time and can be fetched.",[32,62630,62632],{"id":62631},"typical-pitfalls-for-the-consumers-not-processing-messages","Typical Pitfalls for the Consumers Not Processing Messages",[48,62634,62635],{},"Depending on the workload and the design of your application, you might encounter situations where your broker doesn’t send messages to the consumer or your consumer isn’t processing messages. In this section, I will outline some common pitfalls of such behavior and walk you through potential solutions.",[48,62637,62638],{},"As stated in the previous sections, a successful consuming flow is about having availablePermits to request messages from the broker, combined with the consumer’s receiverQueue, successful processing of the message, and making sure an acknowledgement is sent back to the broker. So… what could go wrong?",[3933,62640,62642],{"id":62641},"scenario-1-the-broker-doesnt-dispatch-messages-and-i-see-a-growing-backlog","Scenario 1: The broker doesn’t dispatch messages and I see a growing backlog.",[48,62644,62645],{},"As mentioned, the first health check is to ensure your application acknowledges the messages after processing.",[48,62647,62648],{},"With that out of our way, we can see that this behavior might be due to the fact that your consuming application is not able to process messages fast enough. Going back to our consumer flow, by default the consumers ask for 1000 messages to hold in the receiverQueue. Depending on your processing logic, this could lead to a situation where your client requests more messages than are available to process, and this would lead to messages buffered in the queue eventually timing out and the backlog size growing. Therefore, it’s a good idea to lower this value so it is more meaningful.",[48,62650,62651],{},"In this scenario, typically you should see the availablePermits equal to 0 and unackedMessages that indicate that Pulsar has dispatched messages to the consumer, without the consumer having acked back to the broker.",[916,62653,62654],{},[48,62655,62656],{},"Note: If the unacked messages exceed a threshold (which should be around 50,000 messages) then the consumer gets blocked, as described by setting blockedConsumerOnUnackedMsgs to true. You can use the pulsar-admin stats command to retrieve the metrics described here.",[3933,62658,62660],{"id":62659},"scenario-2-i-do-see-availablepermits-0-but-a-slow-or-zero-delivery-rate","Scenario 2: I do see availablePermits > 0, but a slow or zero delivery rate.",[48,62662,62663],{},"If there are availablePermits, but the delivery rate is slow, then the bottleneck is either on the brokers or the bookies. This means that the application processes messages faster than we can send them.",[48,62665,62666],{},"If the delivery rate is zero, there is typically an issue with the broker. This could indicate that there is a high workload on your broker. For example, the broker might process too many topics with some high workloads and thus dispatch messages slowly.",[48,62668,62669,62670,190],{},"Try to split the bundles and unload the topics to achieve better load distribution. For more information, read the documentation about ",[55,62671,62674],{"href":62672,"rel":62673},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fadministration-load-distribution\u002F#unloading-topics-and-bundles",[264],"unloading topics and bundles",[3933,62676,62678],{"id":62677},"scenario-3-i-use-keyshared-subscription-i-add-new-consumers-but-i-dont-see-them-processing-any-messages","Scenario 3: I use KeyShared Subscription, I add new consumers but I don’t see them processing any messages.",[48,62680,62681],{},"KeyShared subscription aims to guarantee ordering (per key). If you spin up a new consumer, while you have old messages unacked, you have to wait until the old consumer has processed all the data up to the current point before the new consumer can get the new spread keys. When you are in situations where you create a new consumer and then are confused when it doesn’t process any messages, keep in mind that this might be a potential cause.",[3933,62683,62685],{"id":62684},"scenario-4-i-use-keyshared-subscription-and-add-a-new-consumer-but-i-dont-see-it-processing-any-messages-even-though-there-are-no-unacked-messages","Scenario 4: I use KeyShared Subscription and add a new consumer, but I don’t see it processing any messages even though there are no unacked messages.",[48,62687,62688],{},"This scenario typically has to do with key imbalance. When you design your application, think in terms of your key space as having all your keys spread out as evenly as possible among your consumers (probably also the volume of messages per key). Otherwise, you might end up in situations where some consumers pick up too much work, while others stay idle.",[48,62690,62691],{},"Let’s illustrate what we’re learning with an example. Imagine you have just two keys - key1 and key2 - and start with just one consumer. Then, you spin up a second consumer, but the way the keys are distributed does not ensure that the new consumer will receive any of those keys - thus you will very likely end up with an idle consumer.",[48,62693,62694],{},"On the other hand, imagine you have key1 assigned to consumer1 and key2 assigned to consumer2. Now, let’s assume that key1 is a userId and that user is 24\u002F7 with the system and key2 is another userId for a user who only comes online once per week. As you might have guessed, you will see consumer1 processing too many messages. You might think that consumer2 is idle; in reality, it doesn’t have any messages to process for that key until a particular day of the week. This is why it is important to provide a key that will result in more equal message distribution.",[48,62696,62697],{},"At this point, we have covered quite a lot for the consuming side. One highlight here that applies to all of the applications is you should always ensure you close your client resources. Producers and consumers\u002Freaders are long-lived resources you typically create once and then keep as long as you like. However, there are situations that might require that you create a producer or consumer\u002Freader (probably on demand) to perform functionality and exit. In both situations, you need to make sure that all the resources are closed before your application exits to avoid resource leaks.",[48,62699,62700],{},"Imagine a scenario where you spin up a consumer on demand to perform just one task and then exit, but you do not actually close it. By default, the consumers have a receiverQueue size of 1000. This means that when your consumer comes up, it will pre-fetch 1000 messages, perform some computation, and exit. If you don’t close the resources, messages will sit in the buffer and there will always be a backlog of 1000 messages because this “leaked” consumer holds those messages without consuming them.",[916,62702,62703],{},[48,62704,62705],{},"Important Note: After you process each message, acknowledge that back to the broker. Otherwise, you will see backlogs increasing, using one of the available methods. For consumers it is recommended to use the same consumer instance to ACK messages. If you don’t (for whatever reason), you can create a consumer with a receiver queue size equal to 1. This will mimic the number of messages that this consumer needs to pre-fetch.",[32,62707,62709],{"id":62708},"demystifying-the-producing-side","Demystifying the Producing Side",[48,62711,62712],{},"On the producing side, you have some kind of data source that generates data you want to send to Pulsar. Typical sources include ingesting data from files, connecting with some IoT messaging protocol like MQTT, receiving updates from Change Data Capture systems and more. When each message arrives you create a new Pulsar Message using the payload and use a Pulsar Producer to send that message over to the brokers. In order to use a producer first you need to have one in place. In the producer creation process there are a few things you might want to consider.",[48,62714,62715,62716,190],{},"First, you need to decide on the schema of the message you want to use. By default everything is sent as bytes, but all the primitive data types are supported and more complex data formats like Json, Avro and Protobufs. For more information about schema, see ",[55,62717,62720],{"href":62718,"rel":62719},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fschema-understand\u002F",[264],"Understand schema",[48,62722,62723],{},"Second, you want to properly configure your producer:",[321,62725,62726],{},[324,62727,62728],{},"Batching: In order to send a message to the broker you can use either the send() method, which sends the message and waits until an acknowledgement is received or sendAsync() methods that sends the message without waiting for the acknowledgment. The sendAsync() method is used in order to increase the throughput of your application and uses batching in order to create batches of messages and send them altogether instead of sending each message and waiting for a response. By default batching is enabled in Pulsar. You can tune the maximum number of messages the buffer can hold as well as the byte size and a batch is considered “full” and ready to be sent to Pulsar when either one of these two thresholds is met. In case you have large messages and you want to create batches of 1000 messages for example, you might wanna increase the batchingMaxBytes limit (default is 128kb). Also when you use the sendAsync() method, because of the asynchronous nature your producer might be overwhelmed with ack message response, in which case there is another configuration option you will need to enable blockIfQueueFull(true), which applies backpressure when the producer is overwhelmed - i.e signals to the broker to “slow down”. Here is an example producer that uses batching, applies backpressure and tunes the batching buffer.",[8325,62730,62733],{"className":62731,"code":62732,"language":8330},[8328],"Producer producer = pulsarClient.newProducer(Schema.STRING)\n    .topic(topic)\n    .producerName(\"test-producer\")\n    .enableBatching(true)\n    .blockIfQueueFull(true)\n    .batchingMaxMessages(10000)\n    .batchingMaxBytes(10000000)\n    .create();\n",[4926,62734,62732],{"__ignoreMap":18},[321,62736,62737],{},[324,62738,62739],{},"Chunking: There are situations where your messages are too large and you want to send them as chunks to the broker. In order to enable chunking you need to disable batching and also you might wanna tune the sendTimeout option, depending how large your message is and your network latency. Here is an example of a chunking producer with an increased timeout.",[8325,62741,62744],{"className":62742,"code":62743,"language":8330},[8328],"Producer producer = pulsarClient.newProducer(Schema.STRING)\n    .topic(topic)\n    .producerName(\"test-producer\")\n    .enableBatching(false)\n    .enableChunking(true)\n    .sendTimeout(120, TimeUnit.SECONDS)\n    .create();\n",[4926,62745,62743],{"__ignoreMap":18},[321,62747,62748],{},[324,62749,62750],{},"Routing: Your topics can be either non-partitioned or partitioned topics. In case of a partitioned-topic you might wanna have control on how messages are routed over to these partitions, in which case you need to tune your messageRoutingMode and also specify a messageRouter. Here is a producer example that specifies the routing mode as well how the messages should be routed - here we calculate some hash based on the message key. Also note that we specify the Murmur3_32Hash algorithm.",[8325,62752,62755],{"className":62753,"code":62754,"language":8330},[8328],"Producer producer = pulsarClient.newProducer(Schema.STRING)\n    .topic(topic)\n    .producerName(\"test-producer\")\n    .blockIfQueueFull(true)\n    .messageRoutingMode(MessageRoutingMode.CustomPartition)\n    .hashingScheme(HashingScheme.Murmur3_32Hash)\n    .messageRouter(new MessageRouter() {\n        @Override\n        public int choosePartition(Message msg, TopicMetadata metadata) {\n            String key = msg.getKey();\n            return Integer.parseInt(key) % metadata.numPartitions();\n        }\n     })\n     .create();\n",[4926,62756,62754],{"__ignoreMap":18},[48,62758,62759],{},"As you can see the producing side is more straightforward with not many hidden caveats. It’s mostly fine-tuning to meet your application requirements. One thing though that is important to highlight is the number of producers you might create within your application. For example you might want to ingest multiple files (hundreds or thousands) from a directory or, for example, you have a web app and you want to spin up a producer for each user login. Producers are long-living processes, so creating hundreds or thousands of producers is something you should avoid. Instead what you can do is create a ProducerCache with a fixed number of producers that you can reuse across your application.",[40,62761,62763],{"id":62762},"client-application-checklist","Client Application Checklist",[321,62765,62766,62769,62772,62775,62778,62781,62784,62787,62790,62793,62796],{},[324,62767,62768],{},"Name your producers, consumers, and readers.",[324,62770,62771],{},"On the producing side, you typically want to use the sendAsync() method in order to achieve better throughput. Be sure to set the blockIfQueue option to true on your producer to ensure that backpressure gets applied. Due to the async nature, we might receive too many ack messages that the producer queue can't process fast enough. With this option, we can signal to wait before sending more.",[324,62773,62774],{},"When you use a KeyShared subscription, make sure that your producing side uses a BatchBuilder.KeyShared. This ensures that messages with the same keys end up in the same batches.",[324,62776,62777],{},"When you use partitioned topics, think in terms of how you distribute the workload to ensure you don’t have topics with large numbers of messages, while others are too small (this can impact both brokers and consumers as we saw in Scenarios 3 and 4).",[324,62779,62780],{},"The same applies for key shared subscriptions: You typically want to think of your key space and see how you can better distribute the workload among consumers. This avoids the risk of one consumer picking up most of the work, while others sit mostly idle.",[324,62782,62783],{},"For producers, you should avoid creating a producer for each message. For situations that require producers on demand, you might use a Map or a LRU cache and grab a producer from within that cache.",[324,62785,62786],{},"Use the same consumer to acknowledge a message.",[324,62788,62789],{},"On the producing side, make sure you check your batchMaxMessages size, which defaults to 1000. For example, if you have messages that are 1MB in size, this default might be too big for your application, and you will have 1GB sitting on your direct memory.",[324,62791,62792],{},"On the consuming side, make sure you tune your receiverQueueSize, which defaults to 1000. For example, if you have messages that are 1MB in size, this default might be too big for your application, and you will have 1GB sitting on your direct memory, especially if your consumer does some heavy work.",[324,62794,62795],{},"Use partitioned topics, even if you define just one partition. By doing so, if your traffic increases later, you can easily add more partitions to meet the demand. If you use a non-partitioned topic you will have to create a new partitioned-topic and migrate the data to the new topic in order to scale.",[324,62797,62798,62799,190],{},"Malformed messages will fail to be acknowledged. In this case, you might want to fine tune your consumers with the ackTimeout setting, and maybe introduce dead letter topics to recover from such cases and further investigate your messages. For more information, read the documentation about ",[55,62800,62803],{"href":62801,"rel":62802},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-messaging\u002F#dead-letter-topic",[264],"dead letter topics",{"title":18,"searchDepth":19,"depth":19,"links":62805},[62806,62807,62814],{"id":62562,"depth":19,"text":62563},{"id":62575,"depth":19,"text":62576,"children":62808},[62809,62810,62811,62812,62813],{"id":62582,"depth":279,"text":62583},{"id":62601,"depth":279,"text":62602},{"id":62621,"depth":279,"text":62622},{"id":62631,"depth":279,"text":62632},{"id":62708,"depth":279,"text":62709},{"id":62762,"depth":19,"text":62763},"2022-03-10","This blog provides an in-depth review of the internal details of Apache Pulsar producers and consumers; outlines the common pitfalls that application developers encounter when working with Pulsar; and introduces best practices you can use when developing streaming\u002Fmessaging applications. URL","\u002Fimgs\u002Fblogs\u002F63c7e789bedfd2ac2577a6c7_63b3eb191e7ad725300ea7cf_screen-shot-2022-03-10-at-9.28.09-am-1.png",{},"\u002Fblog\u002Fapache-pulsar-client-application-best-practices","16 min read",{"title":62552,"description":62816},"blog\u002Fapache-pulsar-client-application-best-practices",[7347,821],"SvkZbriVPYXVzhA9juw80mZa3FNdTuqJbID7RuTUQj0",{"id":62826,"title":62827,"authors":62828,"body":62829,"category":821,"createdAt":290,"date":63046,"description":63047,"extension":8,"featured":294,"image":63048,"isDraft":294,"link":290,"meta":63049,"navigation":7,"order":296,"path":63050,"readingTime":3556,"relatedResources":290,"seo":63051,"stem":63052,"tags":63053,"__hash__":63054},"blogs\u002Fblog\u002Fcloudera-streamnative-announce-integration-apache-nifi-tm-apache-pulsar-tm.md","Cloudera and StreamNative Announce the Integration of Apache NiFi™ and Apache Pulsar™",[28],{"type":15,"value":62830,"toc":63038},[62831,62834,62843,62847,62859,62862,62865,62868,62872,62875,62878,62882,62885,62888,62894,62897,62900,62903,62906,62909,62912,62915,62918,62921,62924,62928,62947,62949,63036],[48,62832,62833],{},"Cloudera and StreamNative are pleased to announce they are open-sourcing an integration between Apache NiFi and Apache Pulsar. StreamNative was founded by the original creators of Apache Pulsar, and the team is excited to contribute this integration to the open source community. The Cloudera team includes some of the original developers of Apache NiFi and will make the connector available inside the Cloudera platform. Together, NiFi and Pulsar enable companies to create a cloud-native, scalable, real-time streaming data platform that can ingest, transform, and analyze massive amounts of data.",[48,62835,62836,62837,62842],{},"With this update, you will be able to consume and produce messages from Pulsar topics at scale with simple configuration settings within Apache NiFi. Cloudera makes these processors available out of the box for ",[55,62838,62841],{"href":62839,"rel":62840},"https:\u002F\u002Fdocs.cloudera.com\u002Fcdf-datahub\u002F7.2.14\u002Frelease-notes\u002Ftopics\u002Fcdf-datahub-supported-partner-components.html",[264],"CDF"," for Data Hub7.2.14 and newer.",[40,62844,62846],{"id":62845},"what-is-apache-nifi","What is Apache NiFi?",[48,62848,62849,62853,62854,190],{},[55,62850,46577],{"href":62851,"rel":62852},"https:\u002F\u002Fnifi.apache.org\u002F",[264]," is based on technology previously called “Niagara Files” that was in development and used at scale within the National Security Agency (NSA) and was made available to the Apache Software Foundation through the ",[55,62855,62858],{"href":62856,"rel":62857},"https:\u002F\u002Fwww.nsa.gov\u002FResearch\u002FTechnology-Transfer-Program\u002FOverview\u002F",[264],"NSA Technology Transfer Program",[48,62860,62861],{},"NiFi is a visual tool that implements flow-based programming enabling you to construct data flows that move data from one technological platform (such as databases, cloud-storage, and messaging systems) to another.",[48,62863,62864],{},"NiFi automates the movement of data between disparate data sources and systems, making data ingestion fast, easy, and secure. It provides real-time control that makes it easy to manage the movement of data between any source and any destination. It also provides event-level data provenance and traceability, allowing you to trace every piece of data back to its origin.",[48,62866,62867],{},"The NiFi platform includes a collection of over 100 pre-built processors that can be used to perform enrichment, routing, and other transformations on the data as it flows from the source to destination.",[40,62869,62871],{"id":62870},"what-is-apache-pulsar","What is Apache Pulsar?",[48,62873,62874],{},"Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo! and now a top-level Apache Software Foundation project. It is a distributed implementation of the publish-subscribe pattern designed to route messages from one end-point to another without data loss.",[48,62876,62877],{},"At its core, Pulsar uses a replicated distributed ledger to provide durable stream storage that can easily scale to retain petabytes of data. Pulsar’s scalable stream storage makes it a perfect long-term repository for event data. With Pulsar’s message retention policies, you can retain historical event data indefinitely. This allows you to perform streaming analytics on your event data at any point in the future.",[40,62879,62881],{"id":62880},"why-pulsar-and-nifi","Why Pulsar and NiFi?",[48,62883,62884],{},"Apache NiFi and Pulsar’s capabilities complement one another inside modern streaming data architectures. NiFi provides a dataflow solution that automates the flow of data between software systems. As such, it serves as a short-term buffer between data sources rather than a long-term repository of data.",[48,62886,62887],{},"Conversely, Pulsar was designed to act as a long-term repository of event data and provides strong integration with popular stream processing frameworks such as Flink and Spark. By combining these two technologies, you can create a powerful real-time data processing and analytics platform.",[48,62889,62890],{},[384,62891],{"alt":62892,"src":62893},"pulsar and Nifi illustration","\u002Fimgs\u002Fblogs\u002F63b3eaa629055a530c098058_screen-shot-2022-03-09-at-5.36.07-am.png",[48,62895,62896],{},"The synergies realized by combining these technologies inside your data platform will be significant. All of your dataflow management needs including prioritization, back pressure, and edge intelligence are provided by NiFi.",[48,62898,62899],{},"You can use NiFi’s extensive suite of connectors to automate the flow of data into your streaming platform while performing ETL processing along the way. After the data has been transformed, it can be routed directly to Pulsar’s durable stream storage for long-term retention via these new NiFi processors designed for Apache Pulsar.",[48,62901,62902],{},"Once the data has been stored inside Pulsar, it can be made readily available to various popular stream processing engines such as Flink or Spark, for more complex streaming processing and analytics use cases.",[48,62904,62905],{},"In short, NiFi’s extensive suite of connectors makes it easy to “get data in” to your streaming platform, and Pulsar’s integration with Flink and Spark makes it easy to get real-time insights out.",[48,62907,62908],{},"Combining these technologies together creates a complete edge-to-cloud data streaming platform that can be used to provide real-time insights across multiple application domains. For example, the ability to ingest and parse log data will be extremely useful in the cybersecurity industry, as you need to identify and detect threats as quickly as possible.",[48,62910,62911],{},"A wide range of industries such as manufacturing, mining, and oil & gas require the ability to ingest large amounts of IoT sensor data from a variety of locations. These high-volume datasets need to be analyzed in near real-time in order to prevent catastrophic equipment failures and\u002For prevent disruptions that could bring your operations to a screeching halt.",[48,62913,62914],{},"Within the financial services industry, the ability to ingest and process data in near-real time provides a clear competitive advantage in time-sensitive applications such as algorithmic trading or cryptocurrency arbitrage.",[40,62916,62917],{"id":53164},"Demo",[48,62919,62920],{},"Without further ado, let’s take a look at these new NiFi processors in action. In this video, I walk through the process of configuring and using these processors to send data to and receive data from an Apache Pulsar cluster.",[48,62922,62923],{},"As you can see from the video demonstration, there are a total of four Processors: two for publishing data to Pulsar, PublishPulsar and PublishPulsarRecord; and two for consuming data from Pulsar, ConsumePulsar and ConsumePulsarRecord. There are also two controller services included in the bundle as well. One is used for creating Pulsar clients, and another for authentication to secure Pulsar clusters.",[40,62925,62927],{"id":62926},"availability","Availability",[48,62929,62930,62931,62935,62936,62941,62942,190],{},"These processors will be ",[55,62932,62934],{"href":62839,"rel":62933},[264],"available"," starting with version 7.2.14 of CDF on the Public Cloud. If you wish to use these processors in other Apache NiFi clusters, you may download the artifacts directly from the ",[55,62937,62940],{"href":62938,"rel":62939},"https:\u002F\u002Fsearch.maven.org\u002Fsearch?q=g:io.streamnative.connectors%20nifi",[264],"maven central repository",", or you can build them directly from the ",[55,62943,62946],{"href":62944,"rel":62945},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-nifi-bundle",[264],"source code",[40,62948,13565],{"id":1727},[321,62950,62951,62957,62963,62970,62973,62980,62987,62994,63001,63010,63015,63022,63029],{},[324,62952,62953,62954,190],{},"Learn more about the Apache NiFi and Apache Pulsar Connector ",[55,62955,267],{"href":62956},"\u002Fen\u002Fapache-nifi-connector\u002F",[324,62958,62959,62960,190],{},"Get your own free Pulsar cluster by signing up for ",[55,62961,3550],{"href":62962},"\u002Fapache-nifi-connector\u002F",[324,62964,62965,62969],{},[55,62966,62968],{"href":62967},"\u002Fwebinars\u002Fpulsar-and-nifi-for-cloud-data-lakes-03-09-22","Join"," Tim Spann, (Developer Advocate, StreamNative) and John Kuchmek, (Principal Solutions Engineer, Cloudera) for the upcoming Meetup: \"Apache Pulsar and Apache NiFi for Cloud Data Lakes\" on Thursday, March 10th at 3 PM PST \u002F 6 PM EST.",[324,62971,62972],{},"You can also review some talks to get a better understanding of the types of use cases that can be solved by combining these two open-source technologies.",[324,62974,62975],{},[55,62976,62979],{"href":62977,"rel":62978},"https:\u002F\u002Fwww.slideshare.net\u002Fbunkertor\u002Fdevfest-uk-ireland-using-apache-nifi-with-apache-pulsar-for-fast-data-onramp-2022",[264],"Devfest UK & Ireland",[324,62981,62982],{},[55,62983,62986],{"href":62984,"rel":62985},"https:\u002F\u002Fwww.slideshare.net\u002Fbunkertor\u002Fapachecon-2021-apache-nifi-101-introduction-and-best-practices",[264],"ApacheCon 2021",[324,62988,62989],{},[55,62990,62993],{"href":62991,"rel":62992},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zlSbJxrmgh0",[264],"Using the FLiPN Stack for Edge AI",[324,62995,62996],{},[55,62997,63000],{"href":62998,"rel":62999},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eDy_sFIRN9A&list=PLremjir4YVhLGodkp36uTIa3PUVRiZFmU&index=84",[264],"DevNet Create 2021",[324,63002,63003,63004,63009],{},"Download some ",[55,63005,63008],{"href":63006,"rel":63007},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiPN-NFT",[264],"demo code"," and try it out for yourself.",[324,63011,63012],{},[55,63013,60617],{"href":60615,"rel":63014},[264],[324,63016,63017],{},[55,63018,63021],{"href":63019,"rel":63020},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiP-Transit",[264],"FLiP-Transit GitHub Repo",[324,63023,63024],{},[55,63025,63028],{"href":63026,"rel":63027},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002Fawesome-nifi-pulsar",[264],"Awesome Apache NiFi + Apache Pulsar GitHub Repo",[324,63030,63031],{},[55,63032,63035],{"href":63033,"rel":63034},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiPN-Demos",[264],"FLiPN-Demos GitHub Repo",[48,63037,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":63039},[63040,63041,63042,63043,63044,63045],{"id":62845,"depth":19,"text":62846},{"id":62870,"depth":19,"text":62871},{"id":62880,"depth":19,"text":62881},{"id":53164,"depth":19,"text":62917},{"id":62926,"depth":19,"text":62927},{"id":1727,"depth":19,"text":13565},"2022-03-09","Cloudera and StreamNative are open-sourcing an integration between Apache NiFi and Apache Pulsar. With this integration, you will be able to consume and produce messages from Pulsar topics at scale with simple configuration settings within Apache NiFi.","\u002Fimgs\u002Fblogs\u002F63c7fac128dc51e2e11f6f3f_63b3eaa6788d5d1d011b5379_nifi-social.png",{},"\u002Fblog\u002Fcloudera-streamnative-announce-integration-apache-nifi-tm-apache-pulsar-tm",{"title":62827,"description":63047},"blog\u002Fcloudera-streamnative-announce-integration-apache-nifi-tm-apache-pulsar-tm",[302,821,28572],"dI0j37r7IEaNEr80XFsk7T2FE2AM10cWfHc-TsRnDFo",{"id":63056,"title":27750,"authors":63057,"body":63058,"category":821,"createdAt":290,"date":63261,"description":63262,"extension":8,"featured":294,"image":63263,"isDraft":294,"link":290,"meta":63264,"navigation":7,"order":296,"path":27749,"readingTime":63265,"relatedResources":290,"seo":63266,"stem":63267,"tags":63268,"__hash__":63269},"blogs\u002Fblog\u002Ffailure-is-not-an-option-it-is-a-given.md",[28],{"type":15,"value":63059,"toc":63250},[63060,63063,63066,63069,63072,63075,63078,63081,63085,63088,63094,63097,63100,63103,63107,63110,63113,63122,63129,63132,63136,63139,63142,63150,63153,63156,63160,63163,63167,63170,63173,63179,63182,63185,63189,63192,63195,63201,63204,63207,63213,63216,63218,63221,63224,63227,63229,63248],[48,63061,63062],{},"Having worked on large-scale distributed systems for over a decade, I have come to embrace failure as an unavoidable reality that comes with such systems. This mantra is best captured in the following quote by the CTO of Amazon, Werner Vogels.",[48,63064,63065],{},"“Failures are a given, and everything will eventually fail over time.”",[48,63067,63068],{},"Given the sheer number of components involved in modern distributed systems, maintaining 100% uptime is nearly impossible to achieve. Therefore, when you are building an application on top of a large-scale system like Apache Pulsar, it is important to build resilience into your architecture. A necessary precondition to building a resilient application is selecting reliable software systems to serve as foundational components of your application stack, and Pulsar certainly meets this requirement.",[48,63070,63071],{},"Developing a highly-available application requires more than just utilizing fault-tolerant services such as Apache Pulsar in your software stack. It also requires immediate failure detection and resolution including built-in failover when there are data center outages.",[48,63073,63074],{},"Up until now, Pulsar clients could only interact with a single Pulsar cluster and were unable to detect and respond to a cluster-level failure event. In the event of a complete cluster failure, these clients cannot reroute their messages to a secondary\u002Fstandby cluster automatically.",[48,63076,63077],{},"In such a scenario, any application that uses the Pulsar client is vulnerable to a prolonged outage since the clients could not establish a connection to an active cluster. Such an outage could result in data loss and missed business SLAs.",[48,63079,63080],{},"With the upcoming release of Pulsar 2.10, this much-needed automated cluster failover capability has been added to the Pulsar client libraries. In this blog, let’s walk through the changes you need to make inside your application code to take advantage of this new capability.",[40,63082,63084],{"id":63083},"pulsar-resiliency","Pulsar Resiliency",[48,63086,63087],{},"Apache Pulsar’s architecture incorporates several fault-tolerant features, including: component redundancy, data replication, and its connection-aware client libraries that automatically detect and recover in the event a client disconnects from one of the brokers inside the serving layer. Connection failure detection and recovery is handled entirely inside the Pulsar client itself and is completely transparent to the application.",[48,63089,63090],{},[384,63091],{"alt":63092,"src":63093},"Pulsar Resiliency illustration","\u002Fimgs\u002Fblogs\u002F63b3e989eaf1503d6173f340_fo1.png",[48,63095,63096],{},"As we discussed, failures are inevitable when working with complex distributed systems. This is why Pulsar clients work intentionally on that premise. In fact, Pulsar’s automated load balancing will periodically reassign topics to different Brokers to distribute incoming client traffic more evenly. When this happens, all of the client’s reading\u002Fwriting to the topic slated for reassignment will be automatically disconnected.",[48,63098,63099],{},"In this scenario, we are relying on the auto-recovery behavior of the clients to reconnect to the newly assigned Broker and continue processing without missing a beat. This transition from one Broker to another is transparent from an application perspective. No exception is raised that needs to be handled by the application.",[48,63101,63102],{},"Keep in mind that connection auto-recovery only works when the brokers are all part of the same cluster located inside the same environment. For example, in a failover scenario from an active cluster to a stand-by cluster operating in a different region, connection auto-recovery does not work. This was a big shortcoming in the Pulsar client library. Let’s explore why this is the case.",[40,63104,63106],{"id":63105},"continuous-availability","Continuous Availability",[48,63108,63109],{},"A common technique used to provide continuous-availability is to have separate cluster instances configured to run in an active\u002Fstandby mode. Having redundant infrastructure in different regions helps mitigate the impact of a datacenter or cloud region failure.",[48,63111,63112],{},"In such a configuration, all traffic is routed to the “active” cluster and the data is replicated to the “standby” cluster to keep them as closely in sync as possible. This ensures that the message data is available for consumers if and when you need to switch over to the “standby” cluster.",[48,63114,63115,63116,63121],{},"Additionally, you must ",[55,63117,63120],{"href":63118,"rel":63119},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fadministration-geo\u002F#replicated-subscriptions",[264],"replicate"," all Pulsar subscriptions to ensure that consumers resume message consumption from the exact point where they left off before the failure occurred.",[48,63123,63124,63125,63128],{},"To achieve continuous availability: If the “active” cluster fails for any reason, all active producers and consumers should be immediately redirected to the “warm” standby cluster. This transition should be transparent to the connected applications.\n",[384,63126],{"alt":18,"src":63127},"\u002Fimgs\u002Fblogs\u002F63b3e98a26b30bc8d159b988_fo2.png","Multi-region active\u002Fstand-by Pulsar installation, data is geo-replicated between the two instances and the clients are directed to the active cluster via the DNS record for the single static URL.\nThis configuration relies upon a regional load balancer that routes requests to a pool of Pulsar proxies inside each cluster. These proxy instances are also stateless and able to route incoming requests, based on the topic name, to the proper Broker.",[48,63130,63131],{},"To ensure continuous availability, the Pulsar clients are configured to use a single static URL to connect to the load balancer that sits in front of the Pulsar proxies. The DNS record is updated to point to the regional load balancer of the “active” cluster, which, in turn, routes all the traffic on to the Pulsar brokers in that cluster. It doesn’t matter which proxy instance is chosen, because they all perform exactly the same function and forward the traffic to the proper Broker, based on the topic name.",[40,63133,63135],{"id":63134},"current-cluster-failover","Current Cluster Failover",[48,63137,63138],{},"To redirect the clients from the “active” to the standby cluster, the DNS entry for the Pulsar endpoint that the client applications are using must be updated to point to the load balancer of the standby cluster.",[48,63140,63141],{},"In theory, the clients will be re-routed to the stand-by cluster when the DNS record has been updated. However, this approach has two shortcomings:",[321,63143,63144,63147],{},[324,63145,63146],{},"It requires your DevOps team to monitor the health of your Pulsar clusters and manually update the DNS record to point to the stand-by cluster when you have determined that the active cluster is down. This cutover is not automatic and the recovery time is determined by the response time of your DevOps team.",[324,63148,63149],{},"Even after the DNS record has been changed, both the Pulsar client and the DNS system cache the resolved IP address. Therefore, it will take some additional time before the cache entries time out and the updated DNS entry is used. This creates a further delay in the client’s recovery time.",[48,63151,63152],{},"Neither of these issues are fatal. The clients will eventually get re-routed to the stand-by cluster. However, be careful to not discount the potential delay that can occur due to one or both of these issues. In fact, members of the Pulsar community have seen delays in excess of 30 minutes!",[48,63154,63155],{},"Obviously, nothing good can come from such a prolonged outage. SLAs are going to be missed, inbound data will start backing up and potentially get dropped. Ideally, you want the cutover time to be as low as possible.",[40,63157,63159],{"id":63158},"improved-cluster-failure-new-strategies","Improved Cluster Failure: New Strategies",[48,63161,63162],{},"We’re pleased to announce that there are two new alternative strategies for avoiding the prolonged delay caused by the DNS change method for cluster failover included in the upcoming 2.10 release. One supports automatic failover in the event of a cluster outage, while the other enables you to control the switch-over through an HTTP endpoint.",[32,63164,63166],{"id":63165},"auto-cluster-failover-strategy","Auto Cluster Failover Strategy",[48,63168,63169],{},"The first failover strategy, AutoClusterFailover, automatically switches from the primary cluster to a stand-by cluster in the event of a cluster outage.",[48,63171,63172],{},"This behavior is controlled by a probe task that monitors the primary cluster. When it finds the primary cluster failed for more than failoverDelayMs, it will switch the client connections over to the secondary cluster. The following code snippet shows how to construct such a client.",[8325,63174,63177],{"className":63175,"code":63176,"language":8330},[8328],"Map secondaryAuth = \n    new HashMap();\n\n  secondaryAuth.put(\"other\", AuthenticationFactory.create(\n    \"org.apache.pulsar.client.impl.auth.AuthenticationTls\",\n    \"tlsCertFile:\u002Fpath\u002Fto\u002Fmy-role.cert.pem,\" +\n       \"tlsKeyFile:\u002Fpath\u002Fto\u002Fmy-role.key-pk8.pem\"));\n\n  ServiceUrlProvider failover = \n    AutoClusterFailover.builder()\n      .primary(\"pulsar+ssl:\u002F\u002Fbroker.active.com:6651\u002F\")\n      .secondary(\n        Collections.singletonList(\"pulsar+ssl:\u002F\u002Fbroker.standby.com:6651\"))\n      .failoverDelay(30, TimeUnit.SECONDS)\n      .switchBackDelay(60, TimeUnit.SECONDS)\n      .checkInterval(1000, TimeUnit.MILLISECONDS)\n      .secondaryAuthentication(secondaryAuth)\n      .build();\n\n   PulsarClient pulsarClient = \n     PulsarClient.builder()\n       .serviceUrlProvider(failover)\n       .authentication(\"org.apache.pulsar.client.impl.auth.AuthenticationTls\", \n          \"tlsCertFile:\u002Fpath\u002Fto\u002Fmy-role.cert.pem\" + \n          \"tlsKeyFile:\u002Fpath\u002Fto\u002Fmy-role.key-pk8.pem\")\n       .build();\n",[4926,63178,63176],{"__ignoreMap":18},[48,63180,63181],{},"Note that the security credentials for the secondary\u002Fstandby cluster are provided inside a java.util.Map, while the primary cluster authentication credentials are included in the original PulsarClientBuilder. In this particular case, even though the TLS certificates will work for both Pulsar clusters, we still need to provide them separately.",[48,63183,63184],{},"After switching to the secondary cluster, the AutoClusterFailover will continue to probe the primary cluster. If the primary cluster comes back and remains active for switchBackDelayMs, it will switch back to the primary cluster.",[32,63186,63188],{"id":63187},"controlled-cluster-failover-strategy","Controlled Cluster Failover Strategy",[48,63190,63191],{},"The other failover strategy, ControlledClusterFailover, supports switching from the primary cluster to a stand-by cluster in response to a signal sent from an external service. This strategy enables your administrators to trigger the cluster switch over.",[48,63193,63194],{},"The following code snippet shows how to construct such a client. In this particular case, the security credentials provided inside a java.util.Map are for the Pulsar client to use to authenticate with the service specified by the urlProvider property and NOT the standby Pulsar cluster.",[8325,63196,63199],{"className":63197,"code":63198,"language":8330},[8328],"Map header = new HashMap\u003C>();\n  header.put(\"clusterA\", \"\");\n\n  ServiceUrlProvider provider = \n      ControlledClusterFailover.builder()\n        .defaultServiceUrl(\"pulsar+ssl:\u002F\u002Fbroker.active.com:6651\u002F\")\n        .checkInterval(1, TimeUnit.MINUTES)\n        .urlProvider(\"http:\u002F\u002Ffailover-notification-service:8080\u002Fcheck\")\n        .urlProviderHeader(header)\n        .build();\n\n  PulsarClient pulsarClient = \n     PulsarClient.builder()\n      .serviceUrlProvider(provider)\n      .build();\n",[4926,63200,63198],{"__ignoreMap":18},[48,63202,63203],{},"This client will query the urlProvider endpoint every minute to retrieve the service URL of the Pulsar cluster with which it should be interacting.",[48,63205,63206],{},"The Pulsar client expects the call to the urlProvider endpoint to return a JSON formatted message that contains not only the stand-by cluster connection URL, but also any required authentication-related parameters. An example of such a message would be as follows:",[8325,63208,63211],{"className":63209,"code":63210,"language":8330},[8328],"{\n \"serviceUrl\": \"pulsar+ssl:\u002F\u002Fstandby:6651\",\n \"tlsTrustCertsFilePath\": \"\u002Fsecurity\u002Fca.cert.pem\",\n \"authPluginClassName\":\"org.apache.pulsar.client.impl.auth.AuthenticationTls\",\n \"authParamsString\": \" \\\"tlsCertFile\\\": \\\"\u002Fsecurity\u002Fclient.cert.pem\\\" \n    \\\"tlsKeyFile\\\": \\\"\u002Fsecurity\u002Fclient-pk8.pem\\\" \"\n}\n",[4926,63212,63210],{"__ignoreMap":18},[48,63214,63215],{},"Therefore, you will need to be aware of this format when you are writing the endpoint service you will be using to control the failover of your Pulsar clusters.",[40,63217,2125],{"id":2122},[48,63219,63220],{},"Failures are inevitable when you are continuously running any sort of software system at scale. Therefore, it is important to have a contingency plan to handle unexpected regional failures. While geo-replication of data is an important component of such a plan, it is not enough. It is equally important to have failure-aware clients that are able to detect and respond to such an outage automatically. Until now, Apache Pulsar only provided the Geo-replication mechanism.",[48,63222,63223],{},"The latest release of Apache Pulsar now provides two different types of failure-aware clients that you can use to ensure that your applications are not impacted by a regional outage. In this blog, you have learned about both of these clients, and you even have code examples to try.",[48,63225,63226],{},"Most importantly, these clients are 100% backward-compatible with your existing Pulsar Clients. This means you can replace your existing clients within your existing code base, without any problems. Currently, these new classes are only available for the Java client library, but they will be added to the other clients in the near future.",[40,63228,22673],{"id":22672},[321,63230,63231,63238],{},[324,63232,63233,63234],{},"PIP-121: Pulsar cluster level auto failover on client side: ",[55,63235,63236],{"href":63236,"rel":63237},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F13315",[264],[324,63239,63240,63243,63244],{},[2628,63241,63242],{},"Files"," PIP-121: Pulsar cluster level auto failover on client side: ",[55,63245,63246],{"href":63246,"rel":63247},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F13316\u002Ffiles",[264],[48,63249,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":63251},[63252,63253,63254,63255,63259,63260],{"id":63083,"depth":19,"text":63084},{"id":63105,"depth":19,"text":63106},{"id":63134,"depth":19,"text":63135},{"id":63158,"depth":19,"text":63159,"children":63256},[63257,63258],{"id":63165,"depth":279,"text":63166},{"id":63187,"depth":279,"text":63188},{"id":2122,"depth":19,"text":2125},{"id":22672,"depth":19,"text":22673},"2022-03-07","The upcoming release of Pulsar 2.10 is adding the automated cluster failover capability to the Pulsar client libraries. This blog walks through the changes you need to make inside your application code to take advantage of this new capability.","\u002Fimgs\u002Fblogs\u002F63c7fad9d46323105475e08f_63b3e989e4af5c06a1bd821d_fo-top.png",{},"13 min read",{"title":27750,"description":63262},"blog\u002Ffailure-is-not-an-option-it-is-a-given",[302,821],"irJW8ft9SjfI0ek8L3rcEp-SPMvXygt3uh4JZGpGS14",{"id":63271,"title":57360,"authors":63272,"body":63273,"category":821,"createdAt":290,"date":63527,"description":63528,"extension":8,"featured":294,"image":63529,"isDraft":294,"link":290,"meta":63530,"navigation":7,"order":296,"path":63531,"readingTime":5505,"relatedResources":290,"seo":63532,"stem":63533,"tags":63534,"__hash__":63535},"blogs\u002Fblog\u002Fintegrating-apache-pulsar-with-bigquery.md",[62554],{"type":15,"value":63274,"toc":63520},[63275,63278,63281,63287,63294,63298,63301,63305,63311,63317,63320,63324,63327,63340,63351,63357,63362,63368,63371,63379,63385,63388,63391,63395,63398,63403,63409,63417,63420,63423,63428,63434,63437,63446,63451,63455,63458,63464,63467,63470,63475,63480,63485,63490,63495,63497,63499,63502,63504],[48,63276,63277],{},"One common data engineering task is offloading data into your company’s data lake. You may also want to transform and enrich that data during the ingestion process to prepare it for analysis. This blog post will show you how to integrate Apache Pulsar with Google BigQuery to extract meaningful insights.",[48,63279,63280],{},"Let’s assume that you have files stored in an external file system and you want to ingest the contents into your data lake. You will need to build a data pipeline to ingest, transform, and offload the data like the one in the figure below.",[48,63282,63283],{},[384,63284],{"alt":63285,"src":63286},"illustration ingesting and transforming data","\u002Fimgs\u002Fblogs\u002F63b3e8603522b272d58d9477_bq1.png",[48,63288,63289,63290,190],{},"The data pipeline leverages Apache Pulsar as the message bus and performs simple transformations on the data before storing it in a more readable format such as JSON. You can then offload these JSON records to your data lake and transform them into a more queryable format such as parquet using the Apache Pulsar ",[55,63291,63293],{"href":57353,"rel":63292},[264],"Cloud Storage Sink connector",[8300,63295,63297],{"id":63296},"building-the-data-pipeline","Building the Data Pipeline",[48,63299,63300],{},"Now that you have an idea of what you are trying to accomplish, let’s walk through the steps required to implement this pipeline.",[40,63302,63304],{"id":63303},"ingesting-the-data","Ingesting the Data",[48,63306,63307,63308],{},"First, you need to read the data from the file system and send it to a Pulsar topic. The code snippet below creates a producer that writes ",[24842,63309,63310],{}," messages inside the raw event topic.",[8325,63312,63315],{"className":63313,"code":63314,"language":8330},[8328],"\n\u002F\u002F 1. Load input data file.\nList events = IngestionUtils.loadEventData();\n\n\u002F\u002F 2. Instantiate Pulsar Client.\nPulsarClient pulsarClient = ClientUtils.initPulsarClient(Optional.empty());\n\n\u002F\u002F 3. Create a Pulsar Producer.\nProducer eventProducer = pulsarClient.newProducer(Schema.STRING)\n                .topic(AppConfig.RAW_EVENTS_TOPIC)\n                .producerName(\"raw-events-producer\")\n                .blockIfQueueFull(true)\n                .create();\n\n\u002F\u002F 4. Send some messages.\nfor (String event: events) {\n       eventProducer.newMessage()\n                 .value(event)\n                 .sendAsync()\n                 .whenComplete(callback);\n   }\n \n",[4926,63316,63314],{"__ignoreMap":18},[48,63318,63319],{},"The Pulsar topic includes the file contents.",[40,63321,63323],{"id":63322},"transforming-the-data","Transforming the Data",[48,63325,63326],{},"Second, you must complete the following steps to transform the data.",[321,63328,63329,63334,63337],{},[324,63330,36219,63331],{},[24842,63332,63333],{}," messages from the raw events topic.",[324,63335,63336],{},"Parse the messages as an Event object.",[324,63338,63339],{},"Write the messages into a downstream parsed events topic in JSON format.",[1666,63341,63342],{},[324,63343,63344,63345,63350],{},"Read the messages by using a ",[55,63346,63349],{"href":63347,"rel":63348},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Ffunctions-overview\u002F",[264],"Pulsar Function"," with the following signature Function\u003CString, Event>.",[8325,63352,63355],{"className":63353,"code":63354,"language":8330},[8328],"\npublic class EventParserFunc implements Function {\n    private static Logger logger;\n\n    @Override\n    public Event process(String input, Context context) throws Exception {\n        if (logger == null) {\n            logger = context.getLogger();\n        }\n        logger.info(\"Received input: \" + input);\n        Event event = IngestionUtils.lineToEvent(input);\n        logger.info(\"Parsed event: \" + event);\n        return event;\n    }\n} \n",[4926,63356,63354],{"__ignoreMap":18},[1666,63358,63359],{},[324,63360,63361],{},"Deploy the functions using the following configuration file.",[8325,63363,63366],{"className":63364,"code":63365,"language":8330},[8328],"\nclassName: io.streamnative.functions.EventParserFunc\ntenant: public\nnamespace: default\nname: \"event_parser_func\"\ninputs:\n  - \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fraw_events\"\noutput: \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fparsed_events\"\nparallelism: 1\nlogTopic: \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fparsed_events_logs\"\nautoAck: true\ncleanupSubscription: true\nsubName: \"parsed_events_sub\"\n\n",[4926,63367,63365],{"__ignoreMap":18},[48,63369,63370],{},"The important part here is the className, which is the path for the parsing function, the input topic name, and the output topic name.",[1666,63372,63373,63376],{},[324,63374,63375],{},"Run the mvn clean package command to package the code and generate a jar file.",[324,63377,63378],{},"Deploy the Pulsar Function on the cluster.",[8325,63380,63383],{"className":63381,"code":63382,"language":8330},[8328],"\nbin\u002Fpulsar-admin functions create \\\n --function-config-file config\u002Fparser_func_config.yaml \\\n --jar myjars\u002Fexamples.jar\n\n",[4926,63384,63382],{"__ignoreMap":18},[48,63386,63387],{},"The –function-config-file points to the configuration file and the –jar option specifies the path for the jar file.",[48,63389,63390],{},"You have successfully deployed the Pulsar Function that will transform your messages.",[40,63392,63394],{"id":63393},"offload-the-data-to-google-cloud-storage","Offload the Data to Google Cloud Storage",[48,63396,63397],{},"The third step is to deploy the Cloud Sink Connector that will listen to the parsed events topic and store the incoming messages into Google Cloud Storage in the Avro format.",[1666,63399,63400],{},[324,63401,63402],{},"Deploy the connector by providing a configuration file like the one below.",[8325,63404,63407],{"className":63405,"code":63406,"language":8330},[8328],"\ntenant: \"public\"\nnamespace: \"default\"\nname: \"gcs-sink\"\ninputs:\n  - \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fparsed_events\"\nparallelism: 1\n\nconfigs:\n  provider: \"google-cloud-storage\"\n  gcsServiceAccountKeyFileContent: >\n    {\n      \"type\": \"service_account\",\n      \"project_id\": \"\",\n      \"private_key_id\": \"\",\n      \"private_key\": \"\",\n      \"client_email\": \"\",\n      \"client_id\": \"\",\n      \"auth_uri\": \"\",\n      \"token_uri\": \"\",\n      \"auth_provider_x509_cert_url\": \"\",\n      \"client_x509_cert_url\": \"\"\n    }\n\n  bucket: \"eventsbucket311\"\n  region: \"us-west1\"\n  endpoint: \"https:\u002F\u002Fstorage.googleapis.com\u002F\"\n  formatType: \"parquet\"\n  partitionerType: \"time\"\n  timePartitionPattern: \"yyyy-MM-dd\"\n  timePartitionDuration: \"1d\"\n  batchSize: 10000\n  batchTimeMs: 60000\n\n",[4926,63408,63406],{"__ignoreMap":18},[48,63410,63411,63412,190],{},"The configs section includes the different configurations you want to tune for setting up the connector. You can find all the available configuration options in the ",[55,63413,63416],{"href":63414,"rel":63415},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-cloud-storage#cloud-storage-sink-connector-configuration",[264],"connector repository",[48,63418,63419],{},"Let’s walk through the example code above.",[48,63421,63422],{},"First, specify the connection credentials as part of the configuration. (You can also pass a file.) Next, specify the formatType (parquet) and the partitionerType (time) to partition based on the date. Typically in a streaming pipeline, you don’t want to produce too many small files because they will slow down your queries if the data gets too large. In this example use case, a new file is created every 10,000 messages.",[1666,63424,63425],{},[324,63426,63427],{},"Deploy the connector on your cluster by running the following command:",[8325,63429,63432],{"className":63430,"code":63431,"language":8330},[8328],"\nbin\u002Fpulsar-admin sink create \\\n--sink-config-file config\u002Fgcs_sink.yaml \\\n--name gcs-sink --archive connectors\u002Fpulsar-io-cloud-storage-2.8.1.30.nar\n\n",[4926,63433,63431],{"__ignoreMap":18},[48,63435,63436],{},"The –sink-config-file provides the path to the configuration file, -name specifies the connector name, and the last line specifies the .nar file location.",[48,63438,63439,63440,63445],{},"With the Pulsar function and connector up and running, you are ready to execute the producer code and generate some messages. The sample data ",[55,63441,63444],{"href":63442,"rel":63443},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fstreamnative-academy\u002Fblob\u002Fmaster\u002Fpulsar-gcs-bigquery\u002Fdata\u002Fevents.csv",[264],"file"," contains 49,999 lines (50,000 including the header).",[1666,63447,63448],{},[324,63449,63450],{},"Run the producer and then navigate to the Google Cloud Storage console where you can verify that you ingested new files and all of your records are accounted for.",[40,63452,63454],{"id":63453},"querying-on-google-cloud","Querying on Google Cloud",[48,63456,63457],{},"In Google Cloud Storage, you should see a new folder with the tenant name you specified when you created the topic inside Pulsar.",[48,63459,63460],{},[384,63461],{"alt":63462,"src":63463},"Querying on Google Cloud interface","\u002Fimgs\u002Fblogs\u002F63b3e8d3d3e5113791f90758_bq2.png",[48,63465,63466],{},"In the previous section, the example code uses the public tenant so your folder structure should be public -> default -> topic name -> date.",[48,63468,63469],{},"Before you can go into BigQuery and start querying your data, you’ll need to set up a dataset and then create a new table based on the parquet file you have on Google Cloud Storage.",[48,63471,63472],{},[384,63473],{"alt":63462,"src":63474},"\u002Fimgs\u002Fblogs\u002F63b3e8d3878d7a680e37e53e_bq4.png",[1666,63476,63477],{},[324,63478,63479],{},"Create a table.",[48,63481,63482],{},[384,63483],{"alt":63462,"src":63484},"\u002Fimgs\u002Fblogs\u002F63b3e8d333ad8c875974dacd_bq5.png",[1666,63486,63487],{},[324,63488,63489],{},"Verify that all your data is in place and ready for analysis jobs by running Select *.",[48,63491,63492],{},[384,63493],{"alt":63462,"src":63494},"\u002Fimgs\u002Fblogs\u002F63b3e8d327c977d0a94d5fe3_bg6.png",[48,63496,63462],{},[48,63498,3931],{},[48,63500,63501],{},"Congratulations, you have successfully integrated Apache Pulsar with BigQuery!",[40,63503,4135],{"id":4132},[321,63505,63506,63513],{},[324,63507,63508,190],{},[55,63509,63512],{"href":63510,"rel":63511},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fstreamnative-academy\u002Ftree\u002Fmaster\u002Fpulsar-gcs-bigquery",[264],"Source code for this tutorial",[324,63514,63515,190],{},[55,63516,63519],{"href":63517,"rel":63518},"https:\u002F\u002Fyoutu.be\u002F0Tx9B_iHKrI",[264],"Watch this tutorial",{"title":18,"searchDepth":19,"depth":19,"links":63521},[63522,63523,63524,63525,63526],{"id":63303,"depth":19,"text":63304},{"id":63322,"depth":19,"text":63323},{"id":63393,"depth":19,"text":63394},{"id":63453,"depth":19,"text":63454},{"id":4132,"depth":19,"text":4135},"2022-02-03","This blog post will show you how to integrate Apache Pulsar with Google BigQuery to extract meaningful insights.","\u002Fimgs\u002Fblogs\u002F63c7fae95047fb183cca4151_63b3e860d6c8d5834618bc19_bgtop.png",{},"\u002Fblog\u002Fintegrating-apache-pulsar-with-bigquery",{"title":57360,"description":63528},"blog\u002Fintegrating-apache-pulsar-with-bigquery",[28572,38442],"dO-5Ihf9YyjHoaqMm--DqE179lBSCFfWN_tQ3AkFLG8",{"id":63537,"title":63538,"authors":63539,"body":63540,"category":821,"createdAt":290,"date":63790,"description":63791,"extension":8,"featured":294,"image":63792,"isDraft":294,"link":290,"meta":63793,"navigation":7,"order":296,"path":63794,"readingTime":3556,"relatedResources":290,"seo":63795,"stem":63796,"tags":63797,"__hash__":63798},"blogs\u002Fblog\u002Fmoving-toward-zookeeper-less-apache-pulsar.md","Moving Toward a ZooKeeper-Less Apache Pulsar",[28],{"type":15,"value":63541,"toc":63784},[63542,63546,63549,63556,63563,63569,63572,63575,63578,63581,63587,63590,63593,63599,63602,63605,63608,63614,63617,63620,63623,63629,63632,63635,63641,63644,63647,63650,63656,63659,63662,63668,63671,63674,63683,63691,63697,63704,63707,63714,63716,63722,63750,63752,63782],[40,63543,63545],{"id":63544},"abstract","Abstract",[48,63547,63548],{},"Apache Pulsar™ is sometimes perceived as a complex system, due in part to its use on Apache ZooKeeper™ for metadata storage. Since its inception, Pulsar has used ZooKeeper as its distributed coordinator to store critical metadata information. This metadata information can include the broker assigned to serve a topic and the security and data retention policies for a given topic. The additional infrastructure required to run adds to the perception of Pulsar as a complex system.",[48,63550,63551,63552,63555],{},"In order to simplify Apache Pulsar deployments, we started a community initiative – Pulsar Improvement Plan (",[55,63553,41945],{"href":26433,"rel":63554},[264],") – to eliminate the ZooKeeper dependency and replace it with a pluggable framework. This pluggable framework enables you to reduce the infrastructure footprint of Apache Pulsar by leveraging alternative metadata and coordination systems based upon your deployment environment.",[48,63557,63558,63559,63562],{},"We’re pleased to announce that the ",[55,63560,41945],{"href":26433,"rel":63561},[264]," code has been committed to the main branch for early access and is expected to be included in the upcoming 2.10 release. For the first time, you can run Pulsar without ZooKeeper.",[48,63564,63565],{},[384,63566],{"alt":63567,"src":63568},"banner pulsar improvement plan","\u002Fimgs\u002Fblogs\u002F63b3e74c3522b211048d8e2a_0.png",[48,63570,63571],{},"Unlike Apache Kafka’s ZooKeeper replacement strategy, the goal of this initiative is not to internalize the distributed coordination functionality within the Apache Pulsar platform itself. Instead, it will allow users to replace ZooKeeper with an alternative technology that is appropriate for their environment.",[48,63573,63574],{},"Users now have the option of using lightweight alternatives that retain the metadata in-memory or on local disk for non-production environments. This allows developers to reclaim the computing resources previously required to run Apache ZooKeeper on their development laptop.",[48,63576,63577],{},"For production environments, Pulsar’s pluggable framework will enable them to utilize technologies that are already running inside their software stack as an alternative to ZooKeeper.",[48,63579,63580],{},"As you can imagine, this initiative consists of multiple steps, many of which have already been successfully implemented. I will walk you through the steps on the roadmap that have been completed thus far (Step 1-4) and outline the work that still needs to be done (Step 5-6). Please note that the features discussed in this blog are in the beta stage and are subject to change in the future.",[48,63582,63583],{},[384,63584],{"alt":63585,"src":63586},"banner metadata store api","\u002Fimgs\u002Fblogs\u002F63b3e74c02b0c7b053c6b0b5_zk1.png",[48,63588,63589],{},"PIP-45 provides a technology-agnostic interface for both metadata management and distributed coordination, thereby providing the flexibility to use systems other than ZooKeeper to fulfill these roles.",[48,63591,63592],{},"The ZooKeeper client API has historically been used throughout the Apache Pulsar codebase, so we first needed to consolidate all these accesses through a single, generic MetadataStore interface. This interface is based on the needs that Pulsar has in interacting with metadata and with the semantics offered by existing metadata stores, such as ZooKeeper and etcd.",[48,63594,63595],{},[384,63596],{"alt":63597,"src":63598},"illustration metadata storage","\u002Fimgs\u002Fblogs\u002F63b3e74c80291698809bf6ea_screen-shot-2022-01-25-at-3.14.13-pm.png",[48,63600,63601],{},"Figure 1: Replacing the direct dependency on Apache ZooKeeper with an interface permits the development of different implementations of the MetadataStore and provides the flexibility to choose the right one for your environment.",[48,63603,63604],{},"Not only does this approach decouple Pulsar from the ZooKeeper APIs, but it also creates a pluggable framework in which various implementations of these interfaces can be used interchangeably based on the deployment environment.",[48,63606,63607],{},"These new interfaces allow Pulsar users to easily swap out Apache ZooKeeper for other metadata and coordination service implementations based upon the value of the metadataURL configuration property inside the broker configuration file. The framework will automatically instantiate the correct implementation class based on the prefix of the URL. For example, a RocksDB implementation will be used if the metadataURL configuration property starts with the rocksdb:\u002F\u002F prefix.",[48,63609,63610],{},[384,63611],{"alt":63612,"src":63613},"banner step 2 create Zookeeper-based","\u002Fimgs\u002Fblogs\u002F63b3e74ca0ffa7e237997b1f_zk2.png",[48,63615,63616],{},"Once these interfaces were defined, a default implementation based on Apache ZooKeeper was created to provide a smooth transition for existing Pulsar deployments over to the new pluggable framework.",[48,63618,63619],{},"Our primary goal of this phase was to prevent any breaking changes for users with existing Pulsar deployments who want to upgrade their Pulsar software to a newer version without replacing Apache ZooKeeper. Therefore, we needed to ensure that the existing metadata currently stored in ZooKeeper could be kept in the same location and in the same format as before.",[48,63621,63622],{},"The ZooKeeper-based implementation allows users to continue to use Apache ZooKeeper as the metadata storage layer if they choose, and is currently the only production-quality implementation available until the etcd version is completed.",[48,63624,63625],{},[384,63626],{"alt":63627,"src":63628},"banner step 3 create RocksDB-based","\u002Fimgs\u002Fblogs\u002F63b3e74c6dc1737a785ac0fe_zk3.png",[48,63630,63631],{},"After addressing the backward compatibility concerns of these changes, the next step was to provide a non-ZooKeeper based implementation in order to demonstrate the pluggability of the framework. The easiest path for proving out the framework was a RocksDB-based implementation of the MetaDataStore that could be used in standalone mode.",[48,63633,63634],{},"Not only did this demonstrate the ability to swap in different MetaDataStore implementations, but it also significantly reduced the amount of resources required to run a completely self-contained Pulsar cluster. This has a direct impact on developers who choose to run Pulsar locally for development and testing, which is typically done inside a Docker container.",[48,63636,63637],{},[384,63638],{"alt":63639,"src":63640},"banner step 4 create memory-based implementation","\u002Fimgs\u002Fblogs\u002F63b3e74c31697767795d912b_zk4.png",[48,63642,63643],{},"Another use case that would benefit greatly from scaling down the metadata store is unit and integration testing. Rather than repeatedly incurring the cost of spinning up a ZooKeeper cluster in order to perform a suite of tests and then tearing it down, we found that an in-memory implementation of the MetaDataStore API is more suited for this scenario.",[48,63645,63646],{},"Not only are we able to reduce the amount of resources required to run the complete suite of integration tests for the Apache Pulsar project, but we are also able to reduce the time to run the tests as well.",[48,63648,63649],{},"Utilizing the in-memory implementation of the MetaDataStore API significantly reduces the build and release cycle of the Apache Pulsar project, allowing us to build, test, and release changes to the community more quickly.",[48,63651,63652],{},[384,63653],{"alt":63654,"src":63655},"banner step 5 create Etcd-based","\u002Fimgs\u002Fblogs\u002F63b3e74cc2175269dc693f02_zk5.png",[48,63657,63658],{},"Given that Apache Pulsar was designed to run in cloud environments, the most obvious replacement option for ZooKeeper is etcd, which is the consistent and highly-available key-value store used as Kubernetes' backing store for all cluster data.",[48,63660,63661],{},"In addition to its vibrant and growing community, wide-spread adoption, and performance and scalability improvements, etcd is readily available inside Kubernetes environments as part of the control plane. Since Pulsar was designed to run inside Kubernetes, most production deployments will have direct access to an etcd instance already running in their environment. This allows you to reap the benefits of etcd without incurring the operational costs that come with ZooKeeper.",[48,63663,63664],{},[384,63665],{"alt":63666,"src":63667},"illustration When running Apache Pulsar inside Kubernetes, you can use the existing etcd implementation to simplify your deployment","\u002Fimgs\u002Fblogs\u002F63b3e74cdc2b109bb565feff_screen-shot-2022-01-25-at-3.15.55-pm.png",[48,63669,63670],{},"Leveraging the existing etcd service running inside the Kubernetes cluster to act as the metadata store does away with the need to run ZooKeeper entirely. Not only does this reduce the infrastructure footprint of your Pulsar cluster, but it also eliminates the operational burden required to run and operate an complex distributed system",[48,63672,63673],{},"We are particularly excited about the performance improvements we anticipate from etcd, which was designed to solve many of the issues associated with Apache ZooKeeper. For starters, it was written entirely in Go, which is generally considered a much more performant programming language than ZooKeeper’s primary language Java.",[48,63675,63676,63677,63682],{},"Additionally, etcd uses the newer ",[55,63678,63681],{"href":63679,"rel":63680},"https:\u002F\u002Fraft.github.io\u002F",[264],"Raft"," consensus algorithm which is equivalent to the Paxos algorithm used by ZooKeeper in terms of fault tolerance and performance. However, it is much easier to understand and implement than the ZaB protocol used by ZooKeeper.",[48,63684,63685,63686,63690],{},"The biggest difference between etcd’s Raft implementation and Kafka’s (KRaft) is that the latter uses a pull-based model for updates, which has a slight disadvantage in terms of latency",[55,63687,42523],{"href":63688,"rel":63689},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FKIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-Discussion:Pullv.s.PushModel",[264],". The Kafka version of the Raft algorithm is also implemented in Java, which can suffer from prolonged pauses during garbage collection. This is not an issue for etcd’s Go-based Raft implementation.",[48,63692,63693],{},[384,63694],{"alt":63695,"src":63696},"banner step 6 metadata layer","\u002Fimgs\u002Fblogs\u002F63b3e74c02b0c7a730c6b0cb_zk6.png",[48,63698,63699,63700,190],{},"Today, the biggest obstacle to scaling a Pulsar cluster is the storage capacity of the metadata layer. When using Apache ZooKeeper to store this metadata, it must be retained in-memory in order to provide reasonable latency performance. This is best characterized by the phrase ‘the disk is death to ZooKeeper”",[55,63701,46057],{"href":63702,"rel":63703},"https:\u002F\u002Fzookeeper.apache.org\u002Fdoc\u002Fr3.4.8\u002FzookeeperAdmin.html#sc_commonProblems",[264],[48,63705,63706],{},"Instead of the hierarchical tree structure used by ZooKeeper, the data in etcd is stored in a b-tree data structure, which is stored on disk and mapped to memory to support low-latency access.",[48,63708,63709,63710,190],{},"The significance of this is that it effectively increases the storage capacity of the metadata layer from memory-scale to disk-scale, allowing us to store a significantly larger amount of metadata. In the case of ZooKeeper vs. etcd, the increase extends from a few gigabytes of memory in Apache ZooKeeper to over 100GB of disk storage inside etcd",[55,63711,46068],{"href":63712,"rel":63713},"https:\u002F\u002Fwww.alibabacloud.com\u002Fblog\u002Ffast-stable-and-efficient-etcd-performance-after-2019-double-11_595736",[264],[40,63715,7126],{"id":1727},[48,63717,38379,63718,63721],{},[55,63719,38384],{"href":38382,"rel":63720},[264]," over the past few years, with a vibrant community that continues to drive innovation and improvements to the platform as demonstrated by the PIP-45 project.",[321,63723,63724,63735,63743],{},[324,63725,63726,63727,1154,63731,190],{},"For those of you looking to get started right away with a ZooKeeper-less Pulsar installation, you can download the latest version of Pulsar and run it in standalone mode as outlined ",[55,63728,267],{"href":63729,"rel":63730},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fstandalone\u002F#start-pulsar-standalone",[264],[55,63732,267],{"href":63733,"rel":63734},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fstandalone-docker\u002F",[264],[324,63736,63737,63738,63742],{},"Take the 2022 Apache Pulsar User Survey ",[55,63739,267],{"href":63740,"rel":63741},"https:\u002F\u002Fforms.gle\u002FKdWvc5JXJ5Jz1QSL6",[264]," and let the community know what improvements you’d like to see next. You could win one of five $50 Visa gift cards!",[324,63744,63745,63749],{},[55,63746,63748],{"href":31912,"rel":63747},[264],"Start your on-demand Pulsar training today"," with StreamNative Academy.",[32,63751,22673],{"id":22672},[321,63753,63754,63761,63772],{},[324,63755,63756],{},[55,63757,63760],{"href":63758,"rel":63759},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F13302",[264],"PIP-117: Change Pulsar standalone defaults",[324,63762,63763,63766,63767],{},[2628,63764,63765],{},"Dzone Article"," - ",[55,63768,63771],{"href":63769,"rel":63770},"https:\u002F\u002Fdzone.com\u002Farticles\u002Fapache-zookeeper-vs-etcd3",[264],"Apache ZooKeeper vs. etcd3",[324,63773,63774,63766,63777],{},[2628,63775,63776],{},"CNCF Article",[55,63778,63781],{"href":63779,"rel":63780},"https:\u002F\u002Fwww.cncf.io\u002Fblog\u002F2019\u002F05\u002F09\u002Fperformance-optimization-of-etcd-in-web-scale-data-scenario\u002F",[264],"Performance optimization of etcd in web scale data scenario",[48,63783,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":63785},[63786,63787],{"id":63544,"depth":19,"text":63545},{"id":1727,"depth":19,"text":7126,"children":63788},[63789],{"id":22672,"depth":279,"text":22673},"2022-01-25","This blog discusses the community initiative to eliminate Pulsar's dependency on ZooKeeper and adding a pluggable framework for distributed coordination.","\u002Fimgs\u002Fblogs\u002F63c7faf7c443b0aa131176b0_63b3e74cba5de62890d51553_screen-shot-2022-01-25-at-2.53.38-pm.png",{},"\u002Fblog\u002Fmoving-toward-zookeeper-less-apache-pulsar",{"title":63538,"description":63791},"blog\u002Fmoving-toward-zookeeper-less-apache-pulsar",[302,821,4301],"mVyrI610qHXpM2dJ6Ljq-PxeWG6gTzRL1KRAAmjUsAo",{"id":63800,"title":63801,"authors":63802,"body":63803,"category":821,"createdAt":290,"date":64283,"description":64284,"extension":8,"featured":294,"image":64285,"isDraft":294,"link":290,"meta":64286,"navigation":7,"order":296,"path":64287,"readingTime":38438,"relatedResources":290,"seo":64288,"stem":64289,"tags":64290,"__hash__":64291},"blogs\u002Fblog\u002Fauto-scaling-pulsar-functions-kubernetes-using-custom-metrics.md","Auto-Scaling Pulsar Functions in Kubernetes Using Custom Metrics",[58855],{"type":15,"value":63804,"toc":64268},[63805,63807,63813,63819,63827,63836,63851,63854,63858,63861,63939,63941,63944,63948,63982,63986,63989,63994,64000,64003,64009,64013,64016,64019,64025,64028,64034,64037,64043,64046,64050,64053,64059,64062,64068,64071,64074,64077,64082,64091,64095,64098,64104,64107,64113,64116,64122,64125,64129,64132,64138,64141,64145,64148,64151,64157,64160,64165,64169,64189,64202,64204,64266],[40,63806,33228],{"id":33227},[48,63808,63809,63812],{},[55,63810,15627],{"href":63347,"rel":63811},[264]," are Apache Pulsar’s serverless compute framework. By default, a Pulsar Function runs as a single instance. If you want to run a function as multiple instances, you need to specify the parallelism of a function (i.e., the number of instances to run) when creating it. When you want to adjust the number of running instances, you need to collect metrics to see if the scaling is needed and then manually update the parallelism. However, this manual process is unnecessary if you run Puslar Functions in Kubernetes using Function Mesh.",[48,63814,63815,63818],{},[55,63816,29463],{"href":63817},"\u002Fblog\u002Frelease\u002F2021-05-03-function-mesh-open-source"," is a Kubernetes operator that enables you to run Pulsar Functions and connectors natively on Kubernetes, unlocking the full power of Kubernetes’ application deployment, scaling, and management. For example, Function Mesh leverages Kubernetes’ scheduling functionality, which ensures that functions are resilient to failures and can be scheduled properly at any time.",[48,63820,11159,63821,63826],{},[55,63822,63825],{"href":63823,"rel":63824},"https:\u002F\u002Fcloud.google.com\u002Fkubernetes-engine\u002Fdocs\u002Fconcepts\u002Fhorizontalpodautoscaler",[264],"Kubernetes Horizontal Pod Autoscaler (HPA)",", Function Mesh can automatically scale the number of instances required for Pulsar Functions. For functions with HPA configured, the HPA controller monitors the function's Pods and adds or removes Pod replicas when needed.",[48,63828,63829,63830,63835],{},"There are two approaches to auto-scaling with Function Mesh. The first approach is using the predefined auto-scaling policies provided by Function Mesh, which are based on CPU and memory use. We recommend this easy-to-implement approach if your use case only needs CPU and memory as HPA indicators. (This blog doesn’t cover this approach. You can ",[55,63831,63834],{"href":63832,"rel":63833},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fnext\u002Freleases\u002Frelease-note-0-1-7#function-mesh-provides-multiple-options-for-auto-scaling-the-number-of-pods",[264],"read the documentation"," to learn about it.)",[48,63837,63838,63839,63844,63845,63850],{},"The second approach is to customize the auto-scaling policies based on Pulsar Functions' metrics. This approach is more complex to implement, but it allows you to customize HPA according to your use case. (This feature was released with ",[55,63840,63843],{"href":63841,"rel":63842},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002F0.1.7\u002F",[264],"Function Mesh 0.1.7"," in June 2021.) The ",[55,63846,63849],{"href":63847,"rel":63848},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Freference-metrics\u002F#pulsar-functions",[264],"predefined metrics"," help determine the workload and status of function instances. Pulsar Functions exposes metrics in Prometheus format, and we can make the metrics available to HPA through the Kubernetes Custom Metrics API to support metrics-based auto-scaling.",[48,63852,63853],{},"This blog shows you step-by-step how to enable auto-scaling for Pulsar Functions with custom metrics by (1) defining maxReplicas in Function Custom Resource to enable the HPA controller and (2) customizing autoScalingMetrics to specify the metrics list.",[40,63855,63857],{"id":63856},"before-you-begin","Before You Begin",[48,63859,63860],{},"Review the following notes before starting this tutorial.",[321,63862,63863,63866,63869,63872,63875,63883,63886,63889,63903,63909,63912,63915,63922,63931],{},[324,63864,63865],{},"Kubernetes v1.17 ~ v1.21",[324,63867,63868],{},"HPA v2beta2 was released in Kubernetes version v1.12.",[324,63870,63871],{},"Apache Pulsar and Prometheus metrics adapter require Kubernetes version v1.14+.",[324,63873,63874],{},"The apiextensions.k8s.io\u002Fv1beta1 API version of CustomResourceDefinition is no longer served as of v1.22, and Function Mesh has not been moved to apiextensions.k8s.io\u002Fv1 yet.",[324,63876,63877,63878,190],{},"This tutorial is based on ",[55,63879,63882],{"href":63880,"rel":63881},"https:\u002F\u002Fkubernetes.io\u002Fblog\u002F2020\u002F12\u002F08\u002Fkubernetes-1-20-release-announcement\u002F",[264],"Kubernetes v1.20",[324,63884,63885],{},"Apache Pulsar 2.8+",[324,63887,63888],{},"In order to test the function instance with actual workloads, you need a ready-to-use Apache Pulsar cluster.",[324,63890,63891,63892,63897,63898,63902],{},"This tutorial uses ",[55,63893,63896],{"href":63894,"rel":63895},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fcharts",[264],"Helm charts"," from StreamNative (",[55,63899,63901],{"href":63894,"rel":63900},[264],"streamnative\u002Fcharts",") to deploy Apache Pulsar to Kubernetes clusters.",[324,63904,63905],{},[55,63906,63908],{"href":29461,"rel":63907},[264],"Function Mesh v0.1.9",[324,63910,63911],{},"Prometheus (deployed with Apache Pulsar and Function Mesh)",[324,63913,63914],{},"This tutorial uses kube-prometheus to install the cluster-scoped prometheus and uses the Prometheus to collect, store, and query metrics.",[324,63916,63917],{},[55,63918,63921],{"href":63919,"rel":63920},"https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Fprometheus-adapter",[264],"Prometheus Metrics Adapter",[324,63923,63924,63925,63930],{},"This tutorial uses the Prometheus metrics adapter to expose Prometheus metrics as Custom Metrics to the Kubernetes API server. You can use other ",[55,63926,63929],{"href":63927,"rel":63928},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Fconcepts\u002Fextend-kubernetes\u002Fapi-extension\u002Fapiserver-aggregation\u002F",[264],"APIservices"," that provide custom-metrics APIs.",[324,63932,63933,63934,190],{},"You can deploy the Prometheus Metrics Adapter with kube-prometheus by enabling custom-metrics.libsonnet in kube-prometheus configs. To find more details, please check out ",[55,63935,63938],{"href":63936,"rel":63937},"https:\u002F\u002Fgithub.com\u002Fprometheus-operator\u002Fkube-prometheus#customizing-kube-prometheus",[264],"Customizing Kube-Prometheus",[40,63940,42912],{"id":42911},[48,63942,63943],{},"The following steps assume you are starting with a Kubernetes cluster without any service deployed yet.",[32,63945,63947],{"id":63946},"_1-install-the-prerequisites","1. Install the prerequisites",[321,63949,63950,63956,63962,63969,63976],{},[324,63951,63952],{},[55,63953,63955],{"href":42894,"rel":63954},[264],"Download and install Helm3",[324,63957,63958],{},[55,63959,44220],{"href":63960,"rel":63961},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fcharts\u002Fblob\u002Fmaster\u002Fcharts\u002Fpulsar\u002FREADME.md",[264],[324,63963,63964],{},[55,63965,63968],{"href":63966,"rel":63967},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fnext\u002Finstall-function-mesh#install-function-mesh-through-helm",[264],"Install Function Mesh",[324,63970,63971],{},[55,63972,63975],{"href":63973,"rel":63974},"https:\u002F\u002Fprometheus-operator.dev\u002F",[264],"Install kube-prometheus",[324,63977,63978],{},[55,63979,63981],{"href":63919,"rel":63980},[264],"Install Prometheus Metrics Adapter",[32,63983,63985],{"id":63984},"_2-create-servicemonitor-to-collect-metrics-from-pulsar-functions","2. Create ServiceMonitor to collect metrics from Pulsar Functions",[48,63987,63988],{},"Function Mesh creates a Service that binds to each Function. The ServiceMonitor from Prometheus-operator monitors the Service and collects the metrics from the Pulsar Function via the Service. In order to create a ServiceMonitor that monitors Pulsar Functions, create a YAML file (shown as below) and apply the file to Kubernetes by kubectl apply -f sample-pulsar-functions-service-monitor.yaml.",[321,63990,63991],{},[324,63992,63993],{},"sample-pulsar-functions-service-monitor.yaml:",[8325,63995,63998],{"className":63996,"code":63997,"language":8330},[8328],"\napiVersion: monitoring.coreos.com\u002Fv1\nkind: ServiceMonitor\nmetadata:\n  name: function-monitor\n  namespace: default\nspec:\n  endpoints:\n    - path: \u002Fmetrics\n      port: \"metrics\"\n  selector:\n    matchLabels:\n      app: function-mesh\n      component: function\n  podTargetLabels:\n    - component\n    - pulsar-component\n    - pulsar-namespace\n    - pulsar-tenant\n    - pulsar-cluster\n    - name\n    - app\n\n",[4926,63999,63997],{"__ignoreMap":18},[48,64001,64002],{},"After applying the ServiceMonitor to Kubernetes, you can check the resource with kubectl get servicemonitor.",[8325,64004,64007],{"className":64005,"code":64006,"language":8330},[8328],"\n$ kubectl get servicemonitor\nNAME               AGE\nfunction-monitor   7s\n\n",[4926,64008,64006],{"__ignoreMap":18},[32,64010,64012],{"id":64011},"_3-configure-prometheus-metrics-adapter-and-add-seriesquery-to-expose-pulsar-function-metrics-as-custom-metrics","3. Configure prometheus-metrics-adapter and add seriesQuery to expose Pulsar Function metrics as custom metrics",[48,64014,64015],{},"The default adapter configuration does not expose Pulsar Function metrics, so we need to add some custom configs to the adapter configuration file by editing the ConfigMap of the adapter.",[48,64017,64018],{},"Function Mesh creates a function's container with the name pulsar-function, and the metrics from the Pulsar Function are strats with pulsar_function_. We add the configs below to the adapter’s config and expose Pulsar Function metrics as custom metrics.",[8325,64020,64023],{"className":64021,"code":64022,"language":8330},[8328],"\n- \"seriesQuery\": \"{__name__=~\\\"^pulsar_function_.*\\\",container=\\\"pulsar-function\\\",namespace!=\\\"\\\",pod!=\\\"\\\"}\"\n  \"metricsQuery\": \"sum(>{>}) by (>)\"\n  \"resources\":\n    \"template\": \">\"\n \n",[4926,64024,64022],{"__ignoreMap":18},[48,64026,64027],{},"When you install the Prometheus metrics adapter, a ConfigMap for Prometheus metrics adapter will be created with the same name as the adapter’s deployment name. You can get the ConfigMap name with kubectl get configmap.",[8325,64029,64032],{"className":64030,"code":64031,"language":8330},[8328],"\n$ kubectl get configmap -n monitoring\nNAME                                                  DATA   AGE\nadapter-config                                        1      65m\n…\n\n",[4926,64033,64031],{"__ignoreMap":18},[48,64035,64036],{},"As shown in the example above, you need to edit the ConfigMap named adapter-config and append the seriesQuery to the config.yaml. After you run the kubectl edit command, the ConfigMap will be open with your system’s default editor, like vi or nano. You can complete the edit and save as usual, and the changes will automatically apply to the Kubernetes cluster.",[8325,64038,64041],{"className":64039,"code":64040,"language":8330},[8328],"\nkubectl edit configmap prometheus-adapter -o yaml\n\n",[4926,64042,64040],{"__ignoreMap":18},[48,64044,64045],{},"Note that the custom rule maps all Prometheus metrics starting with “pulsarfunction” from container “pulsar-function” to the custom metrics in Kubernetes.",[32,64047,64049],{"id":64048},"_4-deploy-a-function","4. Deploy a Function",[48,64051,64052],{},"We use a sample UserMetricFunction from Apache Pulsar and add a user defined metrics LetterCount.",[8325,64054,64057],{"className":64055,"code":64056,"language":8330},[8328],"\npublic class UserMetricFunction implements Function {\n    @Override\n    public Void process(String input, Context context) {\n        context.recordMetric(\"LetterCount\", input.length());\n        return null;\n    }\n}\n \n",[4926,64058,64056],{"__ignoreMap":18},[48,64060,64061],{},"To deploy the function to Function Mesh, create a YAML file as shown below and apply it to the Kubernetes cluster with kubectl apply.",[8325,64063,64066],{"className":64064,"code":64065,"language":8330},[8328],"\napiVersion: compute.functionmesh.io\u002Fv1alpha1\nkind: Function\nmetadata:\n  labels:\n    pulsar-cluster: pulsar\n    pulsar-component: metrics-hpa-java-fn\n    pulsar-namespace: default\n    pulsar-tenant: public\n  name: metrics-hpa-java-fn\n  namespace: default\nspec:\n  className: org.apache.pulsar.functions.api.examples.UserMetricFunction\n  cleanupSubscription: true\n  clusterName: pulsar\n  forwardSourceMessageProperty: true\n  image: streamnative\u002Fpulsar-all:2.8.1.29\n  input:\n    sourceSpecs:\n      persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmetrics-hpa-java-fn-input:\n        isRegexPattern: false\n        schemaProperties: {}\n    topics:\n    - persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmetrics-hpa-java-fn-input\n    typeClassName: java.lang.String\n  java:\n    extraDependenciesDir: \u002Fpulsar\u002Finstances\u002Fdeps\n    jar: \u002Fpulsar\u002Fexamples\u002Fapi-examples.jar\n  output:\n    producerConf:\n      maxPendingMessages: 0\n      maxPendingMessagesAcrossPartitions: 0\n      useThreadLocalProducers: false\n    typeClassName: java.lang.Void\n  pod:\n    labels:\n      pulsar-cluster: pulsar\n      pulsar-component: metrics-hpa-java-fn\n      pulsar-namespace: default\n      pulsar-tenant: public\n    autoScalingMetrics:\n    - type: Pods\n      pods:\n        metric: \n          name: pulsar_function_received_total_1min\n          selector:\n            matchLabels:\n              pulsar_cluster: pulsar\n              pulsar_component: metrics-hpa-java-fn\n              pulsar_namespace: default\n              pulsar_tenant: public\n        target:\n          type: AverageValue\n          averageValue: \"1\"\n  pulsar:\n    pulsarConfig: pulsar-function-mesh-config\n  replicas: 1\n  maxReplicas: 10\n  resources:\n    limits:\n      cpu: \"1\"\n      memory: \"1181116006\"\n    requests:\n      cpu: \"1\"\n      memory: \"1073741824\"\n  retainKeyOrdering: false\n  retainOrdering: false\n\n",[4926,64067,64065],{"__ignoreMap":18},[48,64069,64070],{},"The Pulsar Function instance automatically enables Prometheus collecting and uses pulsar_function_received_total_1min from autoScalingMetrics as the custom metrics. To enable auto-scaling, set a maxReplicas larger than 1.",[48,64072,64073],{},"After the function is deployed, you can see a StatefulSet, a Service, and a HPAv2beta2 instance all with the metrics-hpa-java-fn prefix.",[48,64075,64076],{},"The HPA then uses Pulsar Function’s metrics pulsar_function_received_total_1min and scales the function up when the average value of the metrics is larger than 1. You can customize the HPA rule in autoScalingMetrics as well.",[916,64078,64079],{},[48,64080,64081],{},"Note: We set the average value as 1 so we can observe autoscaling easily.",[48,64083,64084,64085,64090],{},"To learn more about HPA with Function Mesh, please read the ",[55,64086,64089],{"href":64087,"rel":64088},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fscaling",[264],"Scaling"," section of the Function Mesh documentation.",[32,64092,64094],{"id":64093},"_5-validate-the-metrics","5. Validate the metrics",[48,64096,64097],{},"After the function is ready and running, Prometheus starts collecting metrics from the function’s Pod, and the custom metrics API should show up in discovery. You can then try fetching the discovery information for it:",[8325,64099,64102],{"className":64100,"code":64101,"language":8330},[8328],"\n$ kubectl get --raw \u002Fapis\u002Fcustom.metrics.k8s.io\u002Fv1beta1\n\n",[4926,64103,64101],{"__ignoreMap":18},[48,64105,64106],{},"Because we have set up Prometheus to collect Pulsar Functions' metrics, you should see a pods\u002Fpulsar_function_received_total_1min resource show up, and you can then use the kubectl command below to query the Custom Metrics from the Kubernetes API.",[8325,64108,64111],{"className":64109,"code":64110,"language":8330},[8328],"\n$ kubectl get --raw \u002Fapis\u002Fcustom.metrics.k8s.io\u002Fv1beta1\u002Fnamespaces\u002Fdefault\u002Fpods\u002F*\u002Fpulsar_function_received_total_1min | jq --color-output\n{\n  \"kind\": \"MetricValueList\",\n  \"apiVersion\": \"custom.metrics.k8s.io\u002Fv1beta1\",\n  \"metadata\": {\n    \"selfLink\": \"\u002Fapis\u002Fcustom.metrics.k8s.io\u002Fv1beta1\u002Fnamespaces\u002Fdefault\u002Fpods\u002F%2A\u002Fpulsar_function_received_total_1min\"\n  },\n  \"items\": [\n    {\n      \"describedObject\": {\n        \"kind\": \"Pod\",\n        \"namespace\": \"default\",\n        \"name\": \"metrics-hpa-java-fn-function-0\",\n        \"apiVersion\": \"\u002Fv1\"\n      },\n      \"metricName\": \"pulsar_function_received_total_1min\",\n      \"timestamp\": \"2022-01-06T01:16:12Z\",\n      \"value\": \"0\",\n      \"selector\": null\n    }\n  ]\n}\n\n",[4926,64112,64110],{"__ignoreMap":18},[48,64114,64115],{},"When you can obtain the metrics from the above command from the custom metrics API, the HPA will be ready and you can observe the related metrics.",[8325,64117,64120],{"className":64118,"code":64119,"language":8330},[8328],"\n$ kubectl get hpa\nNAME                           REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE\nmetrics-hpa-java-fn-function   Function\u002Fmetrics-hpa-java-fn   0\u002F1       1         10        1          23h\n\n$ kubectl describe hpa metrics-hpa-java-fn-function\nName:                                             metrics-hpa-java-fn-function\nNamespace:                                        default\nLabels:                                           app=function-mesh\n                                                  component=function\n                                                  name=metrics-hpa-java-fn\n                                                  namespace=default\nAnnotations:                                      \nCreationTimestamp:                                Wed, 05 Jan 2022 10:15:07 +0800\nReference:                                        Function\u002Fmetrics-hpa-java-fn\nMetrics:                                          ( current \u002F target )\n  \"pulsar_function_received_total_1min\" on pods:  0 \u002F 1\nMin replicas:                                     1\nMax replicas:                                     10\nFunction pods:                                    1 current \u002F 1 desired\nConditions:\n  Type            Status  Reason            Message\n  ----            ------  ------            -------\n  AbleToScale     True    ReadyForNewScale  recommended size matches current size\n  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from pods metric pulsar_function_received_total_1min\n  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count\n \n",[4926,64121,64119],{"__ignoreMap":18},[48,64123,64124],{},"From the kubectl describe we can see the condition of the HPA is AbleToScale and ScalingActive, which means the HP is ready for you to use.",[32,64126,64128],{"id":64127},"_6-generate-some-load-to-function","6. Generate some load to function",[48,64130,64131],{},"We can create a sample producer that generates a large number of messages to the function’s input topic. Below is a sample producer.",[8325,64133,64136],{"className":64134,"code":64135,"language":8330},[8328],"\npublic class LoadProducer {\n    public static void main(String[] args) throws PulsarClientException {\n        PulsarClient client = PulsarClient.builder()\n                .serviceUrl(\"http:\u002F\u002Flocalhost:8080\")\n                .build();\n\n        Producer producer = client.newProducer()\n                .topic(\"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmetrics-hpa-java-fn-input\")\n                .create();\n\n        for(int i = 0; i \n",[4926,64137,64135],{"__ignoreMap":18},[48,64139,64140],{},"While the producer is running, we can move to the next step to verify the HPA status.",[32,64142,64144],{"id":64143},"_7-monitoring-the-auto-scaling","7. Monitoring the auto-scaling",[48,64146,64147],{},"With the messages coming to the input topic, we should see at least 2 or 3 new Pods being created and running to process the backlog messages.",[48,64149,64150],{},"Run kubectl get pods to verify if there are multiple Pods with prefix \"metrics-hpa-java-fn” in the name, as shown in Fig.1. To gain insights into the HPA, you can use kubectl describe hpa to get a more detailed output showing why replicas have been added or removed.",[48,64152,64153],{},[384,64154],{"alt":64155,"src":64156},"image of custom code","\u002Fimgs\u002Fblogs\u002F63bf33fb4a3af0c6e4118dbb_a.png",[48,64158,64159],{},"You can get the HPA name with kubectl get hpa and assuming the created name is “metrics-hpa-java-fn-hpa”, you can then observe the HPA with the following watch command, as shown in Fig. 2. watch -n 1 \"kubectl describe hpa metrics-hpa-java-fn-hpa\"",[48,64161,64162],{},[384,64163],{"alt":64155,"src":64164},"\u002Fimgs\u002Fblogs\u002F63bf33fb531fe183a5d61f4a_3c.png",[40,64166,64168],{"id":64167},"future-work-auto-scale-to-from-zero","Future Work : Auto-Scale to \u002F from Zero",[48,64170,64171,64172,64177,64178,4003,64183,64188],{},"We’d like to bring a scale-to-zero feature to Function Mesh soon. With this feature enabled, if a function’s input topic has no backlog, Function Mesh would scale the function down to zero replicas to reduce the cost. However, the current Kubernetes stable release (v1.19) does not support scale-to-zero in HPA by default. You can only use scale-to-zero as an alpha feature after enabling ",[55,64173,64176],{"href":64174,"rel":64175},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Freference\u002Fcommand-line-tools-reference\u002Ffeature-gates\u002F",[264],"Kubernetes Feature Gates HPAScaleToZero",". The Kubernetes community is actively working on a stable version of scale-to-zero (see ",[55,64179,64182],{"href":64180,"rel":64181},"https:\u002F\u002Fgithub.com\u002Fkubernetes\u002Fenhancements\u002Fissues\u002F2021",[264],"issue #2021",[55,64184,64187],{"href":64185,"rel":64186},"https:\u002F\u002Fgithub.com\u002Fkubernetes\u002Fenhancements\u002Fpull\u002F2022",[264],"PR #2022",") and we would like to see this enhancement soon.",[48,64190,64191,64192,64195,64196,64201],{},"Meanwhile, we will be exploring how to bring scale-to-zero to Function Mesh , possibly with the help of third-party tools, such as ",[55,64193,42675],{"href":42673,"rel":64194},[264],", and by implementing an idler, like the ",[55,64197,64200],{"href":64198,"rel":64199},"https:\u002F\u002Fgithub.com\u002Fopenshift\u002Fservice-idler",[264],"service-idler"," from openshift. We will also try to minimize the extra resources required to enable the feature.",[40,64203,22673],{"id":22672},[321,64205,64206,64214,64222,64231,64240,64248,64257],{},[324,64207,64208,758,64211],{},[2628,64209,64210],{},"Docs",[55,64212,29463],{"href":29461,"rel":64213},[264],[324,64215,64216,64218,64219],{},[2628,64217,40436],{}," Introducing Function Mesh - ",[55,64220,64221],{"href":44957},"Simplify Complex Streaming Jobs in Cloud",[324,64223,64224,64226,64227],{},[2628,64225,64210],{}," Kubernetes - ",[55,64228,64230],{"href":34630,"rel":64229},[264],"Horizontal Pod Autoscaling",[324,64232,64233,64226,64235],{},[2628,64234,64210],{},[55,64236,64239],{"href":64237,"rel":64238},"https:\u002F\u002Fgithub.com\u002Fkubernetes\u002Fmetrics",[264],"Metrics API",[324,64241,64242,758,64244],{},[2628,64243,64210],{},[55,64245,64247],{"href":63973,"rel":64246},[264],"Prometheus Operator",[324,64249,64250,758,64252],{},[2628,64251,64210],{},[55,64253,64256],{"href":64254,"rel":64255},"https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Fprometheus-adapter\u002Fblob\u002Fmaster\u002Fdocs\u002Fwalkthrough.md",[264],"Prometheus Adapter end-to-end walkthrough",[324,64258,64259,758,64261],{},[2628,64260,64210],{},[55,64262,64265],{"href":64263,"rel":64264},"https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Fprometheus-adapter\u002Fblob\u002Fmaster\u002Fdocs\u002Fconfig-walkthrough.md",[264],"Prometheus Adapter configuration walkthrough",[48,64267,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":64269},[64270,64271,64272,64281,64282],{"id":33227,"depth":19,"text":33228},{"id":63856,"depth":19,"text":63857},{"id":42911,"depth":19,"text":42912,"children":64273},[64274,64275,64276,64277,64278,64279,64280],{"id":63946,"depth":279,"text":63947},{"id":63984,"depth":279,"text":63985},{"id":64011,"depth":279,"text":64012},{"id":64048,"depth":279,"text":64049},{"id":64093,"depth":279,"text":64094},{"id":64127,"depth":279,"text":64128},{"id":64143,"depth":279,"text":64144},{"id":64167,"depth":19,"text":64168},{"id":22672,"depth":19,"text":22673},"2022-01-19","This blog shows you step-by-step how to enable auto-scaling for Pulsar Functions with custom metrics in Kubernetes.","\u002Fimgs\u002Fblogs\u002F63c7fb09bc45dd26c48c6156_63bf321e8e20fda65e2b99dc_top.png",{},"\u002Fblog\u002Fauto-scaling-pulsar-functions-kubernetes-using-custom-metrics",{"title":63801,"description":64284},"blog\u002Fauto-scaling-pulsar-functions-kubernetes-using-custom-metrics",[9636,821,4839,16985,26747],"jMKjeEp9-NGW394g3kBaqNEQ0bMRw_WTQl0AaCJOr7g",{"id":64293,"title":58893,"authors":64294,"body":64295,"category":821,"createdAt":290,"date":64903,"description":64904,"extension":8,"featured":294,"image":64905,"isDraft":294,"link":290,"meta":64906,"navigation":7,"order":296,"path":64907,"readingTime":62820,"relatedResources":290,"seo":64908,"stem":64909,"tags":64910,"__hash__":64911},"blogs\u002Fblog\u002Fpulsar-isolation-part-iii-separate-pulsar-clusters-sharing-single-bookkeeper-cluster.md",[58855],{"type":15,"value":64296,"toc":64865},[64297,64305,64321,64324,64335,64339,64342,64347,64352,64354,64360,64362,64366,64370,64376,64380,64386,64389,64431,64436,64440,64446,64450,64456,64460,64466,64470,64476,64480,64486,64490,64493,64497,64503,64507,64513,64517,64523,64527,64533,64537,64540,64544,64550,64554,64560,64564,64570,64574,64580,64584,64590,64593,64599,64602,64608,64612,64616,64618,64621,64625,64631,64635,64641,64645,64651,64655,64661,64664,64670,64672,64675,64677,64681,64687,64691,64697,64701,64707,64711,64716,64720,64726,64730,64736,64739,64741,64749,64753,64756,64759,64765,64769,64775,64779,64785,64789,64795,64799,64805,64809,64815,64819,64825,64827,64830,64833,64863],[48,64298,64299,64300,64304],{},"This is the third blog in our 4-part blog series on achieving resource isolation in Apache Pulsar. ",[55,64301,64303],{"href":64302},"\u002Fen\u002Fblog\u002Ftech\u002F2021-03-02-taking-an-in-depth-look-at-how-to-achieve-isolation-in-pulsar","The first blog"," gave an overview of the three approaches to implement isolation in Pulsar:",[1666,64306,64307,64315,64318],{},[324,64308,64309,64310,64314],{},"Separate Pulsar clusters that use separate BookKeeper clusters: This shared-nothing approach offers the highest level of isolation and is suitable for storing highly sensitive data, such as personally identifiable information or financial records. ",[55,64311,64313],{"href":64312},"\u002Fblog\u002Ftech\u002F2021-06-03-pulsar-isolation-for-dummies-separate-pulsar-clusters","Our second blog"," in this series provides a step-by-step tutorial for this approach.",[324,64316,64317],{},"Separate Pulsar clusters that share one BookKeeper cluster: This approach utilizes separate Pulsar broker clusters in order to isolate the end-users from one another and allows you to use different authentication methods based on the use case. However, you gain the benefits of using a shared storage layer, such as a reduced hardware footprint and the associated hardware and maintenance costs.",[324,64319,64320],{},"A single Pulsar cluster and a single BookKeeper cluster: This is the more traditional approach that takes advantage of Pulsar’s built-in multi-tenancy features.",[48,64322,64323],{},"In this blog, we show you how to implement the single, shared BookKeeper approach with an example. We will deploy two Pulsar clusters that share one BookKeeper cluster following the steps below:",[1666,64325,64326,64329,64332],{},[324,64327,64328],{},"Deploy two Pulsar clusters that share one BookKeeper cluster",[324,64330,64331],{},"Verify data isolation between the Pulsar clusters",[324,64333,64334],{},"Scale up and down bookies",[40,64336,64338],{"id":64337},"set-up-the-shared-bookkeeper-cluster","Set up the Shared BookKeeper Cluster",[48,64340,64341],{},"First, we set up the shared BookKeeper cluster on a computer that has an 8-core CPU and 16GB memory. Figure 1 and 2 show you the BookKeeper cluster.",[916,64343,64344],{},[48,64345,64346],{},"All metadata services (ZooKeeper services) are single nodes. We don’t discuss this in detail in this blog.",[48,64348,64349],{},[384,64350],{"alt":18,"src":64351},"\u002Fimgs\u002Fblogs\u002F63be7597d526b7b3facbda87_screen-shot-2022-01-12-at-3.27.46-pm.png",[48,64353,3931],{},[48,64355,64356],{},[384,64357],{"alt":64358,"src":64359},"Figure 2: Inside the shared BookKeeper cluster, each cluster will have its own affinity group of bookies. These bookie groups ensure that each cluster’s respective data remains isolated from one another.","\u002Fimgs\u002Fblogs\u002F63be759748f0a94c5b79a03c_screen-shot-2022-01-12-at-3.28.14-pm.png",[48,64361,3931],{},[40,64363,64365],{"id":64364},"deploy-clusters","Deploy Clusters",[32,64367,64369],{"id":64368},"_1-download-the-latest-binary-pulsar-package-currently-this-would-be-the-281-package","1. Download the latest binary Pulsar package. Currently, this would be the 2.8.1 package.",[48,64371,64372],{},[55,64373,64374],{"href":64374,"rel":64375},"https:\u002F\u002Fwww.apache.org\u002Fdyn\u002Fmirrors\u002Fmirrors.cgi?action=download&filename=pulsar\u002Fpulsar-2.8.1\u002Fapache-pulsar-2.8.1-bin.tar.gz",[264],[32,64377,64379],{"id":64378},"_2-unzip-the-binary-compression-package","2. Unzip the binary compression package.",[8325,64381,64384],{"className":64382,"code":64383,"language":8330},[8328],"tar -zxvf apache-pulsar-2.8.1-bin.tar.gz\n\n### 3. Prepare the following cluster directories. Change the configuration of each directory as instructed in the table below.\nUse the current directory as PULSAR_HOME and create the following topology of directories.\n\ncp -r apache-pulsar-2.8.1 configuration-store2\nmkdir -p bk-cluster\ncp -r apache-pulsar-2.8.1 bk-cluster\u002Fbk1\ncp -r apache-pulsar-2.8.1 bk-cluster\u002Fbk2\ncp -r apache-pulsar-2.8.1 bk-cluster\u002Fbk3\ncp -r apache-pulsar-2.8.1 bk-cluster\u002Fbk4\nmkdir -p cluster1\ncp -r apache-pulsar-2.8.1 cluster1\u002Fzk1\ncp -r apache-pulsar-2.8.1 cluster1\u002Fbroker1\nmkdir -p cluster2\ncp -r apache-pulsar-2.8.1 cluster2\u002Fzk1\ncp -r apache-pulsar-2.8.1 cluster2\u002Fbroker1\n",[4926,64385,64383],{"__ignoreMap":18},[48,64387,64388],{},"The directories’ topology is outlined below.",[321,64390,64391,64394,64397,64400,64403,64406,64409,64412,64415,64418,64421,64424,64427,64429],{},[324,64392,64393],{},"PULSAR_HOME",[324,64395,64396],{},"~configuration-store",[324,64398,64399],{},"~bk-cluster",[324,64401,64402],{},"~~bk1",[324,64404,64405],{},"~~bk2",[324,64407,64408],{},"~~bk3",[324,64410,64411],{},"~~bk4",[324,64413,64414],{},"~~bk5",[324,64416,64417],{},"~cluster1",[324,64419,64420],{},"~~zk1",[324,64422,64423],{},"~~broker1",[324,64425,64426],{},"~cluster2",[324,64428,64420],{},[324,64430,64423],{},[48,64432,64433],{},[384,64434],{"alt":21101,"src":64435},"\u002Fimgs\u002Fblogs\u002F66ed2a3ad4f7dff2e64490a3_63be7dd03aae132da9611462_Screenshot-2023-01-11-at-10.13.13.png",[32,64437,64439],{"id":64438},"_4-start-and-initialize-the-configuration-store-and-the-metadata-store","4. Start and initialize the configuration store and the metadata store.",[8325,64441,64444],{"className":64442,"code":64443,"language":8330},[8328],"$PULSAR_HOME\u002Fconfiguration-store\u002Fbin\u002Fpulsar-daemon start configuration-store\n$PULSAR_HOME\u002Fcluster1\u002Fzk1\u002Fbin\u002Fpulsar-daemon start zookeeper\n$PULSAR_HOME\u002Fcluster2\u002Fzk1\u002Fbin\u002Fpulsar-daemon start zookeeper\n\n$PULSAR_HOME\u002Fconfiguration-store\u002Fbin\u002Fpulsar initialize-cluster-metadata \\\n--cluster cluster1 \\\n--zookeeper localhost:2182 \\\n--configuration-store localhost:2181 \\\n--web-service-url http:\u002F\u002Flocalhost:8080\u002F \\\n--broker-service-url pulsar:\u002F\u002Flocalhost:6650\u002F\n\n.\u002Fconfiguration-store\u002Fbin\u002Fpulsar initialize-cluster-metadata \\\n--cluster cluster2 \\\n--zookeeper localhost:2183 \\\n--configuration-store localhost:2181 \\\n--web-service-url http:\u002F\u002Flocalhost:8081\u002F \\\n--broker-service-url pulsar:\u002F\u002Flocalhost:6651\u002F\n",[4926,64445,64443],{"__ignoreMap":18},[32,64447,64449],{"id":64448},"_5-initialize-the-bookkeeper-metadata-and-start-the-bookie-cluster","5. Initialize the BookKeeper metadata and start the bookie cluster.",[8325,64451,64454],{"className":64452,"code":64453,"language":8330},[8328],"$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell metaformat\n\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fpulsar-daemon start bookie\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk2\u002Fbin\u002Fpulsar-daemon start bookie\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk3\u002Fbin\u002Fpulsar-daemon start bookie\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk4\u002Fbin\u002Fpulsar-daemon start bookie\n",[4926,64455,64453],{"__ignoreMap":18},[32,64457,64459],{"id":64458},"_6-start-brokers-in-cluster1-and-cluster2","6. Start brokers in cluster1 and cluster2.",[8325,64461,64464],{"className":64462,"code":64463,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-daemon start broker\n$PULSAR_HOME\u002Fcluster2\u002Fbroker1\u002Fbin\u002Fpulsar-daemon start broker\n",[4926,64465,64463],{"__ignoreMap":18},[32,64467,64469],{"id":64468},"_7-check-brokers","7. Check brokers.",[8325,64471,64474],{"className":64472,"code":64473,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 brokers list cluster1\n\"localhost:8080\"\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 brokers list cluster2\n\"localhost:8081\"\n",[4926,64475,64473],{"__ignoreMap":18},[32,64477,64479],{"id":64478},"_8-check-the-bookie-list-for-cluster1-and-cluster2-as-shown-below-they-share-the-bookie-cluster","8. Check the bookie list for cluster1 and cluster2. As shown below, they share the bookie cluster.",[8325,64481,64484],{"className":64482,"code":64483,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies list-bookies\n{\n  \"bookies\" : [ {\n    \"bookieId\" : \"127.0.0.1:3181\"\n  }, {\n    \"bookieId\" : \"127.0.0.1:3182\"\n  }, {\n    \"bookieId\" : \"127.0.0.1:3183\"\n  }, {\n    \"bookieId\" : \"127.0.0.1:3184\"\n  } ]\n}\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 bookies list-bookies\n{\n  \"bookies\" : [ {\n    \"bookieId\" : \"127.0.0.1:3181\"\n  }, {\n    \"bookieId\" : \"127.0.0.1:3182\"\n  }, {\n    \"bookieId\" : \"127.0.0.1:3183\"\n  }, {\n    \"bookieId\" : \"127.0.0.1:3184\"\n  } ]\n}\n",[4926,64485,64483],{"__ignoreMap":18},[40,64487,64489],{"id":64488},"bookie-rack-placement","Bookie Rack Placement",[48,64491,64492],{},"In order to archive resource isolation, we need to split the 4 bookie nodes into 2 resource groups.",[32,64494,64496],{"id":64495},"_1-set-the-bookie-rack-for-cluster1","1. Set the bookie rack for cluster1.",[8325,64498,64501],{"className":64499,"code":64500,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3181 \\\n--hostname 127.0.0.1:3181 \\\n--group group-bookie1 \\\n--rack rack1\n\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3182 \\\n--hostname 127.0.0.1:3182 \\\n--group group-bookie1 \\\n--rack rack1\n\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3183 \\\n--hostname 127.0.0.1:3183 \\\n--group group-bookie2 \\\n--rack rack2\n\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3184 \\\n--hostname 127.0.0.1:3184 \\\n--group group-bookie2 \\\n--rack rack2\n",[4926,64502,64500],{"__ignoreMap":18},[32,64504,64506],{"id":64505},"_2-check-bookie-racks-placement-in-cluster1","2. Check bookie racks placement in cluster1.",[8325,64508,64511],{"className":64509,"code":64510,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies racks-placement\n\"group-bookie1    {127.0.0.1:3181=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3181), 127.0.0.1:3182=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3182)}\"\n\"group-bookie2    {127.0.0.1:3183=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3183), 127.0.0.1:3184=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3184)}\"\n",[4926,64512,64510],{"__ignoreMap":18},[32,64514,64516],{"id":64515},"_3-set-bookie-racks-for-cluster2","3. Set bookie racks for cluster2.",[8325,64518,64521],{"className":64519,"code":64520,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3181 \\\n--hostname 127.0.0.1:3181 \\\n--group group-bookie1 \\\n--rack rack1\n\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3182 \\\n--hostname 127.0.0.1:3182 \\\n--group group-bookie1 \\\n--rack rack1\n\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3183 \\\n--hostname 127.0.0.1:3183 \\\n--group group-bookie2 \\\n--rack rack2\n\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3184 \\\n--hostname 127.0.0.1:3184 \\\n--group group-bookie2 \\\n--rack rack2\n",[4926,64522,64520],{"__ignoreMap":18},[32,64524,64526],{"id":64525},"_4-check-bookie-racks-placement-in-cluster2","4. Check bookie racks placement in cluster2.",[8325,64528,64531],{"className":64529,"code":64530,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 bookies racks-placement\n\"group-bookie1    {127.0.0.1:3181=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3181), 127.0.0.1:3182=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3182)}\"\n\"group-bookie2    {127.0.0.1:3183=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3183), 127.0.0.1:3184=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3184)}\"\n",[4926,64532,64530],{"__ignoreMap":18},[40,64534,64536],{"id":64535},"verify-isolation-namespace-by-bookie-affinity-group","Verify Isolation Namespace by Bookie Affinity Group",[48,64538,64539],{},"Now that we have everything configured, let’s verify namespace isolation by the bookie affinity group setting.",[32,64541,64543],{"id":64542},"_1-create-a-namespace-in-cluster1","1. Create a namespace in cluster1.",[8325,64545,64548],{"className":64546,"code":64547,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 namespaces create -b 30 -c cluster1 public\u002Fc1-ns1\n",[4926,64549,64547],{"__ignoreMap":18},[32,64551,64553],{"id":64552},"_2-set-a-bookie-affinity-group-for-the-namespace","2. Set a bookie affinity group for the namespace.",[8325,64555,64558],{"className":64556,"code":64557,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 namespaces set-bookie-affinity-group public\u002Fc1-ns1 \\\n--primary-group group-bookie1\n",[4926,64559,64557],{"__ignoreMap":18},[32,64561,64563],{"id":64562},"_3-check-the-bookie-affinity-group-of-the-namespace","3. Check the bookie affinity group of the namespace.",[8325,64565,64568],{"className":64566,"code":64567,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 namespaces get-bookie-affinity-group public\u002Fc1-ns1\n",[4926,64569,64567],{"__ignoreMap":18},[32,64571,64573],{"id":64572},"_4-produce-some-messages-to-a-topic-of-the-namespace-publicc1-ns1","4. Produce some messages to a topic of the namespace public\u002Fc1-ns1.",[8325,64575,64578],{"className":64576,"code":64577,"language":8330},[8328],"# set retention for namespace `public\u002Fc1-ns1` to avoid messages were deleted automatically\ncluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 namespaces set-retention -s 1g -t 3d public\u002Fc1-ns1\n$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-client --url pulsar:\u002F\u002Flocalhost:6650 produce -m 'hello' -n 300 public\u002Fc1-ns1\u002Ft1\n",[4926,64579,64577],{"__ignoreMap":18},[32,64581,64583],{"id":64582},"_5-check-the-internal-stats-of-the-topic","5. Check the internal stats of the topic.",[8325,64585,64588],{"className":64586,"code":64587,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 topics stats-internal public\u002Fc1-ns1\u002Ft1\n",[4926,64589,64587],{"__ignoreMap":18},[48,64591,64592],{},"We should get a list of the ledgers in the topic. In this case it is ledgers 0, 2, and 3.",[8325,64594,64597],{"className":64595,"code":64596,"language":8330},[8328],"\"ledgers\" : [ {\n    \"ledgerId\" : 0,\n    \"entries\" : 100,\n    \"size\" : 5400,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  }, {\n    \"ledgerId\" : 2,\n    \"entries\" : 100,\n    \"size\" : 5616,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  }, {\n    \"ledgerId\" : 3,\n    \"entries\" : 100,\n    \"size\" : 5700,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  } ]\n",[4926,64598,64596],{"__ignoreMap":18},[48,64600,64601],{},"Check the ensembles for each of the ledgers to confirm that the ledger was written to bookies that are part of group-bookie1.",[8325,64603,64606],{"className":64604,"code":64605,"language":8330},[8328],"$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell ledgermetadata -ledgerid 0\n# check ensembles\nensembles={0=[127.0.0.1:3181, 127.0.0.1:3182]}\n\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell ledgermetadata -ledgerid 2\n# check ensembles\nensembles={0=[127.0.0.1:3182, 127.0.0.1:3181]}\n\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell ledgermetadata -ledgerid 3\n# check ensembles\nensembles={0=[127.0.0.1:3182, 127.0.0.1:3181]}\n",[4926,64607,64605],{"__ignoreMap":18},[32,64609,64611],{"id":64610},"_6-repeat-these-steps-in-cluster2-so-that-we-can-isolate-cluster1s-namespaces-from-cluster2s","6. Repeat these steps in cluster2 so that we can isolate cluster1’s namespaces from cluster2’s.",[40,64613,64615],{"id":64614},"migrate-namespace","Migrate Namespace",[32,64617,59468],{"id":59467},[48,64619,64620],{},"Now that we have verified namespace isolation, if the bookie group hasn’t enough space, we could migrate the bookie affinity group to a namespace.",[32,64622,64624],{"id":64623},"_1-modify-the-bookie-affinity-group-of-the-namespace","1. Modify the bookie affinity group of the namespace.",[8325,64626,64629],{"className":64627,"code":64628,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 namespaces set-bookie-affinity-group public\u002Fc1-ns1 --primary-group group-bookie2\n",[4926,64630,64628],{"__ignoreMap":18},[32,64632,64634],{"id":64633},"_2-unload-the-namespace-to-make-the-bookie-affinity-group-change-take-effect","2. Unload the namespace to make the bookie affinity group change take effect.",[8325,64636,64639],{"className":64637,"code":64638,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 namespaces unload public\u002Fc1-ns1\n",[4926,64640,64638],{"__ignoreMap":18},[32,64642,64644],{"id":64643},"_3-produce-messages-to-the-topic-publicc1-ns1t1-again","3. Produce messages to the topic public\u002Fc1-ns1\u002Ft1 again.",[8325,64646,64649],{"className":64647,"code":64648,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-client --url pulsar:\u002F\u002Flocalhost:6650  produce -m 'hello' -n 300 public\u002Fc1-ns1\u002Ft1\n",[4926,64650,64648],{"__ignoreMap":18},[32,64652,64654],{"id":64653},"_4-check-ensembles-for-new-added-ledgers-we-should-see-that-a-new-ledger-was-already-added-in-group-bookie2","4. Check ensembles for new added ledgers. We should see that a new ledger was already added in group-bookie2.",[8325,64656,64659],{"className":64657,"code":64658,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 topics stats-internal public\u002Fc1-ns1\u002Ft1\n  \"ledgers\" : [ {\n    \"ledgerId\" : 0,\n    \"entries\" : 100,\n    \"size\" : 5400,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  }, {\n    \"ledgerId\" : 2,\n    \"entries\" : 100,\n    \"size\" : 5616,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  }, {\n    \"ledgerId\" : 3,\n    \"entries\" : 100,\n    \"size\" : 5700,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  }, {\n    \"ledgerId\" : 15,\n    \"entries\" : 100,\n    \"size\" : 5400,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  }, {\n    \"ledgerId\" : 16,\n    \"entries\" : 100,\n    \"size\" : 5616,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  }, {\n    \"ledgerId\" : 17,\n    \"entries\" : 100,\n    \"size\" : 5700,\n    \"offloaded\" : false,\n    \"underReplicated\" : false\n  }]\n",[4926,64660,64658],{"__ignoreMap":18},[48,64662,64663],{},"Let’s check the ensembles for new added ledgers (15, 16, 17) to confirm that the ledger was written to bookies that are part of group-bookie2.",[8325,64665,64668],{"className":64666,"code":64667,"language":8330},[8328],"$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell ledgermetadata -ledgerid 15\n# check ensembles\nensembles={0=[127.0.0.1:3184, 127.0.0.1:3183]}\n\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell ledgermetadata -ledgerid 16\n# check ensembles\nensembles={0=[127.0.0.1:3183, 127.0.0.1:3184]}\n\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell ledgermetadata -ledgerid 17\n# check ensembles\nensembles={0=[127.0.0.1:3183, 127.0.0.1:3184]}\n",[4926,64669,64667],{"__ignoreMap":18},[40,64671,59546],{"id":59545},[48,64673,64674],{},"Eventually our data volume will grow beyond the capacity of our BookKeeper cluster, and we will need to scale up the number of bookies. In this section we will show you how to add a new bookie and assign it to an existing bookie affinity group.",[32,64676,59177],{"id":59176},[3933,64678,64680],{"id":64679},"_1-start-a-new-bookie-node-bk-5","1. Start a new bookie node bk-5.",[8325,64682,64685],{"className":64683,"code":64684,"language":8330},[8328],"cp -r apache-pulsar-2.8.1 bk-cluster\u002Fbk5\n$PULSAR_HOME\u002Fbk-cluster\u002F\u002Fbk-cluster\u002Fbk5\u002Fbin\u002Fpulsar-daemon start bookie\n",[4926,64686,64684],{"__ignoreMap":18},[3933,64688,64690],{"id":64689},"_2-add-the-newly-added-bookie-node-to-group-bookie1","2. Add the newly added bookie node to group-bookie1.",[8325,64692,64695],{"className":64693,"code":64694,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies set-bookie-rack \\\n--bookie 127.0.0.1:3185 \\\n--hostname 127.0.0.1:3185 \\\n--group group-bookie2 \\\n--rack rack2\n",[4926,64696,64694],{"__ignoreMap":18},[3933,64698,64700],{"id":64699},"_3-check-bookie-racks-placement","3. Check bookie racks placement.",[8325,64702,64705],{"className":64703,"code":64704,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080  bookies racks-placement\n\"group-bookie1    {127.0.0.1:3181=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3181), 127.0.0.1:3182=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3182)}\"\n\"group-bookie2    {127.0.0.1:3183=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3183), 127.0.0.1:3184=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3184), 127.0.0.1:3185=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3185)}\"\n",[4926,64706,64704],{"__ignoreMap":18},[3933,64708,64710],{"id":64709},"_4-unload-namespace-publicc1-ns1-to-make-the-bookie-affinity-group-change-take-effe","4. Unload namespace public\u002Fc1-ns1 to make the bookie affinity group change take effe",[8325,64712,64714],{"className":64713,"code":64638,"language":8330},[8328],[4926,64715,64638],{"__ignoreMap":18},[3933,64717,64719],{"id":64718},"_5-produce-some-messages-to-the-topic-publicc1-ns1t1-again","5. Produce some messages to the topic public\u002Fc1-ns1\u002Ft1 again.",[8325,64721,64724],{"className":64722,"code":64723,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbin\u002Fpulsar-client --url pulsar:\u002F\u002Flocalhost:6650 produce -m 'hello' -n 300 public\u002Fc1-ns1\u002Ft1\n",[4926,64725,64723],{"__ignoreMap":18},[3933,64727,64729],{"id":64728},"_6-check-the-newly-added-ledger-of-the-topic-publicc1-ns1t1","6. Check the newly added ledger of the topic public\u002Fc1-ns1\u002Ft1.",[8325,64731,64734],{"className":64732,"code":64733,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 topics stats-internal public\u002Fc1-ns1\u002Ft1\n$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell ledgermetadata -ledgerid ledgerid\n",[4926,64735,64733],{"__ignoreMap":18},[48,64737,64738],{},"We can see that the newly added ledger now exists in the newly added bookie node.",[32,64740,59264],{"id":59263},[48,64742,64743,64744,64748],{},"In a distributed system, it is not uncommon for an individual component to fail. In this section, we will simulate the failure of one of the bookies and demonstrate that the shared BookKeeper cluster is able to tolerate the failure event. You could also refer to ",[55,64745,64746],{"href":64746,"rel":64747},"https:\u002F\u002Fbookkeeper.apache.org\u002Fdocs\u002F4.14.0\u002Fadmin\u002Fdecomission\u002F",[264]," for a detailed example.",[3933,64750,64752],{"id":64751},"_1-make-sure-there-are-enough-bookies-in-the-affinity-group","1. Make sure there are enough bookies in the affinity group.",[48,64754,64755],{},"For example, if the configuration managedLedgerDefaultEnsembleSize of the broker is 2, then after we scale down the bookies we should have at least 2 bookies belonging to the affinity group.",[48,64757,64758],{},"We can check the bookie rack placement.",[8325,64760,64763],{"className":64761,"code":64762,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies racks-placement\n",[4926,64764,64762],{"__ignoreMap":18},[3933,64766,64768],{"id":64767},"_2-delete-the-bookie-from-the-affinity-bookie-group","2. Delete the bookie from the affinity bookie group.",[8325,64770,64773],{"className":64771,"code":64772,"language":8330},[8328],"$PULSAR_HOME\u002Fcluster1\u002Fbroker1\u002Fbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 bookies delete-bookie-rack -b 127.0.0.1:3185\n",[4926,64774,64772],{"__ignoreMap":18},[3933,64776,64778],{"id":64777},"_3-check-if-there-are-under-replicated-ledgers-which-should-be-expected-given-the-fact-that-we-have-deleted-a-bookie","3. Check if there are under-replicated ledgers, which should be expected given the fact that we have deleted a bookie.",[8325,64780,64783],{"className":64781,"code":64782,"language":8330},[8328],"$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell listunderreplicated\n",[4926,64784,64782],{"__ignoreMap":18},[3933,64786,64788],{"id":64787},"_4-stop-the-bookie","4. Stop the bookie.",[8325,64790,64793],{"className":64791,"code":64792,"language":8330},[8328],"$PULSAR_HOME\u002Fbk-cluster\u002Fbk5\u002Fbin\u002Fpulsar-daemon stop bookie\n",[4926,64794,64792],{"__ignoreMap":18},[3933,64796,64798],{"id":64797},"_5-decommission-the-bookie","5. Decommission the bookie.",[8325,64800,64803],{"className":64801,"code":64802,"language":8330},[8328],"$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell decommissionbookie -bookieid 127.0.0.1:3185\n",[4926,64804,64802],{"__ignoreMap":18},[3933,64806,64808],{"id":64807},"_6-check-ledgers-in-the-decommissioned-bookie","6. Check ledgers in the decommissioned bookie.",[8325,64810,64813],{"className":64811,"code":64812,"language":8330},[8328],"$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell listledgers -bookieid 127.0.0.1:3185\n",[4926,64814,64812],{"__ignoreMap":18},[3933,64816,64818],{"id":64817},"_7-list-the-bookies","7. List the bookies.",[8325,64820,64823],{"className":64821,"code":64822,"language":8330},[8328],"$PULSAR_HOME\u002Fbk-cluster\u002Fbk1\u002Fbin\u002Fbookkeeper shell listbookies -rw -h\n",[4926,64824,64822],{"__ignoreMap":18},[40,64826,7126],{"id":1727},[48,64828,64829],{},"We have shown you how to achieve isolation with two Puslar clusters sharing one BookKeeper. You can deploy multiple Pulsar clusters following the same steps. Stay tuned for the last blog in this series where we show you how to achieve isolation with a single Pulsar cluster!",[48,64831,64832],{},"Meanwhile, check out the Pulsar resources below:",[1666,64834,64835,64843,64850,64856],{},[324,64836,64837,64842],{},[55,64838,64841],{"href":64839,"rel":64840},"https:\u002F\u002Flnkd.in\u002FgMeRGTM6",[264],"Take the 10-minute 2022 Apache Pulsar User Survey now"," to help the Pulsar community improve the project.",[324,64844,64845,64849],{},[55,64846,64848],{"href":64847},"\u002Fdownload\u002Fmanning-ebook-apache-pulsar-in-action\u002F","Get your free copy"," of Manning's Apache Pulsar in Action by David Kjerrumgaard.",[324,64851,64852,64855],{},[55,64853,64854],{"href":10259},"Join the 2022 StreamNative Ambassador Program"," and work directly with Pulsar experts from StreamNative to co-host events, promote new project updates, and build the Pulsar user group in your city.",[324,64857,64858,64862],{},[55,64859,64861],{"href":57760,"rel":64860},[264],"Join the Pulsar community"," on Slack.",[48,64864,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":64866},[64867,64868,64877,64883,64891,64898,64902],{"id":64337,"depth":19,"text":64338},{"id":64364,"depth":19,"text":64365,"children":64869},[64870,64871,64872,64873,64874,64875,64876],{"id":64368,"depth":279,"text":64369},{"id":64378,"depth":279,"text":64379},{"id":64438,"depth":279,"text":64439},{"id":64448,"depth":279,"text":64449},{"id":64458,"depth":279,"text":64459},{"id":64468,"depth":279,"text":64469},{"id":64478,"depth":279,"text":64479},{"id":64488,"depth":19,"text":64489,"children":64878},[64879,64880,64881,64882],{"id":64495,"depth":279,"text":64496},{"id":64505,"depth":279,"text":64506},{"id":64515,"depth":279,"text":64516},{"id":64525,"depth":279,"text":64526},{"id":64535,"depth":19,"text":64536,"children":64884},[64885,64886,64887,64888,64889,64890],{"id":64542,"depth":279,"text":64543},{"id":64552,"depth":279,"text":64553},{"id":64562,"depth":279,"text":64563},{"id":64572,"depth":279,"text":64573},{"id":64582,"depth":279,"text":64583},{"id":64610,"depth":279,"text":64611},{"id":64614,"depth":19,"text":64615,"children":64892},[64893,64894,64895,64896,64897],{"id":59467,"depth":279,"text":59468},{"id":64623,"depth":279,"text":64624},{"id":64633,"depth":279,"text":64634},{"id":64643,"depth":279,"text":64644},{"id":64653,"depth":279,"text":64654},{"id":59545,"depth":19,"text":59546,"children":64899},[64900,64901],{"id":59176,"depth":279,"text":59177},{"id":59263,"depth":279,"text":59264},{"id":1727,"depth":19,"text":7126},"2022-01-12","Learn how to implement isolation between multiple Apache Pulsar clusters by sharing a single BookKeeper cluster in this third part of our isolation series. Step-by-step instructions on setting up, configuring, and maintaining isolated Pulsar clusters.","\u002Fimgs\u002Fblogs\u002F63be757f044ddc4d070df678_screen-shot-2022-01-12-at-3.10.59-pm.png",{},"\u002Fblog\u002Fpulsar-isolation-part-iii-separate-pulsar-clusters-sharing-single-bookkeeper-cluster",{"title":58893,"description":64904},"blog\u002Fpulsar-isolation-part-iii-separate-pulsar-clusters-sharing-single-bookkeeper-cluster",[27847,38442],"Vhk6La_a32yvcH1hvIG7SVVqwfKGn4WLsH28Xib_OA0",{"id":64913,"title":64914,"authors":64915,"body":64917,"category":821,"createdAt":290,"date":65160,"description":65161,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":65162,"navigation":7,"order":296,"path":65163,"readingTime":38438,"relatedResources":290,"seo":65164,"stem":65165,"tags":65166,"__hash__":65167},"blogs\u002Fblog\u002Ftweaking-the-bookkeeper-protocol-unbounded-ledgers.md","Tweaking the BookKeeper Protocol - Unbounded Ledgers",[64916],"Jack Vanlightly",{"type":15,"value":64918,"toc":65149},[64919,64930,64938,64941,64945,64948,64951,64954,64958,64961,64964,64967,64973,64976,64980,64983,64989,64992,64995,64998,65004,65007,65018,65032,65035,65038,65041,65044,65048,65051,65055,65062,65082,65085,65088,65091,65096,65100,65103,65106,65109,65112,65115,65119,65122,65130,65132,65135,65138,65141,65144,65147],[916,64920,64921],{},[48,64922,64923,64924,64929],{},"This article was originally published on ",[55,64925,64928],{"href":64926,"rel":64927},"https:\u002F\u002Fjack-vanlightly.com\u002Fblog\u002F2021\u002F12\u002F9\u002Ftweaking-the-bookkeeper-protocol-unbounded-ledgers",[264],"Jack-Vanlightly.com"," on December 10, 2021.",[48,64931,46132,64932,64937],{},[55,64933,64936],{"href":64934,"rel":64935},"https:\u002F\u002Fjack-vanlightly.com\u002Fblog\u002F2021\u002F12\u002F7\u002Ftweaking-the-bookkeeper-protocol-guaranteeing-write-quorum",[264],"last post"," I described the necessary protocol changes to ensure that all entries in closed ledgers reached Write Quorum (WQ) and all entries in all but the last fragment in open ledgers reach write quorum.",[48,64939,64940],{},"In this post we’re going to look at another tweak to the protocol to allow ledgers to be unbounded and allow writes from multiple clients over their lifetime.",[40,64942,64944],{"id":64943},"why-unbounded-ledgers","Why unbounded ledgers?",[48,64946,64947],{},"Currently, ledgers are bounded. If you want to create an unbounded log then you create a log of bounded ledgers (forming a log of logs). For example, a Pulsar topic is an ordered list of ledgers. The core BookKeeper protocol does not offer this log-of-logs to you out-of-the-box, you must add that logic on top which is not trivial.",[48,64949,64950],{},"The implementations of log-of-ledgers that I know of are the Managed Ledger module in Apache Pulsar and the Distributed Log library which is a set of modules in the BookKeeper repository. Each has a fair amount of complexity.",[48,64952,64953],{},"But do ledgers need to be bounded at all? Can’t we just allow a single ledger to be an unbounded stream and make that part of the core protocol? Potentially, an UnboundedLedgerHandle could be a simpler stream API (with much less code) than the existing alternatives.",[40,64955,64957],{"id":64956},"a-log-of-logs","A log of logs",[48,64959,64960],{},"The great thing about using BookKeeper for log storage is its dynamic nature. For example, you can scale out your bookies and they will soon start taking on load automatically. One reason for this is that as new ledgers get added, the new bookies start getting chosen to host these ledgers. Ledgers are log segments in a larger log and each segment can be hosted on a different set of bookies.",[48,64962,64963],{},"But an individual ledger is already a log of log segments (known as fragments). Each time a write fails an ensemble change occurs where a new fragment is appended to the ledger. Each ledger is a log of fragments and each fragment is a log that shares the same bookie ensemble. For that reason a log-of-ledgers isn’t the only way to get this nice scaling ability.",[48,64965,64966],{},"Right now fragments get added on failure but for an unbounded ledger we could also set a maximum size per fragment which would trigger an ensemble change once the current fragment has reached capacity.",[48,64968,64969],{},[384,64970],{"alt":64971,"src":64972},"iùage of bounded ledgers","\u002Fimgs\u002Fblogs\u002F63b3e3825b09f4dd82815472_1.jpeg",[48,64974,64975],{},"This way a stream (such as a Pulsar topic) is a single ledger made of a log of fragments that are distributed across the bookie cluster.",[40,64977,64979],{"id":64978},"protocol-changes-for-unbounded-ledgers","Protocol changes for unbounded ledgers",[48,64981,64982],{},"The great thing about unbounded ledgers it that the changes required are relatively small. Most of the pieces already exist.",[48,64984,64985],{},[384,64986],{"alt":64987,"src":64988},"drawing with two square men ","\u002Fimgs\u002Fblogs\u002F63b3e38289e0f7b868c5151b_100.jpeg",[48,64990,64991],{},"The core of this protocol change is changing ledger fencing from a boolean “is fenced or not” to an integer term that is incremented each time a client decides it should take over.",[48,64993,64994],{},"We go from a model where a ledger can only be written to by the client that created it to one where a ledger can be written to by any client, but only one client at a time. Just as in the regular BookKeeper protocol, it is assumed there is leader election for clients external to the protocol. So under normal circumstances, there should only be one client trying to write to the ledger but the protocol can cope with two or more clients battling for control.",[48,64996,64997],{},"Fencing is replaced by terms. When a client decides it should be the one that writes to the ledger it increments the ledger term both in metadata and across the bookies of the last fragment. Recovery is now performed at the beginning of a term, rather than just when closing a ledger.",[48,64999,65000],{},[384,65001],{"alt":65002,"src":65003},"The ledger lifecycle now status and term.","\u002Fimgs\u002Fblogs\u002F63b3e382e4af5ccc80ba4d93_2.jpeg",[48,65005,65006],{},"Each fragment is not linked to any particular term, the term is simply a fencing mechanism to prevent former leader clients from making progress.",[48,65008,65009,65010,65013,65014,65017],{},"The metadata gets a new field for term but the existing ensembles field would need to be modified as the list of ensembles (fragments) could grow very large due to the long lived nature of a ledger. In order to enforce data retention policies, the systems that use BookKeeper would need to be able to delete fragments, rather than ledgers.\n",[384,65011],{"alt":18,"src":65012},"\u002Fimgs\u002Fblogs\u002F63b3e382d56e7e6ea623e451_3.jpeg","Fig 3. Metadata gets a new field for term. The ensembles field now needs to be rethought as it will likely grow too large.\nThe fencing mechanism is very similar. When a new client wants to take over a ledger it increments the term in the ledger metadata and then starts ledger recovery. The client sends the LAC requests to the current fragment as normal, but with the new term, rather than a fencing flag.\n",[384,65015],{"alt":18,"src":65016},"\u002Fimgs\u002Fblogs\u002F63b3e382fefdbcd25986ef83_4.jpeg","\nThe following requests need to include the current term of the client:",[321,65019,65020,65023,65026,65029],{},[324,65021,65022],{},"normal adds",[324,65024,65025],{},"recovery adds",[324,65027,65028],{},"recovery LAC reads",[324,65030,65031],{},"recovery reads",[48,65033,65034],{},"Normal reads do not care about terms. The only thing normal reads should care about is the LAC as always.",[48,65036,65037],{},"If a bookie has a ledger term that is lower than or equal to the ledger term of a request, it accepts the request and updates its ledger term. If a bookie has a higher term, it rejects the request with an InvalidTerm response.",[48,65039,65040],{},"When a client receives an InvalidTerm response it should disengage. It can check it is still the supposed leader (external to this protocol) and if it is still the leader then reengage refreshing its ledger metadata first.",[48,65042,65043],{},"One additional modification to ledger recovery is that it can’t leave any dirty entries left by the previous term in the last fragment. These entries must be removed and this is done by including a new “truncate” flag in the last entry written back during recovery. We must guarantee that this last entry is written to all bookies of the last fragment and so we must utilize some of the logic from the “guaranteed write quorum” to achieve that.",[40,65045,65047],{"id":65046},"the-scope-of-a-term","The scope of a term",[48,65049,65050],{},"There are two main designs that have occurred to me so far and each use the term in a different way.",[32,65052,65054],{"id":65053},"terms-for-fencing-only","Terms for fencing only",[48,65056,65057,65058,65061],{},"The one documented in this post does not include the term as part of an entry identifier or even a fragment identifier, it is for fencing alone. However, for that to be safe it requires at least a subset of the “guaranteed write quorum” changes detailed in the ",[55,65059,64936],{"href":64934,"rel":65060},[264]," and also a final truncate no-op entry as the last entry to be written back during recovery. To understand why see the following:",[1666,65063,65064,65067,65070,65073,65076,65079],{},[324,65065,65066],{},"c1 writes entries {Id: 0, value: A} and {Id: 1, value: B} to b1, b2, b3.",[324,65068,65069],{},"Entry 0 is persisted to b1, b2 and b3. But entry 1 only persisted to b1.",[324,65071,65072],{},"c2 takes over, completes recovery assessing that the last recoverable entry is entry 0, writes it back to b1, b2, b3. Then changes the ledger to OPEN.",[324,65074,65075],{},"c2 writes the entry {id: 1, value: C} to b1, b2, b3.",[324,65077,65078],{},"b2 and b3 acknowledge it making the entry committed. b1 was unreachable in that moment but as the entry is already committed, c2 ignores the timeout response.",[324,65080,65081],{},"We now have log divergence where different bookies have different values for entry 1.",[48,65083,65084],{},"This is avoided by adding a no-op entry with a new “truncate” flag as the last entry to be recovered. When a bookie receives an entry with the truncate flag, it deletes all entries with a higher entry id. This truncation combined with guaranteeing the write quorum ensures that no bookie in the last fragment has not truncated any dirty entries. This does mean that BookKeeper clients will need to be aware of these no-op entries and discard them.",[48,65086,65087],{},"The benefits of this approach is that the term is simply one extra field in ledger metadata and bookies only need to store the term in the ledger index as it does with fencing right now. The term does not need to get stored alongside every entry.",[48,65089,65090],{},"The downside is introducing these no-op entries.",[916,65092,65093],{},[48,65094,65095],{},"We only require “guaranteed write quorum” for recovery writes and so normal writes can continue to use existing behaviour.",[32,65097,65099],{"id":65098},"terms-are-also-entry-identifiers","Terms are also entry identifiers",[48,65101,65102],{},"An alternative solution is to make the term a more integrated role in the protocol, where it actually forms part of an entry identifier. This prevents the above log divergence scenario as the uncommitted entry 1 written by c1 could not be confused with entry 1 written by c2 as they would have different terms.",[48,65104,65105],{},"The metadata would need to include the entry range of each term so that the client can include the correct term when performing a read of a given entry.",[48,65107,65108],{},"We don’t need fragments and terms to line up, but it makes sense as it would make the metadata more compact. I haven’t explored that too far yet.",[48,65110,65111],{},"The benefit of this approach is that we do not need “guaranteed write quorum” on recovery writes. The downside of this approach is that the term has to be stored with every entry.",[48,65113,65114],{},"I may produce an unbounded ledger design that includes terms as part of the entry identifier sometime soon.",[40,65116,65118],{"id":65117},"formal-verification","Formal Verification",[48,65120,65121],{},"I have formally verified the unbounded ledgers protocol changes in TLA+, building on the specification for guaranteed write quorum.",[48,65123,65124,65125,190],{},"You can find the TLA+ specification in ",[55,65126,65129],{"href":65127,"rel":65128},"https:\u002F\u002Fgithub.com\u002FVanlightly\u002Fbookkeeper-tlaplus\u002Fblob\u002Fmain\u002Ftweaks\u002FProtocolUnboundedLedgersAndGWQ.tla",[264],"my GitHub BookKeeper TLA+ repo",[40,65131,26362],{"id":26361},[48,65133,65134],{},"This is all just mental gymnastics at this point as at Splunk we have no pressing need for unbounded ledgers right now. But I do think it could make a valuable addition to BookKeeper in the future and may enable new use cases.",[48,65136,65137],{},"The protocol changes to make a stream API out of a modified LedgerHandle interface are not too major, the issue is the wider impact. There are many secondary impacts such as how it affects auto recovery and garbage collection so it is by no means a trivial change.",[48,65139,65140],{},"In any case, exploring protocol changes is fun and it sheds light on some of the reasons why the protocol is the way it is and the trade-off decisions that were made. I also think it shows how valuable TLA+ is for these kinds of systems as it makes testing out ideas so much easier.",[48,65142,65143],{},"There are also potentially a few varying designs that could be chosen and it might be interesting to explore those, looking at the trade-offs.",[48,65145,65146],{},"UPDATE 1: My original design did not include truncation. When using a larger model the TLA+ spec discovered a counterexample for log divergence. To avoid this a new “truncate” flag is required for the last entry being recovered during the recovery phase of a new term.",[48,65148,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":65150},[65151,65152,65153,65154,65158,65159],{"id":64943,"depth":19,"text":64944},{"id":64956,"depth":19,"text":64957},{"id":64978,"depth":19,"text":64979},{"id":65046,"depth":19,"text":65047,"children":65155},[65156,65157],{"id":65053,"depth":279,"text":65054},{"id":65098,"depth":279,"text":65099},{"id":65117,"depth":19,"text":65118},{"id":26361,"depth":19,"text":26362},"2021-12-16","This blog looks at a tweak to the BookKeeper protocol to allow ledgers to be unbounded and allow writes from multiple clients over their lifetime.",{},"\u002Fblog\u002Ftweaking-the-bookkeeper-protocol-unbounded-ledgers",{"title":64914,"description":65161},"blog\u002Ftweaking-the-bookkeeper-protocol-unbounded-ledgers",[12106],"96Pg5WB0FCiO9Uvn-veukcKFhTI_GJpDWvOtt2trKqU",{"id":65169,"title":65170,"authors":65171,"body":65172,"category":821,"createdAt":290,"date":65372,"description":65373,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":65374,"navigation":7,"order":296,"path":65375,"readingTime":11508,"relatedResources":290,"seo":65376,"stem":65377,"tags":65378,"__hash__":65379},"blogs\u002Fblog\u002Fdeveloping-event-driven-microservices-apache-pulsar-part-i.md","Developing Event Driven Microservices with Apache Pulsar: Part I",[46357],{"type":15,"value":65173,"toc":65361},[65174,65181,65184,65188,65191,65194,65200,65207,65210,65222,65226,65229,65236,65250,65253,65256,65262,65265,65269,65272,65276,65279,65283,65292,65296,65299,65301,65304,65307,65318,65320,65338],[48,65175,65176,65177,65180],{},"Our Developer Advocate team recently presented a ",[55,65178,65179],{"href":51838},"3-part webinar series on building event-driven microservices with Apache Pulsar",". To build on that topic, we will be publishing a blog series and this is the first blog in that series.",[48,65182,65183],{},"In this blog, you will learn how Apache Pulsar’s support for common message patterns and native compute capabilities, known as Pulsar Functions, can be leveraged to build a message bus for event driven microservices. Additionally, you will learn how to use Pulsar Functions to create lean, testable event driven microservices for a diverse set of deployment use cases.",[40,65185,65187],{"id":65186},"why-apache-pulsar-for-event-driven-microservices","Why Apache Pulsar for Event Driven Microservices",[48,65189,65190],{},"Event-driven microservices use a message bus to communicate among loosely-coupled, collaborating services. When a service performs work that other services might be interested in, that service produces an event. Other services can then consume the event in order to perform their own tasks.",[48,65192,65193],{},"The message bus serves as an intermediary between the different services. The message bus receives events from producers, filters the events, then pushes the events to consumers without tying the events to individual services. In order to accomplish this, the message bus needs to support various messaging paradigms, subscription types, and consumption patterns.",[48,65195,65196,65199],{},[55,65197,821],{"href":23526,"rel":65198},[264]," is a cloud-native, distributed messaging and event-streaming platform that supports common message patterns with its diverse subscription types and modes. The ability to support a diverse number of messaging patterns is important because, as mentioned above, it is required for many types of microservices.",[48,65201,65202,65203,65206],{},"In addition, Pulsar includes native, lightweight compute capabilities known as ",[55,65204,15627],{"href":63347,"rel":65205},[264]," that allow you to build microservices with a few lines of code. You can write Pulsar Functions in Java, Python, or Golang and deploy them in threads, processes, or Kubernetes pods. (We will talk more about Pulsar Functions later in this blog.)",[48,65208,65209],{},"Pulsar also provides scalability and elasticity, which are critical to microservices. Pulsar has a cloud-native, layered architecture that separates compute and storage into different layers. Decoupled compute and storage allows for independent scaling and enables microservices to scale elastically on short notice.",[48,65211,65212,65213,65218,65219,190],{},"Furthermore, microservices built with Pulsar can easily integrate with external frameworks, libraries, and systems. Pulsar has many connectors, such as MongoDB, ElasticSearch, Aerospike, ",[55,65214,65217],{"href":65215,"rel":65216},"https:\u002F\u002Fwww.influxdata.com\u002F",[264],"InfluxDB",", and Redis, and protocol handlers, including JMS, AMQP, and MQTT. You can explore what is available in the Pulsar ecosystem in the ",[55,65220,38697],{"href":35258,"rel":65221},[264],[40,65223,65225],{"id":65224},"building-event-driven-microservices-with-pulsar-functions","Building Event-Driven Microservices with Pulsar Functions",[48,65227,65228],{},"In the previous section, we discussed why Apache Pulsar is well-suited for event-driven microservices. In this section we cover why you should use Pulsar Functions when developing microservices.",[48,65230,65231,65232,65235],{},"Pulsar Functions are lambda-style functions and make it easy to transform or process messages in Pulsar. The diagram below illustrates the programming model of Pulsar Functions.\n",[384,65233],{"alt":18,"src":65234},"\u002Fimgs\u002Fblogs\u002F63b3e2b2e4af5c6e4bba4a17_screen-shot-2021-12-14-at-2.47.35-pm.png","\nPulsar Functions complete the following tasks when input messages are received:",[321,65237,65238,65241,65244,65247],{},[324,65239,65240],{},"Consume messages from one or more Pulsar topics",[324,65242,65243],{},"Apply a user-supplied processing logic to each message",[324,65245,65246],{},"Publish the results of the computation to another topic",[324,65248,65249],{},"Write logs to a log topic (potentially for debugging purposes)",[48,65251,65252],{},"When you use Pulsar Functions, producers and consumers are automatically set up, removing the need to write boilerplate code. When messages arrive in a topic, Pulsar Functions automatically applies user-supplied business logic components and sends the output to the specified topics.",[48,65254,65255],{},"The code below is an example of a microservice written as Pulsar Functions. The API is simple, allowing developers to focus on the business logic and easily write services, without the need for deep streaming knowledge. Because of this, Pulsar Functions can easily be developed by a small team or a single developer, allowing agile teams to build microservices.",[8325,65257,65260],{"className":65258,"code":65259,"language":8330},[8328],"\nimport java.util.function.Function;\n\npublic class NVSalesTaxCalcService implements Function {\n    public String apply(Float taxableAmt) {\n       return taxableAmt * 0.08125;   \u002F\u002F 8.125% sales tax\n    }\n}\n \n",[4926,65261,65259],{"__ignoreMap":18},[48,65263,65264],{},"You can use Pulsar’s native function capabilities for many use cases, including simple calculations, filtering, and single message transformation. However, if you need to develop complex functions or access Pulsar’s metadata, you can use the Pulsar client library to integrate with a variety of systems, such as the machine learning library TensorFlow.",[40,65266,65268],{"id":65267},"benefits-of-pulsar-functions","Benefits of Pulsar Functions",[48,65270,65271],{},"Now that we know how Pulsar Functions work, we can look at the key benefits of using Pulsar Functions to build event-driven microservices.",[32,65273,65275],{"id":65274},"_1-flexible-deployment-options","1. Flexible Deployment Options",[48,65277,65278],{},"Microservices written as Pulsar Functions can be deployed individually as threads, processes, or Kubernetes Pods inside Pulsar. The functions are not tied to each other at compile, run, or deployment time and there is no need for another processing framework, deployment system, FaaS, or specialized server. Since the microservices are isolated, you can scale each independently to support any workload.",[32,65280,65282],{"id":65281},"_2-function-mesh-for-complex-operations","2. Function Mesh for Complex Operations",[48,65284,65285,65286,65289,65290,190],{},"For more advanced deployment, you can use a tool called ",[55,65287,29463],{"href":29461,"rel":65288},[264]," to deploy multiple Pulsar Functions as a single unit. Function Mesh allows developers to utilize the full power of Kubernetes Scheduler with Pulsar Functions, including deployment, scaling and management. To learn more about Function Mesh, you can read this ",[55,65291,39553],{"href":44957},[32,65293,65295],{"id":65294},"_3-maintainability-and-testability","3. Maintainability and Testability",[48,65297,65298],{},"Pulsar Functions are highly maintainable and testable, which are desired quality for microservices. They are small pieces of code written in popular languages, such as Java, Python, or Go. They can be easily maintained in source control repositories and automatically tested with existing frameworks.",[40,65300,2125],{"id":2122},[48,65302,65303],{},"In this blog, we explained why you should leverage Apache Pulsar to build event-driven microservices. Pulsar enables teams of all sizes to build scalable, maintainable, testable microservices that support flexible deployment. Its simple API allows developers to focus on the business logic without the need for deep streaming knowledge.",[48,65305,65306],{},"In the next blog in this series, we will show you:",[321,65308,65309,65312,65315],{},[324,65310,65311],{},"Example code of building microservices with Pulsar Functions.",[324,65313,65314],{},"How to design schemas using the Avro interface definition language and use the schemas in Pulsar Functions-based microservices.",[324,65316,65317],{},"How to deploy microservices to a Pulsar cluster.",[40,65319,58598],{"id":58597},[1666,65321,65322,65329,65336],{},[324,65323,51819,65324,1154,65327,51825],{},[55,65325,36487],{"href":36485,"rel":65326},[264],[55,65328,36491],{"href":36490},[324,65330,51828,65331,65335],{},[55,65332,3550],{"href":65333,"rel":65334},"https:\u002F\u002Fauth.streamnative.cloud\u002Flogin?state=hKFo2SBKd0F4ZlJMUEI0MWZlbEF5ajQyVHRfS09zNkZHV0FXbqFupWxvZ2luo3RpZNkgUDVFT1lvWEJNYWNYNVpNZzJrT0xmV3plNU14RUtkM2ajY2lk2SA2ZXI3M3FLcTQycUIwd2JzcjFTT01hWWJhdTdLaGxldw&client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&protocol=oauth2&audience=https%3A%2F%2Fapi.streamnative.cloud&redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&defaultMethod=signup&scope=openid%20profile%20email%20offline_access&response_type=code&response_mode=query&nonce=TDA1T2NjQjI0TEFoS0djS0dUQUouNWdQc2N%2BQ2tzdGUxUlp3MGJxMjAxbA%3D%3D&code_challenge=iI2Fb9kr7DdndPns60IFW5ewA-qjck1lw62AAI4sETc&code_challenge_method=S256&auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTMuNCJ9",[264]," today. StreamNative Cloud is the simple, fast, and cost-effective way to run Pulsar in the public cloud. You can spin up a Pulsar cluster in minutes.",[324,65337,58619],{},[321,65339,65340,65348,65355],{},[324,65341,65342,46714,65344,51839,65346,190],{},[2628,65343,46713],{},[55,65345,267],{"href":51838},[55,65347,267],{"href":51838},[324,65349,65350,758,65352],{},[2628,65351,42753],{},[55,65353,51850],{"href":58635,"rel":65354},[264],[324,65356,65357,758,65359],{},[2628,65358,40436],{},[55,65360,51857],{"href":44957},{"title":18,"searchDepth":19,"depth":19,"links":65362},[65363,65364,65365,65370,65371],{"id":65186,"depth":19,"text":65187},{"id":65224,"depth":19,"text":65225},{"id":65267,"depth":19,"text":65268,"children":65366},[65367,65368,65369],{"id":65274,"depth":279,"text":65275},{"id":65281,"depth":279,"text":65282},{"id":65294,"depth":279,"text":65295},{"id":2122,"depth":19,"text":2125},{"id":58597,"depth":19,"text":58598},"2021-12-14","Learn about how to build scalable, maintainable, testable event driven microservices that support flexible deployment using Pulsar Functions.",{},"\u002Fblog\u002Fdeveloping-event-driven-microservices-apache-pulsar-part-i",{"title":65170,"description":65373},"blog\u002Fdeveloping-event-driven-microservices-apache-pulsar-part-i",[7347,821,9636,8058],"T90sAZZfUvX-2vRV70qv0noENcE4_-JS3YamVZOMLWo",{"id":65381,"title":65382,"authors":65383,"body":65384,"category":821,"createdAt":290,"date":65372,"description":65746,"extension":8,"featured":294,"image":65747,"isDraft":294,"link":290,"meta":65748,"navigation":7,"order":296,"path":65749,"readingTime":33204,"relatedResources":290,"seo":65750,"stem":65751,"tags":65752,"__hash__":65753},"blogs\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-4.md","What’s New in Apache Pulsar 2.7.4",[61300],{"type":15,"value":65385,"toc":65718},[65386,65389,65391,65422,65430,65432,65441,65458,65464,65476,65485,65497,65503,65515,65524,65536,65545,65557,65566,65578,65587,65599,65608,65620,65629,65659,65668,65680,65682,65688,65694,65705,65707],[48,65387,65388],{},"The Apache Pulsar community releases version 2.7.4! 32 contributors provided improvements and bug fixes that delivered 98 commits.",[48,65390,61308],{},[321,65392,65393,65406,65414],{},[324,65394,65395,65396,5157,65401],{},"Upgrade Log4j to 2.17.0 - ",[55,65397,65400],{"href":65398,"rel":65399},"https:\u002F\u002Fpulsar.apache.org\u002Fblog\u002F2021\u002F12\u002F11\u002FLog4j-CVE\u002F",[264],"CVE-2021-45105",[55,65402,65405],{"href":65403,"rel":65404},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F13392",[264],"PR-13392",[324,65407,65408,65409],{},"ManagedLedger can be referenced correctly when OpAddEntry is recycled. ",[55,65410,65413],{"href":65411,"rel":65412},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12103",[264],"PR-12103",[324,65415,65416,65417],{},"NPE does not occur on OpAddEntry while ManagedLedger is closing. ",[55,65418,65421],{"href":65419,"rel":65420},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12364",[264],"PR-12364",[48,65423,65424,65425,190],{},"This blog walks through the most noteworthy changes grouped by the affected functionalities. For the complete list including all enhancements and bug fixes, check out the ",[55,65426,65429],{"href":65427,"rel":65428},"https:\u002F\u002Fpulsar.apache.org\u002Fen\u002Frelease-notes\u002F#274",[264],"Pulsar 2.7.4 Release Notes",[40,65431,61003],{"id":61002},[32,65433,65395,65435,5157,65438],{"id":65434},"upgrade-log4j-to-2170-cve-2021-45105-pr-13392",[55,65436,65400],{"href":65398,"rel":65437},[264],[55,65439,65405],{"href":65403,"rel":65440},[264],[321,65442,65443,65445,65453,65455],{},[324,65444,57576],{},[324,65446,65447,65448,190],{},"A serious vulnerability was reported regarding Log4j that can allow remote execution for attackers. The vulnerability issue is described and tracked under ",[55,65449,65452],{"href":65450,"rel":65451},"https:\u002F\u002Fnvd.nist.gov\u002Fvuln\u002Fdetail\u002FCVE-2021-44228",[264],"CVE-2021-44228",[324,65454,57583],{},[324,65456,65457],{},"Pulsar 2.7.4 upgraded Log4j to 2.17.0.",[32,65459,65408,65461],{"id":65460},"managedledger-can-be-referenced-correctly-when-opaddentry-is-recycled-pr-12103",[55,65462,65413],{"href":65411,"rel":65463},[264],[321,65465,65466,65468,65471,65473],{},[324,65467,57576],{},[324,65469,65470],{},"Previously, after a write failure, a task was scheduled in the background to force close the ledger and trigger the creation of a new ledger. If the OpAddEntry instance was already recycled, that could lead to either an NPE or undefined behavior.",[324,65472,57583],{},[324,65474,65475],{},"The ManagedLedgerImpl object reference is copied to a final variable so the background task will not be dependent on the lifecycle of the OpAddEntry instance.",[32,65477,65479,65480],{"id":65478},"no-potential-race-condition-in-the-blobstorebackedreadhandler-pr-12123","No potential race condition in the BlobStoreBackedReadHandler. ",[55,65481,65484],{"href":65482,"rel":65483},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12123",[264],"PR-12123",[321,65486,65487,65489,65492,65494],{},[324,65488,57576],{},[324,65490,65491],{},"Previously, BlobStoreBackedReadHandler entered an infinite loop when reading an offload ledger. There was a race condition between the operation of reading entries and closing BlobStoreBackedReadHandler.",[324,65493,57583],{},[324,65495,65496],{},"Added a state check before reading entries and made the BlobStoreBackedReadHandler exit loop when the entryID is bigger than the lastEntryID.",[32,65498,65416,65500],{"id":65499},"npe-does-not-occur-on-opaddentry-while-managedledger-is-closing-pr-12364",[55,65501,65421],{"href":65419,"rel":65502},[264],[321,65504,65505,65507,65510,65512],{},[324,65506,57576],{},[324,65508,65509],{},"Previously, the test ManagedLedgerBkTest#managedLedgerClosed closed ManagedLedger object on some asyncAddEntry operations and failed with NPE.",[324,65511,57583],{},[324,65513,65514],{},"Closed OpAddEntry when ManagedLedger signaled OpAddEntry to fail. In this way, the OpAddEntry object was correctly recycled and the failed callback was correctly triggered.",[32,65516,65518,65519],{"id":65517},"set-a-topic-policy-through-the-topic-name-of-a-partition-correctly-pr-11294","Set a topic policy through the topic name of a partition correctly. ",[55,65520,65523],{"href":65521,"rel":65522},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11294",[264],"PR-11294",[321,65525,65526,65528,65531,65533],{},[324,65527,57576],{},[324,65529,65530],{},"Previously, the topic name of a partition could not be used to set a topic policy.",[324,65532,57583],{},[324,65534,65535],{},"Allowed setting a topic policy through a topic name of a partition by converting the topic name of a partition in SystemTopicBasedTopicPoliciesService.",[32,65537,65539,65540],{"id":65538},"dispatch-rate-limiter-takes-effect-for-consumers-pr-8611","Dispatch rate limiter takes effect for consumers. ",[55,65541,65544],{"href":65542,"rel":65543},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8611",[264],"PR-8611",[321,65546,65547,65549,65552,65554],{},[324,65548,57576],{},[324,65550,65551],{},"Previously, dispatch rate limiter did not take effect in cases where all consumers started reading in the next second since acquiredPermits was reset to 0 every second.",[324,65553,57583],{},[324,65555,65556],{},"Changed the behaviour of DispatchRateLimiter by minus permits every second instead of reset acquiredPermits to 0. Consumers stopped reading entries temporarily until acquiredPermits returned to a value less than permits .",[32,65558,65560,65561],{"id":65559},"npe-does-not-occur-when-executing-unload-bundles-operations-pr-11310","NPE does not occur when executing unload bundles operations. ",[55,65562,65565],{"href":65563,"rel":65564},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11310",[264],"PR-11310",[321,65567,65568,65570,65573,65575],{},[324,65569,57576],{},[324,65571,65572],{},"When performing pressure tests on persistent partitioned topics, NPE occurred when executing unload bundles operations. Concurrently, producers did not write messages.",[324,65574,57583],{},[324,65576,65577],{},"Added more safety checks to fix this issue.",[32,65579,65581,65582],{"id":65580},"fix-inconsistent-behavior-for-namespace-bundles-cache-pr-11346","Fix inconsistent behavior for Namespace bundles cache. ",[55,65583,65586],{"href":65584,"rel":65585},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11346",[264],"PR-11346",[321,65588,65589,65591,65594,65596],{},[324,65590,57576],{},[324,65592,65593],{},"Previously, namespace bundle cache was not invalidated after a namespace was deleted.",[324,65595,57583],{},[324,65597,65598],{},"Invalidated namespace policy cache when bundle cache was invalidated.",[32,65600,65602,65603],{"id":65601},"close-the-replicator-and-replication-client-after-a-cluster-is-deleted-pr-11342","Close the replicator and replication client after a cluster is deleted. ",[55,65604,65607],{"href":65605,"rel":65606},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11342",[264],"PR-11342",[321,65609,65610,65612,65615,65617],{},[324,65611,57576],{},[324,65613,65614],{},"Previously, the replicator and the replication client were not closed after a cluster was deleted. The producer of the replicator would then try to reconnect to the deleted cluster continuously.",[324,65616,57583],{},[324,65618,65619],{},"Closed the relative replicator and replication client.",[32,65621,65623,65624],{"id":65622},"publish-rate-limiter-takes-effect-as-expected-pr-10384","Publish rate limiter takes effect as expected. ",[55,65625,65628],{"href":65626,"rel":65627},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10384",[264],"PR-10384",[321,65630,65631,65633,65636,65639,65642,65645,65648,65650,65653,65656],{},[324,65632,57576],{},[324,65634,65635],{},"Previously, there were various issues if preciseTopicPublishRateLimiterEnable was set to true for rate limiting:",[324,65637,65638],{},"Updating the limits did not set a boundary when changing the limits from a bounded limit to an unbounded limit.",[324,65640,65641],{},"Each topic created a scheduler thread for each limiter instance.",[324,65643,65644],{},"Topics did not release the scheduler thread when the topic was unloaded or the operation closed.",[324,65646,65647],{},"Updating the limits did not close the scheduler thread related to the replaced limiter instance",[324,65649,57583],{},[324,65651,65652],{},"Cleaned up the previous limiter instances before creating new limiter instances.",[324,65654,65655],{},"Used brokerService.pulsar().getExecutor() as the scheduler for the rate limiter instances.",[324,65657,65658],{},"Added resource cleanup hooks for topic closing (unload).",[32,65660,65662,65663],{"id":65661},"clean-up-newly-created-ledgers-if-fails-to-update-znode-list-pr-12015","Clean up newly created ledgers if fails to update ZNode list. ",[55,65664,65667],{"href":65665,"rel":65666},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F12015",[264],"PR-12015",[321,65669,65670,65672,65675,65677],{},[324,65671,57576],{},[324,65673,65674],{},"When updating a ZNode list, ZooKeeper threw an exception and did not clean up the created ledger. Newly created ledgers were not indexed to a topic managedLedger list and could not be cleared up as topic retention. Also, ZNode numbers increased in ZooKeeper if the ZNode version mismatch exception was thrown out.",[324,65676,57583],{},[324,65678,65679],{},"Deleted the created ledger from broker cache and BookKeeper regardless of exception type when the ZNode list failed to update.",[40,65681,13565],{"id":1727},[48,65683,65684,65685,57738],{},"If you are interested in learning more about Pulsar 2.7.4, you can ",[55,65686,36195],{"href":58799,"rel":65687},[264],[48,65689,65690,65691,57746],{},"Pulsar Summit Asia 2021 will take place on January 15-16, 2022. ",[55,65692,57745],{"href":35357,"rel":65693},[264],[48,65695,57749,65696,57753,65699,57757,65702,20076],{},[55,65697,40821],{"href":23526,"rel":65698},[264],[55,65700,36238],{"href":36236,"rel":65701},[264],[55,65703,57762],{"href":57760,"rel":65704},[264],[40,65706,39647],{"id":39646},[48,65708,57767,65709,57772,65713,57775,65716,57779],{},[55,65710,57771],{"href":65711,"rel":65712},"http:\u002F\u002Fhttps:\u002Fpulsar.apache.org\u002Fen\u002Fdownload",[264],[55,65714,3550],{"href":61568,"rel":65715},[264],[55,65717,24379],{"href":57778},{"title":18,"searchDepth":19,"depth":19,"links":65719},[65720,65744,65745],{"id":61002,"depth":19,"text":61003,"children":65721},[65722,65724,65726,65728,65730,65732,65734,65736,65738,65740,65742],{"id":65434,"depth":279,"text":65723},"Upgrade Log4j to 2.17.0 - CVE-2021-45105. PR-13392",{"id":65460,"depth":279,"text":65725},"ManagedLedger can be referenced correctly when OpAddEntry is recycled. PR-12103",{"id":65478,"depth":279,"text":65727},"No potential race condition in the BlobStoreBackedReadHandler. PR-12123",{"id":65499,"depth":279,"text":65729},"NPE does not occur on OpAddEntry while ManagedLedger is closing. PR-12364",{"id":65517,"depth":279,"text":65731},"Set a topic policy through the topic name of a partition correctly. PR-11294",{"id":65538,"depth":279,"text":65733},"Dispatch rate limiter takes effect for consumers. PR-8611",{"id":65559,"depth":279,"text":65735},"NPE does not occur when executing unload bundles operations. PR-11310",{"id":65580,"depth":279,"text":65737},"Fix inconsistent behavior for Namespace bundles cache. PR-11346",{"id":65601,"depth":279,"text":65739},"Close the replicator and replication client after a cluster is deleted. PR-11342",{"id":65622,"depth":279,"text":65741},"Publish rate limiter takes effect as expected. PR-10384",{"id":65661,"depth":279,"text":65743},"Clean up newly created ledgers if fails to update ZNode list. PR-12015",{"id":1727,"depth":19,"text":13565},{"id":39646,"depth":19,"text":39647},"We are excited to see the Apache Pulsar community has successfully released the 2.7.4 version! 32 contributors provided improvements and bug fixes that delivered 98 commits. Let's walk through the most noteworthy changes!","\u002Fimgs\u002Fblogs\u002F63c209ae2853d3aa8a9e8df4_Pulsar-release-blog-274.jpg",{},"\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-4",{"title":65382,"description":65746},"blog\u002Fwhats-new-in-apache-pulsar-2-7-4",[302,821],"-sHpqxN1z1pA54qg2qbZgAwlChv5KWjwXk5u02atgEQ",{"id":65755,"title":65756,"authors":65757,"body":65758,"category":821,"createdAt":290,"date":65940,"description":65941,"extension":8,"featured":294,"image":65942,"isDraft":294,"link":290,"meta":65943,"navigation":7,"order":296,"path":65944,"readingTime":31039,"relatedResources":290,"seo":65945,"stem":65946,"tags":65947,"__hash__":65948},"blogs\u002Fblog\u002Finformation-regarding-log4j-security-vulnerabilities.md","Information Regarding Log4j Security Vulnerabilities",[60441,807],{"type":15,"value":65759,"toc":65926},[65760,65765,65768,65780,65784,65787,65790,65793,65799,65803,65806,65812,65816,65819,65822,65826,65833,65837,65840,65843,65849,65852,65858,65861,65865,65872,65875,65879,65882,65893,65897,65901,65904,65907,65910,65914,65917,65921,65924],[916,65761,65762],{},[48,65763,65764],{},"Updated on Dec. 21st to add details of how the vulnerability impacts other Pulsar ecosystem tools.",[48,65766,65767],{},"We wanted to provide an update on the Log4Shell critical vulnerability (known as log4j). Below we provide a status of the vulnerability in the open-source Apache Pulsar and in the StreamNative products, as well as any actions that need to be taken to mitigate the vulnerability.",[48,65769,65770,65771,65775,65776,190],{},"If there are any additional questions, please open a support ticket at ",[55,65772,65774],{"href":16162,"rel":65773},[264],"https:\u002F\u002Fsupport.streamnative.io"," or email ",[55,65777,65779],{"href":65778},"mailto:support@streamnative.io","support@streamnative.io",[40,65781,65783],{"id":65782},"apache-pulsar-open-source-update","Apache Pulsar Open Source Update",[48,65785,65786],{},"A total of three CVEs impacting log4j have been discovered. Of the three vulnerabilities, only two impact Apache Pulsar by default and the Pulsar community has been working to patch all three CVEs.",[48,65788,65789],{},"As of the publish date, Pulsar releases 2.9.1 and 2.8.2 have updated log4j versions and mitigate all known vulnerabilities. Additional releases for Pulsar 2.7 and 2.6 are in progress.",[48,65791,65792],{},"The following table provides a summary of the impact and mitigation options for these vulnerabilities.",[48,65794,65795],{},[384,65796],{"alt":65797,"src":65798},"table of Apache Pulsar Open Source Update","\u002Fimgs\u002Fblogs\u002F63b3e5bec87cd751a4d6008f_table-Apache-Pulsar-Open-Source-Update.webp",[40,65800,65802],{"id":65801},"streamnative-products-update","StreamNative Products Update",[48,65804,65805],{},"StreamNative has worked to mitigate and patch these issues quickly for all customers. The following table provides details on the current status and recommended actions.",[48,65807,65808],{},[384,65809],{"alt":65810,"src":65811},"table of StreamNative Products Update","\u002Fimgs\u002Fblogs\u002F63b3e624d56e7e5466259588_table-StreamNative-Products-Update.webp",[40,65813,65815],{"id":65814},"non-standard-log4j-pulsar-functions-and-process-runtime","Non standard log4j \u002F Pulsar functions and process runtime",[48,65817,65818],{},"As seen above, log4j, even when configured with the mitigating no message lookup mitigation, can be exploited if the user has configured either customer log4j template strings or is using Pulsar functions with the process runtime.",[48,65820,65821],{},"As a summary, if you have configured your log4j template strings to contain references either to a context object ($${ctx:}, %x,%mdc, etc) or are using Pulsar functions with the process runtime, then you potentially are at risk. We encourage you to remove the custom settings, or if using pulsar functions with process runtime, to upgrade to 2.8.2 or 2.9.1.",[40,65823,65825],{"id":65824},"mitigating-pulsar-functions","Mitigating Pulsar Functions",[48,65827,65828,65829],{},"Pulsar functions written in Java will need to be re-deployed in order to get the updated values. StreamNative Cloud Managed customers will be reached out directly in the coming weeks for any functions that are not mitigated. This should be as simple as using the pulsar-admin functions update command. For an example, please see ",[55,65830,65831],{"href":65831,"rel":65832},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Ffunctions-deploying\u002F#updating-cluster-mode-functions",[264],[40,65834,65836],{"id":65835},"how-to-mitigate-using-streamnative-platform","How to mitigate using StreamNative Platform",[48,65838,65839],{},"If you are using StreamNative Platform, you can edit your values file accordingly to deploy the formatMsgNoLookups mitigation. We currently do not recommend updating your chart version, as the latest release contains other unrelated changes.",[48,65841,65842],{},"Shown below are the new values that need to be added or changed in your values file. NOTE: you may have previously seen a mitigation using the -Dlog4j2.formatMsgNoLookups=true being added to PULSAR_EXTRA_OPTS. These two mitigations are equivalent and either is sufficient.",[8325,65844,65847],{"className":65845,"code":65846,"language":8330},[8328],"\nbookkeeper:\n  configData:\n    LOG4J_FORMAT_MSG_NO_LOOKUPS: “true”\nbroker:\n  configData:\n    LOG4J_FORMAT_MSG_NO_LOOKUPS: “true”\nproxy:\n  configData:\n    LOG4J_FORMAT_MSG_NO_LOOKUPS: “true”\nstreamnative_console:\n  configData:\n    LOG4J_FORMAT_MSG_NO_LOOKUPS: “true”\nzookeeper:\n  configData:\n    LOG4J_FORMAT_MSG_NO_LOOKUPS: “true”\n\n",[4926,65848,65846],{"__ignoreMap":18},[48,65850,65851],{},"Upgrading the version can be done as follows:",[8325,65853,65856],{"className":65854,"code":65855,"language":8330},[8328],"\nimages:\n  autorecovery:\n    repository: streamnative\u002Fsn-platform\n    tag: 2.8.1.30\n  bookie:\n    repository: streamnative\u002Fsn-platform\n    tag: 2.8.1.30\n  broker:\n    repository: streamnative\u002Fsn-platform\n    tag: 2.8.1.30\n  functions:\n    repository: streamnative\u002Fsn-platform\n    tag: 2.8.1.30\n  presto:\n    repository: streamnative\u002Fsn-platform\n    tag: 2.8.1.30\n  proxy:\n    repository: streamnative\u002Fsn-platform\n    tag: 2.8.1.30\n  zookeeper:\n    repository: streamnative\u002Fsn-platform\n    tag: 2.8.1.30\n  streamnative_console:\n    repository: streamnative\u002Fsn-platform-console\n    tag: \"1.10-rc2\"\n\n",[4926,65857,65855],{"__ignoreMap":18},[48,65859,65860],{},"Both mitigations can be deployed simultaneously, and it is in fact recommended.",[40,65862,65864],{"id":65863},"how-to-mitigate-via-open-source-pulsar","How to mitigate via open-source Pulsar",[48,65866,65867,65868,65871],{},"If you are using open-source Pulsar, please see the blog post ",[55,65869,267],{"href":65398,"rel":65870},[264]," for instructions on mitigation. Also we still encourage you to upgrade to 2.8.2 or 2.9.1 when possible.",[48,65873,65874],{},"One additional note for users of either the open-source or StreamNative Helm charts also do not need to wait for images to be updated in the helm chart and new versions can be specified similar to platform charts shown above.",[40,65876,65878],{"id":65877},"ecosystem-tools","Ecosystem Tools",[48,65880,65881],{},"There have also been questions regarding other tools in the Pulsar ecosystem.",[321,65883,65884,65887,65890],{},[324,65885,65886],{},"Pulsar Manager - Pulsar manager is not directly affected by the log4j vulnerability. Pulsar manager is built using the spring framework, which uses logback for logging. Logback is also impacted, but unlike the log4j exploit, it requires access to directly edit logback configuration file which drastically lowers the severity of the issue. We will release a new version once spring releases a new version with the fix.",[324,65888,65889],{},"Pulsar Spark\u002FFlink Connector - Both the Apache Flink and Apache Spark connectors for Pulsar are not directly impacted. Neither connector directly includes log4j, and instead uses the log4j included in your Flink or Spark distribution. Upgrading your Flink or Spark deployments will mitigate any issues",[324,65891,65892],{},"Pulsar IO Connectors - Connectors within Pulsar IO also (by default) do not include their own log4j, and instead rely on the log4j provided by the Pulsar IO framework. Following the instructions above for functions and redeploying connectors will mitigate the issue. This is true for all connectors included in Pulsar as well as StreamNative supported connectors. Custom connectors may need to be validated independently.",[40,65894,65896],{"id":65895},"faq","FAQ",[32,65898,65900],{"id":65899},"how-serious-is-this-issue","How serious is this issue?",[48,65902,65903],{},"The underlying issue is potentially very serious, with a Remote Code Execution (RCE) vulnerability allowing for arbitrary code to severely expose a system to additional damage by an attacker, potentially gaining complete system access.",[48,65905,65906],{},"At this time, there are no known RCE exploits against Apache Pulsar. However, we recommend still treating this issue as critical, as the mechanism for executing code is complex and many potential vectors exist. We do know that attacks which could expose environment variables or other environment data is possible.",[48,65908,65909],{},"We strongly recommend taking action ASAP.",[32,65911,65913],{"id":65912},"why-has-it-taken-so-long-to-get-releases-in-open-source","Why has it taken so long to get releases in open-source?",[48,65915,65916],{},"Releasing open-source Pulsar involves following a well-defined process in the community. As additional exploits were found over the past week, the community \u002F Pulsar PMC decided to restart the process multiple times in order to select the latest version of log4j which mitigates all open issues.",[32,65918,65920],{"id":65919},"was-there-any-known-impact-to-streamnative-customers-and-their-data","Was there any known impact to StreamNative customers and their data?",[48,65922,65923],{},"At this time, we are not aware of any usage of this exploit that resulted in loss of secrets or customer data. There have been some automated reports via cloud vendors of attackers attempting to leverage the exploits, but no successful attempts have been found so far. For customers where we received warning reports, we are in the process of rotating the credentials.",[48,65925,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":65927},[65928,65929,65930,65931,65932,65933,65934,65935],{"id":65782,"depth":19,"text":65783},{"id":65801,"depth":19,"text":65802},{"id":65814,"depth":19,"text":65815},{"id":65824,"depth":19,"text":65825},{"id":65835,"depth":19,"text":65836},{"id":65863,"depth":19,"text":65864},{"id":65877,"depth":19,"text":65878},{"id":65895,"depth":19,"text":65896,"children":65936},[65937,65938,65939],{"id":65899,"depth":279,"text":65900},{"id":65912,"depth":279,"text":65913},{"id":65919,"depth":279,"text":65920},"2021-12-10","An update on log4j security vulernabilties","\u002Fimgs\u002Fblogs\u002F63c7fb440cb4c4b358b7d22f_63b3e53550e00da2c55ae24c_log4shell_logo.png",{},"\u002Fblog\u002Finformation-regarding-log4j-security-vulnerabilities",{"title":65756,"description":65941},"blog\u002Finformation-regarding-log4j-security-vulnerabilities",[4301,821],"fBldce9AJDJA0X8UV6W8VGKBKzF2t9l5jXAlhF6zeY0",{"id":65950,"title":65951,"authors":65952,"body":65953,"category":7338,"createdAt":290,"date":66139,"description":66140,"extension":8,"featured":294,"image":66141,"isDraft":294,"link":290,"meta":66142,"navigation":7,"order":296,"path":66143,"readingTime":4475,"relatedResources":290,"seo":66144,"stem":66145,"tags":66146,"__hash__":66147},"blogs\u002Fblog\u002Fpulsar-hits-10-000-github-stars-milestone.md","Pulsar Hits 10,000 GitHub Stars Milestone",[44843],{"type":15,"value":65954,"toc":66128},[65955,65958,65961,65965,65968,65974,65977,65983,65991,65997,66001,66004,66008,66029,66033,66039,66043,66056,66060,66070,66072,66075,66086,66089,66092],[48,65956,65957],{},"Apache Pulsar hits 10,000 GitHub stars. Developers use GitHub stars to show their support for projects and to bookmark projects they want to follow. It is an important measurement to track the engagement of an open source project. The Pulsar community would like to thank every stargazer for joining us in the journey. More importantly, thank you to every Pulsar user, contributor, and committer for making this happen!",[48,65959,65960],{},"Pulsar was initially developed as a cloud-native distributed messaging system at Yahoo! in 2012. Over the past decade, Pulsar has evolved into a unified messaging and streaming platform for event-driven enterprises at scale. In this blog, we look at Pulsar’s community growth, project updates, ecosystem developments, and what’s next for the project.",[40,65962,65964],{"id":65963},"community-growth","Community Growth",[48,65966,65967],{},"Pulsar’s success depends on its community. As shown in the chart below, the growth of Pulsar’s GitHub stars accelerated after Pulsar became a top-level Apache Software Foundation project.",[48,65969,65970],{},[384,65971],{"alt":65972,"src":65973},"graph apache pulsar github starts","\u002Fimgs\u002Fblogs\u002F63b3599de697749ddc814c01_star.png",[48,65975,65976],{},"The number of contributors over time is another metric for measuring community engagement. The chart below shows that the number of Pulsar contributors accelerated when Pulsar became a top-level Apache project and the growth rate has continued into 2021.",[48,65978,65979],{},[384,65980],{"alt":65981,"src":65982},"apache pulsar contributors","\u002Fimgs\u002Fblogs\u002F63b3599d0cd1fdc445d16b75_contri.png",[48,65984,65985,65990],{},[55,65986,65989],{"href":65987,"rel":65988},"https:\u002F\u002Fwww.apiseven.com\u002Fen\u002Fcontributor-graph?chart=contributorMonthlyActivity&repo=apache\u002Fpulsar,apache\u002Fkafka",[264],"Pulsar surpassed Kafka in the number of monthly active contributors in early 2021",". This shows that the development and engagement of Pulsar has grown rapidly over the past few years.",[48,65992,65993],{},[384,65994],{"alt":65995,"src":65996},"active pulsar kafka contriutors","\u002Fimgs\u002Fblogs\u002F63b3599e80a8307b85d06746_monthly.png",[40,65998,66000],{"id":65999},"project-updates-and-ecosystem-development","Project Updates and Ecosystem Development",[48,66002,66003],{},"Recent Pulsar project updates have brought new capabilities to the project. Below we look at key releases and launches.",[32,66005,66007],{"id":66006},"_1-pulsar-28-unified-messaging-and-streaming-with-transactions","1. Pulsar 2.8 - Unified Messaging and Streaming with Transactions",[48,66009,66010,66014,66015,66019,66020,66025,66026,66028],{},[55,66011,66013],{"href":66012},"\u002Fblog\u002Frelease\u002F2021-06-15-apache-pulsar-launches-2-8-unified-messaging-and-streaming-with-transactions\u002F","Pulsar 2.8"," (released in June 2021) introduced many major updates. The ",[55,66016,66018],{"href":66017},"\u002Fen\u002Fblog\u002Frelease\u002F2021-06-14-exactly-once-semantics-with-transactions-in-pulsar\u002F","Pulsar Transaction API"," was added to support atomicity across multiple topics and enable end-to-end exactly-once message delivery guarantee for streaming jobs. ",[55,66021,66024],{"href":66022,"rel":66023},"https:\u002F\u002Fpulsar-summit.org\u002Fevent\u002Fnorth-america-2021\u002Fsessions\u002Freplicated-subscriptions-taking-geo-replication-to-the-next-level",[264],"Replicated subscriptions"," was added to enhance Pulsar’s geo-replication. With this feature, a consumer can restart consuming from the failure point in a different cluster in case of failover. Read ",[55,66027,48586],{"href":66012}," to learn more about 2.8.",[32,66030,66032],{"id":66031},"_2-function-mesh-simplifying-complex-streaming-jobs-in-the-cloud","2. Function Mesh - Simplifying Complex Streaming Jobs in the Cloud",[48,66034,66035,66038],{},[55,66036,29463],{"href":66037},"\u002Fen\u002Fblog\u002Frelease\u002F2021-05-03-function-mesh-open-source\u002F"," is an ideal tool for those who are seeking cloud-native serverless streaming solutions. It is a Kubernetes operator that enables users to run Pulsar Functions and connectors natively on Kubernetes, unlocking the full power of Kubernetes’ application deployment, scaling, and management. Function Mesh is also a serverless framework used to orchestrate multiple Pulsar Functions and I\u002FO connectors for complex streaming jobs in a simple way.",[32,66040,66042],{"id":66041},"_3-pulsar-connectors-aws-sqs-connector-cloud-storage-sink-connector-and-more","3. Pulsar Connectors - AWS SQS Connector, Cloud Storage Sink Connector, and More",[48,66044,66045,66046,66050,66051,66055],{},"Pulsar connectors enable easy integration between Pulsar and external systems with added benefits. For instance, the ",[55,66047,66049],{"href":66048},"\u002Fen\u002Fblog\u002Ftech\u002F2021-03-17-announcing-aws-sqs-connector-for-apache-pulsar\u002F","AWS SQS Connector"," enables secure integration between Pulsar and SQS without needing to write any code. And the ",[55,66052,66054],{"href":66053},"\u002Fen\u002Fblog\u002Ftech\u002F2020-10-20-cloud-storage-sink-connector-251\u002F","Cloud Storage Sink Connector"," can export data by guaranteeing exactly-once delivery semantics to its consumers. It provides applications that export data from Pulsar, the benefits of fault tolerance, parallelism, elasticity, and much more.",[32,66057,66059],{"id":66058},"_4-protocol-handlers-kafa-on-pulsar-mqtt-on-pulsar-and-more","4. Protocol Handlers - Kafa-on-Pulsar, MQTT-on-Pulsar, and More",[48,66061,66062,66063,66066,66067,190],{},"Protocol handlers, such as Kafa-on-Pulsar (KoP) and MQTT-on-Pulsar, allow Pulsar to interact with applications built on other messaging platforms, lowering the barrier to Pulsar adoption. Learn about how KoP became production-ready since KoP 2.8 in ",[55,66064,48586],{"href":66065},"\u002Fblog\u002Fengineering\u002F2021-12-01-offset-implementation-in-kafka-on-pulsar\u002F",". Read about how MoP enables MQTT applications to leverage Pulsar’s infinite event stream retention with BookKeeper and tiered storage in ",[55,66068,48586],{"href":66069},"\u002Fen\u002Fblog\u002Ftech\u002F2020-09-28-announcing-mqtt-on-pulsar\u002F",[40,66071,7126],{"id":1727},[48,66073,66074],{},"In the upcoming Pulsar 2.9 release, you can expect the following updates:",[321,66076,66077,66080,66083],{},[324,66078,66079],{},"Introducing a pluggable metadata interface for ZooKeeper metadata management to improve consistency, resilience, and stability, and reduce technical debt.",[324,66081,66082],{},"Launching the Oracle Debezium Connector and the schema-aware Elasticsearch Sink Connector.",[324,66084,66085],{},"Adding the ability to run Kafka Connect sinks as Pulsar sinks.",[48,66087,66088],{},"The Pulsar community is planning more features for future Pulsar releases, including removing ZooKeeper, autoscaling topics, and simplified management of document-based policy.",[40,66090,66091],{"id":39646},"Get Involved",[321,66093,66094,66105,66110,66121],{},[324,66095,66096,66097,66101,66102,18054],{},"Pulsar 2.8.1 was released in September 2021. ",[55,66098,66100],{"href":53730,"rel":66099},[264],"Download it now"," and try it out! Read ",[55,66103,48586],{"href":66104},"\u002Fblog\u002Frelease\u002F2021-09-23-pulsar-281",[324,66106,55539,66107,190],{},[55,66108,36244],{"href":57760,"rel":66109},[264],[324,66111,66112,66115,66116,66120],{},[55,66113,64848],{"href":66114},"\u002Fdownload\u002Fmanning-ebook-apache-pulsar-in-action"," of Manning's Apache Pulsar in Action and ",[55,66117,66119],{"href":66118},"\u002Fevent\u002Fwebinar-book-launch\u002F","join the virtual book launch"," hosted by the author David Kjerrumgaard.",[324,66122,66123,66124],{},"Join the 2022 StreamNative Ambassador Program and work directly with Pulsar experts from StreamNative to co-host events, promote new project updates, and build the Pulsar user group in your city. Contact Us: ",[55,66125,66127],{"href":66126},"mailto:ambassador@streamnative.io","ambassador@streamnative.io",{"title":18,"searchDepth":19,"depth":19,"links":66129},[66130,66131,66137,66138],{"id":65963,"depth":19,"text":65964},{"id":65999,"depth":19,"text":66000,"children":66132},[66133,66134,66135,66136],{"id":66006,"depth":279,"text":66007},{"id":66031,"depth":279,"text":66032},{"id":66041,"depth":279,"text":66042},{"id":66058,"depth":279,"text":66059},{"id":1727,"depth":19,"text":7126},{"id":39646,"depth":19,"text":66091},"2021-12-07","In this blog we look at Pulsar’s community growth, project updates, ecosystem developments, and what’s next for the project.","\u002Fimgs\u002Fblogs\u002F63c7fb5b22d142fd8c226068_63b3599d77f0518584b40b32_1.png",{},"\u002Fblog\u002Fpulsar-hits-10-000-github-stars-milestone",{"title":65951,"description":66140},"blog\u002Fpulsar-hits-10-000-github-stars-milestone",[302,821],"44C4SnukrjWZJFluG4_vrTcEa6l3ch3siZ2kjSFYM3Q",{"id":66149,"title":66150,"authors":66151,"body":66153,"category":821,"createdAt":290,"date":66796,"description":66797,"extension":8,"featured":294,"image":66798,"isDraft":294,"link":290,"meta":66799,"navigation":7,"order":296,"path":66800,"readingTime":46114,"relatedResources":290,"seo":66801,"stem":66802,"tags":66803,"__hash__":66804},"blogs\u002Fblog\u002F10-useful-pulsarctl-commands-manage-cluster.md","10 Useful Pulsarctl Commands to Manage Your Cluster",[66152],"Aniket Bhattacharyea",{"type":15,"value":66154,"toc":66753},[66155,66161,66169,66183,66186,66195,66220,66231,66237,66241,66247,66253,66262,66286,66294,66297,66301,66307,66311,66317,66321,66327,66336,66339,66342,66346,66352,66358,66362,66368,66374,66378,66384,66390,66399,66406,66417,66420,66424,66430,66434,66440,66449,66452,66455,66461,66470,66481,66507,66510,66514,66520,66524,66530,66539,66542,66545,66548,66563,66566,66570,66576,66579,66585,66589,66595,66598,66603,66612,66615,66635,66638,66642,66648,66652,66658,66664,66673,66676,66687,66690,66693,66699,66705,66714,66717,66720,66723,66729,66731,66739],[48,66156,66157,66160],{},[55,66158,821],{"href":23526,"rel":66159},[264]," is a cloud-native, distributed messaging and streaming platform. Originally created at Yahoo! and currently maintained by the Apache Software Foundation, Pulsar provides a scalable and durable messaging platform with a multitude of features like multi-tenancy, geo-replication, and persistent storage.",[48,66162,66163,66164,66168],{},"Pulsar comes with a command-line utility ",[55,66165,38169],{"href":66166,"rel":66167},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fpulsar-admin\u002F",[264]," that is used to perform administrative operations on Pulsar.",[48,66170,66171,66175,66176,66178,66179,190],{},[55,66172,66174],{"href":42821,"rel":66173},[264],"Pulsarctl"," by ",[55,66177,4496],{"href":10259}," is an alternative to the traditional pulsar-admin. Pulsarctl provides numerous improvements over pulsar-admin, including a unified interface for managing partitioned and non-partitioned topics; outputs in text, JSON, and YAML; detailed documentation; and ",[55,66180,66182],{"href":66181},"\u002Fen\u002Fblog\u002Ftech\u002F2019-11-26-introduction-pulsarctl\u002F","many more",[48,66184,66185],{},"This article will go over ten pulsarctl commands that are useful for Pulsar cluster administration. The commands here are chosen based on how frequently they are used and how vital they are in configuring, managing, and utilizing various aspects of Apache Pulsar.",[40,66187,66189,66190],{"id":66188},"_1-clusters-create","1. ",[55,66191,66194],{"href":66192,"rel":66193},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-add-em-",[264],"clusters create",[48,66196,66197,66198,66202,66203,66208,66209,66213,66214,66219],{},"A Pulsar ",[55,66199,15942],{"href":66200,"rel":66201},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-architecture-overview\u002F#clusters",[264]," is one of the highest level components of a Pulsar instance. A Pulsar cluster consists of one or more ",[55,66204,66207],{"href":66205,"rel":66206},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-architecture-overview\u002F#brokers",[264],"brokers",", one or more ",[55,66210,12106],{"href":66211,"rel":66212},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Freference-terminology#bookkeeper",[264]," servers, and a ",[55,66215,66218],{"href":66216,"rel":66217},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Freference-terminology#zookeeper",[264],"ZooKeeper"," server. A Pulsar instance can consist of one or more clusters. Since clusters sit at the heart of a Pulsar instance, it is crucial to be able to create new clusters, which can be achieved through the clusters create command. This command takes the following arguments:",[1666,66221,66222,66225,66228],{},[324,66223,66224],{},"The cluster name.",[324,66226,66227],{},"A broker service URL using the --broker-url flag.",[324,66229,66230],{},"A Pulsar cluster web service URL using the --url flag.",[48,66232,66233,66234,190],{},"You can find other flags accepted by this command in the ",[55,66235,52472],{"href":66192,"rel":66236},[264],[32,66238,66240],{"id":66239},"usage","Usage:",[8325,66242,66245],{"className":66243,"code":66244,"language":8330},[8328],"pulsarctl clusters create --url http:\u002F\u002Flocalhost:8080 --broker-url pulsar:\u002F\u002Flocalhost:6650 my-cluster\n",[4926,66246,66244],{"__ignoreMap":18},[48,66248,66249],{},[384,66250],{"alt":66251,"src":66252},"Output of clusters create command","\u002Fimgs\u002Fblogs\u002F63b3556ee69774604b7e437e_4faxj3i.png",[40,66254,66256,66257],{"id":66255},"_2-topics-create","2. ",[55,66258,66261],{"href":66259,"rel":66260},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-create-em--33",[264],"topics create",[48,66263,4221,66264,66268,66269,4003,66274,66279,66280,66285],{},[55,66265,9857],{"href":66266,"rel":66267},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-messaging\u002F#topics",[264]," in Pulsar is a named channel used to transmit messages from producers to consumers. As this is a vital part of Pulsar, it’s important to know how to create a topic. Pulsar supports two types of topics: ",[55,66270,66273],{"href":66271,"rel":66272},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-messaging\u002F#persistent-topics",[264],"persistent",[55,66275,66278],{"href":66276,"rel":66277},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-messaging\u002F#non-persistent-topics",[264],"non-persistent",". It also supports ",[55,66281,66284],{"href":66282,"rel":66283},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-messaging\u002F#partitioned-topics",[264],"partitioned topics",". You can create a topic with the topics create command. The command takes the following arguments:",[1666,66287,66288,66291],{},[324,66289,66290],{},"The topic name in the format {persistent|non-persistent}:\u002F\u002Ftenant\u002Fnamespace\u002Ftopic. If the type of the topic is omitted, it defaults to persistent. If the tenant and namespaces are omitted, the topic is created in the default tenant and namespace.",[324,66292,66293],{},"An integer partition number. If the partition number is 0, the resultant topic is non-partitioned.",[32,66295,66240],{"id":66296},"usage-1",[3933,66298,66300],{"id":66299},"_1-creating-a-persistent-topic","1. Creating a persistent topic",[8325,66302,66305],{"className":66303,"code":66304,"language":8330},[8328],"pulsarctl topics create persistent:\u002F\u002Fpublic\u002Fdefault\u002Fexample-topic 0\n",[4926,66306,66304],{"__ignoreMap":18},[3933,66308,66310],{"id":66309},"_2-creating-a-non-persistent-topic","2. Creating a non-persistent topic",[8325,66312,66315],{"className":66313,"code":66314,"language":8330},[8328],"pulsarctl topics create non-persistent:\u002F\u002Fpublic\u002Fdefault\u002Fexample-topic-2 0\n",[4926,66316,66314],{"__ignoreMap":18},[3933,66318,66320],{"id":66319},"_3-creating-a-partitioned-topic-in-the-default-tenant-and-namespace","3. Creating a partitioned topic in the default tenant and namespace",[8325,66322,66325],{"className":66323,"code":66324,"language":8330},[8328],"pulsarctl topics create example-topic-3 2\n",[4926,66326,66324],{"__ignoreMap":18},[40,66328,66330,66331],{"id":66329},"_3-topics-list","3. ",[55,66332,66335],{"href":66333,"rel":66334},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-list-em--36",[264],"topics list",[48,66337,66338],{},"Just like creating a new topic, listing all the existing topics is a job you need to perform frequently as a Pulsar cluster administrator. The topics list command provides a quick overview of all the topics in a specified namespace. The command takes as argument the namespace in the format tenant\u002Fnamespace. The output is shown in a table format where you get the names of the topics as well as whether they’re partitioned. You can also get the output in JSON or YAML format by using the --output=json or --output=yaml flags, respectively.",[32,66340,66240],{"id":66341},"usage-2",[3933,66343,66345],{"id":66344},"_1-list-all-the-topics-in-chosen-namespace-topic","1. List all the topics in chosen namespace \u002F topic",[8325,66347,66350],{"className":66348,"code":66349,"language":8330},[8328],"pulsarctl topics list public\u002Fdefault\n",[4926,66351,66349],{"__ignoreMap":18},[48,66353,66354],{},[384,66355],{"alt":66356,"src":66357},"Output of topics list command","\u002Fimgs\u002Fblogs\u002F63b355d9dad60e7f70164dc4_97M4FKm.png",[3933,66359,66361],{"id":66360},"_2-list-all-the-topics-for-chosen-namespace-topic-in-json-format","2. List all the topics for chosen namespace \u002F topic in JSON format",[8325,66363,66366],{"className":66364,"code":66365,"language":8330},[8328],"pulsarctl topics list public\u002Fdefault --output=json\n",[4926,66367,66365],{"__ignoreMap":18},[48,66369,66370],{},[384,66371],{"alt":66372,"src":66373},"JSON output of topics list command","\u002Fimgs\u002Fblogs\u002F63b355ef13b5d223fbd4e79e_zzAlF0l.png",[3933,66375,66377],{"id":66376},"_3-list-all-the-topics-for-chosen-namespace-topic-in-yaml-format","3. List all the topics for chosen namespace \u002F topic in YAML format",[8325,66379,66382],{"className":66380,"code":66381,"language":8330},[8328],"pulsarctl topics list public\u002Fdefault --output=yaml\n",[4926,66383,66381],{"__ignoreMap":18},[48,66385,66386],{},[384,66387],{"alt":66388,"src":66389},"YAML output of topics list command","\u002Fimgs\u002Fblogs\u002F63b35607b4c6554a246103a6_xbPAYMR.png",[40,66391,66393,66394],{"id":66392},"_4-subscriptions-create","4. ",[55,66395,66398],{"href":66396,"rel":66397},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-create-em--42",[264],"subscriptions create",[48,66400,4221,66401,66405],{},[55,66402,55116],{"href":66403,"rel":66404},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-messaging\u002F#subscriptions",[264],", just like topics, is an integral part of a messaging system. It defines a named configuration that dictates how messages are delivered to consumers. You can create a subscription on a topic with the subscriptions create command. You can also subscribe to a topic from the latest position or specify a position manually. The command takes the following arguments:",[1666,66407,66408,66411,66414],{},[324,66409,66410],{},"The topic name. The format is the same as described in the topic create command.",[324,66412,66413],{},"The subscription name.",[324,66415,66416],{},"The --messageId flag that determines the position. The argument to this flag can be latest, earliest, or a message-id in the format ledgerId:entryId.",[32,66418,66240],{"id":66419},"usage-3",[3933,66421,66423],{"id":66422},"_1-create-a-subscription-from-the-latest-position","1. Create a subscription from the latest position",[8325,66425,66428],{"className":66426,"code":66427,"language":8330},[8328],"pulsarctl subscriptions create my-topic my-subscription\n",[4926,66429,66427],{"__ignoreMap":18},[3933,66431,66433],{"id":66432},"_2-create-a-subscription-from-a-specific-position","2. Create a subscription from a specific position",[8325,66435,66438],{"className":66436,"code":66437,"language":8330},[8328],"pulsarctl subscriptions create --messageId 656:1 my-topic my-subscription-2\n",[4926,66439,66437],{"__ignoreMap":18},[40,66441,66443,66444],{"id":66442},"_5-subscriptions-list","5. ",[55,66445,66448],{"href":66446,"rel":66447},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-list-em--44",[264],"subscriptions list",[48,66450,66451],{},"The subscriptions list command can be used to list all subscriptions of a topic. This command was included because it’s a frequent operation when dealing with subscriptions. It takes the name of the topic as argument and displays the subscriptions in a table format.",[32,66453,66240],{"id":66454},"usage-4",[8325,66456,66459],{"className":66457,"code":66458,"language":8330},[8328],"pulsarctl subscriptions list my-topic\n",[4926,66460,66458],{"__ignoreMap":18},[40,66462,66464,66465],{"id":66463},"_6-functions-create","6. ",[55,66466,66469],{"href":66467,"rel":66468},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-create-em-",[264],"functions create",[48,66471,66472,66475,66476,66480],{},[55,66473,15627],{"href":63347,"rel":66474},[264]," are lightweight compute processes that consume messages from one or more topics, apply a user-specified processing logic to each message, and publish the result to another topic. They are similar in concept to serverless cloud platforms like AWS Lambda or Google Cloud Functions. Pulsar Functions can be written in Java, Go, or Python and deployed to the cluster using the functions create command. This command accepts a ",[55,66477,66479],{"href":66467,"rel":66478},[264],"number of flags",", such as:",[1666,66482,66483,66486,66489,66492,66495,66498,66501,66504],{},[324,66484,66485],{},"--name - The name of the function.",[324,66487,66488],{},"--jar - Path to the JAR file of the function if it’s written in Java.",[324,66490,66491],{},"--go and --py - Path to the main Go executable binary, or the main Python file of the function if it’s written in Go or Python, respectively.",[324,66493,66494],{},"--class - The class name of the function.",[324,66496,66497],{},"--inputs - A comma-separated list of input topics.",[324,66499,66500],{},"--output - The output topic.",[324,66502,66503],{},"--tenant - The tenant of the function.",[324,66505,66506],{},"--namespace - The namespace of the function.",[32,66508,66240],{"id":66509},"usage-5",[3933,66511,66513],{"id":66512},"_1-create-a-function-in-java","1. Create a function in Java",[8325,66515,66518],{"className":66516,"code":66517,"language":8330},[8328],"pulsarctl functions create \\\n    --tenant public \\\n    --namespace default \\\n    --name my-function \\\n    --inputs my-input-topic \\\n    --output persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-output-topic \\\n    --classname org.apache.pulsar.functions.api.examples.ExclamationFunction \\\n    --jar \u002Fexamples\u002Fapi-examples.jar\n",[4926,66519,66517],{"__ignoreMap":18},[3933,66521,66523],{"id":66522},"_2-create-a-function-in-python","2. Create a function in Python",[8325,66525,66528],{"className":66526,"code":66527,"language":8330},[8328],"pulsarctl functions create \\\n    --tenant public \\\n    --namespace default \\\n    --name my-function \\\n    --inputs my-input-topic \\\n    --output persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-output-topic \\\n    --classname my_function.MyFunction \\\n    --py my_function.py\n",[4926,66529,66527],{"__ignoreMap":18},[40,66531,66533,66534],{"id":66532},"_7-functions-putstate","7. ",[55,66535,66538],{"href":66536,"rel":66537},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-putstate-em-",[264],"functions putstate",[48,66540,66541],{},"Pulsar integrates with Apache BookKeeper to store states of functions. These states are stored as simple key\u002Fvalue pairs that can be accessed or modified by the function or via the admin API. This can be extremely useful in cases where you need to persist state across restarts. For example, a counter that keeps track of the number of words in messages processed by a function can be stored as a state. The functions putstate command allows you to put a key\u002Fvalue pair to the state associated with a function.",[48,66543,66544],{},"To put a key\u002Fvalue pair, you need to pass the key and value in key - value format, and to put a key\u002Ffile path pair, the format is key = filepath.",[48,66546,66547],{},"The command takes the following flags:",[1666,66549,66550,66553,66555],{},[324,66551,66552],{},"--tenant and --namespace - The tenant and namespace of the function. If omitted, the default tenant and namespace are used.",[324,66554,66485],{},[324,66556,66557,66558,190],{},"--fqfn - The ",[55,66559,66562],{"href":66560,"rel":66561},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Ffunctions-overview\u002F#fully-qualified-function-name-fqfn",[264],"fully qualified function name",[32,66564,66240],{"id":66565},"usage-6",[3933,66567,66569],{"id":66568},"_1-put-a-keyvalue-pair","1. Put a key\u002Fvalue pair",[8325,66571,66574],{"className":66572,"code":66573,"language":8330},[8328],"pulsarctl functions putstate --name my-function \\\n            pulsar - hello pulsar\n",[4926,66575,66573],{"__ignoreMap":18},[48,66577,66578],{},"The string hello pulsar is stored under the key pulsar.",[48,66580,66581],{},[384,66582],{"alt":66583,"src":66584},"Output of functions putstate command","\u002Fimgs\u002Fblogs\u002F63b3569498258982afa4598e_fPPtAig.png",[3933,66586,66588],{"id":66587},"_2-put-a-keyfile-path-pair","2. Put a key\u002Ffile path pair",[8325,66590,66593],{"className":66591,"code":66592,"language":8330},[8328],"pulsarctl functions putstate --name my-function \\\n        fileKey = some_file\n",[4926,66594,66592],{"__ignoreMap":18},[48,66596,66597],{},"The contents of the file some_file is stored under the key fileKey as a ByteValue.",[48,66599,66600],{},[384,66601],{"alt":66583,"src":66602},"\u002Fimgs\u002Fblogs\u002F63b356de56366c0d985fad20_jnTsagW.png",[40,66604,66606,66607],{"id":66605},"_8-functions-querystate","8. ",[55,66608,66611],{"href":66609,"rel":66610},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-querystate-em-",[264],"functions querystate",[48,66613,66614],{},"Closely related to the previous command, the functions querystate command can be used to query the state associated with a function. Just like the last command, this one is also frequently used while working with functions. The command takes the following flags:",[1666,66616,66617,66619,66621,66626,66629,66632],{},[324,66618,66485],{},[324,66620,66552],{},[324,66622,66557,66623,190],{},[55,66624,66562],{"href":66560,"rel":66625},[264],[324,66627,66628],{},"--key - The key of the state to be queried.",[324,66630,66631],{},"--watch - Whether to watch for changes in the value associated with the key.",[324,66633,66634],{},"--output - The output format (text, JSON, YAML).",[32,66636,66240],{"id":66637},"usage-7",[3933,66639,66641],{"id":66640},"_1-query-a-state","1. Query a state",[8325,66643,66646],{"className":66644,"code":66645,"language":8330},[8328],"pulsarctl functions querystate --name my-function --key pulsar\n",[4926,66647,66645],{"__ignoreMap":18},[3933,66649,66651],{"id":66650},"_2-query-a-state-and-watch-for-changes","2. Query a state and watch for changes",[8325,66653,66656],{"className":66654,"code":66655,"language":8330},[8328],"pulsarctl functions querystate --name my-function --key pulsar --watch\n",[4926,66657,66655],{"__ignoreMap":18},[48,66659,66660],{},[384,66661],{"alt":66662,"src":66663},"Output of functions querystate command","\u002Fimgs\u002Fblogs\u002F63b3570713b5d20200d67b8c_RkH1ncw.png",[40,66665,66667,66668],{"id":66666},"_9-context-set","9. ",[55,66669,66672],{"href":66670,"rel":66671},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-set-em--54",[264],"context set",[48,66674,66675],{},"Context in pulsarctl is instrumental if you’re working with a multi-cluster setup. Using contexts, you can cache the information of multiple clusters and seamlessly switch between them. The context set command creates a new context and stores it in $HOME\u002F.config\u002Fpulsar. The command takes the following arguments:",[1666,66677,66678,66681,66684],{},[324,66679,66680],{},"The name of the context.",[324,66682,66683],{},"The broker service URL using the --admin-service-url flag.",[324,66685,66686],{},"The bookie service URL using the --bookie-service-url flag.",[32,66688,66240],{"id":66689},"usage-8",[48,66691,66692],{},"Create a context named my-context-1",[8325,66694,66697],{"className":66695,"code":66696,"language":8330},[8328],"pulsarctl context set my-context-1 --admin-service-url=\"http:\u002F\u002Flocalhost:8080\" --bookie-service-url=\"http:\u002F\u002Flocalhost:6650\"\n",[4926,66698,66696],{"__ignoreMap":18},[48,66700,66701],{},[384,66702],{"alt":66703,"src":66704},"Output of context set command","\u002Fimgs\u002Fblogs\u002F63b3572280a830f56dcd576b_WoMM0h7.png",[40,66706,66708,66709],{"id":66707},"_10-context-use","10. ",[55,66710,66713],{"href":66711,"rel":66712},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F#-em-use-em-",[264],"context use",[48,66715,66716],{},"Once you have created the contexts, you can use context use to switch between them. The command takes the name of the context as argument and switches to that context. Since this command is used heavily in a multi-cluster setup, it undoubtedly makes the list of top pulsarctl commands.",[32,66718,66240],{"id":66719},"usage-9",[48,66721,66722],{},"Switch to the context named my-context-1",[8325,66724,66727],{"className":66725,"code":66726,"language":8330},[8328],"pulsarctl context use my-context-1\n",[4926,66728,66726],{"__ignoreMap":18},[40,66730,2125],{"id":2122},[48,66732,66733,66734,190],{},"Pulsarctl can be an indispensable tool for managing Apache Pulsar. Compared to pulsar-admin, pulsarctl commands provide a more user-friendly, intuitive interface. The ten pulsarctl commands listed here are used often and provide a good starting point for learning how to use pulsarctl. For a complete reference of all the commands, check out the ",[55,66735,66738],{"href":66736,"rel":66737},"https:\u002F\u002Fdocs.streamnative.io\u002Fpulsarctl\u002Fv2.7.0.7\u002F",[264],"pulsarctl documentation",[48,66740,66741,66742,66744,66745,66748,66749,66752],{},"If you’re looking for a robust messaging and streaming solution, check out ",[55,66743,4496],{"href":10259},". StreamNative, from the creators of Apache Pulsar, provides offerings like ",[55,66746,3550],{"href":66747},"\u002Fcloud\u002Fmanaged","—Apache Pulsar as a Service—and ",[55,66750,44086],{"href":66751},"\u002Fplatform",", a cloud-native streaming platform. With its global 24\u002F7 support and unparalleled expertise in Apache Pulsar, it is an excellent choice for a messaging platform.",{"title":18,"searchDepth":19,"depth":19,"links":66754},[66755,66759,66763,66767,66771,66775,66779,66783,66787,66791,66795],{"id":66188,"depth":19,"text":66756,"children":66757},"1. clusters create",[66758],{"id":66239,"depth":279,"text":66240},{"id":66255,"depth":19,"text":66760,"children":66761},"2. topics create",[66762],{"id":66296,"depth":279,"text":66240},{"id":66329,"depth":19,"text":66764,"children":66765},"3. topics list",[66766],{"id":66341,"depth":279,"text":66240},{"id":66392,"depth":19,"text":66768,"children":66769},"4. subscriptions create",[66770],{"id":66419,"depth":279,"text":66240},{"id":66442,"depth":19,"text":66772,"children":66773},"5. subscriptions list",[66774],{"id":66454,"depth":279,"text":66240},{"id":66463,"depth":19,"text":66776,"children":66777},"6. functions create",[66778],{"id":66509,"depth":279,"text":66240},{"id":66532,"depth":19,"text":66780,"children":66781},"7. functions putstate",[66782],{"id":66565,"depth":279,"text":66240},{"id":66605,"depth":19,"text":66784,"children":66785},"8. functions querystate",[66786],{"id":66637,"depth":279,"text":66240},{"id":66666,"depth":19,"text":66788,"children":66789},"9. context set",[66790],{"id":66689,"depth":279,"text":66240},{"id":66707,"depth":19,"text":66792,"children":66793},"10. context use",[66794],{"id":66719,"depth":279,"text":66240},{"id":2122,"depth":19,"text":2125},"2021-11-23","Pulsarctl is an improved alternative to the traditional pulsar-admin. Learn how to perform administrative operations on Pulsar with these 10 Pulsarctl commands.","\u002Fimgs\u002Fblogs\u002F63c7fb829046660d7d11b4aa_63b3553d893b10bc4f8bb89a_screen-shot-2021-11-23-at-12.07.13-pm.png",{},"\u002Fblog\u002F10-useful-pulsarctl-commands-manage-cluster",{"title":66150,"description":66797},"blog\u002F10-useful-pulsarctl-commands-manage-cluster",[7347,821,11899,27847],"uHrSfeAR_1fh6aBDVf-vZB2x0c-zZABCGklA-Xya2fc",{"id":66806,"title":66807,"authors":66808,"body":66809,"category":821,"createdAt":290,"date":67177,"description":67178,"extension":8,"featured":294,"image":67179,"isDraft":294,"link":290,"meta":67180,"navigation":7,"order":296,"path":67181,"readingTime":5505,"relatedResources":290,"seo":67182,"stem":67183,"tags":67184,"__hash__":67185},"blogs\u002Fblog\u002Fbuilding-edge-applications-apache-pulsar.md","Building Edge Applications With Apache Pulsar",[46357],{"type":15,"value":66810,"toc":67157},[66811,66820,66823,66826,66830,66833,66847,66851,66854,66857,66874,66877,66883,66887,66890,66896,66899,66902,66905,66908,66911,66917,66923,66926,66929,66935,66939,66942,66956,66959,66965,66969,66975,66979,66985,66992,66995,66999,67006,67012,67016,67019,67023,67029,67033,67039,67043,67049,67053,67059,67063,67069,67075,67079,67082,67093,67097,67111,67114],[48,66812,66813,66814,66819],{},"The explosive growth of connected remote devices is posing challenges for the centralized computing paradigm. Due to network and infrastructure limitations, organizations find it increasingly difficult to move and process all the device-generated data in data centers or the cloud without latency or performance issues. As a result, edge applications are on the rise. By ",[55,66815,66818],{"href":66816,"rel":66817},"https:\u002F\u002Fwww.gartner.com\u002Fsmarterwithgartner\u002Fwhat-edge-computing-means-for-infrastructure-and-operations-leaders",[264],"Gartner’s estimation",", 75% of enterprise data will be created and processed outside data centers or the cloud by 2025.",[48,66821,66822],{},"So what are edge applications? Edge applications run on or near the sources of data, such as IoT devices and local edge servers, and edge execution. Edge computing enables computation, storage, cache, management, alerting, machine learning, and routing to happen beyond data centers and the cloud. Organizations across industries, such as retail, farming, manufacturing, transportation, healthcare, and telecommunications, are adopting edge applications to achieve lower latency, better bandwidth availability, lower infrastructure costs, and faster decision-making.",[48,66824,66825],{},"In this article, you will learn some of the challenges of developing edge applications and why Apache Pulsar is the solution. You will also learn how to build edge applications using Pulsar with a step-by-step example.",[40,66827,66829],{"id":66828},"_1-key-challenges","1. Key Challenges",[48,66831,66832],{},"While the decentralized nature of edge computing offers a multitude of benefits, it also poses challenges. Some of the key challenges include:",[321,66834,66835,66838,66841,66844],{},[324,66836,66837],{},"Edge applications often need to support a variety of devices, protocols, languages, and data formats.",[324,66839,66840],{},"Communication from edge applications needs to be asynchronous with a stream of events from sensors, logs, and applications at a rapid but uneven pace.",[324,66842,66843],{},"By design, edge producers of data require diverse messaging cluster deployments.",[324,66845,66846],{},"By design, edge applications are geographically distributed and heterogeneous.",[40,66848,66850],{"id":66849},"_2-the-solution","2. The Solution",[48,66852,66853],{},"To overcome the key challenges of building edge applications, you need an adaptable, hybrid, geo-replicated, extensible, and open-source solution. A widely-adopted open-source project provides the support of an engaged community and a rich ecosystem of adapters, connectors, and extensions needed for edge applications. After working with different technologies and open-source projects for the past two decades, I believe Apache Pulsar solves the needs for edge applications.",[48,66855,66856],{},"Apache Pulsar is an open-source, cloud-native, distributed messaging and streaming platform. Since Pulsar became a top-level Apache Software Foundation project in 2018, its community engagement, ecosystem growth, and global adoption have skyrocketed. Pulsar is equipped to solve the many challenges of edge computing because:",[321,66858,66859,66862,66865,66868,66871],{},[324,66860,66861],{},"Apache Pulsar supports fast messaging, metadata, and many data formats with support for various schemas.",[324,66863,66864],{},"Pulsar supports a rich set of client libraries in Go, C++, Java, Node.js, Websockets, and Python. Additionally, there are community-released open-source clients for Haskell, Scala, Rust, and .Net as well as stream processing libraries for Apache Flink and Apache Spark.",[324,66866,66867],{},"Pulsar supports multiple messaging protocols, including MQTT, Kafka, AMQP, and JMS.",[324,66869,66870],{},"Pulsar’s geo-replication feature solves the issues with distributed device locations.",[324,66872,66873],{},"Pulsar is cloud-native and can run in any cloud, on-premises, or Kubernetes environment. It can also be scaled down to run on edge gateways and powerful devices like the NVIDIA Jetson Xavier NX.",[48,66875,66876],{},"In today’s examples, we build out edge applications on an NVIDIA Jetson Xavier NX, which provides us enough power to run an edge Apache Pulsar standalone broker, multiple web cameras, and deep learning edge applications with horsepower to spare. My edge device contains 384 NVIDIA CUDA® cores and 48 Tensor cores, six 64-bit ARM cores, and 8 GB of 128-bit LPDDR4x RAM. In my upcoming blogs, I will show you that running Pulsar on more restrained devices like Raspberry PI 4s and NVIDIA Jetson Nanos is still adequate for fast edge event streams.",[48,66878,66879],{},[384,66880],{"alt":66881,"src":66882},"illustration of streamnative cloud solution","\u002Fimgs\u002Fblogs\u002F63b3534e893b1075598a5fa3_screen-shot-2021-11-17-at-3.55.09-pm.png",[40,66884,66886],{"id":66885},"_3-architecture","3. Architecture",[48,66888,66889],{},"Now that we have covered the physical architecture of our solution, let’s focus on how we want to logically structure incoming data. For those of you unfamiliar with Pulsar, each topic belongs to both a tenant and a namespace as shown in the diagram below.",[48,66891,66892],{},[384,66893],{"alt":66894,"src":66895},"illustration of pulsar cluster","\u002Fimgs\u002Fblogs\u002F63b3534ed1e82e1b4334d1f7_screen-shot-2021-11-17-at-3.56.12-pm.png",[48,66897,66898],{},"These logical constructs allow us to group data together based on various criteria such as the original source of the data and different business. Once we have decided on our tenant, namespaces, and topics, we need to determine what fields we will need to collect additional data required for analytics.",[48,66900,66901],{},"Next, we need to determine the format of our data. It can be the same as the original format or we can transform it to meet transport, processing, or storage requirements. We need to ask ourselves a number of architectural questions. Plus in many cases, our devices, equipment, sensors, operating system, or transport force us to choose a specific data format.",[48,66903,66904],{},"For today’s application we are going to use JSON, which is ubiquitous for practically any language and human readable. . Apache Avro, a binary format, is also a good option, but for these blogs we will keep it simple.",[48,66906,66907],{},"Now that data format is chosen, we may need to enrich the raw data with extra fields beyond what is produced by the sensors, machine learning classification, logs, or other sources. I like to add IP address, mac address, host name, a creation timestamp, execution time, and some fields about the device health like disk space, memory, and CPU. You can add more or remove some if you don’t see a need for it or if your device already broadcasts device health. At a minimum these fields can help with debugging especially when you get thousands of devices. Therefore I always like to include them unless strict bandwidth restrictions make that impossible.",[48,66909,66910],{},"We need to find a primary key or unique identifier for our event record. Often IoT data does not have a natural one. We can synthesize one with a UUID generator at the creation of the record.",[48,66912,66913,66914,190],{},"Now that we have a list of fields, we need to fit a schema to our data and determine field names, types, defaults, and nullability. Once we have a schema defined, which we can do with JSON Schema or build a class with the fields, we can then use Pulsar SQL to query data from our topics. We can also leverage that schema to ",[55,66915,66916],{"href":60560},"run continuous SQL with Apache Flink SQL",[48,66918,66919],{},[384,66920],{"alt":66921,"src":66922},"image of streamnative cloud","\u002Fimgs\u002Fblogs\u002F63b3534e0a2ec85c118f43e7_screen-shot-2021-11-17-at-3.56.50-pm.png",[48,66924,66925],{},"For IoT applications you often want to use a time-series-capable primary data store for these events. I recommend Aerospike, InfluxDB, or ScyllaDB. We can handle this via Pulsar IO sinks or other mechanisms based on use cases and needs. We can use the Spark connector, Flink Connector, or NiFi connector if needed.",[48,66927,66928],{},"Our final event will look like the JSON in the following example.",[8325,66930,66933],{"className":66931,"code":66932,"language":8330},[8328],"\n{\"uuid\": \"xav_uuid_video0_lmj_20211027011044\", \"camera\": \"\u002Fdev\u002Fvideo0\", \"ipaddress\": \"192.168.1.70\", \"networktime\": 4.284832000732422, \"top1pct\": 47.265625, \"top1\": \"spotlight, spot\", \"cputemp\": \"29.0\", \"gputemp\": \"28.5\", \"gputempf\": \"83\", \"cputempf\": \"84\", \"runtime\": \"4\", \"host\": \"nvidia-desktop\", \"filename\": \"\u002Fhome\u002Fnvidia\u002Fnvme\u002Fimages\u002Fout_video0_tje_20211027011044.jpg\", \"imageinput\": \"\u002Fhome\u002Fnvidia\u002Fnvme\u002Fimages\u002Fimg_video0_eqi_20211027011044.jpg\", \"host_name\": \"nvidia-desktop\", \"macaddress\": \"70:66:55:15:b4:a5\", \"te\": \"4.1648781299591064\", \"systemtime\": \"10\u002F26\u002F2021 21:10:48\", \"cpu\": 11.7, \"diskusage\": \"32367.5 MB\", \"memory\": 82.1}\n\n",[4926,66934,66932],{"__ignoreMap":18},[40,66936,66938],{"id":66937},"_4-edge-producers","4. Edge Producers",[48,66940,66941],{},"Let’s test out a few libraries, languages, and clients on our NVIDIA Jetson Xavier NX to see which is best for our use case. After prototyping a number of libraries that ran on Ubuntu with NVIDIA Jetson Xavier NX’s version of ARM, I have found a number of options that produce messages in line with what I need for my application. These are not the only but very good options for this edge platform.",[321,66943,66944,66947,66950,66953],{},[324,66945,66946],{},"Go Lang Pulsar Producer",[324,66948,66949],{},"Python 3.x Websocket Producer",[324,66951,66952],{},"Python 3.x MQTT Producer",[324,66954,66955],{},"Java 8 Pulsar Producer",[32,66957,66946],{"id":66958},"go-lang-pulsar-producer",[8325,66960,66963],{"className":66961,"code":66962,"language":8330},[8328],"\npackage main\n\nimport (\n        \"context\"\n        \"fmt\"\n        \"log\"\n        \"github.com\u002Fapache\u002Fpulsar-client-go\u002Fpulsar\"\n        \"github.com\u002Fstreamnative\u002Fpulsar-examples\u002Fcloud\u002Fgo\u002Fccloud\"\n       \"github.com\u002Fhpcloud\u002Ftail\"\n)\n\nfunc main() {\n    client := ccloud.CreateClient()\n\n    producer, err := client.CreateProducer(pulsar.ProducerOptions{\n        Topic: \"jetson-iot\",\n    })\n    if err != nil {\n        log.Fatal(err)\n    }\n    defer producer.Close()\n\n    t, err := tail.TailFile(\"demo1.log\", tail.Config{Follow:true})\n        for line := range t.Lines {\n        if msgId, err := producer.Send(context.Background(), \n&pulsar.ProducerMessage{\n            Payload: []byte(line.Text),\n        }); err != nil {\n            log.Fatal(err)\n        } else {\n            fmt.Printf(\"jetson:Published message: %v-%s \\n\", \nmsgId,line.Text)\n        }\n    }\n}\n\n",[4926,66964,66962],{"__ignoreMap":18},[32,66966,66968],{"id":66967},"python-3-websocket-producer","Python 3 Websocket Producer",[8325,66970,66973],{"className":66971,"code":66972,"language":8330},[8328],"\nimport requests, uuid, websocket, base64, json\n\nuuid2 = uuid.uuid4()\nrow = {}\nrow['host'] = 'nvidia-desktop'\nws = websocket.create_connection( 'ws:\u002F\u002Fserver:8080\u002Fws\u002Fv2\u002Fproducer\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fenergy')\nmessage = str(json.dumps(row) )\nmessage_bytes = message.encode('ascii')\nbase64_bytes = base64.b64encode(message_bytes)\nbase64_message = base64_bytes.decode('ascii')\nws.send(json.dumps({ 'payload' : base64_message, 'properties': { 'device' : 'jetson2gb', 'protocol' : 'websockets' },'key': str(uuid2), 'context' : 5 }))\nresponse =  json.loads(ws.recv())\nif response['result'] == 'ok':\n            print ('Message published successfully')\nelse:\n            print ('Failed to publish message:', response)\nws.close()\n\n",[4926,66974,66972],{"__ignoreMap":18},[32,66976,66978],{"id":66977},"java-pulsar-producer-with-schema","Java Pulsar Producer with Schema",[8325,66980,66983],{"className":66981,"code":66982,"language":8330},[8328],"\npublic static void main(String[] args) throws Exception {\n        JCommanderPulsar jct = new JCommanderPulsar();\n        JCommander jCommander = new JCommander(jct, args);\n        if (jct.help) {\n            jCommander.usage();\n            return;\n        }\n        PulsarClient client = null;\n\n        if ( jct.issuerUrl != null && jct.issuerUrl.trim().length() > \n0 ) {\n            try {\n                client = PulsarClient.builder()\n                        .serviceUrl(jct.serviceUrl.toString())\n                        .authentication(\nAuthenticationFactoryOAuth2.clientCredentials(new URL(jct.issuerUrl.toString()),new URL(jct.credentialsUrl.toString()), jct.audience.toString())).build();\n            } catch (PulsarClientException e) {\n                e.printStackTrace();\n            } catch (MalformedURLException e) {\n                e.printStackTrace();\n            }\n        }\n        else {\n            try {\n                client = PulsarClient.builder().serviceUrl(jct.serviceUrl.toString()).build();\n            } catch (PulsarClientException e) {\n                e.printStackTrace();\n            }\n        }\n\n        UUID uuidKey = UUID.randomUUID();\n        String pulsarKey = uuidKey.toString();\n        String OS = System.getProperty(\"os.name\").toLowerCase();\n        String message = \"\" + jct.message;\n        IoTMessage iotMessage = parseMessage(\"\" + jct.message);\n        String topic = DEFAULT_TOPIC;\n        if ( jct.topic != null && jct.topic.trim().length()>0) {\n            topic = jct.topic.trim();\n        }\n        ProducerBuilder producerBuilder = client.newProducer(JSONSchema.of(IoTMessage.class))\n                .topic(topic)\n                .producerName(\"jetson\").\n                sendTimeout(5, TimeUnit.SECONDS);\n\n        Producer producer = producerBuilder.create();\n\n        MessageId msgID = producer.newMessage()\n                .key(iotMessage.getUuid())\n                .value(iotMessage)\n                .property(\"device\", OS)\n                .property(\"uuid2\", pulsarKey)\n                .send();\n        producer.close();\n        client.close();\n        producer = null;\n        client = null;\n    }\n\n   private static IoTMessage parseMessage(String message) {\n\n        IoTMessage iotMessage = null;\n\n        try {\n            if ( message != null && message.trim().length() > 0) {\n                ObjectMapper mapper = new ObjectMapper();\n                iotMessage = mapper.readValue(message, IoTMessage.class);\n                mapper = null;\n            }\n        }\n        catch(Throwable t) {\n            t.printStackTrace();\n        }\n\n        if (iotMessage == null) {\n            iotMessage = new IoTMessage();\n        }\n        return iotMessage;\n    }\n\njava -jar target\u002FIoTProducer-1.0-jar-with-dependencies.jar --serviceUrl pulsar:\u002F\u002Fnvidia-desktop:6650 --topic 'iotjetsonjson' --message \"...JSON…\"\n \n",[4926,66984,66982],{"__ignoreMap":18},[48,66986,66987,66988,190],{},"You can find all the source code ",[55,66989,267],{"href":66990,"rel":66991},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fstreamnative-academy\u002Ftree\u002Fmaster\u002Fiot-examples",[264],[48,66993,66994],{},"Now we determine how to execute our applications on the devices. It can be using a scheduler that comes with the system such as cron or some add-on. I often use cron, MiNiFi agents, a shell script, or run the application continuously as a service. You will have to investigate your device and sensor for optimal scheduling.",[40,66996,66998],{"id":66997},"_5-validate-data-and-monitor","5. Validate Data and Monitor",[48,67000,67001,67002,190],{},"Now that we have a continuous stream of events streaming into our Pulsar cluster, we can validate the data and monitor our progress. The easiest option is to use StreamNative Cloud Manager for a fresh web interface to our unified messaging data, as shown in the diagram below. We also have the option to view the Pulsar metrics endpoint as documented ",[55,67003,267],{"href":67004,"rel":67005},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fstable\u002Fmonitor\u002Foverview",[264],[48,67007,67008],{},[384,67009],{"alt":67010,"src":67011},"Data and Monitor streamnative cloud","\u002Fimgs\u002Fblogs\u002F63b354030a534d4722d4b28c_screen-shot-2021-11-17-at-3.57.56-pm.png",[32,67013,67015],{"id":67014},"check-stats-via-rest","Check Stats via REST",[48,67017,67018],{},"http:\u002F\u002F:8080\u002Fadmin\u002Fv2\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fmqtt-2\u002Fstats http:\u002F\u002F:8080\u002Fadmin\u002Fv2\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fmqtt-2\u002FinternalStats",[32,67020,67022],{"id":67021},"check-stats-via-admin-cli","Check Stats via Admin CLI",[8325,67024,67027],{"className":67025,"code":67026,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics stats-internal persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmqtt-2\n\n",[4926,67028,67026],{"__ignoreMap":18},[32,67030,67032],{"id":67031},"find-subscriptions-to-your-topic","Find Subscriptions to Your Topic",[48,67034,67035],{},[55,67036,67037],{"href":67037,"rel":67038},"http:\u002F\u002Fnvidia-desktop:8080\u002Fadmin\u002Fv2\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fmqtt-2\u002Fsubscriptions",[264],[32,67040,67042],{"id":67041},"consume-from-subscription-via-rest","Consume from Subscription via REST",[48,67044,67045],{},[55,67046,67047],{"href":67047,"rel":67048},"http:\u002F\u002Fnvidia-desktop:8080\u002Fadmin\u002Fv2\u002Fpersistent\u002Fpublic\u002Fdefault\u002Fmqtt-2\u002Fsubscription\u002Fmqtt2\u002Fposition\u002F10",[264],[32,67050,67052],{"id":67051},"consume-messages-via-cli","Consume Messages via CLI",[8325,67054,67057],{"className":67055,"code":67056,"language":8330},[8328],"\nbin\u002Fpulsar-client consume \"persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmqtt-2\" -s \"mqtt2\" -n 5\n\n",[4926,67058,67056],{"__ignoreMap":18},[32,67060,67062],{"id":67061},"query-topics-via-pulsar-sql","Query Topics via Pulsar SQL",[8325,67064,67067],{"className":67065,"code":67066,"language":8330},[8328],"\nselect * from pulsar.\"public\u002Fdefault\".iotjetsonjson;\n\n",[4926,67068,67066],{"__ignoreMap":18},[48,67070,67071],{},[384,67072],{"alt":67073,"src":67074},"quary topics pulsar sql","\u002Fimgs\u002Fblogs\u002F63b3545d6f2a26d95c46e6b5_screen-shot-2021-11-17-at-3.58.44-pm.png",[40,67076,67078],{"id":67077},"_6-next-steps","6. Next Steps",[48,67080,67081],{},"At this point, we have built an edge application that can stream data at event speed and join thousands of other applications’ streaming data into your Apache Pulsar cluster. Next we can add rich, real-time analytics with Flink SQL. This will allow us to do advanced stream processing, join event streams, and process data at scale.",[48,67083,67084,67089,67090,67092],{},[55,67085,67088],{"href":67086,"rel":67087},"https:\u002F\u002Fauth.streamnative.cloud\u002Flogin?state=hKFo2SBXY0xGZVBmSFM0czhwMVpQYVNTaWxqakFNTUdLMnl2c6FupWxvZ2luo3RpZNkgeEVHNU9hOU5oUmVOQmZtVW9NenJGUGRsdTJTdWdsamqjY2lk2SA2ZXI3M3FLcTQycUIwd2JzcjFTT01hWWJhdTdLaGxldw&client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&protocol=oauth2&audience=https%3A%2F%2Fapi.streamnative.cloud&redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&defaultMethod=signup&scope=openid%20profile%20email%20offline_access&response_type=code&response_mode=query&nonce=S2VGZUF2SDUyflAxQTVnUWpnc2FzZDZXeVZDaUI0bE5BWXN1SFh0ZXhSMA%3D%3D&code_challenge=yiiFH1BvynBWeCsB127OBYeeIpKl9_pR5q_l9UNocnM&code_challenge_method=S256&auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTMuNCJ9",[264],"Start a trial with StreamNative Cloud"," now so you can start building IoT applications immediately. With ",[55,67091,3550],{"href":66747},", you can spin up a Pulsar Cluster within minutes.",[40,67094,67096],{"id":67095},"_7-further-learning","7. Further Learning",[48,67098,67099,67100,1154,67104,67106,67107,67110],{},"This blog did not cover the Pulsar fundamentals, which you will need if you want to build your own edge applications following my methods. If you are new to Pulsar, I highly recommend that you take the ",[55,67101,67103],{"href":36485,"rel":67102},[264],"self-serve Pulsar courses",[55,67105,36491],{"href":36490}," developed by ",[55,67108,31914],{"href":31912,"rel":67109},[264],". This will get you started with Pulsar and accelerate your streaming immediately.",[48,67112,67113],{},"If you are interested in learning more about edge and building your own connectors, take a look at the following resources:",[321,67115,67116,67129,67136,67143,67150],{},[324,67117,67118,67119,67123,67124,67128],{},"My talk at Pulsar Summit EU 2021 - ",[55,67120,60898],{"href":67121,"rel":67122},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pfhoF3yTdHU&t=18s",[264]," (Get the slides ",[55,67125,267],{"href":67126,"rel":67127},"https:\u002F\u002Fwww.slideshare.net\u002Fstreamnative\u002Fusing-the-flipn-stack-for-edge-ai-flink-nifi-pulsar-pulsar-virtual-summit-europe-2021",[264],")",[324,67130,67131],{},[55,67132,67135],{"href":67133,"rel":67134},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fclient-libraries\u002F",[264],"Pulsar client libraries",[324,67137,67138],{},[55,67139,67142],{"href":67140,"rel":67141},"https:\u002F\u002Fgithub.com\u002Ftspannhw\u002FFLiP-Jetson\u002F",[264],"Example source",[324,67144,67145],{},[55,67146,67149],{"href":67147,"rel":67148},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fio-influxdb-sink\u002F",[264],"InfluxDB Pulsar IO sink connector",[324,67151,67152],{},[55,67153,67156],{"href":67154,"rel":67155},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fexamples\u002Ftree\u002Fmaster\u002Fcloud\u002Fjava\u002Fsrc\u002Fmain\u002Fjava\u002Fio\u002Fstreamnative\u002Fexamples\u002Foauth2",[264],"Java OAuth2 example for StreamNative Cloud login",{"title":18,"searchDepth":19,"depth":19,"links":67158},[67159,67160,67161,67162,67167,67175,67176],{"id":66828,"depth":19,"text":66829},{"id":66849,"depth":19,"text":66850},{"id":66885,"depth":19,"text":66886},{"id":66937,"depth":19,"text":66938,"children":67163},[67164,67165,67166],{"id":66958,"depth":279,"text":66946},{"id":66967,"depth":279,"text":66968},{"id":66977,"depth":279,"text":66978},{"id":66997,"depth":19,"text":66998,"children":67168},[67169,67170,67171,67172,67173,67174],{"id":67014,"depth":279,"text":67015},{"id":67021,"depth":279,"text":67022},{"id":67031,"depth":279,"text":67032},{"id":67041,"depth":279,"text":67042},{"id":67051,"depth":279,"text":67052},{"id":67061,"depth":279,"text":67062},{"id":67077,"depth":19,"text":67078},{"id":67095,"depth":19,"text":67096},"2021-11-17","Learn how to build edge applications using Pulsar with a step-by-step example.","\u002Fimgs\u002Fblogs\u002F63c7fb9c28dc5165ed1f81c8_63b3534eef94af3b9995275d_screen-shot-2021-11-17-at-3.29.21-pm.png",{},"\u002Fblog\u002Fbuilding-edge-applications-apache-pulsar",{"title":66807,"description":67178},"blog\u002Fbuilding-edge-applications-apache-pulsar",[38442,7347],"3hU7oW8uhTGdkYPLI-dceDslOPZ7VBHUHwH2yYBU6ic",{"id":67187,"title":67188,"authors":67189,"body":67190,"category":821,"createdAt":290,"date":67359,"description":67360,"extension":8,"featured":294,"image":67361,"isDraft":294,"link":290,"meta":67362,"navigation":7,"order":296,"path":67363,"readingTime":7986,"relatedResources":290,"seo":67364,"stem":67365,"tags":67366,"__hash__":67367},"blogs\u002Fblog\u002Fstreaming-data-pipelines-pulsar-io.md","Streaming Data Pipelines with Pulsar IO",[62554],{"type":15,"value":67191,"toc":67348},[67192,67195,67198,67206,67210,67217,67220,67242,67246,67249,67260,67264,67267,67270,67273,67277,67283,67287,67293,67297,67303,67307,67310,67314,67317,67346],[48,67193,67194],{},"Building modern data infrastructure is hard. Organizations today need to be able to manage large volumes of heterogeneous data that is being generated and delivered around the clock. With the quantity and velocity of data and the different needs it must serve, there is no \"one size fits all solution.\" Instead, organizations must be able to move data between different systems in order to store, process and serve the data.",[48,67196,67197],{},"Historically, organizations have used a number of different tools, such as Apache Kafka for streaming workloads and RabbitMQ for messaging workloads, to try to move data. Today, organizations are streamlining this process with Apache Pulsar.",[48,67199,67200,67201,67205],{},"Apache Pulsar is a cloud-native, distributed messaging and streaming platform. Designed to serve modern data needs, Pulsar supports flexible messaging semantics, tiered storage, multi-tenancy, and geo-replication. The Pulsar project has experienced ",[55,67202,67204],{"href":67203},"\u002Fblog\u002Fcommunity\u002F2021-06-14-pulsar-hits-its-400th-contributor-and-passes-kafka-in-monthly-active-contributors\u002F","rapid community growth",", ecosystem expansion, and global adoption since it became a top-level Apache Software Foundation project in 2018. With Pulsar as the backbone of data infrastructure, companies are able to move data in a fast and scalable way. In this blog post we will take a look at how you can easily ingress and egress data between Pulsar and external systems with Pulsar IO.",[40,67207,67209],{"id":67208},"_1-what-is-pulsar-io","1. What is Pulsar IO?",[48,67211,67212,67213,67216],{},"Pulsar IO is a complete toolkit for creating, deploying, and managing Pulsar connectors that integrate with external systems like key\u002Fvalue stores, distributed file systems, search indexes, databases, data warehouses, other messaging systems and more. Since Pulsar IO is built on top of Pulsar’s serverless computing layer known as ",[55,67214,63349],{"href":63347,"rel":67215},[264],", writing a Pulsar IO connector is as simple as writing a Pulsar Function.",[48,67218,67219],{},"With Pulsar IO, you can easily move data in and out of Pulsar by either using existing Apache Pulsar connectors or writing your own custom connectors. Benefits of leveraging Pulsar IO include:",[321,67221,67222,67230,67233,67236,67239],{},[324,67223,67224,67225,67229],{},"There are many ",[55,67226,67228],{"href":35258,"rel":67227},[264],"existing Pulsar IO connectors"," for external systems, such as Apache Kafka, Cassandra, and Aerospike. Using these connectors help reduce time to production, since all the necessary pieces for creating the integrations are in place. Developers just need to provide configurations (like connection urls and credentials) to run the connectors.",[324,67231,67232],{},"Pulsar IO comes with managed runtime, which takes care of execution, scheduling, scaling, and fault tolerance. Developers can focus on configurations and business logics.",[324,67234,67235],{},"You can reduce boilerplate code for producing and consuming applications, by using the provided interfaces.",[324,67237,67238],{},"You can easily scale out - for cases that you need more instances to handle the incoming traffic - by changing one simple configuration value. If you use the Kubernetes Runtime, elastic scaling on traffic demand comes out of the box.",[324,67240,67241],{},"Pulsar IO helps you leverage schemas by specifying the type of Schema on the data models and make sure that schema enforcement is in place - Json, Avro and Protobufs are supported.",[40,67243,67245],{"id":67244},"_2-pulsar-io-runtime","2. Pulsar IO Runtime",[48,67247,67248],{},"As Pulsar IO is built on top of Pulsar Function, it has the same runtime options. When deploying Pulsar IO connectors you have the following options:",[321,67250,67251,67254,67257],{},[324,67252,67253],{},"Thread: Runs inside the same JVM as the worker. (Normally used for testing purposes and local run. Not recommended for production deployments.)",[324,67255,67256],{},"Process: Runs in a different process and you can use multiple workers to scale out across multiple nodes.",[324,67258,67259],{},"Kubernetes: Runs as a pod inside your Kubernetes cluster and the worker coordinates with Kubernetes. This allows leveraging all the benefits a cloud-native environment like Kubernetes has to offer, like easy scale-out.",[40,67261,67263],{"id":67262},"_3-pulsar-io-interfaces","3. Pulsar IO Interfaces",[48,67265,67266],{},"As already mentioned, Pulsar IO reduces the boilerplate code required for producing and consuming application. It does so, by providing different base interfaces that abstract away the boilerplate code and allow us to focus on the business logic.",[48,67268,67269],{},"The Pulsar IO supports base interfaces for Sources and Sinks. Source connectors allow you to bring data into Pulsar from external systems, while Sink connectors can be used to move data out of Pulsar and into an external system such as a database.",[48,67271,67272],{},"There is also a specialized type of Source connector known as the Push Source. The Push Source connectors make it easy to implement certain integrations that require to push data. A Push Source example can be a change data capture source system that, after receiving a new change, automatically pushes that change into Pulsar.",[32,67274,67276],{"id":67275},"the-source-interface","The Source Interface",[8325,67278,67281],{"className":67279,"code":67280,"language":8330},[8328],"\npublic interface Source extends AutoCloseable {\n\n    \u002F**\n     * Open connector with configuration.\n     *\n     * @param config initialization config\n     * @param sourceContext environment where the source connector is running\n     * @throws Exception IO type exceptions when opening a connector\n     *\u002F\n    void open(final Map config, SourceContext sourceContext) throws Exception;\n\n    \u002F**\n     * Reads the next message from source.\n     * If source does not have any new messages, this call should block.\n     * @return next message from source.  The return result should never be null\n     * @throws Exception\n     *\u002F\n    Record read() throws Exception;\n}\n \n",[4926,67282,67280],{"__ignoreMap":18},[32,67284,67286],{"id":67285},"the-push-source-interface","The Push Source Interface",[8325,67288,67291],{"className":67289,"code":67290,"language":8330},[8328],"\npublic interface BatchSource extends AutoCloseable {\n\n    \u002F**\n     * Open connector with configuration.\n     *\n     * @param config config that's supplied for source\n     * @param context environment where the source connector is running\n     * @throws Exception IO type exceptions when opening a connector\n     *\u002F\n    void open(final Map config, SourceContext context) throws Exception;\n\n    \u002F**\n     * Discovery phase of a connector.  This phase will only be run on one instance, i.e. instance 0, of the connector.\n     * Implementations use the taskEater consumer to output serialized representation of tasks as they are discovered.\n     *\n     * @param taskEater function to notify the framework about the new task received.\n     * @throws Exception during discover\n     *\u002F\n    void discover(Consumer taskEater) throws Exception;\n\n    \u002F**\n     * Called when a new task appears for this connector instance.\n     *\n     * @param task the serialized representation of the task\n     *\u002F\n    void prepare(byte[] task) throws Exception;\n\n    \u002F**\n     * Read data and return a record\n     * Return null if no more records are present for this task\n     * @return a record\n     *\u002F\n    Record readNext() throws Exception;\n}\n \n",[4926,67292,67290],{"__ignoreMap":18},[32,67294,67296],{"id":67295},"the-sink-interface","The Sink Interface",[8325,67298,67301],{"className":67299,"code":67300,"language":8330},[8328],"\npublic interface Sink extends AutoCloseable {\n    \u002F**\n     * Open connector with configuration.\n     *\n     * @param config initialization config\n     * @param sinkContext environment where the sink connector is running\n     * @throws Exception IO type exceptions when opening a connector\n     *\u002F\n    void open(final Map config, SinkContext sinkContext) throws Exception;\n\n    \u002F**\n     * Write a message to Sink.\n     *\n     * @param record record to write to sink\n     * @throws Exception\n     *\u002F\n    void write(Record record) throws Exception;\n}\n \n",[4926,67302,67300],{"__ignoreMap":18},[40,67304,67306],{"id":67305},"_4-conclusion","4. Conclusion",[48,67308,67309],{},"Apache Pulsar is able to serve as the backbone of modern data infrastructure. It enables organizations to move data in a fast and scalable way. Pulsar IO is a connector framework that gives developers all the necessary tools to create, deploy, and manage Pulsar connectors that integrate with different systems. It allows developers to focus on the application logic by abstracting away all the boilerplate code.",[40,67311,67313],{"id":67312},"_5-further-reading","5. Further Reading",[48,67315,67316],{},"If you are interested in learning more and build your own connectors take a look at the following resources:",[321,67318,67319,67325,67332,67339],{},[324,67320,67321],{},[55,67322,67324],{"href":35258,"rel":67323},[264],"Discover all the available Pulsar IO connectors",[324,67326,67327],{},[55,67328,67331],{"href":67329,"rel":67330},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=w9xQyyoFds4",[264],"Building and Deploying a Source Connector",[324,67333,67334],{},[55,67335,67338],{"href":67336,"rel":67337},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KraWwiuSDkM&ab_channel=StreamNative",[264],"Writing Custom Sink Connectors for Pulsar IO",[324,67340,67341],{},[55,67342,67345],{"href":67343,"rel":67344},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=60x5eMeBEyI&ab_channel=StreamNative",[264],"Monitoring and Troubleshooting Connectors",[48,67347,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":67349},[67350,67351,67352,67357,67358],{"id":67208,"depth":19,"text":67209},{"id":67244,"depth":19,"text":67245},{"id":67262,"depth":19,"text":67263,"children":67353},[67354,67355,67356],{"id":67275,"depth":279,"text":67276},{"id":67285,"depth":279,"text":67286},{"id":67295,"depth":279,"text":67296},{"id":67305,"depth":19,"text":67306},{"id":67312,"depth":19,"text":67313},"2021-11-10","Pulsar IO is an easy-to-use framework for creating a pulsar connector between Apache Pulsar and different data systems.","\u002Fimgs\u002Fblogs\u002F63c7fbaa4176908085ef3b0d_63b3522bd283b313619becd1_screen-shot-2021-11-10-at-9.19.11-am.png",{},"\u002Fblog\u002Fstreaming-data-pipelines-pulsar-io",{"title":67188,"description":67360},"blog\u002Fstreaming-data-pipelines-pulsar-io",[7347,821],"k3dntgrNHRj1xhg09FIv8_WGFM2WSVOOxZZKoh4ePFg",{"id":67369,"title":67370,"authors":67371,"body":67373,"category":821,"createdAt":290,"date":67795,"description":67796,"extension":8,"featured":294,"image":67797,"isDraft":294,"link":290,"meta":67798,"navigation":7,"order":296,"path":67799,"readingTime":5505,"relatedResources":290,"seo":67800,"stem":67801,"tags":67802,"__hash__":67803},"blogs\u002Fblog\u002Foffset-implementation-kafka-pulsar.md","Offset Implementation in Kafka-on-Pulsar",[67372],"Yunze Xu",{"type":15,"value":67374,"toc":67776},[67375,67383,67389,67392,67395,67398,67401,67405,67409,67412,67415,67423,67426,67437,67440,67444,67447,67458,67461,67464,67468,67471,67482,67485,67489,67493,67496,67502,67505,67509,67512,67515,67518,67521,67524,67527,67531,67534,67545,67548,67552,67560,67563,67569,67572,67578,67581,67586,67589,67593,67596,67602,67605,67608,67616,67619,67622,67625,67628,67636,67639,67643,67646,67651,67654,67660,67686,67689,67709,67711,67715,67724,67727,67730,67733,67735,67738,67741,67744,67746],[48,67376,67377,67382],{},[55,67378,67381],{"href":67379,"rel":67380},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-41%3A-Pluggable-Protocol-Handler",[264],"Protocol handlers"," were introduced in Pulsar 2.5.0 (released in January 2020) to extend Pulsar’s capabilities to other messaging domains. By default, Pulsar brokers only support Pulsar protocol. With protocol handlers, Pulsar brokers can support other messaging protocols, including Kafka, AMQP, and MQTT. This allows Pulsar to interact with applications built on other messaging technologies, expanding the Pulsar ecosystem.",[48,67384,67385,67388],{},[55,67386,49302],{"href":29592,"rel":67387},[264]," is a protocol handler that brings native Kafka protocol into Pulsar. It enables developers to publish data into or fetch data from Pulsar using existing Kafka applications without code change. KoP significantly lowers the barrier to Pulsar adoption for Kafka users, making it one of the most popular protocol handlers.",[48,67390,67391],{},"KoP works by parsing Kafka protocol and accessing BookKeeper storage directly via streaming storage abstraction provided by Pulsar. While Kafka and Pulsar share many common concepts, such as topic and partition, there is no corresponding concept of Kafka’s offset in Pulsar. Early versions of KoP tackled this problem with a simple conversion method, which did not allow continuous offset and was prone to problems.",[48,67393,67394],{},"To solve this pain point, broker entry metadata was introduced in KoP 2.8.0 to enable continuous offset. With this update, KoP is available and production-ready. It is important to note that with this update backward compatibility is broken. In this blog, we dive into how KoP implemented offset before and after 2.8.0. and explain the rationale behind the breaking change.",[48,67396,67397],{},"Note on Version Compatibility",[48,67399,67400],{},"Since Pulsar 2.6.2, KoP version has been updated with Pulsar version accordingly. The version of KoP x.y.z.m conforms to Pulsar x.y.z, while m is the patch version number. For instance, the latest KoP release 2.8.1.22 is compatible with Pulsar 2.8.1. In this blog, 2.8.0 refers to both Pulsar 2.8.0 and KoP 2.8.0.",[40,67402,67404],{"id":67403},"message-identifier-in-kafka-and-pulsar","Message Identifier in Kafka and Pulsar",[32,67406,67408],{"id":67407},"offset-in-kafka","Offset in Kafka",[48,67410,67411],{},"In Kafka, offset is a 64-bit integer that represents the position of a message in a specific partition. Kafka consumers can commit an offset to a partition. If the offset is committed successfully, after the consumer restarts, it can continue consuming from the committed offset.",[48,67413,67414],{},"Kafka's offset is continuous as it follows the following constraints:",[1666,67416,67417,67420],{},[324,67418,67419],{},"The first message's offset is 0.",[324,67421,67422],{},"If the latest message's offset is N, then the next message's offset will be N+1.",[48,67424,67425],{},"Kafka stores messages in each broker's file system:",[321,67427,67428,67431,67434],{},[324,67429,67430],{},"Each partition is divided into segments",[324,67432,67433],{},"Each segment is a file that stores messages of an offset range",[324,67435,67436],{},"Each offset has a position, which is the message's start file offset (A file offset is the character's location within a file, while a Kafka offset is the index of a message within a partition.)",[48,67438,67439],{},"Since each message records the message size in the header, for a given offset, Kafka can easily find its segment file and position.",[32,67441,67443],{"id":67442},"message-id-in-pulsar","Message ID in Pulsar",[48,67445,67446],{},"Unlike Kafka, which stores messages in each broker's file system, Pulsar uses BookKeeper as its storage system. In BookKeeper:",[321,67448,67449,67452,67455],{},[324,67450,67451],{},"Each log unit is an entry",[324,67453,67454],{},"Streams of log entries are ledgers",[324,67456,67457],{},"Individual servers storing ledgers of entries are called bookies",[48,67459,67460],{},"A bookie can find any entry via a 64-bit ledger ID and a 64-bit entry ID. Pulsar can store a message or a batch (one or more messages) in an entry. Therefore, Pulsar finds a message via its message ID that consists of a ledger ID, an entry ID, and a batch index (-1 if it’s not batched). In addition, the message ID also contains the partition number.",[48,67462,67463],{},"Just like a Kafka consumer can commit an offset to record the consumer position, a Pulsar consumer can acknowledge a message ID to record the consumer position.",[32,67465,67467],{"id":67466},"how-does-kop-deal-with-a-kafka-offset","How Does KoP Deal with a Kafka Offset",[48,67469,67470],{},"KoP needs the following Kafka requests to deal with a Kafka offset:",[321,67472,67473,67476,67479],{},[324,67474,67475],{},"PRODUCE: After messages from a Kafka producer are persisted, KoP needs to tell the Kafka producer the offset of the first message. However, the BookKeeper client only returns a message ID.",[324,67477,67478],{},"FETCH : When a Kafka consumer wants to fetch some messages from a given offset, KoP needs to find the corresponding message ID to read entries from the ledger.",[324,67480,67481],{},"LIST_OFFSET: Find the earliest or latest available message, or find a message by timestamp.",[48,67483,67484],{},"We must support computing a specific message offset or locating a message by a given offset.",[40,67486,67488],{"id":67487},"how-kop-implements-offset-before-280","How KoP Implements Offset before 2.8.0",[32,67490,67492],{"id":67491},"the-implementation","The Implementation",[48,67494,67495],{},"As explained earlier, Kafka locates a message via a partition number and an offset, while Pulsar locates a message via a message ID. Before Pulsar 2.8.0, KoP simply performed conversions between Kafka offsets and Pulsar message IDs. A 64-bit offset is mapped into a 20-bit ledger ID, a 32-bit entry id, and a 12-bit batch index. Here is a simple Java implementation.",[8325,67497,67500],{"className":67498,"code":67499,"language":8330},[8328],"public static long getOffset(long ledgerId, long entryId, int batchIndex) {\n        return (ledgerId >> (32 + 12);\n        long entryId = (offset & 0x0F_FF_FF_FF_FF_FFL) >>> BATCH_BITS;\n        \u002F\u002F BookKeeper only needs a ledger id and an entry id to locate an entry\n        return new PositionImpl(ledgerId, entryId);\n    }\n",[4926,67501,67499],{"__ignoreMap":18},[48,67503,67504],{},"In this blog, we use (ledger id, entry id, batch index) to represent a message ID. For example, assuming a message's message ID is (10, 0, 0), the converted offset will be 175921860444160. This works in some cases because the offset is monotonically increasing. But it’s problematic when a ledger rollover happens or the application wants to manage offsets manually. The section below goes into details about the problems of this simple conversion implementation.",[32,67506,67508],{"id":67507},"the-problems-of-the-simple-conversion","The Problems of the Simple Conversion",[48,67510,67511],{},"The converted offset is not continuous, which causes many serious problems.",[48,67513,67514],{},"For example, let’s assume the current message's ID is (10, 5, 100). The next message's ID could be (11, 0, 0) if a ledger rollover happens. In this case, the offsets of these two messages are 175921860464740 and 193514046488576. The delta value between the two is 17,592,186,023,836.",[48,67516,67517],{},"KoP leverages Kafka's MemoryRecordBuilder to merge multiple messages into a batch. The MemoryRecordBuilder must ensure the batch size is less than the maximum value of a 32-bit integer (4,294,967,296). In our example, the delta value of the two continuous offsets is much greater than 4,294,967,296. This will result in an exception that says Maximum offset delta exceeded.",[48,67519,67520],{},"To avoid the exception, before KoP 2.8.0, we must configure maxReadEntriesNum (this config limits the maximum number of entries read by the BookKeeper client) to 1. Naturally, reading only one entry for each FETCH request worsens the performance significantly.",[48,67522,67523],{},"However, even with the workaround of maxReadEntriesNum=1, this conversion implementation doesn’t work in some scenarios. For example, the Kafka integration with Spark relies on the continuance of Kafka offsets. After it consumes a message with an offset of N, it will seek the next offset (N+1). However, the offset N+1 might not be able to convert to a valid message ID.",[48,67525,67526],{},"There are other problems caused by the conversion method. And before 2.8.0, there is no good way to implement the continuous offset.",[40,67528,67530],{"id":67529},"the-continuous-offset-implementation-since-280","The Continuous Offset Implementation since 2.8.0",[48,67532,67533],{},"The solution to implement continuous offset is to record the offset into the metadata of a message. However, an offset is determined at the broker side before publishing messages to bookies, while the metadata of a message is constructed at the client side. To solve this problem, we need to do some extra jobs at the broker side:",[1666,67535,67536,67539,67542],{},[324,67537,67538],{},"Deserialize the metadata",[324,67540,67541],{},"Set the \"offset\" property of metadata",[324,67543,67544],{},"Serialize the metadata again, including re-computing the checksum value",[48,67546,67547],{},"This results in a significant increase in CPU overhead on the broker side.",[32,67549,67551],{"id":67550},"broker-entry-metadata","Broker Entry Metadata",[48,67553,67554,67559],{},[55,67555,67558],{"href":67556,"rel":67557},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-70:-Introduce-lightweight-broker-entry-metadata",[264],"PIP 70"," introduced lightweight broker entry metadata. It's a metadata of a BookKeeper entry and should only be visible inside the broker.",[48,67561,67562],{},"The default message flow is:",[48,67564,67565],{},[384,67566],{"alt":67567,"src":67568},"graph of broker entry metadata","\u002Fimgs\u002Fblogs\u002F63b35801a3553e27d2a30b07_screen-shot-2021-11-30-at-4.45.32-pm.png",[48,67570,67571],{},"If you configured brokerEntryMetadataInterceptors, which represents a list of broker entry metadata interceptors, then the message flow would be:",[48,67573,67574],{},[384,67575],{"alt":67576,"src":67577},"graph pf broker entry metadata interceptors","\u002Fimgs\u002Fblogs\u002F63b3580180a8309857ce13bd_screen-shot-2021-11-30-at-4.51.22-pm.png",[48,67579,67580],{},"We can see the broker entry metadata is stored in bookies, but is not visible to a Pulsar consumer.",[916,67582,67583],{},[48,67584,67585],{},"From 2.9.0, a Pulsar consumer can be configured to read the broker entry metadata.",[48,67587,67588],{},"Each broker entry metadata interceptor adds the specific metadata (called \"broker entry metadata\") before the message metadata. Since the broker entry metadata is independent of the message metadata, the broker does not need to deserialize the message metadata. In addition, the BookKeeper client supports sending a Netty CompositeByteBuf that is a list of ByteBuf without any copy operation. From the perspective of a BookKeeper client, only some extra bytes are sent into the socket buffer. Therefore, the extra overhead is low and acceptable.",[32,67590,67592],{"id":67591},"the-index-metadata","The Index Metadata",[48,67594,67595],{},"We need to configure the AppendIndexMetadataInterceptor (we say index metadata interceptor) to support the Kafka offset.",[8325,67597,67600],{"className":67598,"code":67599,"language":8330},[8328],"brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor\n",[4926,67601,67599],{"__ignoreMap":18},[48,67603,67604],{},"In Pulsar brokers, there is a component named \"managed ledger\" that manages all ledgers of a partition. The index metadata interceptor maintains an index that starts from 0. The \"index\" term is used instead of \"offset\".",[48,67606,67607],{},"Each time before an entry is written to bookies, the following two things happen:",[1666,67609,67610,67613],{},[324,67611,67612],{},"The index is serialized into the broker entry metadata.",[324,67614,67615],{},"The index increases by the number of messages in the entry.",[48,67617,67618],{},"After that, each entry records the first message's index, which is equivalent to the \"base offset\" concept in Kafka.",[48,67620,67621],{},"Now, we must make sure even if the partition's owner broker was down, the index metadata interceptor would recover the index from somewhere.",[48,67623,67624],{},"There are some cases where the managed ledger needs to store its metadata (usually in ZooKeeper). For example, when a ledger is rolled over, the managed ledger must archive all ledger IDs in a z-node. Here we don't look deeper into the metadata format. We only need to know there is a property map in the managed ledger's metadata.",[48,67626,67627],{},"Before metadata is stored in ZooKeeper (or another metadata store):",[1666,67629,67630,67633],{},[324,67631,67632],{},"Retrieve the index from the index metadata interceptor, which represents the latest message's index.",[324,67634,67635],{},"Add the property whose key is \"index\" and value is the index to the property map.",[48,67637,67638],{},"Each time a managed ledger is initialized, it will restore the metadata from the metadata store. At that time, we can set the index metadata intercerptor's index to the value associated with the \"index\" key.",[32,67640,67642],{"id":67641},"how-kop-implements-the-continuous-offsets","How KoP Implements the Continuous Offsets",[48,67644,67645],{},"Let's look back to the How does KoP deal with a Kafka offset section and review how we deal with the offset in following Kafka requests.",[321,67647,67648],{},[324,67649,67650],{},"PRODUCE",[48,67652,67653],{},"When KoP handles PRODUCE requests, it leverages the managed ledger to write messages to bookies. The API has a callback that can access the entry's data.",[8325,67655,67658],{"className":67656,"code":67657,"language":8330},[8328],"@Override\n    public void addComplete(Position pos, ByteBuf entryData, Object ctx) {\n",[4926,67659,67657],{"__ignoreMap":18},[321,67661,67662,67665,67668,67671,67674,67677,67680,67683],{},[324,67663,67664],{},"We only need to parse the broker entry metadata from entryData and then retrieve the index. The index is just the base offset returned to the Kafka producer.",[324,67666,67667],{},"FETCH",[324,67669,67670],{},"The task is to find the position (ledger id and entry id) for a given offset. KoP implements a callback that reads the index from the entry and compares it with the given offset. It then passes the callback to a class named OpFindNewest, which uses binary search to find an entry.",[324,67672,67673],{},"The binary search could take some time. But it only happens on the initial search unless the Kafka consumer disconnects. After the position is found, a non-durable cursor will be created to record the position. The cursor will move to a newer position as the fetch offset increases.",[324,67675,67676],{},"LIST_OFFSET",[324,67678,67679],{},"Earliest: Get the first valid position in a managed ledger. Then read the entry of the position, and parse the index.",[324,67681,67682],{},"Latest: Retrieve the index from the index metadata interceptor and increase by one. It should be noted that the latest offset (or called LEO) in Kafka is the next offset to be assigned to a message, while the index metadata interceptor's index is the offset assigned to the latest message.",[324,67684,67685],{},"By timestamp: First leverage broker's timestamp based binary search to find the target entry. Then parse the index from the entry.",[48,67687,67688],{},"You can find more details about the offset implementation in the following PRs:",[321,67690,67691,67697,67703],{},[324,67692,67693],{},[55,67694,67695],{"href":67695,"rel":67696},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8618",[264],[324,67698,67699],{},[55,67700,67701],{"href":67701,"rel":67702},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9039",[264],[324,67704,67705],{},[55,67706,67707],{"href":67707,"rel":67708},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fkop\u002Fpull\u002F296",[264],[48,67710,3931],{},[40,67712,67714],{"id":67713},"upgrade-from-kop-version-before-280-to-280-or-higher","Upgrade from KoP Version before 2.8.0 to 2.8.0 or Higher",[48,67716,67717,67718,67723],{},"KoP 2.8.0 implements continuous offset with a tradeoff – ",[55,67719,67722],{"href":67720,"rel":67721},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fkop\u002Fblob\u002Fmaster\u002Fdocs\u002Fupgrade.md",[264],"the backward compatibility is broken",". The offset stored by KoP versions before 2.8.0 cannot be recognized by KoP 2.8.0 or higher.",[48,67725,67726],{},"If you have not tried KoP, please upgrade your Pulsar to 2.8.0 or higher and then use the corresponding KoP. As of this writing, the latest KoP release for Pulsar 2.8.1 is 2.8.1.22.",[48,67728,67729],{},"If you have already tried KoP before 2.8.0, you need to know that there's a breaking change from version less than 2.8.0 to version 2.8.0 or higher. You must delete the __consumer_offsets topic and all existing topics previously used by KoP.",[48,67731,67732],{},"There is a latest feature in KoP that can skip these old messages by enabling a config. It would be included in 2.8.1.23 or later. Note that the old messages still won’t be accessible. It just saves the work of deleting old topics.",[40,67734,319],{"id":316},[48,67736,67737],{},"In this blog, we first explained the concept of Kafka offset and the similar concept of message ID in Pulsar. Then we talked about how KoP implemented Kafka offset before 2.8.0 and the related problems.",[48,67739,67740],{},"To solve these problems, the broker entry metadata was introduced from Pulsar 2.8.0. Based on this feature, index metadata is implemented via a corresponding interceptor. After that, KoP can leverage the index metadata interceptor to implement the continuous offset.",[48,67742,67743],{},"Finally, since it's a breaking change, we talked about the upgrade from KoP version less than 2.8.0 to 2.8.0 or higher. It's highly recommended to try KoP 2.8.0 or higher directly.",[40,67745,36477],{"id":36476},[321,67747,67748,67755,67759,67763,67768],{},[324,67749,67750,67751],{},"To learn more about KoP, read ",[55,67752,67754],{"href":67753},"\u002Fblog\u002Ftech\u002F2020-03-24-bring-native-kafka-protocol-support-to-apache-pulsar\u002F","Announcing Kafka-on-Pulsar: Bringing native Kafka protocol support to Apache Pulsar.",[324,67756,36219,67757,49940],{},[55,67758,38410],{"href":27690},[324,67760,45216,67761,47757],{},[55,67762,38404],{"href":45219},[324,67764,45223,67765,45227],{},[55,67766,31914],{"href":31912,"rel":67767},[264],[324,67769,47760,67770,1154,67773,45209],{},[55,67771,47764],{"href":45463,"rel":67772},[264],[55,67774,47768],{"href":45206,"rel":67775},[264],{"title":18,"searchDepth":19,"depth":19,"links":67777},[67778,67783,67787,67792,67793,67794],{"id":67403,"depth":19,"text":67404,"children":67779},[67780,67781,67782],{"id":67407,"depth":279,"text":67408},{"id":67442,"depth":279,"text":67443},{"id":67466,"depth":279,"text":67467},{"id":67487,"depth":19,"text":67488,"children":67784},[67785,67786],{"id":67491,"depth":279,"text":67492},{"id":67507,"depth":279,"text":67508},{"id":67529,"depth":19,"text":67530,"children":67788},[67789,67790,67791],{"id":67550,"depth":279,"text":67551},{"id":67591,"depth":279,"text":67592},{"id":67641,"depth":279,"text":67642},{"id":67713,"depth":19,"text":67714},{"id":316,"depth":19,"text":319},{"id":36476,"depth":19,"text":36477},"2021-11-01","Learn how a Kafka offset is implemented in KoP – a protocol handler that brings native Kafka protocol into Pulsar.","\u002Fimgs\u002Fblogs\u002F63c7fb709476d0eb89637391_63b357c334dbb642be219684_screen-shot-2021-12-01-at-9.15.10-am.png",{},"\u002Fblog\u002Foffset-implementation-kafka-pulsar",{"title":67370,"description":67796},"blog\u002Foffset-implementation-kafka-pulsar",[799,821,11043,51871],"xs9cDkbttOI8lxbzBasCuRc1lPzWaUzVdYsPQZGwcI4",{"id":67805,"title":67806,"authors":67807,"body":67808,"category":7338,"createdAt":290,"date":68002,"description":68003,"extension":8,"featured":294,"image":68004,"isDraft":294,"link":290,"meta":68005,"navigation":7,"order":296,"path":68006,"readingTime":7986,"relatedResources":290,"seo":68007,"stem":68008,"tags":68009,"__hash__":68010},"blogs\u002Fblog\u002Fhighlights-the-pulsar-virtual-summit-europe-2021.md","Highlights from the Pulsar Virtual Summit Europe 2021",[44843],{"type":15,"value":67809,"toc":67986},[67810,67813,67822,67826,67829,67836,67839,67842,67849,67855,67858,67861,67867,67873,67876,67879,67885,67889,67892,67895,67902,67906,67909,67912,67918,67924,67928,67931,67934,67956,67958,67967,67975,67984],[48,67811,67812],{},"In response to high demand for a European event after the Pulsar Summit North America 2021, the first-ever Pulsar Summit Europe took place on October 6th. More than 20 speakers presented at the summit from regional and global companies, including Databricks, Clever Cloud, Elastic, Flipkart, and Tencent, and open-source communities, such as Delta Lake, Apache Skywalking, and Milvus.",[48,67814,67815,67816,67821],{},"The event was packed with the latest Pulsar project and ecosystem updates, technical deep dives, and exciting use cases from around the world. Whether you are a Pulsar veteran or a newbie, Pulsar Summit has something for you. ",[55,67817,67820],{"href":67818,"rel":67819},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Feurope-2021\u002Fschedule\u002Ffirst-day",[264],"Watch the sessions on-demand"," if you missed the live event. Below are some of the highlights from the summit.",[40,67823,67825],{"id":67824},"most-watched-sessions","Most-Watched Sessions",[48,67827,67828],{},"Here are the recaps of the top five most watched sessions:",[32,67830,66189,67832,67835],{"id":67831},"_1-keynote-apache-pulsar-supporting-the-entire-lifecycle-of-streaming-data",[2628,67833,67834],{},"Keynote"," Apache Pulsar, Supporting the Entire Lifecycle of Streaming Data",[48,67837,67838],{},"Presented by Matteo Merli, Pulsar PMC Chair, CTO at StreamNative",[48,67840,67841],{},"In this talk, Matteo Merli shares how recent advancements in the Pulsar technology are enabling companies to use Pulsar as the single data platform to support both messaging and streaming use cases. Matteo shares how Pulsar can be used to simplify data architecture and to enable tighter integration between online processing and offline data services.",[48,67843,67844],{},[55,67845,67848],{"href":67846,"rel":67847},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=lXwRp3o4X_U",[264],"Watch Keynote",[32,67850,66256,67852,67854],{"id":67851},"_2-keynote-pulsar-in-the-lakehouse-apache-pulsar-with-apache-spark-and-delta-lake",[2628,67853,67834],{}," Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake",[48,67856,67857],{},"Presented by Ryan Zhu, Apache Spark PMC Member and Committer, RxJava Committer, Staff Software Engineer at Databricks, and Addison Higham, Chief Architect and Head of Cloud Engineering at StreamNative",[48,67859,67860],{},"In this keynote, Ryan Zhu and Addison Higham talk about the current state, real-world use cases, and future roadmap of Pulsar + Spark & Delta Lake connectors. Learn how the “Lakehouse” architecture and Pulsar can be used to build a reliable data lake through integrations with Apache Spark and Delta Lake.",[48,67862,67863],{},[55,67864,67848],{"href":67865,"rel":67866},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=SJn53RH7-ws",[264],[32,67868,66330,67870,67872],{"id":67869},"_3-keynote-apache-pulsar-a-foundation-backbone-for-clever-cloud",[2628,67871,67834],{}," Apache Pulsar: A Foundation Backbone for Clever Cloud",[48,67874,67875],{},"Presented by Quentin Adam, CEO at Clever Cloud",[48,67877,67878],{},"Quentin Adam explains Clever Cloud’s decision to migrate their cloud-based, full-featured computing platform to Pulsar. He shares how Pulsar’s architecture design and capability to support both messaging and streaming use cases have made Clever Cloud more scalable and streamlined.",[48,67880,67881],{},[55,67882,67848],{"href":67883,"rel":67884},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-pQ6zRz6ij8",[264],[32,67886,67888],{"id":67887},"_4-tracking-apache-pulsar-messages-with-apache-skywalking","4. Tracking Apache Pulsar Messages with Apache SkyWalking",[48,67890,67891],{},"Presented by Sheng Wu, Apache SkyWalking Founder and VP, Apache Board Director, Tetrate.io Founding Engineer, and Penghui Li, Apache Pulsar PMC, Staff Engineer at StreamNative",[48,67893,67894],{},"The move to microservices has brought about increasing complexities in modern infrastructure, making observability critical. In this talk, Sheng Wu demonstrates how Apache Skywalking provides a metrics, tracing, and logging all-in-one solution for system monitoring while Penghui Li explains how Skywalking traces Pulsar messages.",[48,67896,67897],{},[55,67898,67901],{"href":67899,"rel":67900},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MOK5TEnQnWs",[264],"Watch Session",[32,67903,67905],{"id":67904},"_5-deep-dive-into-the-pulsar-binary-protocol","5. Deep Dive into the Pulsar Binary Protocol",[48,67907,67908],{},"Presented by Christophe Bornet, Senior Software Engineer at DataStax",[48,67910,67911],{},"Pulsar uses a custom binary protocol for communications between producers\u002Fconsumers and brokers. This protocol is designed to ensure maximum transport and implementation efficiency. In this session, Christophe Bornet talks about the design decisions and implementation of Pulsar binary protocol, from the networking stack, to the protocol frames, to the message exchanges.",[48,67913,67914],{},[55,67915,67901],{"href":67916,"rel":67917},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ONyld7Da-rw",[264],[48,67919,67920,67921,190],{},"For a full list of sessions from the event, click ",[55,67922,267],{"href":67818,"rel":67923},[264],[40,67925,67927],{"id":67926},"thank-you","Thank You",[48,67929,67930],{},"We want to thank our event hosts, speakers, attendees, and community partners for making the first-ever Pulsar Summit Europe a success!",[40,67932,67933],{"id":57464},"How Can You Get Involved?",[321,67935,67936,67942,67950],{},[324,67937,67938,67939,190],{},"Join the Pulsar community on ",[55,67940,55984],{"href":57760,"rel":67941},[264],[324,67943,67944,67945,67949],{},"Take your Apache Pulsar skills to the next level with free ",[55,67946,67948],{"href":31912,"rel":67947},[264],"On-Demand Pulsar Training"," by StreamNative Academy.",[324,67951,67952,67953,190],{},"Read Manning’s Apache Pulsar in Action to learn how to develop microservices-based applications with Pulsar. ",[55,67954,67955],{"href":64847},"Download your free copy",[32,67957,52473],{"id":52472},[48,67959,67960,67962,67963],{},[2628,67961,42523],{}," McQuaid, M. (2018, August 14). The Open Source Contributor Funnel (or: Why People Don’t Contribute To Your Open Source Project). Mike McQuaid. ",[55,67964,67965],{"href":67965,"rel":67966},"https:\u002F\u002Fmikemcquaid.com\u002F2018\u002F08\u002F14\u002Fthe-open-source-contributor-funnel-why-people-dont-contribute-to-your-open-source-project\u002F",[264],[48,67968,67969,67971,67972],{},[2628,67970,46057],{},"Contributor Overtime - API7.ai -- Full Traffic Management: API Gateway & Kubernetes Ingress Controller & Service Mesh. (n.d.). APISeven.Com. Retrieved October 12, 2021, from ",[55,67973,65987],{"href":65987,"rel":67974},[264],[48,67976,67977,67979,67980],{},[2628,67978,46068],{}," Borges, H., & Tulio Valente, M. (2018). What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform. Journal of Systems and Software, 146, 112–129. ",[55,67981,67982],{"href":67982,"rel":67983},"https:\u002F\u002Fdoi.org\u002F10.1016\u002Fj.jss.2018.09.016",[264],[48,67985,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":67987},[67988,67998,67999],{"id":67824,"depth":19,"text":67825,"children":67989},[67990,67992,67994,67996,67997],{"id":67831,"depth":279,"text":67991},"1. Keynote Apache Pulsar, Supporting the Entire Lifecycle of Streaming Data",{"id":67851,"depth":279,"text":67993},"2. Keynote Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake",{"id":67869,"depth":279,"text":67995},"3. Keynote Apache Pulsar: A Foundation Backbone for Clever Cloud",{"id":67887,"depth":279,"text":67888},{"id":67904,"depth":279,"text":67905},{"id":67926,"depth":19,"text":67927},{"id":57464,"depth":19,"text":67933,"children":68000},[68001],{"id":52472,"depth":279,"text":52473},"2021-10-13","Pulsar Summit Europe took place on October 6th. This blog shares the recaps of the top five most watched sessions.","\u002Fimgs\u002Fblogs\u002F63c7fbbd0f6d0ccaed8b6b34_63b3519800f467f11119191c_screen-shot-2021-10-12-at-9.40.25-am.png",{},"\u002Fblog\u002Fhighlights-the-pulsar-virtual-summit-europe-2021",{"title":67806,"description":68003},"blog\u002Fhighlights-the-pulsar-virtual-summit-europe-2021",[5376,821,2599],"TwIm48azbCvutGSHyx4gsyzvcnRYSUTfO6a9tlyAgUE",{"id":68012,"title":68013,"authors":68014,"body":68015,"category":821,"createdAt":290,"date":68390,"description":68391,"extension":8,"featured":294,"image":68392,"isDraft":294,"link":290,"meta":68393,"navigation":7,"order":296,"path":68394,"readingTime":5505,"relatedResources":290,"seo":68395,"stem":68396,"tags":68397,"__hash__":68398},"blogs\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-8-1.md","What’s New in Apache Pulsar 2.8.1",[61300,809],{"type":15,"value":68016,"toc":68378},[68017,68020,68023,68025,68051,68059,68061,68063,68072,68075,68078,68087,68090,68093,68102,68105,68108,68117,68120,68123,68132,68135,68138,68147,68150,68153,68162,68165,68168,68177,68180,68183,68189,68192,68195,68201,68204,68207,68211,68220,68223,68226,68232,68235,68238,68242,68251,68254,68257,68259,68268,68271,68274,68277,68286,68289,68292,68301,68304,68307,68316,68319,68322,68331,68334,68337,68341,68347,68354,68366,68368],[40,68018,68013],{"id":68019},"whats-new-in-apache-pulsar-281",[48,68021,68022],{},"The Apache Pulsar community releases version 2.8.1! 49 contributors provided improvements and bug fixes that delivered 213 commits.",[48,68024,61308],{},[321,68026,68027,68035,68043],{},[324,68028,68029,68030],{},"Key-shared subscriptions no longer stop dispatching to consumers when repeatedly opening and closing consumers. ",[55,68031,68034],{"href":68032,"rel":68033},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10920",[264],"PR-10920",[324,68036,68037,68038],{},"System topic no longer has potential data loss when not configured for compaction. ",[55,68039,68042],{"href":68040,"rel":68041},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11003",[264],"PR-11003",[324,68044,68045,68046],{},"Consumers are not allowed to read data on topics to which they are not subscribed. ",[55,68047,68050],{"href":68048,"rel":68049},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11912",[264],"PR-11912",[48,68052,68053,68054,190],{},"This blog walks through the most noteworthy changes grouped by component. For the complete list including all features, enhancements, and bug fixes, check out the ",[55,68055,68058],{"href":68056,"rel":68057},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#281-mdash-2021-09-10-a-id281a",[264],"Pulsar 2.8.1 Release Notes",[40,68060,61003],{"id":61002},[32,68062,61065],{"id":61064},[3933,68064,68066,68067],{"id":68065},"precise-publish-rate-limit-takes-effect-as-expected-pr-11446","Precise publish rate limit takes effect as expected. ",[55,68068,68071],{"href":68069,"rel":68070},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11446",[264],"PR-11446",[48,68073,68074],{},"Issue: Previously, when setting precise publish rate limits on topics, it did not work.",[48,68076,68077],{},"Resolution: Implemented a new RateLimiter using the LeakingBucket and FixedWindow algorithms.",[3933,68079,68081,68082],{"id":68080},"messages-with-the-same-keys-are-delivered-to-the-correct-consumers-on-key-shared-subscriptions-pr-10762","Messages with the same keys are delivered to the correct consumers on Key-Shared subscriptions. ",[55,68083,68086],{"href":68084,"rel":68085},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10762",[264],"PR-10762",[48,68088,68089],{},"Issue: Messages with the same keys were out of order when message redelivery occurred on a Key-Shared subscription.",[48,68091,68092],{},"Resolution: When sending a message to messagesToRedeliver, the broker saved the hash value of the key. If the dispatcher attempted to send newer messages to the consumer that had a key corresponding to any one of the saved hash values, they were added to messagesToRedeliver instead of being sent. This prevented messages with the same key from being out of order.",[3933,68094,68096,68097],{"id":68095},"active-producers-with-the-same-name-are-no-longer-removed-from-the-topic-map-pr-11804","Active producers with the same name are no longer removed from the topic map. ",[55,68098,68101],{"href":68099,"rel":68100},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11804",[264],"PR-11804",[48,68103,68104],{},"Issue: Previously, when there were producers with the same name, an error would be triggered and the old producer would be removed even though it was still writing to a topic.",[48,68106,68107],{},"Resolution: Validated producers based on a connection ID (local & remote addresses and unique ID) and a producer ID within that connection rather than a producer name.",[3933,68109,68111,68112],{"id":68110},"topics-in-a-fenced-state-can-recover-when-producers-continue-to-reconnect-to-brokers-pr-11737","Topics in a fenced state can recover when producers continue to reconnect to brokers. ",[55,68113,68116],{"href":68114,"rel":68115},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11737",[264],"PR-11737",[48,68118,68119],{},"Issue: Previously, when a producer continued to reconnect to a broker, the fenced state of the topic was always set to true, which caused the topic to be unable to recover.",[48,68121,68122],{},"Resolution: Add an entry to ManagedLedgerException when the polled operation is not equal to the current operation.",[3933,68124,68126,68127],{"id":68125},"topic-properly-initializes-the-cursor-to-prevent-data-loss-pr-11547","Topic properly initializes the cursor to prevent data loss. ",[55,68128,68131],{"href":68129,"rel":68130},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11547",[264],"PR-11547",[48,68133,68134],{},"Issue: Previously, when subscribing to a topic with the earliest position, data would be lost because ManagedLedger used a wrong position to initialize a cursor.",[48,68136,68137],{},"Resolution: Added a test to check a cursor's position when subscribing to a topic with the earliest position.",[3933,68139,68141,68142],{"id":68140},"deadlock-no-longer-occurs-when-using-hasmessageavailableasync-and-readnextasync-pr-11183","Deadlock no longer occurs when using hasMessageAvailableAsync and readNextAsync. ",[55,68143,68146],{"href":68144,"rel":68145},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11183",[264],"PR-11183",[48,68148,68149],{},"Issue: Previously, when messages were added to an incoming queue, a deadlock might occur. The deadlock might happen in two possible scenarios. First, if the message was added to the queue before the message was read. Second, if readNextAsync was completed before future.whenComplete was called.",[48,68151,68152],{},"Resolution: Used an internal thread to process the callback of hasMessageAvailableAsync.",[3933,68154,68156,68157],{"id":68155},"memory-leak-does-not-occur-when-calling-getlastmessageid-api-pr-10977","Memory leak does not occur when calling getLastMessageId API. ",[55,68158,68161],{"href":68159,"rel":68160},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10977",[264],"PR-10977",[48,68163,68164],{},"Issue: Previously, the broker ran out of memory when calling the getLastMessageId API.",[48,68166,68167],{},"Resolution: Added the entry.release() call to the PersistentTopic.getLastMessageId.",[3933,68169,68171,68172],{"id":68170},"compaction-is-triggered-for-system-topics-pr-10941","Compaction is triggered for system topics. ",[55,68173,68176],{"href":68174,"rel":68175},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10941",[264],"PR-10941",[48,68178,68179],{},"Issue: Previously, when a topic had only non-durable subscriptions, the compaction was not triggered because it had 0 estimated backlog size.",[48,68181,68182],{},"Resolution: Used the total backlog size to trigger the compaction. Changed the behavior in the case of no durable subscriptions to use the total backlog size",[3933,68184,68029,68186],{"id":68185},"key-shared-subscriptions-no-longer-stop-dispatching-to-consumers-when-repeatedly-opening-and-closing-consumers-pr-10920",[55,68187,68034],{"href":68032,"rel":68188},[264],[48,68190,68191],{},"Issue: Repeatedly opening and closing consumers with a Key-Shared subscription might occasionally stop dispatching messages to all consumers.",[48,68193,68194],{},"Resolution: Moved the mark-delete position and removed the consumer from the selector before calling removeConsumer().",[3933,68196,68045,68198],{"id":68197},"consumers-are-not-allowed-to-read-data-on-topics-to-which-they-are-not-subscribed-pr-11912",[55,68199,68050],{"href":68048,"rel":68200},[264],[48,68202,68203],{},"Issue: Previously, the request ledger was not checked whether it belonged to a consumer’s connected topic, which allowed the consumer to read data that does not belong to the connected topic.",[48,68205,68206],{},"Resolution: Added a check on the ManagedLedger level before executing read operations.",[32,68208,68210],{"id":68209},"topic-policy","Topic Policy",[3933,68212,68214,68215],{"id":68213},"retention-policy-works-as-expected-pr-11021","Retention policy works as expected. ",[55,68216,68219],{"href":68217,"rel":68218},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11021",[264],"PR-11021",[48,68221,68222],{},"Issue: Previously, the retention policy did not work because it was not set in the managedLedger configuration.",[48,68224,68225],{},"Resolution: Set the retention policy in the managedLedger configuration to the onUpdate listener method.",[3933,68227,68037,68229],{"id":68228},"system-topic-no-longer-has-potential-data-loss-when-not-configured-for-compaction-pr-11003",[55,68230,68042],{"href":68040,"rel":68231},[264],[48,68233,68234],{},"Issue: Previously, data might be lost if there were no durable subscriptions on topics.",[48,68236,68237],{},"Resolution: Leveraged the topic compaction cursor to retain data.",[32,68239,68241],{"id":68240},"proxy","Proxy",[3933,68243,68245,68246],{"id":68244},"pulsar-proxy-correctly-shuts-down-outbound-connections-pr-11848","Pulsar proxy correctly shuts down outbound connections. ",[55,68247,68250],{"href":68248,"rel":68249},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11848",[264],"PR-11848",[48,68252,68253],{},"Issue: Previously, there was a memory leak of outgoing TCP connections in the Pulsar proxy because the ProxyConnectionPool instances were created outside the PulsarClientImpl instance and not closed when the client was closed.",[48,68255,68256],{},"Resolution: Shut down the ConnectionPool correctly.",[32,68258,61160],{"id":61159},[3933,68260,68262,68263],{"id":68261},"pulsar-functions-support-protobuf-schema-pr-11709","Pulsar Functions support Protobuf schema. ",[55,68264,68267],{"href":68265,"rel":68266},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11709",[264],"PR-11709",[48,68269,68270],{},"Issue: Previously, the exception GeneratedMessageV3 is not assignable was thrown when using a Protobuf schema.",[48,68272,68273],{},"Resolution: Added the relevant dependencies to the Pulsar instance.",[32,68275,60409],{"id":68276},"client",[3933,68278,68280,68281],{"id":68279},"partitioned-topic-consumers-clean-up-resources-after-a-failure-pr-11754","Partitioned-topic consumers clean up resources after a failure. ",[55,68282,68285],{"href":68283,"rel":68284},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11754",[264],"PR-11754",[48,68287,68288],{},"Issue: Previously, partitioned-topic consumers did not clean up the resources when failing to create consumers. If this failure occurred with non-recoverable errors, it triggered a memory leak, which made applications unstable.",[48,68290,68291],{},"Resolution: Closed and cleaned timer task references.",[3933,68293,68295,68296],{"id":68294},"race-conditions-do-not-occur-on-multi-topic-consumers-pr-11764","Race conditions do not occur on multi-topic consumers. ",[55,68297,68300],{"href":68298,"rel":68299},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11764",[264],"PR-11764",[48,68302,68303],{},"Issue: Previously, there was a race condition between 2 threads when one of the individual consumers was in a \"paused\" state and the shared queue was full.",[48,68305,68306],{},"Resolution: Validated the state of the shared queue after marking the consumer as \"paused\". The consumer is not blocked if the other thread has emptied the queue in the meantime.",[3933,68308,68310,68311],{"id":68309},"consumers-are-not-blocked-on-batchreceive-pr-11691","Consumers are not blocked on batchReceive. ",[55,68312,68315],{"href":68313,"rel":68314},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11691",[264],"PR-11691",[48,68317,68318],{},"Issue: Previously, consumers were blocked when Consumer.batchReceive() was called concurrently by different threads due to a race condition in ConsumerBase.java.",[48,68320,68321],{},"Resolution: Put pinnedInternalExecutor in ConsumerBase to allow batch timer, ConsumerImpl, and MultiTopicsConsumerImpl to submit work in a single thread.",[3933,68323,68325,68326],{"id":68324},"python-client-correctly-enables-custom-logging-pr-11882","Python client correctly enables custom logging. ",[55,68327,68330],{"href":68328,"rel":68329},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11882",[264],"PR-11882",[48,68332,68333],{},"Issue: Previously, deadlock might happen when custom logging was enabled in the Python client.",[48,68335,68336],{},"Resolution: Detached the worker thread and reduced log level.",[40,68338,68340],{"id":68339},"what-is-next","What is Next?",[48,68342,68343,68344,57738],{},"If you are interested in learning more about Pulsar 2.8.1, you can ",[55,68345,36195],{"href":53730,"rel":68346},[264],[48,68348,68349,68350,57746],{},"The first-ever Pulsar Virtual Summit Europe 2021 will take place in October. ",[55,68351,57745],{"href":68352,"rel":68353},"https:\u002F\u002Fhopin.com\u002Fevents\u002Fpulsar-summit-europe-2021",[264],[48,68355,68356,68357,57753,68360,57757,68363,20076],{},"For more information about the Apache Pulsar project and the progress, visit the ",[55,68358,40821],{"href":23526,"rel":68359},[264],[55,68361,36238],{"href":36236,"rel":68362},[264],[55,68364,57762],{"href":57760,"rel":68365},[264],[40,68367,39647],{"id":39646},[48,68369,57767,68370,57772,68373,57775,68376,57779],{},[55,68371,57771],{"href":53730,"rel":68372},[264],[55,68374,3550],{"href":61568,"rel":68375},[264],[55,68377,24379],{"href":57778},{"title":18,"searchDepth":19,"depth":19,"links":68379},[68380,68381,68388,68389],{"id":68019,"depth":19,"text":68013},{"id":61002,"depth":19,"text":61003,"children":68382},[68383,68384,68385,68386,68387],{"id":61064,"depth":279,"text":61065},{"id":68209,"depth":279,"text":68210},{"id":68240,"depth":279,"text":68241},{"id":61159,"depth":279,"text":61160},{"id":68276,"depth":279,"text":60409},{"id":68339,"depth":19,"text":68340},{"id":39646,"depth":19,"text":39647},"2021-09-23","We are excited to see the Apache Pulsar community has successfully released the 2.8.1 version! 49 contributors provided improvements and bug fixes that delivered 213 commits. Let's walk through the most noteworthy changes!","\u002Fimgs\u002Fblogs\u002F63c7fbd0756bf860c7c13f24_63b350b2a38d3ea2bd1787c3_281-top.jpeg",{},"\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-8-1",{"title":68013,"description":68391},"blog\u002Fwhats-new-in-apache-pulsar-2-8-1",[302,821],"oOBKXHqcF0N0iK9g3qJRNVEbyd29fuvvQxtun1QQG2A",{"id":68400,"title":68401,"authors":68402,"body":68403,"category":3550,"createdAt":290,"date":68442,"description":68443,"extension":8,"featured":294,"image":68444,"isDraft":294,"link":290,"meta":68445,"navigation":7,"order":296,"path":68446,"readingTime":20144,"relatedResources":290,"seo":68447,"stem":68448,"tags":68449,"__hash__":68450},"blogs\u002Fblog\u002Fstreamnative-cloud-launches-aws-marketplace.md","StreamNative Cloud Launches on AWS Marketplace",[60441],{"type":15,"value":68404,"toc":68438},[68405,68408,68412,68415,68418,68422,68431],[48,68406,68407],{},"StreamNative Cloud is a fully-managed Apache-Pulsar-as-a-service offering, built by the original creators of Pulsar, and trusted by enterprises to manage mission-critical workloads. Today, we are excited to announce that StreamNative Cloud is available on the AWS Marketplace.",[40,68409,68411],{"id":68410},"aws-streamnative","AWS + StreamNative",[48,68413,68414],{},"Since its launch in early 2021, StreamNative Cloud has gained popularity with organizations - such as Iterable, Narvar, and others - who are looking to unify and empower teams with a cloud-native messaging and streaming platform. With the launch of StreamNative Cloud on AWS Marketplace, developers who rely on AWS can easily extend their capabilities with the power of Pulsar. This streamlines the process, enabling developers to sign up and deploy a cluster in minutes without having to enter a credit card.",[48,68416,68417],{},"In addition to StreamNative Cloud’s availability on the AWS Marketplace, its support for Pulsar connectors for AWS Services, such as Kinesis, SQS, and Lambda, makes it a turnkey messaging and streaming solution for AWS customers. With StreamNative Cloud, organizations can enable real-time data capabilities with a fully-managed, multi-tenant messaging and streaming platform.",[40,68419,68421],{"id":68420},"get-pulsar-spinning-today","Get Pulsar Spinning Today",[48,68423,68424,68425,68430],{},"Get started with StreamNative Cloud on AWS Marketplace with three easy steps: 1. ",[55,68426,68429],{"href":68427,"rel":68428},"https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fpp\u002Fprodview-jhi5p3pvb3q5o",[264],"Sign up via your AWS account",". 2. Log in, create your organization, and invite your team members to join. 3. Create your first Pulsar Cluster.",[48,68432,68433,68434],{},"Have additional questions? ",[55,68435,68437],{"href":68436},"\u002Fen\u002Fcontact\u002F","Contact us!",{"title":18,"searchDepth":19,"depth":19,"links":68439},[68440,68441],{"id":68410,"depth":19,"text":68411},{"id":68420,"depth":19,"text":68421},"2021-09-20","We are excited to announce that StreamNative Cloud is available on the AWS Marketplace.","\u002Fimgs\u002Fblogs\u002F63c7fbdfca757414da57202f_63b3501d3973d703f5669bae_screen-shot-2021-09-20-at-12.07.56-pm.png",{},"\u002Fblog\u002Fstreamnative-cloud-launches-aws-marketplace",{"title":68401,"description":68443},"blog\u002Fstreamnative-cloud-launches-aws-marketplace",[302,3550,821],"EK9JLsghvjf1n2zE4SKG9eXftA-shT1C3ZyBmpUqwjs",{"id":68452,"title":68453,"authors":68454,"body":68455,"category":3550,"createdAt":290,"date":68518,"description":68519,"extension":8,"featured":294,"image":68520,"isDraft":294,"link":290,"meta":68521,"navigation":7,"order":296,"path":68522,"readingTime":11180,"relatedResources":290,"seo":68523,"stem":68524,"tags":68525,"__hash__":68526},"blogs\u002Fblog\u002Fenabling-real-time-messaging-and-streaming-for-the-cloud.md","StreamNative: Enabling Real-time Messaging and Streaming for the Cloud",[806,807],{"type":15,"value":68456,"toc":68516},[68457,68466,68469,68472,68475,68478,68481,68489,68492,68495,68502,68508,68511],[48,68458,68459,68460,68465],{},"As the CEO of StreamNative, it’s been an exciting ride so far, and today we are proud to share that we’ve ",[55,68461,68464],{"href":68462,"rel":68463},"https:\u002F\u002Fwww.prnewswire.com\u002Fnews-releases\u002Foriginal-creators-of-apache-pulsar-raise-23m-series-a-for-streamnative-round-led-by-prosperity7-ventures-301375962.html",[264],"raised a $23M series A round",". For us, this funding underscores the increased adoption of StreamNative and Apache Pulsar that we are seeing in the market and a bright future ahead.",[48,68467,68468],{},"In celebrating this milestone, we’d like to look back at how our Pulsar journey began. More than 10 years ago Matteo and I were at Yahoo! working to develop a consolidated messaging platform that connected all the popular Yahoo! Applications, including Yahoo! Finance, Yahoo! Mail, Yahoo! Sports, Flickr and more, to data. At the time we looked at the existing messaging and streaming technologies, but they were not able to provide the scalability, reliability, and features needed to meet today’s modern architecture and application requirements.",[48,68470,68471],{},"The team at Yahoo! set out to build a cloud-native messaging service that would work for the global enterprise. We built Pulsar from the ground up to handle millions of topics and partitions with full support for geo-replication and multi-tenancy. Pulsar was open sourced by Yahoo! in 2016 and became a top-level Apache Software Foundation in 2018.",[48,68473,68474],{},"Over the past several years there has been a huge market shift from applications and traditional services using monolithic messaging services - either running on-premise, or simply ported to the cloud - to truly cloud-native applications designed to leverage the cloud and Kubernetes. This shift to the cloud and containers has amplified the spotlight on Apache Pulsar.",[48,68476,68477],{},"Apache Pulsar is unique in that it provides an all-in-one platform with unified messaging and streaming capabilities built for the cloud. Think about it as the combination of Kafka (streaming only) and RabbitMQ (messaging only), designed for multi-tenancy and containers.",[48,68479,68480],{},"At StreamNative, we work to help organizations around the globe successfully adopt Pulsar. StreamNative builds upon the powerful Apache Pulsar platform with two product offerings, StreamNative Cloud and StreamNative Platform, details below:",[1666,68482,68483,68486],{},[324,68484,68485],{},"StreamNative Cloud provides Apache Pulsar-as-a-service and delivers a resilient and scalable messaging and event streaming managed service deployable in minutes (alleviating the need to spend time or resources to deploy, upgrade, or maintain clusters).",[324,68487,68488],{},"StreamNative Platform is a self-managed cloud-native offering that completes Apache Pulsar, providing a unified messaging and streaming platform powered by Apache Pulsar with advanced capabilities to help accelerate real-time application development and to simplify enterprise operations at scale.",[48,68490,68491],{},"We’re excited to see the growth in the Apache Pulsar and StreamNative communities. When we started this journey Kafka was the dominant player in the space, but Pulsar’s rapid adoption since it became a top level Apache Software Foundation project in 2018 has been remarkable.",[48,68493,68494],{},"In fact, the number of monthly active Apache Pulsar contributors surpassed Apache Kafka recently (see graph below)! Many have adopted Apache Pulsar because it offers the potential of faster throughput and lower latency than Apache Kafka, along with a compatible API that allows developers to switch from Kafka to Pulsar with relative ease.",[48,68496,68497,68498,20571],{},"We are proud to continue to support the Apache Pulsar community through events, training, project updates, and project contributions. In fact, members of the StreamNative team often represent more than half the monthly Pulsar contributors. We also play a key role sponsoring and hosting the global Pulsar Summits (next being ",[55,68499,68501],{"href":35357,"rel":68500},[264],"Pulsar Summit Europe 2021 in October",[48,68503,68504],{},[384,68505],{"alt":68506,"src":68507},"graph apache puslar and kafka monthly contributors since 2017","\u002Fimgs\u002Fblogs\u002F63b34f6fa0dbab46651fcef4_screen-shot-2021-09-14-at-10.17.54-am.png",[48,68509,68510],{},"What’s next? StreamNative will continue to focus on advancing the state-of-the-art in streaming, stream storage, and messaging technologies. From real-time microservices that use Pulsar’s pub\u002Fsub features and streaming storage for real-time analytics to infinite storage for deep analysis, we’re all about innovating on Pulsar’s flexible architecture and industry leading feature-set to deliver new capabilities.",[48,68512,68513,68514,190],{},"And, we’re hiring! We’re growing our global staff across all departments to accelerate product development, ecosystem expansion, and customer acquisition. If you’re interested in joining the StreamNative team and building a platform based on Apache Pulsar to enable companies to manage the entire lifecycle of data, ",[55,68515,24379],{"href":68436},{"title":18,"searchDepth":19,"depth":19,"links":68517},[],"2021-09-14","It’s been an exciting ride so far, and today we are proud to share that we’ve raised a $23M series A round. For us, this funding underscores the increased adoption of StreamNative and Apache Pulsar that we are seeing in the market and a bright future ahead.","\u002Fimgs\u002Fblogs\u002F63c7fbef8f903d89b70404ea_63b34f6f05dfdb78e053b438_screen-shot-2021-09-14-at-10.51.54-am.png",{},"\u002Fblog\u002Fenabling-real-time-messaging-and-streaming-for-the-cloud",{"title":68453,"description":68519},"blog\u002Fenabling-real-time-messaging-and-streaming-for-the-cloud",[821,303],"RPLE1sU92l8d-qgH_Y1pbn-YysOjVi7TGEBaADGxA28",{"id":68528,"title":68529,"authors":68530,"body":68531,"category":7338,"createdAt":290,"date":68636,"description":68637,"extension":8,"featured":294,"image":68638,"isDraft":294,"link":290,"meta":68639,"navigation":7,"order":296,"path":68640,"readingTime":7986,"relatedResources":290,"seo":68641,"stem":68642,"tags":68643,"__hash__":68644},"blogs\u002Fblog\u002Fspeakers-announced-pulsar-virtual-summit-europe-2021.md","Speakers Announced for Pulsar Virtual Summit Europe 2021",[44843],{"type":15,"value":68532,"toc":68626},[68533,68542,68545,68548,68551,68557,68559,68562,68566,68569,68572,68576,68579,68582,68586,68589,68592,68596,68599,68602,68606,68609,68612,68615],[48,68534,68535,68536,68541],{},"The first-ever ",[55,68537,68540],{"href":68538,"rel":68539},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Feurope-2021",[264],"Pulsar Virtual Summit Europe"," is just one month away! Co-hosted by StreamNative and Clever Cloud, this event will be held online on October 6th at 12:00 PM CEST.",[48,68543,68544],{},"The Pulsar Summit offers a unique opportunity for engineers, architects, data scientists, and technical leaders interested in Pulsar and the messaging and streaming ecosystem to learn and network. Since 2020, the Pulsar Summits have drawn more than 100 speakers, thousands of attendees, and hundreds of companies globally.",[48,68546,68547],{},"The speaker committee for the Pulsar Summit Europe 2021includes Apache Pulsar PMC members Matteo Merli from StreamNative, Jerry Peng from Splunk, and Rajan Dhabalia from Verizon Media. Additionally, Till Rohrmann from Ververica, Karthik Ramasamy from Splunk, Addison Higham from StreamNative, and Ricardo Ferreira from Elastic will be participating.",[48,68549,68550],{},"Featured speakers include engineers, developer advocates, and technical leaders from the Apache Pulsar PMC, Clever Cloud, Databricks, StreamNative, Elastic, DataStax, Flipkart, Zilliz, Tencent, JAMPP, and Softtech.",[48,68552,68553,68556],{},[55,68554,45203],{"href":68352,"rel":68555},[264]," and learn about the latest Pulsar project updates, technology deep dives, use cases, and ecosystem developments!",[40,68558,36415],{"id":36414},[48,68560,68561],{},"The Pulsar Virtual Summit Europe 2021 will feature 3 keynotes and 12 breakout sessions. Below is a sneak peak into some of the featured breakout sessions.",[32,68563,68565],{"id":68564},"_1-tracking-apache-pulsar-messages-with-apache-skywalking","1. Tracking Apache Pulsar Messages with Apache SkyWalking",[48,68567,68568],{},"Presented by Penghui Li, Apache Pulsar PMC Member and Software Engineer at StreamNative",[48,68570,68571],{},"Apache SkyWalking is a popular application performance monitoring tool for distributed systems, specially designed for microservices, cloud-native, and container-based (Docker, K8s) architectures. In this talk, the speakers will walk you through the features of Apache SkyWalking and Pulsar, and demo how to us Pulsar message with SkyWalking to troubleshoot issues related to message publishing and receiving.",[32,68573,68575],{"id":68574},"_2-log-system-as-backbonehow-we-built-the-worlds-most-advanced-vector-database-on-pulsar","2. Log System as Backbone–How We Built the World’s Most Advanced Vector Database on Pulsar",[48,68577,68578],{},"Presented by Xiaofan Luan, Partner and Director of Engineering at Zilliz",[48,68580,68581],{},"Milvus is an open-source vector database for building and managing vector similarity search applications. It has been adopted in production by thousands of companies, including Lucidworks, Shutterstock, and Cloudinary. In this talk, Xiaofan Luan will share with you how the community built Milvus 2.0, a cloud-native, highly scalable and extendable vector similarity solution, on Pulsar.",[32,68583,68585],{"id":68584},"_3-writing-custom-sink-connectors-for-pulsar-io","3. Writing Custom Sink Connectors for Pulsar I\u002FO",[48,68587,68588],{},"Presented by Ricardo Ferreira, Principal Developer Advocate at Elastic",[48,68590,68591],{},"In this talk, Ricardo Ferreira will show you how to write and deploy custom sink connectors for Pulsar I\u002FO that work just like the built-in ones. He will also discuss some of the design decisions that your custom connectors may need to address.",[32,68593,68595],{"id":68594},"_4-pulsar-watermarking","4. Pulsar Watermarking",[48,68597,68598],{},"Presented by Eron Wright, Cloud Engineering Lead at StreamNative",[48,68600,68601],{},"The goal of the Pulsar Watermarking project is to simplify and improve the correctness of stream processing applications. In this session, Eron Wright will do a technical deep-dive into the Apache Pulsar community's plan to support event-time watermarking in a Pulsar topic.",[32,68603,68605],{"id":68604},"_5-application-of-apache-pulsar-in-tencent-billing-and-tencent-advertising","5. Application of Apache Pulsar in Tencent Billing and Tencent Advertising",[48,68607,68608],{},"Presented by Mingyu Bao, Senior Engineer at Tencent",[48,68610,68611],{},"Mingyu Bao will provide a behind-the-scenes on Tencent’s adoption of Pulsar for their billing and advertising use cases and share some of their challenges. He will also discuss the adaptations and improvements Tencent made with Pulsar in order to meet their performance and operations requirements.",[40,68613,68614],{"id":16948},"Register Now",[48,68616,68617,68618,68621,68622,18054],{},"Don’t miss this opportunity to learn from top Pulsar thought leaders. ",[55,68619,57745],{"href":68352,"rel":68620},[264]," to participate and connect with the Pulsar community at the summit. Check out ",[55,68623,68625],{"href":67818,"rel":68624},[264],"the full schedule",{"title":18,"searchDepth":19,"depth":19,"links":68627},[68628,68635],{"id":36414,"depth":19,"text":36415,"children":68629},[68630,68631,68632,68633,68634],{"id":68564,"depth":279,"text":68565},{"id":68574,"depth":279,"text":68575},{"id":68584,"depth":279,"text":68585},{"id":68594,"depth":279,"text":68595},{"id":68604,"depth":279,"text":68605},{"id":16948,"depth":19,"text":68614},"2021-09-07","The Pulsar Virtual Summit Europe is taking place on October 6th. Here is a sneak peak into the featured speakers and sessions.","\u002Fimgs\u002Fblogs\u002F63c7fbfe8e1d2c3f1cbfd0cd_63b34ec619293f2558a5ce76_screen-shot-2021-09-07-at-4.37.56-pm.png",{},"\u002Fblog\u002Fspeakers-announced-pulsar-virtual-summit-europe-2021",{"title":68529,"description":68637},"blog\u002Fspeakers-announced-pulsar-virtual-summit-europe-2021",[5376,821],"z3tQfmS2d2DHBZHwyeM_BUH4e1xocJkiagcWKKmMcuw",{"id":68646,"title":68647,"authors":68648,"body":68649,"category":821,"createdAt":290,"date":68971,"description":68972,"extension":8,"featured":294,"image":68973,"isDraft":294,"link":290,"meta":68974,"navigation":7,"order":296,"path":68975,"readingTime":46114,"relatedResources":290,"seo":68976,"stem":68977,"tags":68978,"__hash__":68979},"blogs\u002Fblog\u002Fscalable-stream-processing-pulsars-key-shared-subscription.md","Scalable Stream Processing with Pulsar’s Key_Shared Subscription",[28],{"type":15,"value":68650,"toc":68950},[68651,68653,68667,68670,68674,68677,68680,68683,68686,68689,68692,68695,68701,68704,68707,68710,68713,68716,68722,68726,68729,68732,68735,68738,68742,68745,68751,68754,68757,68763,68766,68769,68773,68776,68780,68783,68789,68792,68795,68799,68802,68805,68811,68814,68817,68823,68826,68830,68833,68836,68839,68843,68846,68850,68853,68859,68862,68865,68869,68872,68878,68881,68884,68890,68893,68896,68900,68903,68905,68908,68911,68914,68916],[40,68652,8924],{"id":8923},[1666,68654,68655,68658,68661,68664],{},[324,68656,68657],{},"Traditional messaging enables high-throughput, stateless processing via multiple concurrent consumers on a topic.",[324,68659,68660],{},"Streaming provides stateful processing with a single consumer on a topic, but with a tradeoff in throughput.",[324,68662,68663],{},"Pulsar’s Key_Shared subscription type allows you to have both high throughput and stateful stream processing on a single topic.",[324,68665,68666],{},"Pulsar’s Key_Shared subscription is a good fit for use cases that require you to perform stateful processing on high-volumes of data such as personalization, real-time marketing, micro-targeted advertising, and cyber security.",[48,68668,68669],{},"Prior to Pulsar’s Key_Shared subscription, you had to decide between having multiple consumers on a topic for high-throughput or a single consumer for stateful processing when using traditional streaming frameworks. In this blog, you will learn how to use Pulsar’s Key-shared subscription to perform behavioral analytics on clickstream data.",[40,68671,68673],{"id":68672},"messaging-vs-streaming","Messaging vs. Streaming",[48,68675,68676],{},"It is not uncommon for developers to view messaging and streaming as essentially the same and therefore use the terms interchangeably. However, messaging and streaming are two very different things and it is important to understand the difference between them in order to choose the right one for your use case.",[48,68678,68679],{},"In this section I compare the message consumption and processing semantics of each and how they differ. This will also help you understand why sometimes neither messaging nor streaming alone is adequate for your use case and why you might need unified messaging and streaming.",[32,68681,68682],{"id":57922},"Messaging",[48,68684,68685],{},"The central data structure used in messaging systems is the queue. Incoming messages are stored in a first-in-first-out (FIFO) ordering. Messages are retained inside the queue until they are consumed. Once they are consumed, they get deleted in order to make space for incoming messages.",[48,68687,68688],{},"From a consumer processing perspective, messaging is completely stateless because every message contains all the information that is required to perform the processing, and therefore can be acted upon without requiring any information from previous messages. This allows you to distribute the message processing across multiple consumers and decrease processing latency.",[48,68690,68691],{},"Messaging is the perfect fit for use cases in which you want to scale up the number of concurrent consumers on a topic in order to increase the processing throughput. A good example of this is the traditional work queue of incoming e-commerce orders that need to be processed by an order fulfillment microservice. Since each order is independent from the others, you can increase the number of microservice instances consuming from the queue to match the demand.",[48,68693,68694],{},"Pulsar’s Shared subscription is designed for this type of use case. As illustrated in Figure 1, it provides messaging semantics by ensuring that each message is delivered to exactly one of the consumers attached to the subscription.",[48,68696,68697],{},[384,68698],{"alt":68699,"src":68700},"Pulsar Shared subscription type supports message queuing use cases","\u002Fimgs\u002Fblogs\u002F63b34d66c9fe745c6ad2f890_screen-shot-2021-08-25-at-8.36.25-pm.png",[32,68702,68703],{"id":57917},"Streaming",[48,68705,68706],{},"In stream processing the central data structure is the log, which is an append-only sequence of records ordered by time. Messages are appended to the end of the log, and reads proceed from the oldest to the newest messages. Message consumption is a non-destructive operation with stream processing, as the consumer just updates its location in the stream.",[48,68708,68709],{},"From a processing perspective, streaming is stateful because the processing is done on a sequence of messages that are typically grouped into fixed-sized “windows” based on either time or size (e.g., every five minutes). Stream processing depends upon information from all the messages in the window in order to produce the correct result.",[48,68711,68712],{},"Streaming is perfect for aggregation operations such as computing the simple moving average of a sensor reading, because all of the sensor readings must be combined and processed by the same consumer in order to properly calculate the value.",[48,68714,68715],{},"Pulsar’s Exclusive subscription provides the right streaming semantics for this type of use cases. As shown in Figure 2, the Exclusive subscription type ensures that all the messages are delivered to a single consumer in the time-order they were received.",[48,68717,68718],{},[384,68719],{"alt":68720,"src":68721},"Pulsar’s Exclusive subscription type supports stateful stream processing on a single consumer","\u002Fimgs\u002Fblogs\u002F63b34d660a2ec8ff8f8a1d51_screen-shot-2021-08-25-at-8.39.45-pm.png",[32,68723,68725],{"id":68724},"trade-offs","Trade Offs",[48,68727,68728],{},"As you can see, messaging and streaming provide different processing semantics. Messaging supports highly-scalable processing via support for multiple concurrent consumers. You should use messaging when dealing with large volumes of data that need to be processed quickly so that each message has a very low latency between when it was produced and when it is processed.",[48,68730,68731],{},"Streaming supports more complex analytical processing capabilities, but at the expense of scalability per topic partition. Only a single consumer is allowed to process the data in order to produce an accurate result, therefore the speed at which that data is processed is severely limited. This leads to higher latency in stream processing use cases.",[48,68733,68734],{},"Although you can reduce latency by using sharding and partitions, the scalability is still limited. Tying the processing scalability to the number of partitions makes the architecture less flexible. Changing the number of partitions is not an effect-free operation either, because it also affects the way in which data is published to the topic. Therefore, you should only use streaming when you need stateful processing and can tolerate slower processing.",[48,68736,68737],{},"However, what if you have a use case that needs both low latency and stateful processing? If you are using Apache Pulsar, then you should consider the Key_Shared subscription, which provides processing semantics that are a hybrid between messaging and streaming.",[40,68739,68741],{"id":68740},"apache-pulsars-key_shared-subscription","Apache Pulsar’s Key_Shared Subscription",[48,68743,68744],{},"Messages are the basic unit with Pulsar, and they consist of not only the raw bytes that are being sent between a producer and a consumer, but some metadata fields as well. As you can see from Figure 3, one of the metadata fields inside each Pulsar message is the “key” field which can hold a String value. This is the field that the Key_Shared subscription uses to perform its grouping.",[48,68746,68747],{},[384,68748],{"alt":68749,"src":68750},"Pulsar Key_Shared subscription type uses a metadata field \"key\" for grouping.","\u002Fimgs\u002Fblogs\u002F63b34d666f2a263b31421885_screen-shot-2021-08-25-at-8.41.36-pm.png",[48,68752,68753],{},"Pulsar’s Key_Shared subscription supports multiple concurrent consumers, so you can easily decrease the processing latency by increasing the number of consumers. So in this aspect it provides messaging-type semantics because each message can be processed independently from the others.",[48,68755,68756],{},"However, this subscription type differs from the traditional Shared subscription type in the way that it distributes the data among the consumers. Unlike traditional messaging where any message can be handled by any consumer, within Pulsar's Key_Shared subscription, the messages are distributed across the consumers with the guarantee that messages with the same key are delivered to the same consumer.",[48,68758,68759],{},[384,68760],{"alt":68761,"src":68762},"Pulsar Key_Shared subscription type supports high-throughput, stateful stream processing","\u002Fimgs\u002Fblogs\u002F63b34d661bdc1b453fa7669d_screen-shot-2021-08-25-at-8.42.36-pm.png",[48,68764,68765],{},"Pulsar achieves these guarantees by hashing the incoming key values and distributing the hashes evenly across all of the consumers on the subscription. Thus, we know that messages with the same key will generate the same hash value and consequently get sent to the same consumer as the previous messages with that key.",[48,68767,68768],{},"By ensuring that all messages with the same key are sent to the same consumer, that consumer is guaranteed to receive all of the messages for a given key and in the order they were received, which matches the streaming consumption semantics. Let’s explore a real-world use case where Pulsar’s Key_Shared subscription could be used effectively.",[40,68770,68772],{"id":68771},"behavioral-analytics-on-clickstream-data","Behavioral Analytics on Clickstream Data",[48,68774,68775],{},"Providing real-time targeted recommendation on an e-commerce website based on clickstream data is a good real-world example of where the Key_Shared subscription would be particularly well-suited because it requires low latency processing of high volume data.",[32,68777,68779],{"id":68778},"clickstream-data","Clickstream Data",[48,68781,68782],{},"Clickstream data refers to the sequence of clicks performed by an individual user when they interact with a website. A clickstream contains all of a user’s interactions, such as where they click, which pages they visit, and how much time they spend on each page.",[48,68784,68785],{},[384,68786],{"alt":68787,"src":68788}," Clickstream data is a time series of events that represent an individual’s interaction with a website.","\u002Fimgs\u002Fblogs\u002F63b34d66bc0621585047d99c_screen-shot-2021-08-25-at-8.44.41-pm.png",[48,68790,68791],{},"Figure 5: Clickstream data is a time series of events that represent an individual’s interaction with a website.",[48,68793,68794],{},"This data can be analyzed to report user behavior on a specific website, such as routing, stickiness, and tracking of the common user paths through your website. The clickstream behavior is basically a sequence of the user’s interactions with a particular website.",[32,68796,68798],{"id":68797},"tracking","Tracking",[48,68800,68801],{},"In order to receive this clickstream data, you need to embed some tracking software into your website that collects the clickstream events and forwards it to you for analysis. These tags are typically small pieces of JavaScript that capture user behavior at the individual level using personally identifiable data such as IP addresses and cookies. Every time a user clicks on a tagged website, the tracking software detects the event and forwards that information in JSON format to a collection endpoint via an HTTP POST request.",[48,68803,68804],{},"An example of such a JSON object generated by such a tracking library is shown below in Listing 1. As you can see, these clickstream events can contain a lot of information that needs to be aggregated, filtered, and enriched before it can be consumed for generating insights.",[8325,68806,68809],{"className":68807,"code":68808,"language":8330},[8328],"{\n   \"app_id\":\"gottaeat.com\",\n   \"platform\":\"web\",\n   \"collector_tstamp\":\"2021-08-17T23:46:46.818Z\",\n   \"dvce_created_tstamp\":\"2021-08-17T23:46:45.894Z\",\n   \"event\":\"page_view\",\n   \"event_id\":\"933b4974-ffbd-11eb-9a03-0242ac130003\",\n   \"user_ipaddress\":\"206.10.136.123\",\n   \"domain_userid\":\"8bf27e62-ffbd-11eb-9a03-0242ac130003\",\n   \"session_id\":\"7\",\n   \"page_url\":\"http:\u002F\u002Fgottaeat.com\u002Fmenu\u002Fshinjuku-ramen\u002FzNiq_Q8-TaCZij1Prj9GGA\"\n   ...\n",[4926,68810,68808],{"__ignoreMap":18},[48,68812,68813],{},"Listing 1: An example clickstream event containing personally identifiable information.",[48,68815,68816],{},"There can be potentially hundreds of active JavaScript trackers at any given time, each of which are collecting the clickstream events for an individual visitor on the company’s website. These events are forwarded to a single tag collector that publishes them directly into a Pulsar topic.",[48,68818,68819],{},[384,68820],{"alt":68821,"src":68822},"The trackers collect the clickstream event for a single user and forward them to a single collector.","\u002Fimgs\u002Fblogs\u002F63b34e11a0dbab24c61ed32e_screen-shot-2021-08-25-at-8.47.02-pm.png",[48,68824,68825],{},"As you can see from Figure 6, since these JavaScript tags don’t coordinate with one another, the clickstream data from multiple users ends up getting intermingled within your Pulsar topic. This poses a big problem because we can only perform behavioral analytics on an individual user’s clickstream.",[32,68827,68829],{"id":68828},"identity-stitching","Identity Stitching",[48,68831,68832],{},"In order to properly analyze the data, the raw clickstream events first need to be grouped together for each individual user to ensure that we have a complete picture of their interactions in the order they occurred. This process of reconstructing each user's clickstream from the commingled data is known as identity stitching. It is done by correlating clickstream events together based upon as many of a user’s unique identifiers as possible.",[48,68834,68835],{},"This is a perfect use case for the Key_Shared subscription because you need to process each individual user's complete stream of events in the order they occurred, so you need stream data processing semantics. At the same time, you also need to scale out this processing to match the traffic on your company website. As we shall see, Pulsar’s Key_Shared subscription allows you to do both.",[48,68837,68838],{},"In order to reconstitute each user's clickstream, we will use the domain_userid field inside the clickstream event, which is a unique identifier generated by the JavaScript tag. This field is a randomly generated universally unique identifier (UUID) that uniquely identifies each user. Therefore we know that all clickstream events with the same domain_userid value belong to the same user. As you shall see, we will use this value to have Pulsar’s Key_Shared subscription to group all of the user’s events together for us.",[40,68840,68842],{"id":68841},"implementation","Implementation",[48,68844,68845],{},"In order to perform behavioral analytics we need to have a complete picture of the user's interaction with our website. Therefore we need to ensure that we are grouping all of the clicks for an individual user together and delivering them to the same consumer. As we discussed in the last section, the domain_userid field inside each clickstream event contains a user’s unique identifier. By using this value as the message key we are guaranteed to have all of the same user’s events delivered to the same consumer when we use a Key_Shared subscription.",[32,68847,68849],{"id":68848},"data-enrichment","Data Enrichment",[48,68851,68852],{},"The JSON objects collected from the JavaScript tags and forwarded by the tag collector only contain raw JSON bytes (the key field is empty). Therefore, in order to utilize the Key_Shared subscription, we first need to enrich these messages by populating the message key with the value of the domain_userid field inside each JSON object.",[8325,68854,68857],{"className":68855,"code":68856,"language":8330},[8328],"import org.apache.pulsar.functions.api.Context;\nimport org.apache.pulsar.functions.api.Function;\nimport com.fasterxml.jackson.databind.ObjectMapper;\nimport com.manning.pulsar.chapter4.types.TrackingTag;\nimport org.apache.pulsar.client.impl.schema.JSONSchema;\npublic class WebTagEnricher implements Function {\n    static final String TOPIC = \"persistent:\u002F\u002Ftracking\u002Fweb-activity\u002Ftags\";\n    @Override\n    public Void process(String json, Context ctx) throws Exception {\n    ObjectMapper objectMapper = new ObjectMapper();\n    TrackingTag tag = objectMapper.readValue(json, TrackingTag.class);\n        \n    ctx.newOutputMessage(TOPIC, JSONSchema.of(TrackingTag.class))\n        .key(tag.getDomainUserId())\n        .value(tag)\n        .send();\n        \n    return null;\n    }\n}\n",[4926,68858,68856],{"__ignoreMap":18},[48,68860,68861],{},"Listing 2: The Pulsar Function converts the raw tag bytes into a JSON object, and copies the value of the domain_userid field into the key field of the outgoing message.",[48,68863,68864],{},"This can be accomplished via a relatively simplistic piece of code as shown in Listing 2, which parses the JSON object, grabs the value of the domain_userid field, and outputs a new message containing the original clickstream event that has a key that is populated with the user’s UUID. This type of logic is a perfect use case for Pulsar Functions. Moreover, since the logic is stateless, it can be performed in parallel using the Shared subscription type, which will minimize the processing latency required to perform this task.",[32,68866,68868],{"id":68867},"identity-stitching-with-the-key_shared-subscription","Identity Stitching with the Key_Shared Subscription",[48,68870,68871],{},"Once we have properly enriched the messages containing the clickstream events with the correct key value, the next step is to confirm that the Key_Shared subscription is performing the identity stitching for us. The code in Listing 3 starts a total of five consumers on the Key_Shared subscription.",[8325,68873,68876],{"className":68874,"code":68875,"language":8330},[8328],"public class ClickstreamAggregator {\n  static final String PULSAR_SERVICE_URL = \"pulsar:\u002F\u002Flocalhost:6650\";\n  static final String MY_TOPIC = \"persistent:\u002F\u002Ftracking\u002Fweb-activity\u002Ftags\\\"\";\n  static final String SUBSCRIPTION = \"aggregator-sub\";\n  public static void main() throws PulsarClientException {\n    PulsarClient client = PulsarClient.builder()\n          .serviceUrl(PULSAR_SERVICE_URL)\n          .build();\n    ConsumerBuilder consumerBuilder = \n       client.newConsumer(JSONSchema.of(TrackingTag.class))\n            .topic(MY_TOPIC)\n            .subscriptionName(SUBSCRIPTION)\n            .subscriptionType(SubscriptionType.Key_Shared)\n            .messageListener(new TagMessageListener());\n    \n       IntStream.range(0, 4).forEach(i -> {\n        String name = String.format(\"mq-consumer-%d\", i);\n          try {\n            consumerBuilder\n                .consumerName(name)\n                .subscribe();\n          } catch (PulsarClientException e) {\n            e.printStackTrace();\n           }\n       });\n    }\n}\n",[4926,68877,68875],{"__ignoreMap":18},[48,68879,68880],{},"Listing 3: The main class launches the consumers on the same Key_Shared subscription using the MessageListener interface that runs them inside an internal thread pool.",[48,68882,68883],{},"The processing logic that is executed when a new event arrives exists inside the TagMessageListener class, which is shown below. Since the consumer will most likely be assigned multiple keys, the incoming clickstream events need to be stored inside an internal map that uses the UUID for each web page visitor as the key. Therefore we decided to practice a bit of defensive programming by using the least recently used (LRU) map implementation from the Apache Commons library which ensures that the map remains a fixed size by removing the oldest elements in the event as it becomes full.",[8325,68885,68888],{"className":68886,"code":68887,"language":8330},[8328],"import org.apache.commons.collections4.map.LRUMap;\nimport org.apache.pulsar.client.api.Consumer;\nimport org.apache.pulsar.client.api.Message;\nimport org.apache.pulsar.client.api.MessageListener;\npublic class TagMessageListener implements MessageListener {\n  private LRUMap> userActivity = \n    new LRUMap>(100);\n  @Override\n  public void received(Consumer consumer, \n    Message msg) {\n    try {\n      recordEvent(msg.getValue());\n      invokeML(msg.getValue().getDomainUserId());\n      consumer.acknowledge(msg);\n    } catch (PulsarClientException e) {\n      e.printStackTrace();\n    }\n  }\n  private void recordEvent(TrackingTag event) {\n    if (!userActivity.containsKey(event.getDomainUserId())) {\n      userActivity.put(event.getDomainUserId(), \n           new ArrayList ());\n    }       \n    userActivity.get(event.getDomainUserId()).add(event);\n  }\n  \u002F\u002F Invokes the ML model with the collected events for the user    \n  private void invokeML(String domainUserId) {\n    . . .\n  } \n}\n",[4926,68889,68887],{"__ignoreMap":18},[48,68891,68892],{},"Listing 4: The class responsible for aggregating the clickstream events uses an LRU map to sort the events by user id. Each new event is appended to the list of previous events. These lists can then be fed through a machine learning model to produce a recommendation.",[48,68894,68895],{},"When a new event arrives, it is added to the clickstream for the corresponding user. Thereby reconstructing the clickstream for the user keys that have been assigned to the consumer.",[32,68897,68899],{"id":68898},"real-time-behavior-analytics","Real-Time Behavior Analytics",[48,68901,68902],{},"Now that we have reconstructed the clickstreams, we can feed them to a machine learning model that will provide a targeted recommendation for each visitor to our company’s web site. This might be a suggestion for an item to add to cart based on items in the cart, a recently viewed item, or a coupon. With real-time behavioural analytics we are able to improve the user experience through personalized recommendations, which helps to increase conversion and the average order size.",[40,68904,319],{"id":316},[48,68906,68907],{},"Traditional messaging enables scalable processing via multiple concurrent consumers on a topic. A classic use case for this is the traditional work queue of incoming e-commerce orders that need to be processed by an order fulfillment microservice. You should use Pulsar’s Shared subscription for this type of use case.",[48,68909,68910],{},"Traditional streaming provides stateful processing with a single consumer on a topic, but with a tradeoff in throughput. Streaming is used for its more complex analytical processing capabilities. Pulsar’s Exclusive and Failover subscriptions are designed to support this semantic.",[48,68912,68913],{},"Pulsar’s Key_Shared subscription type allows you to have both high throughput and stateful processing on a single topic. It is a good fit for use cases that require you to perform stateful processing on high-volumes of data such as personalization, real-time marketing, micro-targeted advertising, and cyber security.",[40,68915,36477],{"id":36476},[1666,68917,68918,68927,68934,68938,68945],{},[324,68919,68920,68921,68926],{},"To learn more about Pulsar’s Key_Shared subscription, ",[55,68922,68925],{"href":68923,"rel":68924},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=wDrBh7Y-l4g",[264],"watch this video"," from Matteo Merli, CTO of StreamNative and PMC Chair of Apache Pulsar.",[324,68928,3931,68929,68933],{},[55,68930,68932],{"href":66403,"rel":68931},[264],"Read the Apache Pulsar documentation"," to learn more about Pulsar’s different subscription types.",[324,68935,62227,68936,190],{},[55,68937,51627],{"href":62230},[324,68939,62233,68940,62236,68942],{},[55,68941,38404],{"href":38403},[55,68943,3931],{"href":45212,"rel":68944},[264],[324,68946,68947,62252],{},[55,68948,62251],{"href":31912,"rel":68949},[264],{"title":18,"searchDepth":19,"depth":19,"links":68951},[68952,68953,68958,68959,68964,68969,68970],{"id":8923,"depth":19,"text":8924},{"id":68672,"depth":19,"text":68673,"children":68954},[68955,68956,68957],{"id":57922,"depth":279,"text":68682},{"id":57917,"depth":279,"text":68703},{"id":68724,"depth":279,"text":68725},{"id":68740,"depth":19,"text":68741},{"id":68771,"depth":19,"text":68772,"children":68960},[68961,68962,68963],{"id":68778,"depth":279,"text":68779},{"id":68797,"depth":279,"text":68798},{"id":68828,"depth":279,"text":68829},{"id":68841,"depth":19,"text":68842,"children":68965},[68966,68967,68968],{"id":68848,"depth":279,"text":68849},{"id":68867,"depth":279,"text":68868},{"id":68898,"depth":279,"text":68899},{"id":316,"depth":19,"text":319},{"id":36476,"depth":19,"text":36477},"2021-08-25","Different Pulsar subscription types support different messaging and streaming use cases. The Key_Shared subscription enables stateful processing on high-volumes of data.","\u002Fimgs\u002Fblogs\u002F63c7fc0ec443b03bbf1183df_63b34d666de3bb6dca1051af_screen-shot-2021-08-25-at-8.28.23-pm.png",{},"\u002Fblog\u002Fscalable-stream-processing-pulsars-key-shared-subscription",{"title":68647,"description":68972},"blog\u002Fscalable-stream-processing-pulsars-key-shared-subscription",[821],"w3NaCQfSNdafOaQu164MLQAJsNplda0i2_5VrqazFrk",{"id":68981,"title":68982,"authors":68983,"body":68984,"category":821,"createdAt":290,"date":69340,"description":69341,"extension":8,"featured":294,"image":69342,"isDraft":294,"link":290,"meta":69343,"navigation":7,"order":296,"path":69344,"readingTime":42793,"relatedResources":290,"seo":69345,"stem":69346,"tags":69347,"__hash__":69348},"blogs\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-3.md","What’s New in Apache Pulsar 2.7.3",[54455,61300],{"type":15,"value":68985,"toc":69325},[68986,68989,68992,68994,69012,69020,69022,69024,69033,69041,69047,69055,69064,69072,69079,69087,69096,69104,69113,69121,69130,69138,69140,69149,69157,69160,69169,69177,69186,69194,69203,69211,69220,69228,69230,69239,69247,69251,69260,69268,69272,69281,69289,69293,69302,69310,69312],[40,68987,68982],{"id":68988},"whats-new-in-apache-pulsar-273",[48,68990,68991],{},"The Apache Pulsar community releases version 2.7.3! 34 contributors provided improvements and bug fixes that delivered 79 commits.",[32,68993,7930],{"id":7929},[321,68995,68996,69004],{},[324,68997,68998,68999],{},"Cursor reads adhere to the dispatch byte rate limiter setting and no longer cause unexpected results. ",[55,69000,69003],{"href":69001,"rel":69002},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11249",[264],"PR-11249",[324,69005,69006,69007],{},"The ledger rollover scheduled task runs as expected. ",[55,69008,69011],{"href":69009,"rel":69010},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11226",[264],"PR-11226",[48,69013,69014,69015,190],{},"This blog walks through the most noteworthy changes. For the complete list including all enhancements and bug fixes, check out the ",[55,69016,69019],{"href":69017,"rel":69018},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#273-mdash-2021-07-27-a-id273a",[264],"Pulsar 2.7.3 Release Notes",[40,69021,61003],{"id":61002},[32,69023,61065],{"id":61064},[3933,69025,69027,69028],{"id":69026},"cursor-reads-adhere-to-the-dispatch-byte-rate-limiter-setting-pr-9826","Cursor reads adhere to the dispatch byte rate limiter setting. ",[55,69029,69032],{"href":69030,"rel":69031},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9826",[264],"PR-9826",[321,69034,69035,69038],{},[324,69036,69037],{},"Issue: When using byte rates, the dispatch rates were not respected (regardless of being a namespace or topic policy).",[324,69039,69040],{},"Resolution: Fixed behavior of dispatch byte rate limiter setting. Cursor reads adhere to the setting and no longer cause unexpected results.",[3933,69042,69006,69044],{"id":69043},"the-ledger-rollover-scheduled-task-runs-as-expected-pr-11226",[55,69045,69011],{"href":69009,"rel":69046},[264],[321,69048,69049,69052],{},[324,69050,69051],{},"Issue: Previously, the ledger rollover scheduled task was executed before reaching the ledger maximum rollover time, which caused the ledger not to roll over in time.",[324,69053,69054],{},"Resolution: Fixed the timing of the ledger rollover schedule, so the task runs only after the ledger is created successfully.",[3933,69056,69058,69059],{"id":69057},"the-topic-level-retention-policy-works-correctly-when-restarting-a-broker-pr-11136","The topic-level retention policy works correctly when restarting a broker. ",[55,69060,69063],{"href":69061,"rel":69062},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11136",[264],"PR-11136",[321,69065,69066,69069],{},[324,69067,69068],{},"Issue: Previously, when setting a topic-level retention policy for a topic and then restarting the broker, the topic-level retention policy did not work.",[324,69070,69071],{},"Resolution: Fixed behavior of the policy so it replays all policy messages after initiating policyCacheInitMap and added a retention policy check test when restarting the broker.",[3933,69073,69075,69076],{"id":69074},"the-lastmessageid-api-call-no-longer-causes-a-memory-leak-pr-10977","The lastMessageId API call no longer causes a memory leak. ",[55,69077,68161],{"href":68159,"rel":69078},[264],[321,69080,69081,69084],{},[324,69082,69083],{},"Issue: Previously, there was a memory leak when calling the lastMessageId API, which caused the broker process to be stopped by Kubernetes.",[324,69085,69086],{},"Resolution: Added the missing entry.release() call to PersistentTopic.getLastMessageId to ensure the broker does not run out of memory.",[3933,69088,69090,69091],{"id":69089},"zookeeper-reads-are-cached-by-brokers-pr-10594","ZooKeeper reads are cached by brokers. ",[55,69092,69095],{"href":69093,"rel":69094},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10594",[264],"PR-10594",[321,69097,69098,69101],{},[324,69099,69100],{},"Issue: When performing the admin operation to get the namespace of a tenant, ZooKeeper reads were issued on the ZooKeeper client and not getting cached by the brokers.",[324,69102,69103],{},"Resolution: Fixed ZooKeeper caching when fetching a list of namespaces for a tenant.",[3933,69105,69107,69108],{"id":69106},"monitoring-threads-that-call-leaderserviceisleader-are-no-longer-blocked-pr-10512","Monitoring threads that call LeaderService.isLeader() are no longer blocked. ",[55,69109,69112],{"href":69110,"rel":69111},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10512",[264],"PR-10512",[321,69114,69115,69118],{},[324,69116,69117],{},"Issue: When LeaderService changed leadership status, it was locked with a synchronized block, which also blocked other threads calling LeaderService.isLeader().",[324,69119,69120],{},"Resolution: Fixed the deadlock condition on the monitoring thread so it is not blocked by LeaderService.isLeader() by modifyingClusterServiceCoordinatorandWorkerStatsManagerto check if it is a leader fromMembershipManager`.",[3933,69122,69124,69125],{"id":69123},"hasmessageavailable-can-read-messages-successfully-pr-10414","hasMessageAvailable can read messages successfully. ",[55,69126,69129],{"href":69127,"rel":69128},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10414",[264],"PR-10414",[321,69131,69132,69135],{},[324,69133,69134],{},"Issue: When hasMessageAvailableAsync returned true, it could not read messages because messages were filtered by acknowledgmentsGroupingTracker.",[324,69136,69137],{},"Resolution: Fixed the race conditions by modifying acknowledgmentsGroupingTracker to filter duplicate messages, and then cleanup the messages when the connection is open.",[32,69139,68241],{"id":68240},[3933,69141,69143,69144],{"id":69142},"proxy-supports-creating-partitioned-topics-automatically-pr-8048","Proxy supports creating partitioned topics automatically. ",[55,69145,69148],{"href":69146,"rel":69147},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8048",[264],"PR-8048",[321,69150,69151,69154],{},[324,69152,69153],{},"Issue: Proxies were not creating partitions because they were using the current ZooKeeper metadata.",[324,69155,69156],{},"Resolution: Changed the proxy to handle PartitionMetadataRequest by selecting and fetching from an available broker instead of using current ZooKeeper metadata.",[32,69158,69159],{"id":38169},"Pulsar admin",[3933,69161,69163,69164],{"id":69162},"flag-added-to-indicate-whether-or-not-to-create-a-metadata-path-on-replicated-clusters-pr-11140","Flag added to indicate whether or not to create a metadata path on replicated clusters. ",[55,69165,69168],{"href":69166,"rel":69167},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11140",[264],"PR-11140",[321,69170,69171,69174],{},[324,69172,69173],{},"Issue: When creating a partitioned topic in a replicated namespace, it did not create a metadata path \u002Fmanaged-ledgers on replicated clusters.",[324,69175,69176],{},"Resolution: Added a flag (createLocalTopicOnly) to indicate whether or not to create a metadata path for a partitioned topic in replicated clusters.",[3933,69178,69180,69181],{"id":69179},"a-topic-policy-can-no-longer-be-set-for-a-non-existent-topic-pr-11131","A topic policy can no longer be set for a non-existent topic. ",[55,69182,69185],{"href":69183,"rel":69184},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F11131",[264],"PR-11131",[321,69187,69188,69191],{},[324,69189,69190],{},"Issue: Due to a redirect loop in a topic policy, you can set a policy for a non-existing topic or a partition of a partitioned topic.",[324,69192,69193],{},"Resolution: The fix added an authoritative flag for a topic policy to avoid a redirect loop. You can not set a topic policy for a non-existent topic or a partition of a partitioned topic. If you set a topic policy for a partition of a 0-partition topic, it redirects to the broker.",[3933,69195,69197,69198],{"id":69196},"discovery-service-no-longer-hard-codes-the-topic-domain-as-persistent-pr-10806","Discovery service no longer hard codes the topic domain as persistent. ",[55,69199,69202],{"href":69200,"rel":69201},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10806",[264],"PR-10806",[321,69204,69205,69208],{},[324,69206,69207],{},"Issue: When using the lookup discovery service for a partitioned non-persistent topic, it returned zero rather than the number of partitions. The Pulsar client tried to connect to the topic as if it were a normal topic.",[324,69209,69210],{},"Resolution: Implemented topicName.getDomain().value() rather than hard coding persistent. Now you can use the discovery service for a partitioned, non-persistent topic successfully.",[3933,69212,69214,69215],{"id":69213},"other-connectors-can-now-use-the-kinesis-backoff-class-pr-10744","Other connectors can now use the Kinesis Backoff class. ",[55,69216,69219],{"href":69217,"rel":69218},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10744",[264],"PR-10744",[321,69221,69222,69225],{},[324,69223,69224],{},"Issue: The Kinesis sink connector Backoff class in the Pulsar client implementation project in combination with the dependency org.apache.pulsar:pulsar-client-original increased the connector size.",[324,69226,69227],{},"Resolution: Added a new class Backoff in the function io-core project so that the Kinesis sink connector and other connectors can use the class.",[32,69229,60409],{"id":68276},[3933,69231,69233,69234],{"id":69232},"a-flow-request-with-zero-permits-can-not-be-sent-pr-10506","A FLOW request with zero permits can not be sent. ",[55,69235,69238],{"href":69236,"rel":69237},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10506",[264],"PR-10506",[321,69240,69241,69244],{},[324,69242,69243],{},"Issue: When a broker received a FLOW request with zero permits, an exception was thrown and then the connection was closed. This triggered frequent reconnections and caused duplicate or out-of-order messages.",[324,69245,69246],{},"Resolution: Added a validation that verifies the permits of a FLOW request before sending it. If the permit is zero, the FLOW request can not be sent.",[32,69248,69250],{"id":69249},"function-and-connector","Function and connector",[3933,69252,69254,69255],{"id":69253},"the-kinesis-sink-connector-acknowledges-successful-messages-pr-10769","The Kinesis sink connector acknowledges successful messages. ",[55,69256,69259],{"href":69257,"rel":69258},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10769",[264],"PR-10769",[321,69261,69262,69265],{},[324,69263,69264],{},"Issue: The Kinesis sink connector did not acknowledge messages after they were sent successfully.",[324,69266,69267],{},"Resolution: Added acknowledgement for the Kinesis sink connector once a message is sent successfully.",[32,69269,69271],{"id":69270},"docker","Docker",[3933,69273,69275,69276],{"id":69274},"function-name-length-cannot-exceed-52-characters-when-using-kubernetes-runtime-pr-10531","Function name length cannot exceed 52 characters when using Kubernetes runtime. ",[55,69277,69280],{"href":69278,"rel":69279},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10531",[264],"PR-10531",[321,69282,69283,69286],{},[324,69284,69285],{},"Issue: When using Kubernetes runtime, if a function was submitted with a valid length (less than 55 characters), a StatefulSet was created but it was unable to spawn pods.",[324,69287,69288],{},"Resolution: Changed the maximum length of a function name from 55 to 53 characters for Kubernetes runtime. With this fix, the length of a function name can not exceed 52 characters.",[32,69290,69292],{"id":69291},"dependency","Dependency",[3933,69294,69296,69297],{"id":69295},"pulsar-admin-connection-to-proxy-is-stable-when-tls-is-enabled-pr-10907","pulsar-admin connection to proxy is stable when TLS is enabled. ",[55,69298,69301],{"href":69299,"rel":69300},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10907",[264],"PR-10907",[321,69303,69304,69307],{},[324,69305,69306],{},"Issue: pulsar-admin was unstable over the TLS connection because of the Jetty bug in SSL buffering introduced in Jetty 9.4.39. It caused large function jar uploads to fail frequently.",[324,69308,69309],{},"Resolution: Upgraded Jetty to 9.4.42.v20210604, so that pulsar-admin connection to proxy is stable when TLS is enabled.",[40,69311,39647],{"id":39646},[48,69313,57767,69314,69317,69318,69321,69322,69324],{},[55,69315,57771],{"href":53730,"rel":69316},[264]," or you can spin up for a Pulsar cluster on StreamNative Cloud with a free 30-day trial of ",[55,69319,3550],{"href":61568,"rel":69320},[264],"! Moreover, we offer technical consulting and expert training to help get your organization started. As always, we are highly responsive to your feedback. Feel free to ",[55,69323,24379],{"href":57778}," if you have any questions at any time. Look forward to hearing from you and stay tuned for the next Pulsar release!",{"title":18,"searchDepth":19,"depth":19,"links":69326},[69327,69330,69339],{"id":68988,"depth":19,"text":68982,"children":69328},[69329],{"id":7929,"depth":279,"text":7930},{"id":61002,"depth":19,"text":61003,"children":69331},[69332,69333,69334,69335,69336,69337,69338],{"id":61064,"depth":279,"text":61065},{"id":68240,"depth":279,"text":68241},{"id":38169,"depth":279,"text":69159},{"id":68276,"depth":279,"text":60409},{"id":69249,"depth":279,"text":69250},{"id":69270,"depth":279,"text":69271},{"id":69291,"depth":279,"text":69292},{"id":39646,"depth":19,"text":39647},"2021-08-11","We are excited to see the Apache Pulsar community has successfully released the 2.7.3 version! 34 contributors provided improvements and bug fixes that contributed to 79 commits. Let's walk through the most noteworthy changes!","\u002Fimgs\u002Fblogs\u002F63c7fc2075f807d1f63731fc_63b34c436276822a107ad9e5_273-top.jpeg",{},"\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-3",{"title":68982,"description":69341},"blog\u002Fwhats-new-in-apache-pulsar-2-7-3",[302,821],"Ol3xsVtakazIrgdWCaqeN8U2O47sUBHzT55RHfmOBJ8",{"id":69350,"title":69351,"authors":69352,"body":69354,"category":821,"createdAt":290,"date":69502,"description":69503,"extension":8,"featured":294,"image":69504,"isDraft":294,"link":290,"meta":69505,"navigation":7,"order":296,"path":69506,"readingTime":5505,"relatedResources":290,"seo":69507,"stem":69508,"tags":69509,"__hash__":69510},"blogs\u002Fblog\u002Fdavid-kjerrumgaard-pulsar-inauthor-talks-all-things-pulsar.md","Pulsar In Action Author, David Kjerrumgaard Talks All Things Pulsar",[69353],"Carolyn King",{"type":15,"value":69355,"toc":69498},[69356,69360,69363,69368,69371,69376,69379,69384,69387,69393,69398,69401,69406,69409,69412,69415,69418,69423,69426,69429,69434,69437,69440,69443,69448,69451,69458,69463,69466,69471,69474,69478,69481,69484,69496],[40,69357,69359],{"id":69358},"interview","Interview",[48,69361,69362],{},"In this blog we talk to David Kjerrumgaard, long-time Pulsar user and author of the book Pulsar in Action, a Manning Publication, to get his insights on the messaging and streaming space, the trends driving Pulsar adoption, and his new role as a Developer Advocate at StreamNative.",[916,69364,69365],{},[48,69366,69367],{},"Q: Before we jump in, let’s start with your background.",[48,69369,69370],{},"A: Over the past decade, I have had the opportunity to architect stream processing solutions for Fortune 500 companies across a variety of industries. First, as part of the professional services team at Hortonworks using a combination of Apache NiFi, Storm and Kafka, and later at a startup called Streamlio that focused on Apache Pulsar and Heron. Streamlio was acquired by Splunk to build out the messaging layer of its stream processing offering that is responsible for processing over 10 terabytes of data per day.",[916,69372,69373],{},[48,69374,69375],{},"Q: You’ve been working on Pulsar since 2017. Can you tell us about the early days with Pulsar?",[48,69377,69378],{},"A: To give some context, Pulsar was committed to open-source by Yahoo in 2016 and when I joined the team at Streamlio in 2017, we were initially focused on the Apache Heron distributed computing framework. Based on the feedback from our customers, we quickly pivoted to Apache Pulsar to address the gap in the market for a unified messaging and streaming platform. We spent the next 16 months maturing the Pulsar project and building the community. In 2018 Pulsar became a top-level project at the Apache Software foundation.",[916,69380,69381],{},[48,69382,69383],{},"Q: People have strong opinions on the Pulsar versus Kafka debate. What is your perspective?",[48,69385,69386],{},"A: It’s an interesting debate. Some people look at Kafka and think that its widespread adoption is because the tech is better or superior in some way. The reality is that Kafka was released five or six years ahead of Pulsar and the community has had more time to mature. I believe that Pulsar is on the same trajectory that Kafka was at this point in its evolution. In fact, Pulsar growth has skyrocketed over the last few years and if you look at the projects today, the tables are turning.",[48,69388,69389,69390],{},"Kjerrumgaard is supported by the numbers here. In June 2021, Apache Pulsar surpassed Apache Kafka in its number of monthly active contributors.\n",[384,69391],{"alt":18,"src":69392},"\u002Fimgs\u002Fblogs\u002F63b34b69b9ddc83cac270e33_1.png",[916,69394,69395],{},[48,69396,69397],{},"Q: That is quite a change. Can you share your perspective on why Pulsar is so popular?",[48,69399,69400],{},"A: Pulsar’s cloud-native architecture has many advantages over existing legacy messaging systems that were designed to run on physical servers. The growing popularity of cloud and container-based deployments has accelerated the adoption of Pulsar because it is designed to run in these environments. If you are a Kafka or Confluent organization today and you’re moving to the cloud, you’re going to consider Pulsar.",[916,69402,69403],{},[48,69404,69405],{},"Q: What are the advantages that Pulsar has in a cloud environment?",[48,69407,69408],{},"A: Every messaging system consists of two distinct “layers”, a serving layer that is responsible for receiving and delivering messages to clients, and a storage layer that retains the messages on disk until they are consumed.",[48,69410,69411],{},"Traditional messaging systems such as Kafka or RabbitMQ are designed to have these two layers running alongside one another on the same physical node in order to eliminate the need for an additional network “hop” to retrieve the data from storage. Today, the minor speed advantage you gain from a single-tier architecture is outweighed by the lack of scalability it imposes.",[48,69413,69414],{},"Apache Pulsar decouples the serving and storage layers, allowing them to run independently inside separate containers which makes it easier to deploy and dynamically scale in the cloud. Separating the layers also allows the serving layer to be completely stateless, meaning that any node can serve any message because the data is located on a different layer, only one network call away.",[48,69416,69417],{},"Pulsar’s independent layers can fully exploit the elasticity of today’s modern cloud computing environments by dynamically adding or removing capacity in either the serving or storage layers. This can be done automatically by leveraging existing tools such as Kubernetes horizontal pod autoscaler.",[916,69419,69420],{},[48,69421,69422],{},"Q: If you were to name Pulsar’s biggest differentiator, what would it be?",[48,69424,69425],{},"A: Versatility is a big differentiator for Pulsar. Not only is it the only messaging platform that supports both pub\u002Fsub and streaming message consumption patterns, but it’s pluggable protocol handler allows it to support a variety of common messaging protocols such as AMQP, MQTT, JMS, and Kafka.",[48,69427,69428],{},"All other messaging systems only support one messaging consumption pattern and one binary messaging protocol. A common driver of Pulsar adoption is migration away from multiple messaging systems onto a unified messaging platform based on Apache Pulsar. Companies are looking to eliminate the need to maintain both a system for pub\u002Fsub messaging such as RabbitMQ and another one for streaming such as Apache Kafka.",[916,69430,69431],{},[48,69432,69433],{},"Q: How difficult is it to move from other streaming and messaging technologies to Pulsar?",[48,69435,69436],{},"A: More often than not, the organization has developed several business critical applications based upon the technology and so they are tied to a particular API which makes migration difficult.",[48,69438,69439],{},"Apache Pulsar’s ability to support legacy messaging protocols streamlines the migration process by allowing you to run your existing applications with minimal code changes. If you are migrating an application that uses one of the wire protocols that Pulsar supports then the only changes that need to be made to your code are API related.",[48,69441,69442],{},"If you are migrating an existing Kafka application that uses the Java client, you can use Pulsar’s Kafka Adaptor that provides a 100% Kafka compatible API. Using this adapter, any existing Java code will work without any changes needed.",[916,69444,69445],{},[48,69446,69447],{},"Q: Your passion for the space is apparent. Can you tell me about the decision to join StreamNative?",[48,69449,69450],{},"A: It is great to see the Pulsar market taking off, and multiple companies offering Apache Pulsar as a service. What distinguishes StreamNative from the competition is the caliber of talent. Not only do we have two of the original creators of Apache Pulsar, but we have more Apache committers than anyone else which means we are the center of gravity for the Apache project overall.",[48,69452,69453,69454,69457],{},"Having worked with Matteo Merli [Apache Pulsar Chair and StreamNative CTO) and Sijie Guo ",[2628,69455,69456],{},"Apache Pulsar Member and StreamNative CEO"," in the past, I knew that their technical expertise was second to none in this space and I knew that I couldn’t pass up the opportunity to collaborate with them again. Pivoting from an individual contributor role at Splunk to a Developer Advocate at StreamNative will allow me to have a bigger impact on the Apache Pulsar community at a time when adoption is accelerating.",[916,69459,69460],{},[48,69461,69462],{},"Q: Can you tell us about the StreamNative offering?",[48,69464,69465],{},"A: StreamNative is powered by Pulsar and provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies. We offer both Cloud and Platform products so you can choose cloud, on-prem, or a hybrid of both.",[916,69467,69468],{},[48,69469,69470],{},"Q: Last question, what makes StreamNative Cloud & Platform exciting?",[48,69472,69473],{},"A: The StreamNative products are a game-changer, because they enable organizations to unlock the power of Apache Pulsar with a turnkey, enterprise offering across cloud, hybrid, and on-premise environments without the heavy lift from the DevOps teams.",[40,69475,69477],{"id":69476},"in-summary","In Summary",[48,69479,69480],{},"David’s experience developing real-time messaging, streaming, Edge\u002FIoT, and Big Data solutions for customers across a broad range of industries will be beneficial to both the StreamNative team and will help ensure the success of StreamNative’s customers.",[48,69482,69483],{},"His upcoming book, Pulsar in Action, will be available in print by Manning Publications in December. For a sneak peek, visit StreamNative.io. We are a proud sponsor of the book and are excited to offer an early release. Check our site in mid-August to get your free download.",[48,69485,69486,69487,1186,69491,69495],{},"To stay up-to-date on Kjerrumgaard’s upcoming talks and webinars, we encourage you to join the ",[55,69488,69490],{"href":34070,"rel":69489},[264],"StreamNative mailing list",[55,69492,69494],{"href":16156,"rel":69493},[264],"StreamNative Community Slack Channel",", and follow us on Twitter at @streamnative.io.",[48,69497,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":69499},[69500,69501],{"id":69358,"depth":19,"text":69359},{"id":69476,"depth":19,"text":69477},"2021-08-03","We talked to David Kjerrumgaard, long-time Pulsar user and author of the book Pulsar in Action, a Manning Publication, to get his insights on the messaging and streaming space, the trends driving Pulsar adoption, and his new role as a Developer Advocate at StreamNative.","\u002Fimgs\u002Fblogs\u002F63c7fc2fdcf6a47c210f3203_63b34b69ef94af509e8c846c_top.png",{},"\u002Fblog\u002Fdavid-kjerrumgaard-pulsar-inauthor-talks-all-things-pulsar",{"title":69351,"description":69503},"blog\u002Fdavid-kjerrumgaard-pulsar-inauthor-talks-all-things-pulsar",[7347,821],"fNLPT3rVDUeHUxPtSaG1jvSrAZ_WWPM_no9ZMO1QSK8",{"id":69512,"title":69513,"authors":69514,"body":69516,"category":7338,"createdAt":290,"date":69502,"description":69609,"extension":8,"featured":294,"image":69610,"isDraft":294,"link":290,"meta":69611,"navigation":7,"order":296,"path":69612,"readingTime":7986,"relatedResources":290,"seo":69613,"stem":69614,"tags":69615,"__hash__":69616},"blogs\u002Fblog\u002Flocal-apache-pulsar-2-8-0-release-party.md","Join Us to Organize Your Local Apache Pulsar 2.8.0 Release Party",[69353,69515],"Dianjin Wang",{"type":15,"value":69517,"toc":69602},[69518,69526,69529,69532,69535,69538,69572,69574,69577,69585,69588,69590,69597,69599],[48,69519,69520,69521,190],{},"We are excited to share that Apache Pulsar 2.8.0 was released in June. This release was a major milestone for the Pulsar community, with lots of great upgrades and enhancements, such as transaction API, Broker Entry Metadata, New protobuf Code Generator, and more! You can see the details from the ",[55,69522,69525],{"href":69523,"rel":69524},"https:\u002F\u002Fpulsar.apache.org\u002Fblog\u002F2021\u002F06\u002F12\u002FApache-Pulsar-2-8-0\u002F",[264],"Pulsar official website blog",[48,69527,69528],{},"To celebrate, we decided to organize a release party that would take place across a number of different cities. The 2.8.0 party is a great way to raise Pulsar awareness in your local area and provide an opportunity for Pulsar users and developers to come together as a community.",[40,69530,69531],{"id":2696},"How it works",[48,69533,69534],{},"There is no rigid format, it can be virtual or in-person, depending on local guidelines. Also, it can cover your local city, or just your companies, teams, and friends.",[48,69536,69537],{},"We have some helpful guidelines to help you organize a successful party:",[321,69539,69540,69543,69546,69555,69569],{},[324,69541,69542],{},"Organizers: We recommend you gather a group of organizers who can work together to plan each party. If you can, find a few friends to help you. Only you? It is possible to have a single organizer but we’ve found that having a few organizers can really help promote event success. It will also make planning much more fun!",[324,69544,69545],{},"Time & Date: When will the party be? If you want to organize the release party in your teams or companies, office hours will be ok for the participants. If you want to cover your local city, the weekend time would be better. Just pick the best time & date for your case.",[324,69547,69548,69549,69554],{},"In-person or Virtual: If in-person, you should take the first choice to have one place for free and take the event scale into consideration. No matter what it is, you should create one event in ",[55,69550,69553],{"href":69551,"rel":69552},"http:\u002F\u002Fmeetup.com\u002F",[264],"meetup.com"," or Zoom, Google Chat, or another platform.",[324,69556,69557,69558,1186,69563,69568],{},"Programming: Generally, it is good to have some talks or Demos on Pulsar at the release party. You can decide the sessions, but make sure they are around Pulsar. If you need to invite Pulsar experts to have a talk, ",[55,69559,69562],{"href":69560,"rel":69561},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fgraphs\u002Fcontributors",[264],"Pulsar contributors",[55,69564,69567],{"href":69565,"rel":69566},"https:\u002F\u002Fpulsar.apache.org\u002Fen\u002Fteam\u002F",[264],"committers and PMC members"," will be one of the best choices. We can help you invite them too.",[324,69570,69571],{},"Promote it: after all the things have been done, you should let more local people know about this party. You can write blog posts or microblogs, send your email to the mailing list or your target people, post the news in related slack channels… Also, let us know your event details and we will promote it via our channels as well. We also will provide the necessary design assets which you can customize.",[40,69573,50937],{"id":50936},[48,69575,69576],{},"We would like to ask you to write one event report to let us know the success of your party, and share the screenshots or recording videos if virtual or photos if in-person.",[48,69578,69579,69580,190],{},"How about this? If you want to organize one Apache Pulsar Release Party, then welcome to ",[55,69581,69584],{"href":69582,"rel":69583},"https:\u002F\u002Fshare.hsforms.com\u002F1wxDHMlnWQxKRAHg5K9yGdg3x5r4",[264],"submit your application",[48,69586,69587],{},"If you have any good ideas or suggestions, feel free to let us know. Let’s make Apache Pulsar great!",[32,69589,39828],{"id":39827},[48,69591,69592,69593,190],{},"Apache Pulsar is a cloud-native, distributed messaging and streaming platform that manages trillions of events per day. The Pulsar community has witnessed rapid growth since it became a top-level Apache project in 2018. In the past two years, the community growth has accelerated. In fact, in May ",[55,69594,69596],{"href":69595},"\u002Fen\u002Fblog\u002Fcommunity\u002F2021-06-14-pulsar-hits-its-400th-contributor-and-passes-kafka-in-monthly-active-contributors","Apache Pulsar hit 400 contributors and surpassed Apache Kafka in the number of Monthly Active Contributors",[32,69598,10248],{"id":10247},[48,69600,69601],{},"StreamNative is the organizer of Pulsar Summit North America 2021. Founded by the original developers of Apache Pulsar, the StreamNative team is committed to the Pulsar community. As the core developers of Pulsar, the StreamNative team is deeply versed in the technology, the community, and the use cases. The StreamNative team's unmatched operational experience on Pulsar and BookKeeper is now available to you through StreamNative Cloud and StreamNative Platform.",{"title":18,"searchDepth":19,"depth":19,"links":69603},[69604,69605],{"id":2696,"depth":19,"text":69531},{"id":50936,"depth":19,"text":50937,"children":69606},[69607,69608],{"id":39827,"depth":279,"text":39828},{"id":10247,"depth":279,"text":10248},"To celebrate the release of Pulsar 2.8.0, we decided to organize a release party that would take place across a number of different cities.","\u002Fimgs\u002Fblogs\u002F63c7fc3d62480951f4302a00_63b34a9a0a2ec80d24868021_top.jpeg",{},"\u002Fblog\u002Flocal-apache-pulsar-2-8-0-release-party",{"title":69513,"description":69609},"blog\u002Flocal-apache-pulsar-2-8-0-release-party",[821,9144],"DUVKubYdJppTFIoEI7r_e7SBhAdcvSdEDU6MaFwdG6o",{"id":69618,"title":69619,"authors":69620,"body":69621,"category":7338,"createdAt":290,"date":69763,"description":69764,"extension":8,"featured":294,"image":69765,"isDraft":294,"link":290,"meta":69766,"navigation":7,"order":296,"path":69767,"readingTime":11508,"relatedResources":290,"seo":69768,"stem":69769,"tags":69770,"__hash__":69771},"blogs\u002Fblog\u002Fannouncing-pulsar-virtual-summit-europe-2021-cfp-open.md","Announcing Pulsar Virtual Summit Europe 2021: CFP Is Open!",[69353],{"type":15,"value":69622,"toc":69752},[69623,69626,69629,69632,69635,69639,69642,69645,69659,69667,69671,69685,69687,69701,69708,69710,69715,69722,69726,69740,69742,69746,69748,69750],[48,69624,69625],{},"We’re excited to announce the first-ever Pulsar Virtual Summit Europe!",[48,69627,69628],{},"The inaugural Pulsar Summit North America took place in June 2020. Since then we’ve hosted the Pulsar Summit Asia in November 2020, and the Pulsar Summit North America in June 2021.",[48,69630,69631],{},"Last month’s Pulsar Summit attracted 550+ signups and 250+ companies, including Netflix, Adobe, Cisco, Disney, Oracle, Rakuten, Workday, Twitter, Lowes, CME Group, and Dell. The two-day, virtual event was packed with 6 keynotes and 33 breakout sessions from some of the biggest Pulsar users, such as Splunk, Intuit, Micro Focus, Narvar, Iterable, VMware, and Tencent.",[48,69633,69634],{},"Cumulatively, the Pulsar Summits drew more than 100 speakers, thousands of attendees, and hundreds of companies from diverse industries. The Pulsar Summit is the global event for engineers, architects, data scientists, and technical leaders interested in Pulsar and the messaging and streaming ecosystem. It is a unique opportunity to network and learn about Pulsar project updates, ecosystem developments, best practices, and adoption stories.",[40,69636,69638],{"id":69637},"speak-at-pulsar-virtual-summit-europe-2021","Speak at Pulsar Virtual Summit Europe 2021",[48,69640,69641],{},"Do you have a Pulsar story to share? Join us and speak at the summit. You will be on stage with top Pulsar thought-leaders and it is a great way to raise your profile in the rapidly growing Apache Pulsar community.",[48,69643,69644],{},"We are looking for Pulsar stories that are innovative, informative, and thought-provoking. Here are some suggestions:",[321,69646,69647,69650,69653,69656],{},[324,69648,69649],{},"Your use case \u002F success story",[324,69651,69652],{},"A technical deep dive",[324,69654,69655],{},"Pulsar best practices",[324,69657,69658],{},"Pulsar ecosystem updates",[48,69660,69661,69662,69666],{},"To speak at the summit, please ",[55,69663,56336],{"href":69664,"rel":69665},"https:\u002F\u002Fsessionize.com\u002Fpulsar-virtual-summit-europe-2021",[264]," about your presentation. Remember to keep your proposal relevant and engaging and limit it to 300 words.",[32,69668,69670],{"id":69669},"speaker-benefits-include","Speaker Benefits Include:",[321,69672,69673,69676,69679,69682],{},[324,69674,69675],{},"The opportunity to connect with and meet new people and thought leaders in your space.",[324,69677,69678],{},"The chance to demonstrate your experience and deep knowledge in the event streaming space.",[324,69680,69681],{},"Your name, title, company, and bio will be featured on the Pulsar Virtual Summit Europe 2021 website.",[324,69683,69684],{},"Your session will be added to the Pulsar Summit YouTube Channel and promoted on Twitter and LinkedIn.",[32,69686,39793],{"id":39792},[321,69688,69689,69692,69695,69698],{},[324,69690,69691],{},"CFP opened: June 16th, 2021",[324,69693,69694],{},"CFP closes: July 14th, 2021",[324,69696,69697],{},"Speaker notifications sent: July 21st, 2021",[324,69699,69700],{},"Schedule announcement: July 28th, 2021",[48,69702,69703,69704,39815],{},"If you want some advice or feedback on your proposal, or have any questions about the summit, please do not hesitate to contact us at ",[55,69705,69707],{"href":69706},"mailto:speakers@pulsar-summit.org","speakers@pulsar-summit.org",[40,69709,56379],{"id":56378},[48,69711,69712,69713,38617],{},"Pulsar Summit is a conference for the community and Sponsorship is needed. Sponsoring this event provides a great opportunity for your organization to further engage with the Apache Pulsar community. Contact us at ",[55,69714,39814],{"href":39813},[48,69716,69717,69718,69721],{},"Help us make #PulsarSummit Europe 2021 a big success by spreading the word and submitting your proposal! Follow us on ",[55,69719,39691],{"href":39821,"rel":69720},[264]," to receive the latest updates of the conference!",[32,69723,69725],{"id":69724},"stay-connected","Stay Connected",[321,69727,69728,69734],{},[324,69729,69730,69733],{},[55,69731,10265],{"href":34070,"rel":69732},[264]," for the Pulsar Newsletter.",[324,69735,69736,69739],{},[55,69737,62968],{"href":31692,"rel":69738},[264]," the Apache Pulsar community on Slack.",[32,69741,39828],{"id":39827},[48,69743,69592,69744,190],{},[55,69745,69596],{"href":69595},[32,69747,10248],{"id":10247},[48,69749,69601],{},[48,69751,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":69753},[69754,69758],{"id":69637,"depth":19,"text":69638,"children":69755},[69756,69757],{"id":69669,"depth":279,"text":69670},{"id":39792,"depth":279,"text":39793},{"id":56378,"depth":19,"text":56379,"children":69759},[69760,69761,69762],{"id":69724,"depth":279,"text":69725},{"id":39827,"depth":279,"text":39828},{"id":10247,"depth":279,"text":10248},"2021-08-02","Pulsar Summit is coming to Europe! Submit a talk today and share your Pulsar story. CFP is open till July 14th, 2021.","\u002Fimgs\u002Fblogs\u002F63c7fc4a53f98a0671a482e7_63b34a311bdc1b2087a4f9b7_top.png",{},"\u002Fblog\u002Fannouncing-pulsar-virtual-summit-europe-2021-cfp-open",{"title":69619,"description":69764},"blog\u002Fannouncing-pulsar-virtual-summit-europe-2021-cfp-open",[5376,821],"ibNBELeiuUXJIfQ5RpmMWwPippO_i4b6LCO8mwzUHbk",{"id":69773,"title":69774,"authors":69775,"body":69776,"category":7338,"createdAt":290,"date":69973,"description":69974,"extension":8,"featured":294,"image":69975,"isDraft":294,"link":290,"meta":69976,"navigation":7,"order":296,"path":69977,"readingTime":11508,"relatedResources":290,"seo":69978,"stem":69979,"tags":69980,"__hash__":69981},"blogs\u002Fblog\u002Fhighlights-pulsar-virtual-summit-north-america-2021.md","Highlights from The Pulsar Virtual Summit North America 2021",[69353],{"type":15,"value":69777,"toc":69965},[69778,69781,69784,69787,69791,69797,69801,69807,69811,69814,69864,69866,69869,69925,69929,69932,69947,69954,69956,69963],[48,69779,69780],{},"Hosted by StreamNative and Splunk, the Pulsar Virtual Summit North America 2021, took place on June 16-17th. This 2-day event was packed with Pulsar project updates, ecosystem news, and insights into some of the largest and most exciting Pulsar deployments around the globe.",[48,69782,69783],{},"Since it became a top-level Apache Software Foundation project in 2018, the community has seen an influx of adoption as companies look to Apache Pulsar's cloud-native capabilities, unified messaging and streaming, and super-set of built-in features.",[48,69785,69786],{},"Attendees had the chance to learn from Pulsar committers, contributors, experts, and adopters. Whether you were a Pulsar veteran or a newbie, the event had something for you. Below we provide some of our favorite highlights from the summit.",[40,69788,69790],{"id":69789},"summit-by-the-numbers","Summit by the Numbers",[48,69792,69793],{},[384,69794],{"alt":69795,"src":69796},"image of key point of the pulsar summit north america 2021","\u002Fimgs\u002Fblogs\u002F63b2fc1eccfce6656b066e37_1.png",[40,69798,69800],{"id":69799},"pulsar-community-by-the-numbers","Pulsar Community by the Numbers",[48,69802,69803],{},[384,69804],{"alt":69805,"src":69806},"three graph to show growth of the pulsar community since 2017 ","\u002Fimgs\u002Fblogs\u002F63b2fc1e6630d6f07e7ab830_2.png",[40,69808,69810],{"id":69809},"major-project-updates","Major Project Updates",[48,69812,69813],{},"Last week was an exciting week for the Apache Pulsar community. Not only did the Summit happen, but the Pulsar PMC also announced the release of Pulsar 2.8.0. Summit keynotes and talks from the Technology Deep Dive Stage provided a closer look at some of the most significant project updates. We highlight these below:",[321,69815,69816,69824,69832,69840,69848,69856],{},[324,69817,69818,69823],{},[55,69819,69822],{"href":69820,"rel":69821},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-Bm1h508oIQ",[264],"Apache Pulsar: Why Unified Messaging and Streaming Is the Future",": In this keynote, Matteo Merli and Sijie Guo dive into the landscape of unified messaging and streaming, how Pulsar helps companies achieve this vision, and what the future of Pulsar will look like, including the newly released v 2.8.0.",[324,69825,69826,69831],{},[55,69827,69830],{"href":69828,"rel":69829},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tnWq8opMI6s",[264],"Exactly-Once Made Easy: Transactional Messaging In Apache Pulsar",": In this session, Sijie Guo and Addison Higham discuss Pulsar transaction and how it can be applied to Pulsar Functions and other processing engines to achieve transactional event streaming.",[324,69833,69834,69839],{},[55,69835,69838],{"href":69836,"rel":69837},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=YtX_rSH_UZo",[264],"Replicated Subscriptions: Taking Geo-Replication to the Next Level",": In this session, Matteo Merli, explores various patterns of cluster failover, when it is appropriate to use them and the tradeoffs found in each approach.",[324,69841,69842,69847],{},[55,69843,69846],{"href":69844,"rel":69845},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=NR49Zz7JD-g",[264],"Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for Pulsar 2.8.0",": In this talk, Yunze Xu and Aloys Zhang introduce how KoP has been improved for Pulsar 2.8.0, including the new implementation of KoP offset, the Kafka entry formatter that improves the performance, and additional changes that improve KoP’s stability and Kafka protocol compatibility.",[324,69849,69850,69855],{},[55,69851,69854],{"href":69852,"rel":69853},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=83Or4IlGzSs",[264],"Advanced Stream Processing with Flink and Pulsar",": In this talk, Till Rohrmann and Addison Higham discuss how Flink allows for ambitious stream processing workflows and how using Pulsar and Flink together enables new capabilities that push forward the state-of-the-art in streaming.",[324,69857,69858,69863],{},[55,69859,69862],{"href":69860,"rel":69861},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Bk4mYo9bPoc",[264],"Function Mesh: Complex Streaming Jobs Made Simple",": In this talk, Neng Lu and Rui Fu provide a walkthrough of the Function Mesh, including its design, implementation, use cases, and examples. This session helps users understand how Function Mesh can be used to simplify complex streaming solutions.",[40,69865,4644],{"id":4643},[48,69867,69868],{},"The Summit was jam-packed with groundbreaking Pulsar use cases, from Karthik Ramasamy’s session on Scaling Apache Pulsar to 10 Petabytes\u002FDay to Intuit’s talk on Building the Next-Generation Messaging Platform on Pulsar. More on these and other favorites below:",[321,69870,69871,69880,69889,69898,69907,69916],{},[324,69872,69873,69874,69879],{},"Splunk’s Karthik Ramasamy presented ",[55,69875,69878],{"href":69876,"rel":69877},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1uG3mdfh0nk",[264],"Scaling Apache Pulsar to 10 Petabytes\u002FDay",". In this talk, we learn how Splunk helped a flagship customer scale a Pulsar deployment to handle 10 PB\u002Fday in a single cluster.",[324,69881,69882,69883,69888],{},"Intuit’s Madhavan Narayanan, Sajith Sebastian, Amit Kaushal, Gokul Sarangapani present: ",[55,69884,69887],{"href":69885,"rel":69886},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=CmyHUN5MRUU",[264],"Building the Next-Generation Messaging Platform on Pulsar at Intuit",". In this talk we learn how they use Pulsar for their highly distributed, multi-cluster, multi-region messaging platform to serve the queuing use-cases of its applications and services.",[324,69890,69891,69892,69897],{},"Narvar’s Ankush Goyal shares ",[55,69893,69896],{"href":69894,"rel":69895},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vS4yk4bbLN0",[264],"How Narvar Uses Pulsar to Power the Post-Purchase Experience",". In this session we learn about Narvar’s platform, which is built with pub-sub messaging at its core, making reliability, scalability, maintainability, and flexibility business critical features. Ankush shares why Narvar adopted Pulsar and how Narvar is leveraging Pulsar today.",[324,69899,69900,69901,69906],{},"MicroFocus’s Srikanth Natarajan presents ",[55,69902,69905],{"href":69903,"rel":69904},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GKh7a8-ZjD4",[264],"Why Micro Focus Chose Pulsar for Data Ingestion",". In this talk, we learn about Micro Focus’ adoption of Pulsar, including the lessons learned, and the help that Micro Focus received from a development support partner in their Pulsar journey.",[324,69908,69909,69910,69915],{},"Iterable’s Tom Wang and Thomas Kim present ",[55,69911,69914],{"href":69912,"rel":69913},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=kBBUFdKvDgI",[264],"Migrating a Billion Transactions a Day to Apache Pulsar",". In this talk we learn how Iterable, a marketing automation SaaS that operates at a vast scale, processes a billion queue transactions per day.",[324,69917,69918,69919,69924],{},"Verizon Media’s Rajan Dhabalia and Ludwig Pummer present on ",[55,69920,69923],{"href":69921,"rel":69922},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pDAh-gh-aZ0",[264],"Security and Multi-tenancy with Apache Pulsar in Yahoo!",". In this talk we learn how Verizon Media uses different multi-tenancy dimensions of Apache Pulsar to serve multiple use cases and applications on a shared pulsar cluster.",[40,69926,69928],{"id":69927},"summit-news","Summit News",[48,69930,69931],{},"In addition to the great content, there were other project updates announced during the summit:",[321,69933,69934,69940],{},[324,69935,69936,69937,190],{},"Pulsar Hackathon Winners: The winners of Pulsar Hackathon 2021 were announced at the summit. You can learn more about the top projects and view the video submissions ",[55,69938,267],{"href":69939},"\u002Fen\u002Fblog\u002Fcommunity\u002F2021-06-22-pulsar-hackathon-2021-winners-announced",[324,69941,69942,69943,20076],{},"CFP for Pulsar Summit Europe 2021: The CFP for Pulsar Summit Europe 2021 is now open. ",[55,69944,69946],{"href":69664,"rel":69945},[264],"Submit a talk",[48,69948,69949,69950,190],{},"We want to thank our event hosts, community sponsors, and speakers for making this an unforgettable event. We also want to say thank you to everyone who attended! We hope you enjoyed it as much as we did. And, don’t worry if you missed the live event. The recordings and slides are ",[55,69951,62934],{"href":69952,"rel":69953},"https:\u002F\u002Fwww.na2021.pulsar-summit.org\u002Fagenda",[264],[40,69955,69725],{"id":69724},[48,69957,69958,69959,190],{},"If you’re not signed up for the StreamNative newsletter, ",[55,69960,69962],{"href":34070,"rel":69961},[264],"sign up today",[48,69964,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":69966},[69967,69968,69969,69970,69971,69972],{"id":69789,"depth":19,"text":69790},{"id":69799,"depth":19,"text":69800},{"id":69809,"depth":19,"text":69810},{"id":4643,"depth":19,"text":4644},{"id":69927,"depth":19,"text":69928},{"id":69724,"depth":19,"text":69725},"2021-06-22","The Pulsar Virtual Summit North America 2021 was packed with Pulsar project updates, ecosystem news, and insights into some of the largest and most exciting Pulsar deployments around the globe. This blog shares the highlights of the event.","\u002Fimgs\u002Fblogs\u002F63c7fc6cf8ae5ab7b8d16213_63b2fc1e27dabd2a0f47a0b9_top.png",{},"\u002Fblog\u002Fhighlights-pulsar-virtual-summit-north-america-2021",{"title":69774,"description":69974},"blog\u002Fhighlights-pulsar-virtual-summit-north-america-2021",[5376,821],"rBPPSCxazDuT00545qzeiP8afdWImewjnywIcHHLtKU",{"id":69983,"title":69984,"authors":69985,"body":69986,"category":7338,"createdAt":290,"date":69973,"description":70183,"extension":8,"featured":294,"image":70184,"isDraft":294,"link":290,"meta":70185,"navigation":7,"order":296,"path":70186,"readingTime":4475,"relatedResources":290,"seo":70187,"stem":70188,"tags":70189,"__hash__":70190},"blogs\u002Fblog\u002Fpulsar-hackathon-2021-winners-announced.md","Pulsar Hackathon 2021 Winners Announced",[69353,44843],{"type":15,"value":69987,"toc":70170},[69988,69991,69994,69997,70001,70004,70021,70025,70028,70051,70054,70058,70061,70065,70068,70071,70074,70081,70085,70088,70091,70097,70101,70104,70107,70110,70116,70120,70123,70126,70132,70136,70139,70142,70148,70152,70155,70159],[48,69989,69990],{},"2021 has been an exciting year for the Apache Pulsar community. In May the community hit the 400 Contributor mark and Apache Pulsar surpassed Apache Kafka in Monthly Active Contributors!",[48,69992,69993],{},"To continue the Pulsar momentum, StreamNative hosted the first-ever Pulsar Hackathon May 6th & 7th, 2021. The goal of the Hackathon was to engage the Pulsar community, drive contributions, and generate ideas to enhance Pulsar and its ecosystem, and, with more than 130 signups for the event, it was a success! We’d like to start by saying thank you to everyone who participated.",[48,69995,69996],{},"In this post we’ll share more on the winners and their projects. But before we get to the winners, let’s talk about the challenge.",[40,69998,70000],{"id":69999},"the-categories","The Categories",[48,70002,70003],{},"To help inspire the teams, we created five categories for the hackathon:",[321,70005,70006,70009,70012,70015,70018],{},[324,70007,70008],{},"Pulsar Enhancement: adding new features, improving performance, etc.",[324,70010,70011],{},"Pulsar + Big Data Ecosystem Integration: integrating Pulsar with other influential data systems for easy usage",[324,70013,70014],{},"Pulsar + Flink Solution: developing end-to-end general data processing solutions based on Pulsar and Apache Flink",[324,70016,70017],{},"Pulsar + Cloud: enabling Pulsar to run on cloud environments easily and seamlessly",[324,70019,70020],{},"BookKeeper Enhancement: adding new features, improving performance, etc.",[40,70022,70024],{"id":70023},"the-judges","The Judges",[48,70026,70027],{},"A panel of seven judges with deep experience in Apache Pulsar and real-time data streaming and messaging technologies was assembled. That panel included:",[321,70029,70030,70033,70036,70039,70042,70045,70048],{},[324,70031,70032],{},"Matteo Merli, Apache Pulsar Chair, CTO at StreamNative",[324,70034,70035],{},"Jerry Peng, Apache Pulsar PMC Member, Principal Software Engineer at Splunk",[324,70037,70038],{},"Addison Higham, Chief Architect at StreamNative",[324,70040,70041],{},"Ricardo Ferreira, Principal Developer Advocate at Elastic",[324,70043,70044],{},"Sijie Guo, Apache Pulsar PMC Member, CEO at StreamNative",[324,70046,70047],{},"Nozomi Kurihara, Manager of the Messaging Platform Team at Yahoo! Japan",[324,70049,70050],{},"Arvid Heise, Senior Engineer at Ververica",[48,70052,70053],{},"The teams were judged on three criteria: Innovation, Utility \u002F Applicability, Difficulty.",[40,70055,70057],{"id":70056},"the-winners","The Winners",[48,70059,70060],{},"After two days of hard work, 11 teams submitted their projects by the deadline. The winners were announced live at the Pulsar Virtual Summit North America 2021 on June 16th and 17th. Below are the top-rated submissions.",[32,70062,70064],{"id":70063},"first-place-5000-prize-zookeeper-free","First Place ($5,000 Prize): ZooKeeper Free",[48,70066,70067],{},"This project eliminates Pulsar’s dependency on ZooKeeper by bringing metadata storage and management into BookKeeper. This project also enables Pulsar users to handle metadata more flexibly by introducing a unified metadata API for brokers and bookies. Team members include Bo Cong, Ran Gao, Yang Yang, Yu Liu, and Zike Yang.",[48,70069,70070],{},"“It not only innovates Pulsar's architecture but also minimizes the operational complexity of it's architecture. It is by far the hardest project to implement.” - Ricardo Ferreira, Hackathon Judge",[48,70072,70073],{},"“Concise \u002F Easy to understand.” - Nozomi Kurihara, Hackathon Judge",[48,70075,70076,70077,190],{},"Watch the demo ",[55,70078,267],{"href":70079,"rel":70080},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=peuwkUH_Kzc",[264],[32,70082,70084],{"id":70083},"second-place-2500-prize-support-hyperscale-topics-and-clients","Second Place ($2,500 Prize): Support Hyperscale Topics and Clients",[48,70086,70087],{},"This project enables scale in producers and topics by reducing load in ZooKeeper and introducing the Topic Level Batch approach. Team members include Lin Lin, Hang Chen, and Penghui Li.",[48,70089,70090],{},"“A simple idea to scale the production workload.”- Sijie Guo, Hackathon Judge",[48,70092,70076,70093,190],{},[55,70094,267],{"href":70095,"rel":70096},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=2A3cpI5Rlqk",[264],[32,70098,70100],{"id":70099},"third-place-1000-prize-pulsar-watermarking","Third Place ($1,000 Prize): Pulsar Watermarking",[48,70102,70103],{},"This project solves the challenge of generating watermarks when consuming a Pulsar topic. It teaches Pulsar how to broker the transmission of event time watermarks from producers to consumers. Team members include Jennifer Huang, Eron Wright, Giannis Polyzos, and Murthy Kakarlamudi.",[48,70105,70106],{},"“The overall idea of this project is interesting because out-of-order events are likely to happen in stream processing apps.” - Ricardo Ferreira, Hackathon Judge",[48,70108,70109],{},"“This is a really innovative feature that no other streaming system seems to have. ” - Addison Higham, Hackathon Judge",[48,70111,70076,70112,190],{},[55,70113,267],{"href":70114,"rel":70115},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=uPC9jx7NyUo",[264],[32,70117,70119],{"id":70118},"fourth-place-pulsar-multi-tenant-bookkeeper-storage-isolation","Fourth Place: Pulsar Multi-Tenant BookKeeper Storage Isolation",[48,70121,70122],{},"This project teaches Pulsar to store entryLogFiles to different folders according to the tenant. Team members include Jialing Wang, Hao Zhang, Shaojie Wang, and Xin Yi.",[48,70124,70125],{},"“Super useful feature for making multi-tenancy easier to manage at the storage layer. This, paired with some improvements around quotas, would be very useful in helping to manage larger Pulsar clusters” - Addison Higham, Hackathon Judge",[48,70127,70076,70128,190],{},[55,70129,267],{"href":70130,"rel":70131},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=FdIxRNoH1q8",[264],[32,70133,70135],{"id":70134},"fifth-place-integrated-with-apm","Fifth Place: Integrated with APM",[48,70137,70138],{},"This project aims to fill in the missing tracing piece of Pulsar's Ops and achieve sample tracing in Pulsar Broker and integration with SkyWalking. Team members include Zhangjian He and Tian Luo.",[48,70140,70141],{},"“This use case is quite useful for troubleshooting end-2-end Pulsar applications.” - Ricardo Ferreira, Hackathon Judge",[48,70143,70076,70144,190],{},[55,70145,267],{"href":70146,"rel":70147},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0w9ab3DEW5E",[264],[40,70149,70151],{"id":70150},"the-innovation-continues-streamnative-incubator-program","The Innovation Continues: StreamNative Incubator Program",[48,70153,70154],{},"Hackathon participants will be able to continue their projects as part of the StreamNative Incubator Program. This 12-week engagement will pair each participating team with a mentor from the Pulsar\u002FBookKeeper communities. Together, the team members and mentor will lay out a practical execution plan for the project and work together to execute the final code. We will share progress and updates with the Pulsar community!",[40,70156,70158],{"id":70157},"stay-involved","Stay involved:",[48,70160,70161,70162,70165,70166,190],{},"Join Apache Pulsar on ",[55,70163,55984],{"href":57760,"rel":70164},[264],". Sign up for the ",[55,70167,70169],{"href":34070,"rel":70168},[264],"StreamNative Newsletter",{"title":18,"searchDepth":19,"depth":19,"links":70171},[70172,70173,70174,70181,70182],{"id":69999,"depth":19,"text":70000},{"id":70023,"depth":19,"text":70024},{"id":70056,"depth":19,"text":70057,"children":70175},[70176,70177,70178,70179,70180],{"id":70063,"depth":279,"text":70064},{"id":70083,"depth":279,"text":70084},{"id":70099,"depth":279,"text":70100},{"id":70118,"depth":279,"text":70119},{"id":70134,"depth":279,"text":70135},{"id":70150,"depth":19,"text":70151},{"id":70157,"depth":19,"text":70158},"Read about the winning projects of the Pulsar Hackathon 2021 and the StreamNative Incubator program.","\u002Fimgs\u002Fblogs\u002F63c7fc5b6248091ae6302b04_63b348d7e6977479ae76cac2_top.png",{},"\u002Fblog\u002Fpulsar-hackathon-2021-winners-announced",{"title":69984,"description":70183},"blog\u002Fpulsar-hackathon-2021-winners-announced",[799,821],"-KMiW8e0S9nYqAVGcM_2VXylpPM-kPBnWyXVLSwjdYg",{"id":70192,"title":70193,"authors":70194,"body":70195,"category":821,"createdAt":290,"date":70584,"description":70585,"extension":8,"featured":294,"image":70586,"isDraft":294,"link":290,"meta":70587,"navigation":7,"order":296,"path":70588,"readingTime":31039,"relatedResources":290,"seo":70589,"stem":70590,"tags":70591,"__hash__":70592},"blogs\u002Fblog\u002Fdeep-dive-transactions-apache-pulsar.md","A Deep-dive of Transactions in Apache Pulsar",[808],{"type":15,"value":70196,"toc":70565},[70197,70205,70216,70219,70223,70226,70229,70232,70238,70241,70244,70255,70258,70262,70265,70268,70272,70275,70278,70281,70287,70290,70294,70297,70301,70304,70307,70310,70318,70322,70325,70331,70334,70340,70344,70353,70356,70362,70366,70370,70373,70376,70379,70382,70385,70388,70392,70395,70399,70402,70405,70409,70412,70416,70419,70423,70426,70429,70433,70436,70439,70442,70444,70447,70450,70453,70456,70460,70463,70467,70470,70481,70484,70487,70491,70494,70498,70501,70525,70539,70541,70544,70547,70550,70553],[48,70198,70199,70200,70204],{},"In a previous blog post, ",[55,70201,70203],{"href":70202},"\u002Fen\u002Fblog\u002Frelease\u002F2021-06-14-exactly-once-semantics-with-transactions-in-pulsar","Exactly-Once Semantics with Transactions in Pulsar",", we introduced the exactly-once semantics enabled by Transaction API for Apache Pulsar. That blog post covered the various message delivery semantics, including:",[321,70206,70207,70210,70213],{},[324,70208,70209],{},"The single-topic exactly-once semantics enabled by idempotent producer",[324,70211,70212],{},"The Transaction API",[324,70214,70215],{},"The end-to-end exactly-once processing semantics for the Pulsar and Flink integration",[48,70217,70218],{},"In this blog post, we will dive deeper into the transactions in Apache Pulsar. The goal here is to familiarize you with the main concepts needed to use the Pulsar Transaction API effectively.",[40,70220,70222],{"id":70221},"why-transactions","Why Transactions?",[48,70224,70225],{},"Transactions strengthen the message delivery semantics and the processing guarantees for stream processing (i.e using Pulsar Functions or integrating with other stream processing engines). These stream processing applications usually exhibit a “consume-process-produce” pattern when consuming and producing from and to data streams such as Pulsar topics.",[48,70227,70228],{},"The demand for stream processing applications with stronger processing guarantees has grown along with the rise of stream processing. For example, in the financial industry, financial institutions use stream processing engines to process debits and credits for users. This type of use case requires that every message is processed exactly once, without exception.",[48,70230,70231],{},"In other words, if a stream processing application consumes message A and produces the result as a message B (B = f(A)), then exactly-once processing guarantee means that A can only be marked as consumed if and only if B is successfully produced, and vice versa.",[48,70233,70234],{},[384,70235],{"alt":70236,"src":70237},"illusatrion of transaction","\u002Fimgs\u002Fblogs\u002F63b2f9cb3ddd88f0dbf748a4_1.png",[48,70239,70240],{},"Prior to Pulsar 2.8.0, there was no easy way to build stream processing applications with Apache Pulsar to achieve exactly-once processing guarantees. If you integrate a stream processing engine, like Flink, you might be able to achieve exactly-once processing guarantees. For example, using Flink you can achieve exactly-once processing reading from Pulsar topics, but it is not possible to achieve exactly-once processing writing to Pulsar topics.",[48,70242,70243],{},"When you configure Pulsar producers and consumers for at-least-once delivery semantics, a stream processing application cannot achieve exactly-once processing semantics in the following scenarios:",[1666,70245,70246,70249,70252],{},[324,70247,70248],{},"Duplicate writes: A producer can potentially write a message multiple times due to the internal retry logic. The idempotent producer addresses this via guaranteed message deduplication.",[324,70250,70251],{},"Application crashes: The stream processing application can crash at any time. If the application crashes after writing the result message B but before making the source message A as consumed. The application can reprocess the source message A after it restarts, resulting in a duplicated result message B being written again to the output topic, violating the exactly-once processing guarantees.",[324,70253,70254],{},"Zombie application: The stream processing application can potentially be partitioned from the network in a distributed environment. Typically, new instances of the same stream processing application will be automatically started to replace the ones which were deemed lost. In such a situation, multiple instances of the same processing application may be running. They will process the same input topics and write the results to the same output topics, causing duplicate output messages and violating the exactly-once processing semantics.",[48,70256,70257],{},"The new Transaction API introduced in Pulsar 2.8.0 release is designed to solve the second and third problems.",[40,70259,70261],{"id":70260},"transactional-semantics","Transactional Semantics",[48,70263,70264],{},"The Transaction API enables stream processing applications to consume, process, and produce messages in one atomic operation. That means, a batch of messages in a transaction can be received from, produced to and acknowledged to many topic partitions. All the operations involved in a transaction succeed or fail as one single until.",[48,70266,70267],{},"But how does the Transaction API resolve the three problems above?",[32,70269,70271],{"id":70270},"atomic-writes-and-acknowledgements-across-multiple-topics","Atomic writes and acknowledgements across multiple topics",[48,70273,70274],{},"First, the Transaction API enables atomic writes and atomic acknowledgments to multiple Pulsar topics together as one single unit. All the messages produced or consumed in one transaction are successfully written or acknowledged together, or none of them are. For example, an error during processing can cause a transaction to be aborted, in which case none of the messages produced by the transaction will be consumable by any consumers.",[48,70276,70277],{},"What does this mean to an atomic “consume-process-produce” operation?",[48,70279,70280],{},"Let’s assume that if an application consumes message A from topic T0 and produces a result message B to topic T1 after applying some transforming logic on message A (B = f(A)), then the consume-process-produce operation is atomic only if message A and B are considered successfully consumed and published together, or not at all. The message A is ONLY considered consumed from topic T0 only when it is successfully acknowledged.",[48,70282,70283],{},[384,70284],{"alt":70285,"src":70286},"illusatrion of transaction semantics","\u002Fimgs\u002Fblogs\u002F63b2f9cb304affbe8af329e6_2.png",[48,70288,70289],{},"Transaction API ensures the acknowledgement of message A and the write of message B to happen as atomic, hence the “consume-process-produce” operation is atomic.",[32,70291,70293],{"id":70292},"fence-zombie-instances-via-conditional-acknowledgement","Fence zombie instances via conditional acknowledgement",[48,70295,70296],{},"We solve the problem of zombie instances by conditional acknowledgement. Conditional acknowledgement means if there are two transactions attempting to acknowledge on the same message, Pulsar guarantees that there is ONLY one transaction that can succeed and the other transaction is aborted.",[32,70298,70300],{"id":70299},"read-transactional-messages","Read transactional messages",[48,70302,70303],{},"What is the guarantee for reading messages written as part of a transaction?",[48,70305,70306],{},"The Pulsar broker only dispatches transactional messages to a consumer if the transaction was actually committed. In other words, the broker will not deliver transactional messages which are part of an open transaction, nor will it deliver messages which are part of an aborted transaction.",[48,70308,70309],{},"However, Pulsar doesn’t guarantee that the messages produced within one committed transaction will be consumed all together. There are several reasons for this:",[1666,70311,70312,70315],{},[324,70313,70314],{},"Consumers may not consume from all the topic partitions that participated in the committed transaction. Hence they will never be able to read all the messages that are produced in that transaction.",[324,70316,70317],{},"Consumers may have a different receiver queue size or buffering window size, allowing only a certain amount of messages. That amount can be any arbitrary number.",[40,70319,70321],{"id":70320},"transactions-api","Transactions API",[48,70323,70324],{},"The transaction feature is primarily a server-side and protocol-level feature. Currently it is only available for Java clients. (Support for other language clients will be added in the future releases.) An example “consume-process-produce” application written in Java and using Pulsar’s transaction API would look something like:",[48,70326,70327],{},[384,70328],{"alt":70329,"src":70330},"image of transaction API","\u002Fimgs\u002Fblogs\u002F63b2f9cb89919e09213209b9_3.png",[48,70332,70333],{},"Let’s walk through this example step by step.",[48,70335,70336],{},[384,70337],{"alt":70338,"src":70339},"table with steps and description of transaction API","\u002Fimgs\u002Fblogs\u002F63b2fa6103dcd190e3c092d6_table.webp",[40,70341,70343],{"id":70342},"how-transactions-work","How transactions work",[48,70345,70346,70347,70352],{},"In this section, we present a brief overview of the new components and new request flows introduced by the Transaction APIs. For a more exhaustive treatment of this subject, you may checkout the original design document, or watch ",[55,70348,70351],{"href":70349,"rel":70350},"https:\u002F\u002Fwww.na2021.pulsar-summit.org\u002Fexactly-once-made-easy-transactional-messaging-in-apache-pulsar",[264],"the upcoming Pulsar Summit talk"," where transactions were introduced.",[48,70354,70355],{},"The content below provides an overview to help with debugging or tuning transactions for better performance.",[48,70357,70358],{},[384,70359],{"alt":70360,"src":70361},"illustration to explain how transactions work","\u002Fimgs\u002Fblogs\u002F63b2fa82ccfce60d1605adad_4.png",[32,70363,70365],{"id":70364},"components","Components",[3933,70367,70369],{"id":70368},"transaction-coordinator-and-transaction-log","Transaction coordinator and Transaction log",[48,70371,70372],{},"The transaction coordinator (TC) maintains the topics and subscriptions that interact in a transaction. When a transaction is committed, the transaction coordinator interacts with the topic owner broker to complete the transaction.",[48,70374,70375],{},"The transaction coordinator is a module running inside a Pulsar broker. It maintains the entire life cycle of transactions and prevents a transaction from getting into an incorrect status. The transaction coordinator also handles transaction timeout, and ensures that the transaction is aborted after a transaction timeout.",[48,70377,70378],{},"All the transaction metadata persists in the transaction log. The transaction log is backed by a Pulsar topic. After the transaction coordinator crashes, it can restore the transaction metadata from the transaction log.",[48,70380,70381],{},"Each coordinator owns some subset of the partitions of the transaction log topics, i.e. the partitions for which its broker is the owner.",[48,70383,70384],{},"Each transaction is identified with a transaction id (TxnID). The transaction id is 128-bits long. The highest 16 bits are reserved for the partition of the transaction log topic and the remaining bits are used for generating monotonically increasing numbers by the TC who owns that transaction log topic partition.",[48,70386,70387],{},"It is worth noting that the transaction log topic just stores the state of a transaction and not the actual messages in the transaction. The messages are stored in the actual topic partitions. The transaction can be in various states like “Open”, “Prepare commit”, and “committed”. It is this state and associated metadata that is stored in the transaction log.",[3933,70389,70391],{"id":70390},"transaction-buffer","Transaction buffer",[48,70393,70394],{},"Messages produced to a topic partition within a transaction are stored in the transaction buffer of that topic partition. The messages in the transaction buffer are not visible to consumers until the transactions are committed. The messages in the transaction buffer are discarded when the transactions are aborted.",[3933,70396,70398],{"id":70397},"pending-acknowledge-state","Pending acknowledge state",[48,70400,70401],{},"Message acknowledgments within a transaction are maintained by the pending acknowledge state before the transaction is committed. If a message is in the pending acknowledge state, the message cannot be acknowledged by other transactions until the message is removed from the pending acknowledge state when a transaction is aborted.",[48,70403,70404],{},"The pending acknowledge state is persisted to the pending acknowledge log. The pending acknowledge log is backed by a cursor log. A new broker can restore the state from the pending acknowledge log to ensure the acknowledgement is not lost.",[32,70406,70408],{"id":70407},"data-flow","Data flow",[48,70410,70411],{},"At a high level, the data flow can be broken into multiple steps. 1. Start a transaction. 2. Publish messages with a transaction. 3. Acknowledge messages with a transaction. 4. Complete a transaction.",[3933,70413,70415],{"id":70414},"begin-transaction","Begin transaction",[48,70417,70418],{},"At the beginning of a transaction, the Pulsar client will locate a Transaction Coordinator to request a new transaction ID. The Transaction Coordinator will allocate a transaction ID for the transaction. The transaction will be logged with its transaction id and status of OPEN in the transaction log (as shown in step 1a). This ensures the transaction status is persisted regardless of whether the Transaction Coordinator crashes. After a transaction status entry is logged, TC returns the transaction ID back to the Pulsar client.",[3933,70420,70422],{"id":70421},"publish-messages-with-a-transaction","Publish messages with a transaction",[48,70424,70425],{},"Before the pulsar client produces messages to a new topic partition, the client sends a request to TC to add the partition to the transaction. TC logs the partition changes into its transaction log for durability (as shown in 2.1a). This step ensures TC knows all the partitions that a transaction is handling, so TC can commit or abort changes on each partition at the end-partition phase.",[48,70427,70428],{},"The Pulsar client starts producing messages to partitions. This producing flow is the same as the normal message producing flow. The only difference is the batch of messages produced by a transaction will contain the transaction id. The broker that receives the batch of messages checks if the batch of messages belongs to a transaction. If it doesn’t belong to a transaction, the broker handles the writes as it normally would. If it belongs to a transaction, the broker writes the batch into the partition’s transaction buffer.",[3933,70430,70432],{"id":70431},"acknowledge-messages-with-a-transaction","Acknowledge messages with a transaction",[48,70434,70435],{},"The Pulsar client sends a request to TC the first time a new subscription is acknowledged as part of a transaction. The addition of the subscription to the transaction is logged by TC in step 2.3a. This step ensures TC knows all the subscriptions that a transaction is handling, so TC can commit or abort changes on each subscription at the EndTxn phase.",[48,70437,70438],{},"The Pulsar client starts acknowledging messages on subscriptions. This transactional acknowledgement flow is the same as the normal acknowledgement flow. However the ack request carries a transaction id. The broker receiving the acknowledgement request checks if the acknowledgment belongs to a transaction or not. If it belongs to a transaction, the broker will mark the message as:PENDING_ACK state. PENDING_ACK state means the message can not be acknowledged or negative-acknowledged by other consumers until the ack is committed or aborted. This ensures if there are two transactions attempting to acknowledge one message, only one will succeed and the other one will be aborted.",[48,70440,70441],{},"The Pulsar client will abort the whole transaction when it tries to acknowledge but the conflict is detected on both individual and cumulative acknowledgements.",[3933,70443,54528],{"id":54527},[48,70445,70446],{},"At the end of a transaction, the application will decide to commit or abort the transaction. The transaction can also be aborted if a conflict is detected when acknowledging messages.",[48,70448,70449],{},"When a pulsar client is finished with a transaction, it can issue an end transaction request to TC, with a field indicating whether the transaction is committed or aborted.",[48,70451,70452],{},"TC writes a COMMITTING or ABORTING message to its transaction log (as shown in 3.1a) and begins the process of committing or aborting messages or acknowledgments to all the partitions involved in this transaction. It is shown in 3.2.",[48,70454,70455],{},"After all the partitions involved in this transaction are successfully committed or aborted, TC writes COMMITTED or ABORTED messages to its transaction log. It is shown in 3.3 in the diagram.",[40,70457,70459],{"id":70458},"how-transactions-perform","How transactions perform",[48,70461,70462],{},"So far, this document covered the semantics of transactions and how they work, next let’s turn our attention to how transactions perform.",[32,70464,70466],{"id":70465},"performance-for-transactional-producers","Performance for transactional producers",[48,70468,70469],{},"Transactions cause only moderate write amplification. The additional writes are due to:",[321,70471,70472,70475,70478],{},[324,70473,70474],{},"For each transaction, the producers receive additional requests to register the topic partitions with the coordinator.",[324,70476,70477],{},"When completing a transaction, one transaction marker is written to each partition participating in the transaction.",[324,70479,70480],{},"Finally, the TC writes transaction status changes to the transaction log. This includes a write for each batch of topic partitions added to the transaction ( “prepare commit” and the “committed” status).",[48,70482,70483],{},"The overhead is independent of the number of messages written as part of a transaction. So the key to having higher throughput is to include a large number of messages per transaction. Smaller messages or shorter transaction commit intervals result in more amplification.",[48,70485,70486],{},"The main tradeoff when increasing the transaction duration is that it increases end-to-end latency. Recall that a consumer reading transactional messages will not deliver messages which are part of open transactions. So the longer the interval between commits, the longer consumers will have to wait, increasing the end-to-end latency.",[32,70488,70490],{"id":70489},"performance-for-transactional-consumers","Performance for transactional consumers",[48,70492,70493],{},"The transactional consumer is much simpler than the producer. All the logic is done by the Pulsar broker at the server side. The broker only dispatches the messages that are in completed transactions.",[40,70495,70497],{"id":70496},"further-reading","Further reading",[48,70499,70500],{},"In the blog post, we only scratched the surface of transactions in Apache Pulsar. All the details of the design are documented online. You can find those references listed below:",[1666,70502,70503,70511,70519],{},[324,70504,70505,70510],{},[55,70506,70509],{"href":70507,"rel":70508},"https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI\u002Fedit#heading=h.bm5ainqxosrx",[264],"The design document",": This is the definitive place to learn about the public interfaces, the data flow, the components. You will also learn about how each transaction component is implemented, how each transactional request is processed, how the transactional data is purged, etc.",[324,70512,70513,70518],{},[55,70514,70517],{"href":70515,"rel":70516},"http:\u002F\u002Fpulsar.apache.org\u002Fapi\u002Fclient\u002F2.8.0-SNAPSHOT\u002Forg\u002Fapache\u002Fpulsar\u002Fclient\u002Fapi\u002Ftransaction\u002Fpackage-frame.html",[264],"The Pulsar Client javadocs",": The Javadocs is a great place to learn about how to use the new APIs.",[324,70520,70521,70524],{},[55,70522,70523],{"href":70202},"Exactly-Once Semantics with Transaction Support in Pulsar",": This is the first part of this blog series.",[48,70526,70527,70528,70532,70533,70538],{},"My fellow colleagues Sijie Guo and Addison Higham are going to give a presentation “",[55,70529,70531],{"href":70349,"rel":70530},[264],"Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar","” at the upcoming Pulsar Summit North America 2021 on June 16-17th. If you are interested in this topic, ",[55,70534,70537],{"href":70535,"rel":70536},"https:\u002F\u002Fhopin.com\u002Fevents\u002Fpulsar-summit-north-america-2021",[264],"reserve your spot today"," and listen to them diving into every detail of Pulsar Transaction.",[40,70540,2125],{"id":2122},[48,70542,70543],{},"In the first blog post of this series, Exactly-Once Semantics Made Simple with Transaction Support in Pulsar, we introduced the exactly-once semantics enabled by Transaction API for Apache Pulsar. In this post, we talked about the key design goals for the Transaction API in Apache Pulsar, the semantics of the transaction API, and a high-level idea of how the APIs actually work.",[48,70545,70546],{},"If we consider stream processing as a read-process-write processor, this blog post focuses on the read and write paths with the processing itself being a black box. However, in the real world, a lot happens in the processing stage, which makes exactly-once processing impossible to guarantee using the Transaction API alone. For example, if the processing logic modifies external storage systems, the Transaction API covered here is not sufficient to guarantee exactly-once processing.",[48,70548,70549],{},"The Pulsar and Flink integration uses the Transaction API described here to provide end-to-end exactly-once processing for a wide variety of stream processing applications, even those which update additional state stores during processing.",[48,70551,70552],{},"In the next few weeks we will share the third blog in this series to provide the details on how the Pulsar and Flink integration provides end-to-end exactly-once processing semantics based on the new Pulsar transactions, as well as how to easily write streaming applications with Pulsar and Flink.",[48,70554,70555,70556,70559,70560,70564],{},"If you want to try out the new exactly-once functionality, check out ",[55,70557,3550],{"href":17075,"rel":70558},[264]," or install the ",[55,70561,44086],{"href":70562,"rel":70563},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Fv1.0.0\u002Fquickstart",[264]," today, to create your own applications to process streams of events using the Transaction API.",{"title":18,"searchDepth":19,"depth":19,"links":70566},[70567,70568,70573,70574,70578,70582,70583],{"id":70221,"depth":19,"text":70222},{"id":70260,"depth":19,"text":70261,"children":70569},[70570,70571,70572],{"id":70270,"depth":279,"text":70271},{"id":70292,"depth":279,"text":70293},{"id":70299,"depth":279,"text":70300},{"id":70320,"depth":19,"text":70321},{"id":70342,"depth":19,"text":70343,"children":70575},[70576,70577],{"id":70364,"depth":279,"text":70365},{"id":70407,"depth":279,"text":70408},{"id":70458,"depth":19,"text":70459,"children":70579},[70580,70581],{"id":70465,"depth":279,"text":70466},{"id":70489,"depth":279,"text":70490},{"id":70496,"depth":19,"text":70497},{"id":2122,"depth":19,"text":2125},"2021-06-16","Previously, we introduced the exactly-once semantics enabled by Transaction API for Pulsar. In this blog, we dive deeper into the transactions in Pulsar and familiarize you with the main concepts needed to use the Pulsar Transaction API effectively.","\u002Fimgs\u002Fblogs\u002F63c7fc8bf8ae5a428cd165ce_63b2f9cbccfce613d70568ab_top.png",{},"\u002Fblog\u002Fdeep-dive-transactions-apache-pulsar",{"title":70193,"description":70585},"blog\u002Fdeep-dive-transactions-apache-pulsar",[821,9144],"zpc8_GPhWvlmASo_hSoK0vnNaG0xFJa6VxPLy5GT3YQ",{"id":70594,"title":70595,"authors":70596,"body":70597,"category":3550,"createdAt":290,"date":70584,"description":70601,"extension":8,"featured":294,"image":70877,"isDraft":294,"link":290,"meta":70878,"navigation":7,"order":296,"path":70879,"readingTime":3556,"relatedResources":290,"seo":70880,"stem":70881,"tags":70882,"__hash__":70883},"blogs\u002Fblog\u002Fintroducing-streamnative-platform.md","Introducing StreamNative Platform",[806],{"type":15,"value":70598,"toc":70866},[70599,70602,70605,70616,70619,70623,70626,70629,70635,70638,70657,70660,70664,70667,70674,70678,70681,70684,70704,70708,70711,70714,70728,70732,70735,70738,70746,70749,70753,70756,70760,70763,70766,70771,70777,70791,70795,70798,70801,70804,70807,70810,70824,70826,70829,70832,70843,70856],[48,70600,70601],{},"We are excited to announce StreamNative Platform 1.0, a cloud-native, unified messaging and streaming platform powered by Apache Pulsar. StreamNative Platform provides a complete, declarative API-driven experience for deploying and self-managing Apache Pulsar in your private environments. With StreamNative Platform, we’ve packaged our enterprise expertise with StreamNative Cloud to help you build your own private cloud Pulsar service.",[48,70603,70604],{},"Whether you are an agile development team that needs to get up and running with Apache Pulsar quickly, or a central infrastructure team that is responsible for enabling your engineering team to build messaging and streaming applications, StreamNative Platform may be the right fit for you. StreamNative platform enables you to:",[321,70606,70607,70610,70613],{},[324,70608,70609],{},"Reduce operational costs by using our API-driven automation to deploy and manage in the private environment of your choice.",[324,70611,70612],{},"Reduce risk and costly resource investments, by leveraging our Pulsar expertise to run a secure, reliable, and production-ready messaging and streaming platform.",[324,70614,70615],{},"Run messaging and streaming workloads consistently and scale to meet business demands with efficient use of resources by deploying Apache Pulsar to any private cloud.",[48,70617,70618],{},"With StreamNative Platform, you can achieve the simplicity, flexibility, and efficiency of the cloud without the burden of complex infrastructure operations. We provide all the components of a complete platform—ready out of the box with enterprise-grade configurations.",[40,70620,70622],{"id":70621},"streamnative-platform-features","StreamNative Platform features",[48,70624,70625],{},"The StreamNative Platform is built on Apache Pulsar to allow developers to transition from traditional silos and monolithic applications, to modern microservices and messaging and streaming applications to increase agility and accelerate time to market.",[48,70627,70628],{},"The overall architecture of StreamNative Platform is illustrated in the following figure.",[48,70630,70631],{},[384,70632],{"alt":70633,"src":70634},"illustration o overall architecture of streamnative","\u002Fimgs\u002Fblogs\u002F63b2fb1feca1e747a5e02cfb_snpe-1.png",[48,70636,70637],{},"In this blog, we highlight the enterprise features for StreamNative Platform, including:",[321,70639,70640,70643,70646,70648,70651,70654],{},[324,70641,70642],{},"Transaction (Pulsar 2.8.0)",[324,70644,70645],{},"Kafka-on-Pulsar",[324,70647,29463],{},[324,70649,70650],{},"Enterprise-grade security (Vault & Audit Log)",[324,70652,70653],{},"Declarative API",[324,70655,70656],{},"Integrated with Cloud-Native ecosystem",[48,70658,70659],{},"Below we provide a deep dive on each.",[32,70661,70663],{"id":70662},"enable-unrestricted-developer-productivity-with-pulsar-transactions","Enable unrestricted developer productivity with Pulsar transactions",[48,70665,70666],{},"Built on Apache Pulsar 2.8, StreamNative Platform brings strong transactional guarantees to Pulsar. Transactional guarantees make it easier than ever to write real-time, mission-critical messaging and streaming applications. From tracking ad views to processing financial transactions, you can do it all in real-time and reliably with Pulsar Transaction. You no longer have to develop with lost or duplicated data in mind.",[48,70668,70669,70670,70673],{},"Pulsar PMC member and StreamNative Engineering Lead, Penghui Li, reviews this functionality in detail in the recent blog, ",[55,70671,70672],{"href":70202},"Exactly-once Semantics with Transactions in Pulsar",". Read this blog to learn more about the exactly-once semantics support in Pulsar.",[32,70675,70677],{"id":70676},"empower-kafka-api-users-to-build-upon-a-new-streaming-platform-reimagined-for-kubernetes","Empower Kafka-API users to build upon a new streaming platform reimagined for Kubernetes",[48,70679,70680],{},"Developed by OVHCloud and StreamNative, Kafka-on-Pulsar (KoP) has become one of the most popular protocol handlers in the Apache Pulsar community. Companies including Tencent, Bigo, and Dada Nexus have deployed KoP at internet-scale to migrate their existing Kafka applications to Pulsar.",[48,70682,70683],{},"StreamNative Platform includes the GA release of Kafka-on-Pulsar to enable Kafka-API users to build event streaming applications on a streaming platform architected for Kubernetes. The GA release of Kafka-on-Pulsar includes the following features:",[321,70685,70686,70689,70692,70695,70698,70701],{},[324,70687,70688],{},"Native support for Kafka protocols from 1.0 to 2.6.",[324,70690,70691],{},"Native support for Kafka admin API. All existing Kafka tools can be seamlessly used without any code changes.",[324,70693,70694],{},"Continuous offset to support a broader set of Kafka integrations, like the Kafka Spark connector.",[324,70696,70697],{},"Enterprise-grade security features such as OAuth2 integration.",[324,70699,70700],{},"Native Pulsar performance to Kafka-on-Pulsar (you can get the same performance using Kafka clients as you get from using Pulsar clients).",[324,70702,70703],{},"Preview feature of Kafka Transaction Support.",[32,70705,70707],{"id":70706},"simplify-building-serverless-streaming-applications-with-function-mesh","Simplify building serverless streaming applications with Function Mesh",[48,70709,70710],{},"Pulsar Functions and Pulsar IO have been proven to be two powerful building blocks for developing messaging and event streaming applications. However, running and orchestrating multiple functions and connectors at a large scale is not an easy task. The complexity is increased when the number of functions and connectors increases.",[48,70712,70713],{},"StreamNative Platform leverages Function Mesh to simplify building serverless event streaming applications. The key benefits include:",[321,70715,70716,70719,70722,70725],{},[324,70717,70718],{},"Eases the management of Pulsar Functions and connectors when running multiple instances of Functions and connectors together.",[324,70720,70721],{},"Utilizes the full power of Kubernetes Scheduler, including deployment, scaling and management, to manage and scale Pulsar Functions and connectors.",[324,70723,70724],{},"Allows Pulsar Functions and connectors to run natively in the cloud environment, leading to greater possibilities when more resources become available in the cloud.",[324,70726,70727],{},"Enables Pulsar Functions to work with different messaging systems and to integrate with existing tools in the cloud environment.",[32,70729,70731],{"id":70730},"monitor-and-audit-pulsar-clusters-with-structured-audit-logs","Monitor and audit Pulsar clusters with Structured Audit Logs",[48,70733,70734],{},"Once Pulsar is up and running within a large team, it’s critical to keep an eye on who is touching data and what they’re doing with it. Structured Audit Logs, which is GA on StreamNative Platform, provides an easy way to track user\u002Fapplication access so you can identify potential anomalies and bad actors.",[48,70736,70737],{},"Structured Audit Logs enable you to capture audit logs in a set of dedicated Pulsar topics, either on a local or a remote cluster, including:",[321,70739,70740,70743],{},[324,70741,70742],{},"Capture low-volume, management-related activities, such as creating or deleting tenants, namespaces or topics (enabled by default).",[324,70744,70745],{},"Capture high-volume activities, such as produce, consume, and acknowledge events (can be enabled as needed).",[48,70747,70748],{},"With audit events safely stored in Pulsar topics, you can use Pulsar integrated tools, like Pulsar Functions, Pulsar SQL, and Flink SQL, to process and analyze them. Additionally, you can offload audit events to external data lakes or data warehouses (like Snowflake or Databricks) for analysis using Pulsar IO connectors.",[32,70750,70752],{"id":70751},"self-managing-pulsar-with-a-fully-managed-experience-using-declarative-apis","Self-managing Pulsar with a fully-managed experience using declarative APIs",[48,70754,70755],{},"StreamNative Platform provides high-level declarative APIs by extending the Kubernetes API through Custom Resource Definitions to support the management of Pulsar services. As a user, you can interact with the Custom Resource Definition by defining a Custom Resource that specifies the desired state. Then the StreamNative Platform will take care of the rest.",[3933,70757,70759],{"id":70758},"manage-pulsar-components","Manage Pulsar components",[48,70761,70762],{},"StreamNative Platform provides a set of Custom Resource Definitions to deploy and manage Pulsar components: ZooKeeper, BookKeeper, Pulsar Broker, Pulsar Proxy, and StreamNative Console.",[48,70764,70765],{},"The declarative API enables you to leave the infrastructure handling to software automation, freeing you to focus on your core business applications.",[321,70767,70768],{},[324,70769,70770],{},"Scale Pulsar with a single change to the declarative spec. StreamNative Platform then will spin up the required compute, networking, and storage, and start the new components (bookies, brokers, or proxies).",[48,70772,70773],{},[384,70774],{"alt":70775,"src":70776},"gif of a developer console","\u002Fimgs\u002Fblogs\u002F63b2fb1f877d1a2e843114d9_snpe-2.gif",[321,70778,70779,70782,70788],{},[324,70780,70781],{},"Deploy a fully secure Pulsar cluster with a single declarative spec. StreamNative Platform automates Pulsar configuration for strong authentication, authorization, and network encryption, as well as creates the set of TLS certificates required by Pulsar components to operate.",[324,70783,70784,70785],{},"Upgrade to the latest StreamNative Platform release by specifying the new version in the declarative spec. StreamNative Platform then orchestrates a rolling upgrade, deploying the new version without disruption to ongoing workloads.\n",[384,70786],{"alt":18,"src":70787},"\u002Fimgs\u002Fblogs\u002F63b2fb20e25599f30b9f03b0_snpe-3.gif",[324,70789,70790],{},"Deploy highly-available infrastructure in any environment. StreamNative Platform understands the infrastructure topology of nodes, racks, and zones, while automating the detection of and configuring the Pulsar service to ensure resilience to infrastructure failures.",[32,70792,70794],{"id":70793},"operating-apache-pulsar-with-a-cloud-native-ecosystem","Operating Apache Pulsar with a cloud-native ecosystem",[48,70796,70797],{},"With StreamNative, you can utilize Kubernetes-native interfaces, integrations, and scheduling controls to operate consistently and cost-effectively alongside other applications and data systems.",[48,70799,70800],{},"Initially, we used Helm to provide a simple configuration abstraction on top of Kubernetes to allow you to define a declarative spec as a Helm values yaml file (around 60% to 70% of Pulsar users are using Pulsar or StreamNative Helm charts to deploy Pulsar on Kubernetes).",[48,70802,70803],{},"However, after working to provide a fully managed Pulsar service (StreamNative Cloud) and thinking about how to provide automation and packaged best practices, we found that Helm templates were not the right architecture choice. Helm did not provide important features for running a stateful storage service such as the ability to control deployment sequences and add additional operations between deployment steps.",[48,70805,70806],{},"As a result, we moved to an industry standard and aligned on providing a Kubernetes-native interface with Custom Resource Definitions and Controllers. A Kubernetes-native experience provides a reliable API-driven approach with custom resources and leverages the ecosystem tooling and features inherent to Kubernetes. With this approach, you do not need specialized knowledge of how the applications are deployed, such as how to configure storage and network for stateful services.",[48,70808,70809],{},"Each Pulsar resource configuration spec is defined as a Kubernetes-native Custom Resource Definition, and each Pulsar resource provides an extensible configuration interface:",[321,70811,70812,70815,70818,70821],{},[324,70813,70814],{},"Configure the service configuration, JVM configuration, and Log4j2 configuration for each Pulsar component.",[324,70816,70817],{},"Manage the lifecycle of sensitive credentials and configurations separately, and only reference them in the Pulsar resource configuration spec.",[324,70819,70820],{},"Leverage the industry standards like Kubernetes Secrets to manage the lifecycle of credentials.",[324,70822,70823],{},"Specify workload scheduling rules through Kubernetes Node and Pod affinity. Fully integrated with Prometheus and Grafana. Prometheus on Kubernetes automatically discovers and scraps metrics from Pulsar components.",[40,70825,22668],{"id":2146},[48,70827,70828],{},"The StreamNative team has experience running some of the largest Pulsar deployments in the world and operating StreamNative Cloud. StreamNative Platform brings a cloud-native experience for running Apache Pulsar workloads in on-premises environments. It provides an enterprise-ready deployment of Apache Pulsar that enhances Pulsar’s elasticity, ease of operations, and resiliency.",[48,70830,70831],{},"StreamNative Platform is a strong fit for the following use cases:",[321,70833,70834,70837,70840],{},[324,70835,70836],{},"If you have data on-premises that needs to be streaming.",[324,70838,70839],{},"If you have regulatory requirements that mandate controls of data, systems, and applications to stay within your own isolated environments.",[324,70841,70842],{},"If you want to provide the same StreamNative Cloud experience across all of your use cases.",[48,70844,70845,70846,1154,70851,190],{},"You can learn more about StreamNative Platform from the ",[55,70847,70850],{"href":70848,"rel":70849},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Fv1.0.0\u002Foverview",[264],"user guide",[55,70852,70855],{"href":70853,"rel":70854},"https:\u002F\u002Fwww.google.com\u002Furl?q=https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Fv1.0.0\u002Fquickstart&sa=D&source=editors&ust=1623805143083000&usg=AOvVaw1H7w2s6ccH7X7mIEslBz11",[264],"give it a spin",[48,70857,70858,70859,1154,70862,70865],{},"If you want all of the benefits and capabilities of StreamNative Platform but don’t want to manage it, check out ",[55,70860,3550],{"href":17075,"rel":70861},[264],[55,70863,70864],{"href":57778},"talk to us",". We’d love to hear from you!",{"title":18,"searchDepth":19,"depth":19,"links":70867},[70868,70876],{"id":70621,"depth":19,"text":70622,"children":70869},[70870,70871,70872,70873,70874,70875],{"id":70662,"depth":279,"text":70663},{"id":70676,"depth":279,"text":70677},{"id":70706,"depth":279,"text":70707},{"id":70730,"depth":279,"text":70731},{"id":70751,"depth":279,"text":70752},{"id":70793,"depth":279,"text":70794},{"id":2146,"depth":19,"text":22668},"\u002Fimgs\u002Fblogs\u002F63c7fc7c7a94c3c88c58f89d_63b2fb1f0ad4c146f3c040b4_snpe-top.jpeg",{},"\u002Fblog\u002Fintroducing-streamnative-platform",{"title":70595,"description":70601},"blog\u002Fintroducing-streamnative-platform",[302,3550,821],"h0A_QGuyg5klXLcnBSyIKrEkm5NS3thClErkM2b7FXs",{"id":70885,"title":70886,"authors":70887,"body":70888,"category":821,"createdAt":290,"date":71232,"description":71233,"extension":8,"featured":294,"image":71234,"isDraft":294,"link":290,"meta":71235,"navigation":7,"order":296,"path":62199,"readingTime":42793,"relatedResources":290,"seo":71236,"stem":71237,"tags":71238,"__hash__":71239},"blogs\u002Fblog\u002Fapache-pulsar-launches-2-8-unified-messaging-streaming-transactions.md","Apache Pulsar Launches 2.8: Unified Messaging and Streaming With Transactions",[807,806],{"type":15,"value":70889,"toc":71208},[70890,70894,70897,70900,70904,70907,70925,70928,70931,70934,70943,70946,70953,70956,70959,70962,70970,70972,70975,70978,70981,70984,70987,70990,70992,70995,70998,71004,71008,71012,71015,71018,71021,71027,71030,71033,71039,71042,71046,71049,71053,71056,71059,71063,71066,71069,71073,71076,71079,71083,71086,71089,71092,71095,71099,71102,71105,71109,71112,71115,71160,71164,71167,71170,71174,71177,71179,71187,71194],[40,70891,70893],{"id":70892},"an-overview-of-the-280-release","An Overview of the 2.8.0 Release",[48,70895,70896],{},"Today, the Apache Pulsar Project Management Committee announced the release of Apache Pulsar 2.8.0, which includes a number of exciting upgrades and enhancements. This blog provides a deep dive into the updates from the 2.8.0 release as well as a detailed look at the major Pulsar developments that have helped it evolve into the unified messaging and streaming platform it is today.",[48,70898,70899],{},"Note: The Pulsar community typically releases a major release every 3 months, but it has been 6 months since the release of 2.7.0. We spent more time on 2.8.0 in order to make the transaction API generally available to the Pulsar community.",[40,70901,70903],{"id":70902},"release-28-overview","Release 2.8 Overview",[48,70905,70906],{},"The key features and updates in this release are:",[321,70908,70909,70912,70915,70918,70920,70923],{},[324,70910,70911],{},"Exclusive Producer",[324,70913,70914],{},"Package Management API",[324,70916,70917],{},"Simplified Client Memory Limit Settings",[324,70919,67551],{},[324,70921,70922],{},"New Protobuf Code Generator",[324,70924,9144],{},[32,70926,70911],{"id":70927},"exclusive-producer",[48,70929,70930],{},"By default, the Pulsar producer API provides a “multi-writer” semantic to append messages to a topic. However, there are several use cases that require exclusive access for a single writer, such as ensuring a linear non-interleaved history of messages or providing a mechanism for leader election.",[48,70932,70933],{},"This new feature allows applications to require exclusive producer access in order to achieve a “single-writer” situation. It guarantees that there should be 1 single writer in any combination of errors. If the producer loses its exclusive access, no more messages from it can be published on the topic.",[48,70935,70936,70937,70942],{},"One use case for this feature is the metadata controller in Pulsar Functions. In order to write a single linear history of all the functions metadata updates, the metadata controller requires to elect one leader and that all the “decisions” made by this leader be written on the metadata topic. By leveraging the exclusive producer feature, Pulsar guarantees that the metadata topic contains different segments of updates, one per each successive leader, and there is no interleaving across different leaders. See “",[55,70938,70941],{"href":70939,"rel":70940},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-68%3A-Exclusive-Producer",[264],"PIP-68: Exclusive Producer","” for more details.",[32,70944,70914],{"id":70945},"package-management-api",[48,70947,70948,70949,70942],{},"Since its introduction in version 2.0, the Functions API has become hugely popular among Pulsar users. While it offers many benefits, there are a number of ways to improve the user experience. For example, today, if a function is deployed multiple times, the function package ends up being uploaded multiple times. Also, there is no version management in Pulsar for Functions and IO connectors. The newly introduced package management API provides an easier way to manage the packages for Functions and IO connectors and significantly simplifies the upgrade and rollback processes. Read “",[55,70950,70914],{"href":70951,"rel":70952},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fadmin-api-packages\u002F",[264],[32,70954,70917],{"id":70955},"simplified-client-memory-limit-settings",[48,70957,70958],{},"Prior to 2.8, there are multiple settings in producers and consumers that allow controlling the sizes of the internal message queues. These settings ultimately control the amount of memory the Pulsar client uses. However, there are few issues with this approach that make it complicated to select an overall configuration that controls the total usage of memory.",[48,70960,70961],{},"For example, the settings are based on the “number of messages”, so the expected message size must be adjusted per producer or consumer. If an application has a large (or unknown) number of producers or consumers, it’s very difficult to select an appropriate value for queue sizes. The same is true for topics that have many partitions.",[48,70963,70964,70965,70942],{},"In 2.8, we introduced a new API to set the memory limit. This single memoryLimit setting specifies a maximum amount of memory on a given Pulsar client. The producers and consumers compete for the memory assigned. It ensures the memory used by the Pulsar client will not go beyond the set limit. Read “",[55,70966,70969],{"href":70967,"rel":70968},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-74%3A-Pulsar-client-memory-limits",[264],"PIP-74: Pulsar client memory limits",[32,70971,67551],{"id":67550},[48,70973,70974],{},"Pulsar messages define a very comprehensive set of metadata properties. However, to add a new property, the MessageMetadata definition in Pulsar protocol must change to inform both broker and client of the newly introduced property.",[48,70976,70977],{},"But in certain cases, this metadata property might need to be added from the broker side, or need to be retrieved by the broker at a very low cost. To prevent deserializing these properties from the message metadata, we introduced “Broker Entry Metadata” in 2.8.0 to provide a lightweight approach to add additional metadata properties without serializing and deserializing the protobuf-encoded MessageMetadata.",[48,70979,70980],{},"This feature unblocks a new set of capabilities for Pulsar. For example, we can leverage broker entry metadata to generate broker publish time for the messages appended to the Pulsar topic. The other example is to generate a monotonically increasing sequence-id for messages produced to a Pulsar topic. We use this feature in Kafka-on-Pulsar to implement Kafka offset.",[32,70982,70922],{"id":70983},"new-protobuf-code-generator",[48,70985,70986],{},"Pulsar uses Google Protobuf in order to perform serialization and deserialization of the commands that are exchanged between clients and brokers. Because of the overhead involved with the regular Protobuf implementation, we have been using a modified version of Protobuf 2.4.1. The modifications were done to ensure a more efficient serialization code that used thread local cache for the objects used in the process.",[48,70988,70989],{},"This approach introduced a few issues. For example, the patch to the Protobuf code generator is only based on Protobuf version 2.4.1 and cannot be upgraded to the newer Protobuf versions. In 2.8, we switched the patched Protobuf 2.4.1 to Splunk LightProto as the code generator. The new code generator generates the fastest possible Java code for Protobuf SerDe, is 100% compatible with proto2 definition and wire protocol, and provides zero-copy deserialization using Netty ByteBuf.",[32,70991,9144],{"id":53272},[48,70993,70994],{},"Prior to Pulsar 2.8, Pulsar only supported exactly-once semantics on single topic through Idempotent Producer. While powerful, Idempotent producer only solves a narrow scope of challenges for exactly-once semantics. For example, there is no atomicity when a producer attempts to produce messages to multiple topics. A publish error can occur when the broker serving one of the topics crashes. If the producer doesn’t retry publishing the message again, it results in some messages being persisted once and others being lost. If the producer retries, it results in some messages being persisted multiple times.",[48,70996,70997],{},"In order to address the remaining challenges described above, we’ve strengthened the Pulsar’s delivery semantics by introducing a Pulsar Transaction API to support atomic writes and acknowledgements across multiple topics. The addition of the Transaction API to Apache Pulsar completes our vision of making Pulsar a complete unified messaging and streaming platform.",[48,70999,71000,71001,190],{},"Pulsar PMC member and StreamNative Engineering Lead Penghui Li, goes over this functionality in great detail in his recent blog, Exactly-once Semantics with Transactions in Pulsar. You can read it to learn more about the ",[55,71002,71003],{"href":70202},"exactly-once semantics support in Pulsar",[40,71005,71007],{"id":71006},"building-a-unified-messaging-and-streaming-platform-with-apache-pulsar","Building a Unified Messaging and Streaming Platform with Apache Pulsar",[32,71009,71011],{"id":71010},"the-evolution-of-apache-pulsar","The Evolution of Apache Pulsar",[48,71013,71014],{},"Apache Pulsar is widely adopted by hundreds of companies across the globe, including Splunk, Tencent, Verizon, and Yahoo! JAPAN, just to name a few. Born as a cloud-native distributed messaging system, Apache Pulsar has evolved into a complete messaging and streaming platform for publishing and subscribing, storing, and processing streams of data at scale and in real-time.",[48,71016,71017],{},"Back in 2012 the Yahoo! team was looking for a global, geo-replicated infrastructure that could manage all of Yahoo!’s messaging data. After vetting the messaging and streaming landscape it became clear that existing technologies were not able to serve the need for an event-driven organization. As a result, the team at Yahoo! set out to build its own.",[48,71019,71020],{},"At the time, there were generally two types of systems to handle in-motion data: message queues that handled mission-critical business events in real-time, and streaming systems that handled scalable data pipelines at scale. Companies had to limit their capabilities to one or the other, or they had to adopt multiple different technologies. If they chose multiple technologies, they would end up with a complex infrastructure that often resulted in data segregation and data silos, with one silo for message queues used to build application services and the other silo for streaming systems used to build data services. The figure below illustrates what this can look like.",[48,71022,71023],{},[384,71024],{"alt":71025,"src":71026},"graph of application services and data services","\u002Fimgs\u002Fblogs\u002F63b2f8faf60c0541fd9db981_1.png",[48,71028,71029],{},"However, with the diversity of data that companies need to process beyond operational data (like log data, click events, etc), coupled with the increase in the number of downstream systems that need access to combined business data and operational data, the system would need to support message queueing and streaming.",[48,71031,71032],{},"Beyond that, companies need an infrastructure platform that would allow them to build all of their applications on top of it, and then have those applications handle in-motion data (messaging and streaming data) by default. This way real-time data infrastructure could be significantly simplified, as illustrated in the diagram below.",[48,71034,71035],{},[384,71036],{"alt":71037,"src":71038},"illustration o unified application and data services","\u002Fimgs\u002Fblogs\u002F63b2f8fa89919e42693209ac_2.png",[48,71040,71041],{},"With that vision, the Yahoo! team started working on building a unified messaging and streaming platform for in-motion data. Below is an overview of the key milestones on the Pulsar journey, from inception to today.",[32,71043,71045],{"id":71044},"step-1-a-scalable-storage-for-streams-of-data","Step 1: A scalable storage for streams of data",[48,71047,71048],{},"The journey of Pulsar began with Apache BookKeeper. Apache BookKeeper implements a log-like abstraction for continuous streams and provides the ability to run it at internet-scale with simple write-read log APIs. A log provides a great abstraction for building distributed systems, such as distributed databases and pub-sub messaging. The write APIs are in the form of appends to the log. And the read APIs are in the form of continuous read from a starting offset defined by the readers. The implementation of BookKeeper created the foundation - a scalable log-backed messaging and streaming system.",[32,71050,71052],{"id":71051},"step-2-a-multi-layered-architecture-that-separates-compute-from-storage","Step 2: A multi-layered architecture that separates compute from storage.",[48,71054,71055],{},"On top of the scalable log storage, a stateless serving layer was introduced which runs stateless brokers for publishing and consuming messages. This multi-layered architecture separates serving\u002Fcompute from storage, allowing Pulsar to manage serving and storage in separate layers.",[48,71057,71058],{},"This architecture also ensures instant scalability and higher availability. Both of these factors are extremely important and make Pulsar well-suited for building mission-critical services, such as billing platforms for financial use cases, transaction processing systems for e-commerce and retailers, and real-time risk control systems for financial institutions.",[32,71060,71062],{"id":71061},"step-3-unified-messaging-model-and-api","Step 3: Unified messaging model and API",[48,71064,71065],{},"In a modern data architecture, the real-time use cases can typically be categorized into two categories: queueing and streaming. Queueing is typically used for building core business application services while streaming is typically used for building real-time data services such as data pipelines.",[48,71067,71068],{},"To provide one platform able to serve both application and data services required a unified messaging model that integrates queuing and streaming semantics. The Pulsar topics become the source of truth for consumption. Messages can be stored only once on topics, but can be consumed in different ways via different subscriptions. Such unification significantly reduces the complexity of managing and developing messaging and streaming applications.",[32,71070,71072],{"id":71071},"step-4-schema-api","Step 4: Schema API",[48,71074,71075],{},"Next, a new Pulsar schema registry and a new type-safe producer & consumer API were added. The built-in schema registry enables message producers and consumers on Pulsar topics to coordinate on the structure of the topic’s data through the Pulsar broker itself, without needing an external coordination mechanism. With data schemas, every single piece of data traveling through Pulsar is completely discoverable, enabling you to build systems that can easily adapt as the data changes.",[48,71077,71078],{},"Furthermore, the schema registry keeps track of data compatibility between versions of the schema. As the new schemas are uploaded the registry ensures that new schema versions are able to be read by old consumers. This ensures that Producers cannot break Consumers.",[32,71080,71082],{"id":71081},"step-5-functions-and-io-api","Step 5: Functions and IO API",[48,71084,71085],{},"The next step was to build APIs that made it easy to get data in and out of Pulsar and process it. The goal was to make it easy to build event-driven applications and real-time data pipelines with Apache Pulsar, so you can then process those events when they arrive, no matter where they originated from.",[48,71087,71088],{},"The Pulsar IO API allows you to build real-time streaming data pipelines by plugging various source connectors to get data from external systems into Pulsar and sink connectors to get data from Pulsar into external systems. Today, Pulsar provides several built-in connectors that you can use.",[48,71090,71091],{},"Additionally, StreamNative hosts StreamNative Hub (a registry of Pulsar connectors) that provides dozens of connectors integrated with popular data systems. If the IO API is for building streaming data pipelines, the Functions API is for building event-driven applications and real-time stream processors.",[48,71093,71094],{},"The serverless function concepts were adopted into stream processing and then built the Functions API as a lightweight serverless library that you can write any event processing logic using any language you like. The underlying motivation was to enable your engineering team to write stream processing logic without the operational complexity of running and maintaining yet another cluster.",[32,71096,71098],{"id":71097},"step-6-infinite-storage-for-pulsar-via-tiered-storage","Step 6: Infinite storage for Pulsar via Tiered Storage",[48,71100,71101],{},"As adoption of Apache Pulsar continued and the amount of data stored in Pulsar increased, users eventually hit a “retention cliff”, at which point it became significantly more expensive to store, manage, and retrieve data in Apache BookKeeper. To work around this, operators and application developers typically use an external store like AWS S3 as a sink for long-term storage. This means you lose most of the benefits of Pulsar’s immutable stream and ordering semantics, and instead end up having to manage two different systems with different access patterns.",[48,71103,71104],{},"The introduction of Tiered Storage allows Pulsar to offload the majority of the data to a remote cloud-native storage. This cheaper form of storage readily scales with the volume of data. More importantly, with the addition of Tiered Storage, Pulsar provides the batch storage capabilities needed to support batch processing when integrating with a unified batch and stream processor like Flink. The unified batch and stream processing capabilities integrated with Pulsar enable companies to query real-time streams with historical context quickly and easily, unlocking a unique competitive advantage.",[32,71106,71108],{"id":71107},"step-7-protocol-handler","Step 7: Protocol Handler",[48,71110,71111],{},"After introducing tiered storage, Pulsar evolved from a Pub\u002FSub messaging system into a scalable stream data system that can ingest, store, and process streams of data. However, existing applications written using other messaging protocols such as Kafka, AMQP, MQTT, etc had to be rewritten to adopt Pulsar’s messaging protocol.",[48,71113,71114],{},"The Protocol Handler API further reduces the overhead of adopting Pulsar for building messaging and streaming applications, and allows developers to extend Pulsar capabilities to other messaging domains by leveraging all the benefits provided by Pulsar architecture. This resulted in major collaborations between StreamNative and other industry leaders to develop popular protocol handlers including:",[321,71116,71117,71129,71140,71152],{},[324,71118,71119,71123,71124,71128],{},[55,71120,49302],{"href":71121,"rel":71122},"https:\u002F\u002Fhub.streamnative.io\u002Fprotocol-handlers\u002Fkop\u002F0.2.0",[264],", which was ",[55,71125,71127],{"href":71126},"\u002Fen\u002Fblog\u002Ftech\u002F2020-03-24-bring-native-kafka-protocol-support-to-apache-pulsar","launched in March 2020"," by OVHCloud and StreamNative.",[324,71130,71131,71123,71135,71139],{},[55,71132,49346],{"href":71133,"rel":71134},"https:\u002F\u002Fhub.streamnative.io\u002Fprotocol-handlers\u002Faop\u002F0.1.0",[264],[55,71136,71138],{"href":71137},"\u002Fen\u002Fblog\u002Ftech\u002F2020-06-15-announcing-aop-on-pulsar","announced in June 2020"," by China Mobile and StreamNative.",[324,71141,71142,71123,71147,71151],{},[55,71143,71146],{"href":71144,"rel":71145},"https:\u002F\u002Fhub.streamnative.io\u002Fprotocol-handlers\u002Fmop\u002F0.2.0",[264],"MQTT-on-Pulsar (MoP)",[55,71148,71150],{"href":71149},"\u002Fen\u002Fblog\u002Ftech\u002F2020-09-28-announcing-mqtt-on-pulsar","announced in August 2020"," by StreamNative.",[324,71153,71154,71159],{},[55,71155,71158],{"href":71156,"rel":71157},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Frop",[264],"RocketMQ-on-Pulsar (RoP)",", which was launched in May 2021 by Tencent Cloud and StreamNative.",[32,71161,71163],{"id":71162},"step-8-transaction-api-for-exactly-once-stream-processing","Step 8: Transaction API for exactly-once stream processing",[48,71165,71166],{},"More recently, transactions were added to Apache Pulsar to enable exactly-once semantics for stream processing. This is a fundamental feature that provides a strong guarantee for streaming data transformations, making it easy to build scalable, fault-tolerant, stateful messaging and streaming applications that process streams of data.",[48,71168,71169],{},"Furthermore, the transaction API capabilities are not limited to a given language client. Pulsar’s support for transactional messaging and streaming is primarily a protocol-level capability that can be presented in any language. Such protocol-level capability can be leveraged in all kinds of applications.",[40,71171,71173],{"id":71172},"building-an-ecosystem-for-unified-messaging-and-streaming","Building an ecosystem for unified messaging and streaming",[48,71175,71176],{},"In addition to contributing to the Pulsar technology, the community is also working to build a robust ecosystem to support it. Pulsar’s ability to support a rich ecosystem of pub-sub libraries, connectors, functions, protocol handlers, and integrations with popular query engines will enable Pulsar adopters to streamline workflows and achieve new use cases.",[40,71178,68340],{"id":68339},[48,71180,71181,71182,71186],{},"If you are interested in learning more about Pulsar 2.8.0, you can ",[55,71183,71185],{"href":58799,"rel":71184},[264],"download 2.8.0"," and try it out today!",[48,71188,71189,71190,71193],{},"If you want to learn more about how companies have adopted Pulsar, you can ",[55,71191,29176],{"href":70535,"rel":71192},[264]," for Pulsar Summit NA 2021!",[48,71195,71196,71197,71201,71202,1154,71205,190],{},"For more information about the Apache Pulsar project and the progress, please visit the official website at ",[55,71198,71200],{"href":23526,"rel":71199},[264],"https:\u002F\u002Fpulsar.apache.org"," and follow the project on Twitter ",[55,71203,36238],{"href":36236,"rel":71204},[264],[55,71206,36254],{"href":33664,"rel":71207},[264],{"title":18,"searchDepth":19,"depth":19,"links":71209},[71210,71211,71219,71230,71231],{"id":70892,"depth":19,"text":70893},{"id":70902,"depth":19,"text":70903,"children":71212},[71213,71214,71215,71216,71217,71218],{"id":70927,"depth":279,"text":70911},{"id":70945,"depth":279,"text":70914},{"id":70955,"depth":279,"text":70917},{"id":67550,"depth":279,"text":67551},{"id":70983,"depth":279,"text":70922},{"id":53272,"depth":279,"text":9144},{"id":71006,"depth":19,"text":71007,"children":71220},[71221,71222,71223,71224,71225,71226,71227,71228,71229],{"id":71010,"depth":279,"text":71011},{"id":71044,"depth":279,"text":71045},{"id":71051,"depth":279,"text":71052},{"id":71061,"depth":279,"text":71062},{"id":71071,"depth":279,"text":71072},{"id":71081,"depth":279,"text":71082},{"id":71097,"depth":279,"text":71098},{"id":71107,"depth":279,"text":71108},{"id":71162,"depth":279,"text":71163},{"id":71172,"depth":19,"text":71173},{"id":68339,"depth":19,"text":68340},"2021-06-15","This blog provides a deep dive into the updates from the 2.8.0 release as well as a detailed look at the major Pulsar developments that have helped it evolve into the unified messaging and streaming platform it is today.","\u002Fimgs\u002Fblogs\u002F63c7fc9d150008491f286121_63b2f8faa013ca0fc93b963c_top.png",{},{"title":70886,"description":71233},"blog\u002Fapache-pulsar-launches-2-8-unified-messaging-streaming-transactions",[302,821,9144],"g_zZqPyFSd_nqy6hldfO1F1qotDPN1TTnE7jsCKEvmQ",{"id":71241,"title":70203,"authors":71242,"body":71243,"category":821,"createdAt":290,"date":71491,"description":71492,"extension":8,"featured":294,"image":71493,"isDraft":294,"link":290,"meta":71494,"navigation":7,"order":296,"path":71495,"readingTime":33204,"relatedResources":290,"seo":71496,"stem":71497,"tags":71498,"__hash__":71499},"blogs\u002Fblog\u002Fexactly-once-semantics-transactions-pulsar.md",[808],{"type":15,"value":71244,"toc":71474},[71245,71254,71257,71261,71264,71268,71271,71275,71278,71282,71285,71289,71292,71295,71298,71302,71305,71309,71312,71316,71319,71322,71326,71329,71332,71335,71343,71346,71353,71356,71359,71363,71366,71369,71375,71378,71381,71392,71395,71399,71402,71405,71408,71411,71414,71417,71420,71424,71447,71450,71460,71467,71471],[48,71246,71247,71248,71253],{},"We have hit an exciting milestone for the Apache Pulsar community: exactly once semantics. As part of the 2.8 Pulsar release, we have evolved the exactly-once semantic from ",[55,71249,71252],{"href":71250,"rel":71251},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-6:-Guaranteed-Message-Deduplication",[264],"guaranteed message deduplication"," on a single topic to atomic produce and acknowledgement over multiple topics via Transaction API. In this post, I’ll explain what this means, how we made this evolution, and how the transaction features in Pulsar simplify exactly-once semantics for building messaging and streaming applications.",[48,71255,71256],{},"Before diving into the transaction features, let’s get started with an overview of messaging semantics.",[40,71258,71260],{"id":71259},"what-is-exactly-once-semantics","What is exactly-once semantics?",[48,71262,71263],{},"In any distributed system, the machines that form the system can always fail independently of one another. In Apache Pulsar, an individual broker or bookie can crash, or a network failure can happen while the producer is producing a message to a topic. Depending on how the producer handles such a failure, the application can get one of three different semantics.",[32,71265,71267],{"id":71266},"at-least-once-semantics","At-least-once Semantics",[48,71269,71270],{},"If the producer receives an acknowledgement (ACK) from the Pulsar broker, it means that the message has been written to the Pulsar topic. However, if a producer times out on receiving an acknowledgement or receives an error from the Pulsar broker, it might retry sending the message to the Pulsar topic. If the broker had failed right before it sent the ACK but after the message was successfully written to the Pulsar topic, this reattempt leads to the message being written twice and delivered more than once to the consumers.",[32,71272,71274],{"id":71273},"at-most-once-semantics","At-most-once Semantics",[48,71276,71277],{},"If the producer does not attempt to produce the message when it times out on receiving an acknowledgement or receives an error, then the message might end up not being written to the Pulsar topic, and not delivered to the consumers. In some cases in order to avoid the possibility of duplication, we accept that messages will not be written.",[32,71279,71281],{"id":71280},"exactly-once-semantics","Exactly-once Semantics",[48,71283,71284],{},"Exactly-once semantics guarantees that even if a producer retries sending a message multiple times, the message will only be written exactly-once to the Pulsar topic. Exactly-once semantics is the most desirable guarantee, but also one that is not well understood. Exactly-once semantics requires coordination between the messaging system itself and the application producing and consuming the messages. For example, if after consuming and acknowledging a message successfully, your application rewinds the subscription to a previous message ID, your application will receive all the messages from that message ID to the latest one, all over again.",[40,71286,71288],{"id":71287},"challenges-in-supporting-exactly-once-semantics","Challenges in supporting exactly-once semantics",[48,71290,71291],{},"Supporting exactly-once delivery semantics in messaging systems presents some challenges. To describe them, I’ll start with a simple example.",[48,71293,71294],{},"Suppose there is a producer that sends a message “Hello StreamNative” to a Pulsar topic called “Greetings”. Further suppose a consumer on the other end receives messages from the topic and prints them. In a happy path where there are no failures, this works well, and the message “Hello StreamNative” is written to the “Greetings” topic only once. The consumer receives the message, processes it, and acknowledges it to indicate that it has completed its processing. The consumer will not receive the message again, even if the consumer application crashes and restarts.",[48,71296,71297],{},"However, at scale, failure scenarios can happen all the time.",[32,71299,71301],{"id":71300},"a-bookie-can-fail","A bookie can fail",[48,71303,71304],{},"Pulsar stores messages in BookKeeper. BookKeeper is a highly available, durable log storage service where data written to a ledger (a segment of a Pulsar topic) is persisted and replicated multiple times (number n). As a result, BookKeeper can tolerate n-1 bookie failures, meaning that a ledger is available as long as there is at least one bookie available. Inherited from Zab\u002FPaxos, BookKeeper’s replication protocol guarantees that once the data has been successfully written to a quorum of bookies, the data is permanently stored and will be replicated to all bookies within the same ensemble.",[32,71306,71308],{"id":71307},"a-broker-can-fail-or-the-producer-to-broker-connection-can-fail","A broker can fail or the producer-to-broker connection can fail",[48,71310,71311],{},"Durability in Pulsar depends on the producer receiving an ACK from the Pulsar broker. Failure to receive that ACK does not necessarily mean that the produce request itself failed. The broker can crash after writing a message but before it sends an ACK back to the producer. It can also crash before even writing the message to the topic. Since there is no way for the producer to know the nature of the failure, it is forced to assume that the message was not written successfully and to retry it. In some cases, the same message is duplicated in the Pulsar topic, causing the consumers to receive it more than once.",[32,71313,71315],{"id":71314},"the-pulsar-client-can-fail","The Pulsar client can fail",[48,71317,71318],{},"Exactly-once delivery must account for client failures as well. But it is also hard to tell if a client has actually failed and is not just temporarily partitioned from the Pulsar brokers or undergoing an application pause. Having the ability to distinguish between a permanent failure and a soft one is important. The Pulsar broker should discard messages sent by a zombie producer, likewise for the consumer. Once a new client has been restarted, it must be able to recover from whatever state the previous failed client left behind and begin processing from a safe point.",[48,71320,71321],{},"The Pulsar community completes the support for exactly-once semantics in steps. We first introduced Idempotent Producer to support exactly-once semantics on a single topic in the Pulsar 1.20.0-incubating release, and then completed the vision by introducing Transaction API to provide atomicity across multiple topics in the recent 2.8.0 release.",[40,71323,71325],{"id":71324},"idempotent-producer-exactly-once-semantics-on-a-single-topic","Idempotent producer: exactly-once semantics on a single topic",[48,71327,71328],{},"We started the journey of supporting exactly-once semantics in Pulsar by introducing Idempotent Producer in its 1.20.0-incubating release.",[48,71330,71331],{},"What does Idempotent Producer mean? An idempotent operation can be performed once or many times without causing a different result. If Guaranteed Message Deduplication is enabled at the cluster level or the namespace level and a producer is configured to be a Idempotent Producer, the produce requests are idempotent. In the event of an error that causes a producer to retry, the same message sent by the producer multiple times, is guaranteed to write to the Pulsar topic only once on the broker.",[48,71333,71334],{},"To turn on this feature and get exactly-once semantics per partition - meaning no duplicates, no data loss, and in-order semantics - configure the following:",[321,71336,71337,71340],{},[324,71338,71339],{},"Enable message deduplication for all namespaces\u002Ftopics at the cluster level, or for a specific namespace at the namespace policy level, or for a specific topic at the topic policy level",[324,71341,71342],{},"Specify a name for the producer and set the message timeout to 0",[48,71344,71345],{},"How did that feature work? Under the hood, it works in a way very similar to TCP: each message produced to Pulsar will contain a sequence ID that the Pulsar broker will use to dedupe any duplicated message. However, unlike TCP which provides guarantees only within a transient connection, this sequence ID along with the message is persisted to the Pulsar topic and Pulsar broker keeps track of the last received sequence ID. So even if the Pulsar broker fails, any broker that takes over the topic ownership will also know if a message is duplicated or not. The overhead of this mechanism is very low, adding negligible performance overhead over the non-idempotent producer.",[48,71347,71348,71349,190],{},"You can try out this feature in any Pulsar version newer than 1.20.0-incubating by following the tutorial ",[55,71350,267],{"href":71351,"rel":71352},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fcookbooks-deduplication\u002F",[264],[48,71354,71355],{},"While powerful, Idempotent producer only solves a narrow scope of challenges for exactly-once semantics. There are still many other challenges it doesn’t resolve. For example, there is no atomicity when a producer attempts to produce messages to multiple topics. A publish error can occur when the broker serving one of the topics crashes. If the producer doesn’t retry publishing the message again, it results in some messages being persisted once and others being lost. If the producer retries, it results in some messages being persisted multiple times.",[48,71357,71358],{},"On the consumer side, the message acknowledgement was a best-effort operation. The message ACKs can potentially be lost because the consumer has no idea if the broker has received them and will not retry sending ACKs again. This will then result in consumers receiving duplicate messages.",[40,71360,71362],{"id":71361},"transactions-atomic-writes-and-acknowledgments-across-multiple-topics","Transactions: atomic writes and acknowledgments across multiple topics",[48,71364,71365],{},"To address the remaining challenges described above, we’ve strengthened Pulsar’s delivery semantics by introducing a Pulsar Transaction API to support atomic writes and acknowledgments across multiple topics. This allows a producer to send a batch of messages to multiple topics such that either all messages in the batch are eventually visible to any consumer or none are ever visible to consumers. This feature also allows you to acknowledge your messages across multiple topics in the same transaction along with the messages you have processed, thereby allowing end-to-end exactly-once semantics.",[48,71367,71368],{},"Here is an example code snippet to demonstrate the use of Transaction API:",[8325,71370,71373],{"className":71371,"code":71372,"language":8330},[8328],"\nPulsarClient pulsarClient = PulsarClient.builder()\n        .serviceUrl(\"pulsar:\u002F\u002Flocalhost:6650\")\n        .enableTransaction(true)\n        .build();\nTransaction txn = pulsarClient\n        .newTransaction()\n        .withTransactionTimeout(1, TimeUnit.MINUTES)\n        .build()\n        .get();\nproducer.newMessage(txn).value(\"Hello Pulsar Transaction\".getBytes()).send();\nMessage message = consumer.receive();\nconsumer.acknowledge(message.getMessageId(), txn);\ntxn.commit().get();\n \n",[4926,71374,71372],{"__ignoreMap":18},[48,71376,71377],{},"The code example above describes how you can use the new producer API with Transaction API to send messages atomically to a set of topics and use the new consumer API with Transactions to acknowledge the processed messages in the same transaction.",[48,71379,71380],{},"It is worth noting that:",[321,71382,71383,71386,71389],{},[324,71384,71385],{},"A Pulsar topic might have some messages that are part of a transaction while others are not.",[324,71387,71388],{},"A Pulsar client can have multiple concurrent transactions outstanding. This design is fundamentally different from the transactions implementation in other older messaging systems, and results in much higher throughput.",[324,71390,71391],{},"The current Pulsar Transaction API only supports READ_COMMITTED isolation level. The consumer can only read the messages that are not part of a transaction and the messages that are part of a committed transaction. Messages produced in an aborted transaction are not delivered to any consumers.",[48,71393,71394],{},"To use the Transaction API, you don’t need any additional settings in the Pulsar client.",[40,71396,71398],{"id":71397},"end-to-end-exactly-once-stream-processing-made-simple-a-pulsarflink-example","End-to-end exactly-once stream processing made simple: a Pulsar+Flink Example",[48,71400,71401],{},"Exactly-once stream processing is now possible through the Pulsar Transaction API.",[48,71403,71404],{},"One of the most critical questions for a stream processing system is, “Does my stream processing application get the right answer, even if one of the instances crashes in the middle of processing?” The key, when recovering a failed instance, is to resume processing in exactly the same state as before the crash.",[48,71406,71407],{},"Stream processing on Apache Pulsar is a read-process-write operation on Pulsar topics. A source operator that runs a Pulsar consumer reads messages from one or multiple Pulsar topics, some processing operators transform the messages or modify the state maintained by them, and a sink operator that runs a Pulsar producer writes the resulting messages to another Pulsar topic. Exactly-once stream processing is simply the ability to execute a read-process-write operation exactly once. In such a context, “getting the right answer” means not missing any input messages from the source operator or producing any duplicates to the sink operator. This is the behavior users expect from an exactly-once stream processor.",[48,71409,71410],{},"Let’s take the Pulsar and Flink integration as an example.",[48,71412,71413],{},"Prior to Pulsar 2.8.0, the Pulsar and Flink integration only supported exactly-once source connector and at-least-once sink connector. That means if you want to use Flink to build stream applications with Apache Pulsar, the highest processing guarantee you can get end-to-end is at-least-once - the resulting messages from these streaming applications may potentially produce multiple times to the resulting topic in Pulsar.",[48,71415,71416],{},"With the introduction of Pulsar Transaction in 2.8.0, the Pulsar-Flink sink connector can be easily enhanced to support exactly-once semantics. Because Flink uses a two-phase commit protocol to ensure end-to-end exactly-once semantics, we can implement the designated TwoPhaseCommitSinkFunction and hook up the Flink sink message lifecycle with Pulsar Transaction API. When the Pulsar-Flink sink connector calls beginTransaction, it starts a Pulsar Transaction and obtains the transaction id. All the subsequent messages written to the sink connector will be associated with this transaction ID. They will be flushed to Pulsar when the connector calls preCommit. The Pulsar transaction will then be committed or aborted when the connector calls recoverAndCommit and recoverAndAbort accordingly. The integration is very straightforward and the connector just has to persist the transaction ID together with Flink checkpoints so the transaction ID can be retrieved back for commit or abort.",[48,71418,71419],{},"Based on idempotency and atomicity provided by Pulsar Transactions and the globally consistent checkpoint algorithm offered by Apache Flink, the streaming applications built on Pulsar and Flink can easily achieve end-to-end exactly-once semantics.",[40,71421,71423],{"id":71422},"where-to-go-from-here","Where to go from here",[48,71425,71426,71427,71431,71432,71435,71436,71441,71442,71446],{},"Exactly-once semantics via Transaction API is now supported in ",[55,71428,3550],{"href":71429,"rel":71430},"https:\u002F\u002Fauth.streamnative.cloud\u002Flogin?state=hKFo2SAtTWYyejRLMi1CZkFwWE16LUc1X0RFUzZuY3F6ejBWUqFupWxvZ2luo3RpZNkgbkItS09ERTlGWW1ybHZoYWJKUVdOaS1LUHhGWXBjdkyjY2lk2SA2ZXI3M3FLcTQycUIwd2JzcjFTT01hWWJhdTdLaGxldw&client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&protocol=oauth2&audience=https%3A%2F%2Fapi.streamnative.cloud&redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&defaultMethod=signup&scope=openid%20profile%20email%20offline_access&response_type=code&response_mode=query&nonce=c1JvaTJVaU1PT2xmOEVvM2hnWFIwckJ6OUhyX2JOQ1FjN1ljSHE0eC1GSg%3D%3D&code_challenge=vTFvdA2fbYkHvT7j-8Hgg2nIWpnbOSSQWVzeavNh-XE&code_challenge_method=S256&auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTUuMCJ9",[264]," as well as in ",[55,71433,44086],{"href":71434},"\u002Fen\u002Fplatform"," v1.0 and later. If you’d like to understand the exactly-once guarantees in more detail, I’d recommend checking out ",[55,71437,71440],{"href":71438,"rel":71439},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-31%3A-Transaction-Support",[264],"PIP-31"," for the transaction feature. If you’d like to dive deeper into the detailed design, this ",[55,71443,71445],{"href":70507,"rel":71444},[264],"design document"," is worth reading.",[48,71448,71449],{},"This post primarily focuses on describing the nature of the user-facing guarantees as supported by the Transaction API introduced in Apache Pulsar 2.8.0, and how you can use this feature. In our next post, we will go into more details about the API and design.",[48,71451,71452,71453,71456,71457,71459],{},"If you want to put the new Transaction API to practical use, check out ",[55,71454,3550],{"href":71429,"rel":71455},[264]," or download ",[55,71458,44086],{"href":71434}," 1.0 to create your own applications with Pulsar Java clients.",[48,71461,71462,71463,71466],{},"My fellow colleagues Sijie Guo and Addison Higham gave a presentation “",[55,71464,70531],{"href":69828,"rel":71465},[264],"” at the Pulsar Virtual Summit North America 2021.",[40,71468,71470],{"id":71469},"credits","Credits",[48,71472,71473],{},"An amazing team of Pulsar committers and contributors worked for over a year to bring this awesome exactly-once work to Pulsar. Thanks to everyone that has been involved in this feature development: Penghui Li, Ran Gao, Bo Cong, Addison Higham, Jia Zhai, Yong Zhang, Xiaolong Ran, Matteo Merli, and Sijie Guo.",{"title":18,"searchDepth":19,"depth":19,"links":71475},[71476,71481,71486,71487,71488,71489,71490],{"id":71259,"depth":19,"text":71260,"children":71477},[71478,71479,71480],{"id":71266,"depth":279,"text":71267},{"id":71273,"depth":279,"text":71274},{"id":71280,"depth":279,"text":71281},{"id":71287,"depth":19,"text":71288,"children":71482},[71483,71484,71485],{"id":71300,"depth":279,"text":71301},{"id":71307,"depth":279,"text":71308},{"id":71314,"depth":279,"text":71315},{"id":71324,"depth":19,"text":71325},{"id":71361,"depth":19,"text":71362},{"id":71397,"depth":19,"text":71398},{"id":71422,"depth":19,"text":71423},{"id":71469,"depth":19,"text":71470},"2021-06-14","As part of the 2.8 Pulsar release, we have evolved the exactly once semantics from guaranteed message deduplication on a single topic to atomic produce and acknowledgement over multiple topics via Transaction API.","\u002Fimgs\u002Fblogs\u002F63c7fcbcbc45dd72258c7b02_63b2f515ccfce680da021908_top.png",{},"\u002Fblog\u002Fexactly-once-semantics-transactions-pulsar",{"title":70203,"description":71492},"blog\u002Fexactly-once-semantics-transactions-pulsar",[821,9144],"PgAo1GU0IsJ8F7H-fwlU7mN6aj_YFI-FNzMJT6bXDcU",{"id":71501,"title":71502,"authors":71503,"body":71504,"category":7338,"createdAt":290,"date":71491,"description":71652,"extension":8,"featured":294,"image":71653,"isDraft":294,"link":290,"meta":71654,"navigation":7,"order":296,"path":61631,"readingTime":11508,"relatedResources":290,"seo":71655,"stem":71656,"tags":71657,"__hash__":71658},"blogs\u002Fblog\u002Fpulsar-hits-400th-contributor-passes-kafka-monthly-active-contributors.md","Pulsar Hits Its 400th Contributor & Passes Kafka in Monthly Active Contributors",[69353,44843],{"type":15,"value":71505,"toc":71638},[71506,71509,71511,71514,71518,71521,71527,71531,71534,71540,71544,71547,71553,71557,71565,71569,71575,71579,71586,71590,71598,71602,71609,71611,71619,71625],[48,71507,71508],{},"The Pulsar community hit two major milestones: Pulsar welcomed its 400th contributor last month and Pulsar surpassed Kafka in the number of monthly active contributors. We want to thank everyone in the Pulsar community who contributed.",[40,71510,65964],{"id":65963},[48,71512,71513],{},"Community growth and engagement are at the core of every open-source project. The number of contributors (400) signals project adoption and advancement while the number of monthly active contributors indicates the vibrancy of the Pulsar community.",[32,71515,71517],{"id":71516},"_1-400-contributors-and-2x-growth","1. 400 Contributors and 2x Growth",[48,71519,71520],{},"The 400th contributor milestone shows the size of the community and the fact that the number of contributors has more than doubled in the past two years shows the trajectory of the project. The charts below provide an overview of key Pulsar milestones.",[48,71522,71523],{},[384,71524],{"alt":71525,"src":71526},"graph of apache pulsar contributors since 2017","\u002Fimgs\u002Fblogs\u002F63b2f6305777846ee102c053_1.png",[32,71528,71530],{"id":71529},"apache-pulsar-surpasses-apache-kafka","Apache Pulsar Surpasses Apache Kafka",[48,71532,71533],{},"Pulsar recently surpassed Kafka in the number of Monthly Active Contributors. This is significant because Kafka is a large and widely adopted project often considered a lot bigger than Pulsar. As this graph demonstrates, the adoption of Pulsar and engagement in the project has skyrocketed over the last few years and it now has more monthly engagement than Kafka.",[48,71535,71536],{},[384,71537],{"alt":71538,"src":71539},"grph apache pulsar and kafka monthly active contributors since 2017","\u002Fimgs\u002Fblogs\u002F63b2f631d36214a9d676d42a_2.png",[32,71541,71543],{"id":71542},"_3-apache-pulsar-github-stars","3. Apache Pulsar Github Stars",[48,71545,71546],{},"Github stars are another key metric for open source projects. The chart below shows the accelerating growth for Apache Pulsar.",[48,71548,71549],{},[384,71550],{"alt":71551,"src":71552},"graph apache pulsar github stars","\u002Fimgs\u002Fblogs\u002F63b2f630a6c37363bc9ceedc_3.png",[40,71554,71556],{"id":71555},"ecosystem-development","Ecosystem Development",[48,71558,71559,71560,71564],{},"Since the ",[55,71561,71563],{"href":71562},"\u002Fen\u002Fblog\u002Fcommunity\u002F2020-08-24-pulsar-300-contributors","Pulsar project gained its 300th contributor"," last August, the Pulsar ecosystem has seen a number of updates. Below we look at a few key product launches.",[32,71566,71568],{"id":71567},"_1-function-mesh-simplifying-complex-streaming-jobs-in-the-cloud","1. Function Mesh - Simplifying Complex Streaming Jobs in the Cloud",[48,71570,71571,71574],{},[55,71572,29463],{"href":71573},"\u002Fen\u002Fblog\u002Frelease\u002F2021-05-03-function-mesh-open-source"," is an ideal tool for those who are seeking cloud-native serverless streaming solutions. It is a Kubernetes operator that enables users to run Pulsar Functions and connectors natively on Kubernetes, unlocking the full power of Kubernetes’ application deployment, scaling, and management. Function Mesh is also a serverless framework to orchestrate multiple Pulsar Functions and I\u002FO connectors for complex streaming jobs in a simple way.",[32,71576,71578],{"id":71577},"_2-aws-sqs-connector-seamless-integration-between-sqs-and-pulsar","2. AWS SQS Connector - Seamless Integration between SQS and Pulsar",[48,71580,3600,71581,71585],{},[55,71582,71584],{"href":71583},"\u002Fen\u002Fblog\u002Ftech\u002F2021-03-17-announcing-aws-sqs-connector-for-apache-pulsar","SQS connector"," enables users to integrate Pulsar with SQS easily, quickly, and securely. Organizations can move data in and out of Pulsar without writing a single line of code. This connector can run jobs on a single node or for an entire organization, which allows users to build reactive data pipelines to serve their business and operational needs in real-time.",[32,71587,71589],{"id":71588},"_3-cloud-storage-sink-connector-streaming-data-from-pulsar-to-cloud-objects","3. Cloud Storage Sink Connector - Streaming Data from Pulsar to Cloud Objects",[48,71591,71592,71593,71597],{},"Depending on your environment, the ",[55,71594,71596],{"href":71595},"\u002Fen\u002Fblog\u002Ftech\u002F2020-10-20-cloud-storage-sink-connector-251","Cloud Storage sink connector"," can export data by guaranteeing exactly-once delivery semantics to its consumers. It provides applications that export data from Pulsar all the benefits of Pulsar IO, such as fault tolerance, parallelism, elasticity, load balancing, on-demand updates, and much more.",[32,71599,71601],{"id":71600},"_4-mqtt-on-pulsar-mop-bringing-native-mqtt-protocol-support-to-pulsar","4. MQTT-on-Pulsar (MoP) - Bringing Native MQTT Protocol Support to Pulsar",[48,71603,71604,71605,71608],{},"By adding the ",[55,71606,71607],{"href":71149},"MoP protocol handler"," in existing Pulsar clusters, users can migrate their existing MQTT applications and services to Pulsar without modifying the code. This enables MQTT applications to leverage Pulsar’s infinite event stream retention with Apache BookKeeper and tiered storage.",[40,71610,66091],{"id":39646},[48,71612,3600,71613,71618],{},[55,71614,71617],{"href":71615,"rel":71616},"https:\u002F\u002Fwww.na2021.pulsar-summit.org\u002F",[264],"Pulsar Virtual Summit North America 2021"," is taking place on June 16-17th. Keynote speakers include Karthik Ramasamy from Splunk, Ankur Jain from Flipkart, Ankush Goyal from Narvar, Till Rorhrmann from Ververica, and Srikanth Natarajan from Micro Focus.",[48,71620,71621,71624],{},[55,71622,45203],{"href":70535,"rel":71623},[264]," to get the latest Pulsar project and ecosystem updates, use cases, and best practices!",[48,71626,71627,71628,71632,71633,71637],{},"In addition, the community hosts monthly meetups, webinars, and ",[55,71629,71631],{"href":71630},"\u002Fen\u002Facademy","training"," for Pulsar users of all experience levels. ",[55,71634,71636],{"href":34070,"rel":71635},[264],"Sign up for the monthly Pulsar newsletter"," to stay tuned.",{"title":18,"searchDepth":19,"depth":19,"links":71639},[71640,71645,71651],{"id":65963,"depth":19,"text":65964,"children":71641},[71642,71643,71644],{"id":71516,"depth":279,"text":71517},{"id":71529,"depth":279,"text":71530},{"id":71542,"depth":279,"text":71543},{"id":71555,"depth":19,"text":71556,"children":71646},[71647,71648,71649,71650],{"id":71567,"depth":279,"text":71568},{"id":71577,"depth":279,"text":71578},{"id":71588,"depth":279,"text":71589},{"id":71600,"depth":279,"text":71601},{"id":39646,"depth":19,"text":66091},"Pulsar is growing faster than ever. Read about key Pulsar milestones and ecosystem development across the project.","\u002Fimgs\u002Fblogs\u002F63c7fcad6c20797725fa78e6_63b2f6302985bf6100266ee6_top.png",{},{"title":71502,"description":71652},"blog\u002Fpulsar-hits-400th-contributor-passes-kafka-monthly-active-contributors",[302,799],"38gswbbcDrT2G0-k-D0yNahRcqdFu43AsIBkmK3oQ8Q",{"id":71660,"title":71661,"authors":71662,"body":71663,"category":7338,"createdAt":290,"date":71898,"description":71899,"extension":8,"featured":294,"image":71900,"isDraft":294,"link":290,"meta":71901,"navigation":7,"order":296,"path":71902,"readingTime":7986,"relatedResources":290,"seo":71903,"stem":71904,"tags":71905,"__hash__":71906},"blogs\u002Fblog\u002Fpulsar-user-survey-2021-highlights.md","Pulsar User Survey 2021 Highlights",[69353],{"type":15,"value":71664,"toc":71886},[71665,71674,71677,71680,71687,71701,71704,71708,71711,71719,71725,71728,71732,71740,71743,71746,71750,71758,71760,71768,71771,71774,71782,71788,71791,71794,71798,71801,71804,71807,71810,71821,71824,71828,71839,71842,71845,71848,71854,71858,71864,71867,71870,71873,71876,71878,71884],[48,71666,71667,71668,71673],{},"As noted in the ",[55,71669,71672],{"href":71670,"rel":71671},"https:\u002F\u002Fshare.hsforms.com\u002F1519w9knETd2kiCGqRocmwg3x5r4",[264],"2021 Apache Pulsar User Survey Report",", Apache Pulsar adoption and community engagement skyrocketed over the past year.",[48,71675,71676],{},"Key trends driving Pulsar adoption include the move to containers and cloud strategies, the need to solve for unprecedented scale and management complexity, the pivot from a pure streaming workload to unified batch and streaming workloads, and the need to unlock new use cases.",[48,71678,71679],{},"Pulsar’s cloud-native capabilities, unified messaging and streaming, scalability and reliability, and super-set of built-in features that enable new use cases and streamline operations make it uniquely positioned to meet many of today’s emerging needs.",[48,71681,71682,71683,190],{},"In this report, we look at the key takeaways from the ",[55,71684,71686],{"href":71670,"rel":71685},[264],"2021 Apache Pulsar User Report",[321,71688,71689,71692,71695,71698],{},[324,71690,71691],{},"Pulsar in Production and at Scale",[324,71693,71694],{},"Kafka Users Adopt Pulsar",[324,71696,71697],{},"Cloud Native Initiatives and K8s Drive Pulsar Adoption",[324,71699,71700],{},"Pulsar + Flink: Pulsar Continues to Innovate",[48,71702,71703],{},"Below we take a look at each of these highlights in more detail.",[40,71705,71707],{"id":71706},"_1-pulsar-in-production-and-at-scale","1. Pulsar in Production and at Scale",[48,71709,71710],{},"The two most important takeaways from the Pulsar User Survey 2021 are:",[321,71712,71713,71716],{},[324,71714,71715],{},"The growth in the number of companies using Pulsar in production.",[324,71717,71718],{},"The growth in the number of companies using Pulsar at enterprise scale.",[48,71720,71721],{},[384,71722],{"alt":71723,"src":71724},"chart of organisations running pulsar in production between 2020 and 2021 ","\u002Fimgs\u002Fblogs\u002F63b2f2b1cef698b05394a161_1.png",[48,71726,71727],{},"While the increase in Pulsar adoption is significant, the increase in production deployments has seen the most meaningful growth (see graph above). The 2021 Survey Report reveals that 51% of respondents were using Pulsar in production, compared to 31% the year prior. The increase in production use cases demonstrates Pulsar's ability to deliver mission-critical applications in the real world.",[32,71729,71731],{"id":71730},"pulsar-at-scale","Pulsar at Scale",[321,71733,71734,71737],{},[324,71735,71736],{},"Question:  How many messages does your organization process with Pulsar every day?",[324,71738,71739],{},"Response:  12% of the respondents process over one trillion messages per day.",[48,71741,71742],{},"Pulsar has also seen an increase in the number of large scale, enterprise deployments. 12% of respondents shared that their organization processes more than 1 trillion messages per day using Pulsar. Tencent, Splunk, Newland Digital Technology Co Ltd, Kingsoft Cloud, and Pactera are just a handful of the companies who are using Pulsar to process more than 1 trillion messages per day.",[48,71744,71745],{},"The increase in companies running Pulsar at a large scale illustrates its ability to meet the scalability, reliability, and flexibility needs of companies today. Notably, Pulsar is meeting the needs of companies seeking a unified messaging and streaming platform.",[40,71747,71749],{"id":71748},"_2-kafka-users-adopt-pulsar","2. Kafka Users Adopt Pulsar",[321,71751,71752,71755],{},[324,71753,71754],{},"Question:  What other message queues does your organization use in addition to Pulsar?",[324,71756,71757],{},"Response:  68% of respondents use Kafka in addition to Pulsar.",[48,71759,3931],{},[321,71761,71762,71765],{},[324,71763,71764],{},"Question:  If you use connectors, which connectors do you use or plan to use for Pulsar?",[324,71766,71767],{},"Response:  34% of respondents said Kafka on Pulsar (KoP)",[32,71769,71694],{"id":71770},"kafka-users-adopt-pulsar",[48,71772,71773],{},"A major insight from the user survey is the number of Kafka users who are adopting Pulsar. 68% of respondents said that they use Kafka in addition to Pulsar. Given Kafka is an older and more widely adopted technology, we can infer that these are companies who were already using Kafka and then decided to adopt Pulsar (versus Pulsar users who are adopting Kafka).",[48,71775,71776,71777,71781],{},"The figure below from ",[55,71778,71780],{"href":65987,"rel":71779},[264],"API7","(1), demonstrates the increase in Pulsar project engagement. Perhaps even more interesting, it shows that the Apache Pulsar community has surpassed Apache Kafka in monthly active contributors.",[48,71783,71784],{},[384,71785],{"alt":71786,"src":71787},"graph of monthly active contributors of pulsar and kafka since 2017","\u002Fimgs\u002Fblogs\u002F63b2f3b8e255992c939a3493_2.png",[48,71789,71790],{},"The 2021 survey also shows that more than one third of respondents use, or are planning to use, Kafka on Pulsar (KoP). KoP, which was launched in 2020, enables Kafka users to migrate their existing Kafka applications and services to Pulsar without modifying code.",[48,71792,71793],{},"KoP reduces barriers to Pulsar adoption for Kafka users and its popularity reveals that Kafka users are increasingly looking to Pulsar to solve problems and to enable use cases they are not able to achieve with Kafka.",[32,71795,71797],{"id":71796},"kafka-and-pulsar-serve-different-use-cases","Kafka and Pulsar Serve Different Use Cases",[48,71799,71800],{},"The high percentage of respondents (68%) using both Kafka and Pulsar may seem counterintuitive, as the technologies serve many of the same use cases. But, in fact, there are distinct differences in Pulsar and Kafka’s use cases and capabilities.",[48,71802,71803],{},"Kafka was built to support data pipelines and large scale data movement to centralized locations. Pulsar, by contrast, was created to serve both messaging and data streaming use cases that require handling more topics with complex topologies and sophisticated consumption models.",[48,71805,71806],{},"Pulsar’s built-in offering of multi-tenancy, geo-replication, and scalability enable new use cases and capabilities that Kafka cannot match. The top use cases are: (1) Message Queues, (2) Pub\u002FSub, (3) Data Pipelines, (4) Streaming Processing, (5) Microservices\u002FEvent Sourcing, (6) Data Integration, (7) Change Data Capture, and (8) Streaming ETL. This list demonstrates Pulsar’s ability to solve for a broader range of use cases.",[48,71808,71809],{},"Below we look at some Pulsar adoption stories from the past 12 months:",[321,71811,71812,71815,71818],{},[324,71813,71814],{},"A key Kafka-to-Pulsar adoption story comes from Splunk, a company that used Kafka in production environments for years. At the Pulsar Summit 2020, Karthik Ramasamy shared details on Splunk's decision to adopt Pulsar for the Splunk DSP, an analytics product which handles billions of events per day. You can find the full details in this video on \"Why Splunk Chose Pulsar\".",[324,71816,71817],{},"Tencent adopted Pulsar to solve issues with scale and reliability. Pulsar was first adopted to power their billing platform, Midas, and then, Pulsar adoption spread to Tencent’s Federated Learning Platform and to Tencent’s Gaming Department, where it was used to replace Kafka for its logging pipeline. You can learn more about Tencent’s adoption of Pulsar here.",[324,71819,71820],{},"Iterable is another example of Pulsar adoption spreading. Iterable first adopted Pulsar to replace one messaging system, RabbitMQ, and they are now in the process of using Pulsar to replace Kafka and Amazon SQS. You can read the full story here.",[48,71822,71823],{},"The survey report shows that once it is adopted, Pulsar adoption expands across organizations. Tencent and Iterable are just two examples of Pulsar adoption expanding across an organization. When asked, “Will your organization build more applications on Pulsar in 2021”? 66% said “Yes” and another 10% said “Under Consideration.” That means 76% of Pulsar adopters are considering or planning to expand their Pulsar adoptions.",[40,71825,71827],{"id":71826},"_3-cloud-native-initiatives-and-k8s-drive-adoption-of-pulsar","3. Cloud Native Initiatives and K8s Drive Adoption of Pulsar",[321,71829,71830,71833,71836],{},[324,71831,71832],{},"80% of Pulsar users deploy in a cloud environment",[324,71834,71835],{},"62% of Pulsar users deploy on Kubernetes",[324,71837,71838],{},"49% noted Pulsar’s “cloud native” capabilities as one of the top reasons they chose to adopt Pulsar",[48,71840,71841],{},"The adoption of Pulsar is being driven by a larger industry move to the cloud and Kubernetes. As part of this move, organizations are looking for technologies that run in the cloud, scale well, and can leverage and run well on top of Kubernetes.",[48,71843,71844],{},"Technologies with single tenant systems, monolithic architectures, and that lack geo-replication and multi-cloud capabilities are not able to meet the needs of modern data applications. As a result, companies are increasingly looking to adopt cloud-native technologies, like Pulsar, to meet their business needs.",[48,71846,71847],{},"The move to Kubernetes is not a simple lift and shift. This transition requires new development models, new ways of working, and is causing companies to re-evaluate how existing technologies will be deployed and managed in the cloud. For example, technologies such as Kafka, that were designed before Cloud was commonplace can be difficult to map to the capabilities of cloud and Kubernetes. These factors are leading companies to best-of-breed cloud-native technologies, including Pulsar.",[48,71849,71850],{},[384,71851],{"alt":71852,"src":71853},"illustration of VM early cloud era and containers modern cloud era","\u002Fimgs\u002Fblogs\u002F63b2f3b872c4f0a0a2382730_3.png",[40,71855,71857],{"id":71856},"_4-pulsar-flink-pulsar-continues-to-innovate","4. Pulsar + Flink: Pulsar Continues to Innovate",[48,71859,71860],{},[384,71861],{"alt":71862,"src":71863},"chart organisation using pulsar + flink in 2020 and 2021","\u002Fimgs\u002Fblogs\u002F63b2f3b873ef688350d4646d_4.png",[48,71865,71866],{},"Companies today are looking for a complete streaming solution and Pulsar’s integration with Flink is significant because it creates another differentiator for the Pulsar community. From the 2020 Survey to the 2021 Survey, the number of Pulsar + Flink use cases almost doubled. As noted above, the adoption of Pulsar is often driven by companies seeking the ability to achieve new use cases and the Pulsar + Flink integration is an example of this.",[48,71868,71869],{},"Stream processors, such as Kafka Streams, are adept at relatively simple processing of streaming data and computing answers close to real-time, but they are not a good fit for processing large historical datasets or datasets that require many joins and complex analysis. Many organizations need to run both batch and streaming data processors in order to gain the insights they need for their business, but maintaining multiple systems is expensive and complex.",[48,71871,71872],{},"More recently, systems have been developed which can do both batch and stream processing. Apache Flink is one example. Currently, Flink is used for stream processing with both Kafka and Pulsar. However, Flink's batch capabilities are not particularly compatible with Kafka as Kafka is only able to deliver data in streams, making it too slow for most batch workloads.",[48,71874,71875],{},"Pulsar's tiered storage model provides the batch storage capabilities needed to support batch processing in Flink. With Flink + Pulsar, companies are able to query both historical and real-time data quickly and easily, unlocking a unique competitive advantage.",[40,71877,52473],{"id":52472},[48,71879,71880,71881],{},"(1) “Monthly Active Contributors.” API7, 10 Jun, 2021, ",[55,71882,65987],{"href":65987,"rel":71883},[264],[48,71885,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":71887},[71888,71891,71895,71896,71897],{"id":71706,"depth":19,"text":71707,"children":71889},[71890],{"id":71730,"depth":279,"text":71731},{"id":71748,"depth":19,"text":71749,"children":71892},[71893,71894],{"id":71770,"depth":279,"text":71694},{"id":71796,"depth":279,"text":71797},{"id":71826,"depth":19,"text":71827},{"id":71856,"depth":19,"text":71857},{"id":52472,"depth":19,"text":52473},"2021-06-11","In this report, we look at the key takeaways from the 2021 Apache Pulsar User Report and the major trends driving change across the messaging and streaming landscape.","\u002Fimgs\u002Fblogs\u002F63c7fcc9f98d44856d02eb16_63b2f2b167ae66abcb8b12a2_top.png",{},"\u002Fblog\u002Fpulsar-user-survey-2021-highlights",{"title":71661,"description":71899},"blog\u002Fpulsar-user-survey-2021-highlights",[35559,821],"SLGfDHsZRqFvP2PP4xnYCZbEeG8eTYW35jTxad0E1l0",{"id":71908,"title":71909,"authors":71910,"body":71911,"category":7338,"createdAt":290,"date":71995,"description":71996,"extension":8,"featured":294,"image":71997,"isDraft":294,"link":290,"meta":71998,"navigation":7,"order":296,"path":71999,"readingTime":11180,"relatedResources":290,"seo":72000,"stem":72001,"tags":72002,"__hash__":72003},"blogs\u002Fblog\u002F2021-apache-pulsar-user-report-announcement.md","2021 Apache Pulsar User Report Announcement",[806,69353],{"type":15,"value":71912,"toc":71990},[71913,71916,71919,71921,71924,71931,71937,71940,71943,71949,71955,71957,71960,71983,71985],[48,71914,71915],{},"We’re excited to announce the 2021 Apache Pulsar User Report is now available. The Apache Pulsar PMC launched their annual survey in 2020. This year, we are excited to see the growth in adoption and engagement across the Apache Pulsar project and to see the top trends, use cases, and insights shared directly from the community.",[48,71917,71918],{},"You can read the Executive Summary below and download the full report.",[40,71920,45531],{"id":45530},[48,71922,71923],{},"Apache Pulsar adoption and community engagement skyrocketed over the past year. The community has seen an influx of new adoption as companies look to Pulsar’s cloud-native capabilities, unified messaging and streaming, scalability and reliability, and a super-set of built-in features to enable new use cases and streamline operations.",[48,71925,71926,71927,71930],{},"The biggest trends driving Pulsar adoption are the move to containers and cloud strategies, the need to solve for unprecedented scale and management complexity, the pivot from a pure streaming workload to unified batch and streaming workloads, and the need to unlock new use cases that older messaging and streaming systems are not able to support. In fact, the figure below from ",[55,71928,71780],{"href":65987,"rel":71929},[264],"(1), demonstrates the Apache Pulsar community surpassed Apache Kafka in monthly active contributors, and the Pulsar engagement continues to grow.",[48,71932,71933],{},[384,71934],{"alt":71935,"src":71936},"graph represent apache pulsar and apache kafka active contributors since 2017","\u002Fimgs\u002Fblogs\u002F63b2f03cc7592046e47fcb5c_1.png",[48,71938,71939],{},"To better understand the growth in adoption and to learn how organizations are leveraging the project, the Apache Pulsar Project Management Committee (PMC) sent a survey to Pulsar users between November 2020 and January 2021. More than 260 Pulsar users responded.",[48,71941,71942],{},"90% of survey respondents hold technical roles as architects, data scientists, developers, engineers, and DevOps engineers. They represent more than 20 industries; including computer software\u002Fhardware, Internet, finance, e-commerce, business services, education, to name a few, and span North America, Europe, and Asia.",[48,71944,71945,71946,190],{},"This report details insights and use cases on how organizations are deploying Pulsar today. Because this year’s survey is the second annual Pulsar user survey, we have the added opportunity to compare the results with last year’s report and highlight key trends. Download the full report ",[55,71947,267],{"href":71670,"rel":71948},[264],[48,71950,71951,71952,190],{},"In addition to the Apache Pulsar User Survey Report 2021, you can also check out the Pulsar User Survey 2021 Highlights. In this blog we provide a deep dive into the top trends for Pulsar and across the messaging and streaming ecosystem. You can find this blog ",[55,71953,267],{"href":71954},"\u002Fen\u002Fblog\u002Fcommunity\u002F2021-06-11-pulsar-user-survey-2021-highlights",[40,71956,69725],{"id":69724},[48,71958,71959],{},"To stay up-to-date on Pulsar developments and events:",[321,71961,71962,71969,71976],{},[324,71963,71964,71965],{},"Subscribe to the ",[55,71966,71968],{"href":34070,"rel":71967},[264],"Monthly Pulsar Newsletter",[324,71970,71971,71972],{},"Join ",[55,71973,71975],{"href":31692,"rel":71974},[264],"Apache Pulsar on Slack",[324,71977,71978,71979],{},"Check out past ",[55,71980,71982],{"href":35357,"rel":71981},[264],"Pulsar Summit videos",[40,71984,52473],{"id":52472},[48,71986,71880,71987],{},[55,71988,65987],{"href":65987,"rel":71989},[264],{"title":18,"searchDepth":19,"depth":19,"links":71991},[71992,71993,71994],{"id":45530,"depth":19,"text":45531},{"id":69724,"depth":19,"text":69725},{"id":52472,"depth":19,"text":52473},"2021-06-10","We’re excited to announce the 2021 Apache Pulsar User Report is now available. Read about the growth in adoption and engagement across the Apache Pulsar project!","\u002Fimgs\u002Fblogs\u002F63c7fcd86c207947b4fa79fc_63b2f03ccb4be280f8ad5496_top.png",{},"\u002Fblog\u002F2021-apache-pulsar-user-report-announcement",{"title":71909,"description":71996},"blog\u002F2021-apache-pulsar-user-report-announcement",[821],"-K-h42mCoYTKCMPQXBUNhqeegqhnICPAS-eDaxA-t3s",{"id":72005,"title":72006,"authors":72007,"body":72008,"category":7338,"createdAt":290,"date":72120,"description":72121,"extension":8,"featured":294,"image":72122,"isDraft":294,"link":290,"meta":72123,"navigation":7,"order":296,"path":72124,"readingTime":11508,"relatedResources":290,"seo":72125,"stem":72126,"tags":72127,"__hash__":72128},"blogs\u002Fblog\u002Fkeynotes-pulsar-virtual-summit-north-america-2021.md","Keynotes Announced for Pulsar Virtual Summit North America 2021",[69353],{"type":15,"value":72009,"toc":72110},[72010,72016,72019,72025,72031,72033,72036,72040,72043,72046,72049,72052,72056,72059,72062,72066,72069,72072,72076,72079,72082,72085,72089,72092,72095,72097,72104],[48,72011,72012,72015],{},[55,72013,71617],{"href":71615,"rel":72014},[264]," is just around the corner! The event, hosted by StreamNative and Splunk, will be held online June 16-17th.",[48,72017,72018],{},"The conference will feature Apache Pulsar adopters, committers, and contributors from around the globe. Talks will include tech deep dives, adoption stories, best practices, and insights into Pulsar’s global adoption and thriving community. Here are a handful of the companies presenting at the summit:",[48,72020,72021],{},[384,72022],{"alt":72023,"src":72024},"multiple logoof companies like tencent, narvr or flipkart","\u002Fimgs\u002Fblogs\u002F63b2ef250ad4c106f5b63374_logo.png",[48,72026,72027,71624],{},[55,72028,72030],{"href":70535,"rel":72029},[264],"Save your seat NOW",[40,72032,40525],{"id":40524},[48,72034,72035],{},"This year the Pulsar Virtual Summit North America 2021 will feature 5 keynotes and 33 breakout sessions. Below is a sneak peak into the insights and use cases from our keynotes.",[32,72037,72039],{"id":72038},"_1-apache-pulsar-why-unified-messaging-and-streaming-is-the-future","1. Apache Pulsar: Why Unified Messaging and Streaming Is the Future",[48,72041,72042],{},"Presented by Sijie Guo, Pulsar PMC Member and CEO @ StreamNative, and Matteo Merli, Pulsar PMC Chair and CTO @ StreamNative",[48,72044,72045],{},"Data insights and data-driven strategies create the competitive differentiators companies thrive off today. The need for unified messaging and streaming has never been more apparent.",[48,72047,72048],{},"Pulsar started with the goal of building a global, geo-replicated infrastructure to serve Yahoo!’s messaging needs. With the increased need to process both business events (such as payment request, billing request) and operational events (such as log data, click events, etc), the team at Yahoo! set out to build a true unified infrastructure platform to handle all in-motion data. That technology became Apache Pulsar.",[48,72050,72051],{},"In this talk, Matteo Merli and Sijie Guo will dive into the landscape of unified messaging and streaming, how Pulsar helps companies achieve this vision, and what the future of Pulsar will look like.",[32,72053,72055],{"id":72054},"_2-advanced-stream-processing-with-flink-and-pulsar","2. Advanced Stream Processing with Flink and Pulsar",[48,72057,72058],{},"Presented by Till Rohrmann, Flink PMC Member and Engineering Lead @ Ververica, and Addison Higham, Pulsar Committer and Chief Architect @ StreamNative",[48,72060,72061],{},"In this talk, Till Rohrmann and Addison Higham discuss how Flink allows for ambitious stream processing workflows and how Pulsar and Flink enable new capabilities that push forward the state-of-the-art in streaming. They will also share upcoming features and new capabilities in the integrations between Flink and Pulsar and how these two communities are working together to truly advance the power of stream processing.",[32,72063,72065],{"id":72064},"_3-scaling-apache-pulsar-to-10-petabytesday","3. Scaling Apache Pulsar to 10 Petabytes\u002FDay",[48,72067,72068],{},"Presented by Karthik Ramasamy, Senior Director of Engineering @ Splunk",[48,72070,72071],{},"Pulsar is used by a portfolio of products at Splunk for stream processing of different types of data, including metrics and logs. In this talk, Karthik Ramasamy will share how Splunk helped a flagship customer scale a Pulsar deployment to handle 10 PB\u002Fday in a single cluster. He will talk about the journey, the challenges faced, and the trade-offs made to scale Pulsar and operate it reliably and stably in Google Cloud Platform (GCP).",[32,72073,72075],{"id":72074},"_4-why-micro-focus-chose-pulsar-for-data-ingestion","4. Why Micro Focus Chose Pulsar for Data Ingestion",[48,72077,72078],{},"Presented by Srikanth Natarajan, Micro Focus Fellow and CTO @ the ITOM Product Group",[48,72080,72081],{},"Modern IT and application environments are increasingly complex, transitioning to cloud, and large in scale. The managed resources, services and applications in these environments generate tremendous data that needs to be observed, consumed and analyzed in real time (or later) by management tools to create insights and to drive operational actions and decisions.",[48,72083,72084],{},"In this talk, Srikanth Natarajan will share Micro Focus’ adoption story of Pulsar, including being a part of the Apache Pulsar community, working with the StreamNative team, and the lessons learned in their Pulsar journey.",[32,72086,72088],{"id":72087},"_5-how-narvar-uses-pulsar-to-power-the-post-purchase-experience","5. How Narvar Uses Pulsar to Power the Post-Purchase Experience",[48,72090,72091],{},"Presented by Ankush Goyal, VP and Head of Engineering @ Narvar",[48,72093,72094],{},"Narvar provides a customer experience platform for some of the largest retailers on the planet - from Levi’s, Patagonia, Home Depot, to Sonos - and its technology is used by millions of users every day. Narvar’s platform is built with pub-sub messaging at its core, making reliability, scalability, maintainability, and flexibility business critical. In this talk, Ankush Goyal will discuss why Narvar adopted Pulsar and how Narvar is leveraging Pulsar today.",[40,72096,68614],{"id":16948},[48,72098,72099,72100,72103],{},"Don’t miss the best opportunity to learn from top Pulsar thought leaders. ",[55,72101,57745],{"href":70535,"rel":72102},[264]," to participate and connect with the Pulsar community at the summit. We look forward to seeing you next Wednesday!",[48,72105,72106,72107,190],{},"You can find the full schedule ",[55,72108,267],{"href":69952,"rel":72109},[264],{"title":18,"searchDepth":19,"depth":19,"links":72111},[72112,72119],{"id":40524,"depth":19,"text":40525,"children":72113},[72114,72115,72116,72117,72118],{"id":72038,"depth":279,"text":72039},{"id":72054,"depth":279,"text":72055},{"id":72064,"depth":279,"text":72065},{"id":72074,"depth":279,"text":72075},{"id":72087,"depth":279,"text":72088},{"id":16948,"depth":19,"text":68614},"2021-06-09","Pulsar Summit is just around the corner. Here is a sneak peak into the insights and use cases from our keynotes.","\u002Fimgs\u002Fblogs\u002F63c7fce8e64a0f525740eb9e_63b2ef25f60c05573c95c977_top.png",{},"\u002Fblog\u002Fkeynotes-pulsar-virtual-summit-north-america-2021",{"title":72006,"description":72121},"blog\u002Fkeynotes-pulsar-virtual-summit-north-america-2021",[5376,821],"KIMkxBo7Mif0VeBDMsmoEDxZnCfvWRH5qTfxfQ9vdu0",{"id":72130,"title":72131,"authors":72132,"body":72133,"category":3550,"createdAt":290,"date":72200,"description":72201,"extension":8,"featured":294,"image":72202,"isDraft":294,"link":290,"meta":72203,"navigation":7,"order":296,"path":72204,"readingTime":7986,"relatedResources":290,"seo":72205,"stem":72206,"tags":72207,"__hash__":72208},"blogs\u002Fblog\u002Fmatteo-merli-apache-pulsar-pmc-chair-joins-streamnative-cto.md","Matteo Merli, Apache Pulsar PMC Chair, Joins StreamNative as CTO",[69353],{"type":15,"value":72134,"toc":72195},[72135,72138,72141,72145,72148,72151,72154,72157,72161,72164,72167,72170,72172,72193],[48,72136,72137],{},"StreamNative welcomes Matteo Merli to the leadership team as the Chief Technology Officer. Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies. Matteo joins at a time when StreamNative is growing rapidly and working to transform customers’ streaming and messaging capabilities with StreamNative Cloud, StreamNative Professional Services, and, soon, StreamNative Platform.",[48,72139,72140],{},"Matteo Merli was one of the original creators of Apache Pulsar at Yahoo! and has been working on the Apache Pulsar project in the decade since. Matteo’s deep industry experience in messaging, streaming, and cloud-native technologies will help accelerate StreamNative’s vision of building a next-generation messaging and streaming platform to bring real-time value to the enterprise.",[40,72142,72144],{"id":72143},"why-pulsar","Why Pulsar?",[48,72146,72147],{},"“Today, companies need unified messaging and streaming capabilities in order to run their business. The line between messaging and streaming use cases is blurring and Pulsar is uniquely positioned to serve both use cases,” Matteo shares.",[48,72149,72150],{},"Matteo is excited about the future of Pulsar. “It is reaching more and more users and becoming easier to adopt and deploy. Pulsar continues to innovate and evolve as a technology, as a project and as a community. New developments, such as updates to tiered storage, functions, connectors, and transactions are enabling new use cases and helping to streamline Pulsar adoption and operations.”",[48,72152,72153],{},"Today, Pulsar has global adoption and a thriving community. A recent survey from the Pulsar PMC revealed a jump in the number of Pulsar deployments from 2019 to 2020.",[48,72155,72156],{},"Additionally, there was a sharp increase in the number of companies using Pulsar at enterprise scale. Tencent, Splunk, Newland Digital Technology Co Ltd, Kingsoft Cloud, and Pactera are just some of the companies using Pulsar to process more than 1 trillion messages a day. These use cases demonstrate Pulsar's ability to deliver mission-critical applications in the real world.",[40,72158,72160],{"id":72159},"why-streamnative","Why StreamNative?",[48,72162,72163],{},"Matteo notes the teams’ Pulsar experience as one of the top reasons he joined. “The StreamNative team has experience operating some of the largest and most diverse Pulsar use cases in the world. In addition to their experience running StreamNative Cloud, the team includes a number of Pulsar committers and contributors, they are active in the Pulsar community, and are helping companies around the globe to adopt Pulsar.”",[48,72165,72166],{},"StreamNative is dedicated to supporting the Pulsar community. Today, the company hosts 5-10 Pulsar events monthly and is the host of the three global summits taking place this year, including Pulsar Summit North America (June 16-17th), Pulsar Summit Europe (September 23rd), and Pulsar Summit Asia (November 20-21st). Global support for the project has helped to bolster the community and last month the Pulsar community hit the 400 contributor mark.",[48,72168,72169],{},"The StreamNative offering, including StreamNative Cloud, StreamNative Professional Services, and, soon to be released, StreamNative Platform, provides turnkey solutions that companies can leverage to successfully deploy Pulsar in production in any environment or any cloud. Matteo added, “StreamNative is building the most compelling data streaming and messaging platform that will set the foundation to solve the data problems of tomorrow. We have a shared vision.”",[40,72171,69725],{"id":69724},[321,72173,72174,72182,72187],{},[324,72175,72176,72177,190],{},"You can connect to Matteo Merli on Twitter ",[55,72178,72181],{"href":72179,"rel":72180},"https:\u002F\u002Ftwitter.com\u002Fmerlimat",[264],"@merlimat",[324,72183,72184,72185,190],{},"To learn more about StreamNative Cloud, a fully managed SaaS offering of StreamNative Platform, or StreamNative Platform, a self-managed software offering of Apache Pulsar, you can contact us ",[55,72186,267],{"href":57778},[324,72188,72189,72192],{},[55,72190,10265],{"href":34070,"rel":72191},[264]," to receive the Pulsar Newsletter and stay up to date on all things Pulsar.",[48,72194,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":72196},[72197,72198,72199],{"id":72143,"depth":19,"text":72144},{"id":72159,"depth":19,"text":72160},{"id":69724,"depth":19,"text":69725},"2021-06-08","StreamNative welcomes Matteo Merli to the leadership team as the Chief Technology Officer. Matteo’s experience will help accelerate StreamNative’s vision of building a next-generation messaging and streaming platform to bring real-time value to the enterprise.","\u002Fimgs\u002Fblogs\u002F63c7fd010cb4c4abf6b7ec0e_63b2edc1844c6fcc7700b38c_top.png",{},"\u002Fblog\u002Fmatteo-merli-apache-pulsar-pmc-chair-joins-streamnative-cto",{"title":72131,"description":72201},"blog\u002Fmatteo-merli-apache-pulsar-pmc-chair-joins-streamnative-cto",[3550,821,303],"37-IbYmdYyyRkvzVhX3o_XHt-1pMfBxYS1tCIyENGVY",{"id":72210,"title":72211,"authors":72212,"body":72213,"category":821,"createdAt":290,"date":72200,"description":72490,"extension":8,"featured":294,"image":72491,"isDraft":294,"link":290,"meta":72492,"navigation":7,"order":296,"path":72493,"readingTime":7986,"relatedResources":290,"seo":72494,"stem":72495,"tags":72496,"__hash__":72497},"blogs\u002Fblog\u002Fnew-apache-pulsar-2-6-4.md","What’s New in Apache Pulsar 2.6.4",[48575,61300],{"type":15,"value":72214,"toc":72478},[72215,72218,72221,72224,72250,72257,72261,72265,72274,72276,72279,72281,72289,72293,72295,72304,72306,72309,72311,72314,72320,72322,72325,72327,72330,72339,72341,72344,72346,72349,72358,72360,72363,72365,72368,72374,72376,72379,72381,72384,72393,72395,72398,72400,72403,72412,72414,72417,72419,72422,72424,72430,72432,72435,72437,72440,72444,72453,72455,72458,72460,72463,72465,72476],[40,72216,72211],{"id":72217},"whats-new-in-apache-pulsar-264",[48,72219,72220],{},"We are excited to see the Apache Pulsar community has successfully released the 2.6.4 version! 10 contributors provided improvements and bug fixes that contributed to 16 PRs.",[48,72222,72223],{},"Highlights:",[321,72225,72226,72234,72242],{},[324,72227,72228,72229,67128],{},"Broker no longer delivers old messages after a topic is closed (",[55,72230,72233],{"href":72231,"rel":72232},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8634",[264],"#8634",[324,72235,72236,72237,67128],{},"AWS credentials are refreshed after expiry (",[55,72238,72241],{"href":72239,"rel":72240},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9387",[264],"#9387",[324,72243,72244,72245,67128],{},"Pulsar identifies when individual message deletes cause an unsynced cursor (",[55,72246,72249],{"href":72247,"rel":72248},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9732",[264],"#9732",[48,72251,69014,72252,190],{},[55,72253,72256],{"href":72254,"rel":72255},"http:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#264-mdash-2021-06-02-a-id264a",[264],"Pulsar 2.6.4 Release Notes",[40,72258,72260],{"id":72259},"notable-enhancement","Notable enhancement",[32,72262,72264],{"id":72263},"c-client","C++ client",[3933,72266,72268,72269,67128],{"id":72267},"c-client-supports-multiple-topic-subscriptions-across-multiple-namespaces-9520","C++ client supports multiple topic subscriptions across multiple namespaces (",[55,72270,72273],{"href":72271,"rel":72272},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9520",[264],"#9520",[225,72275,57576],{"id":44661},[48,72277,72278],{},"Previously, you could not subscribe to different topics on different namespaces.",[225,72280,57583],{"id":57582},[321,72282,72283,72286],{},[324,72284,72285],{},"Move the check for namespace in MultiTopicsConsumerImpl to PatternMultiTopicsConsumerImpl that uses a regex subscription.",[324,72287,72288],{},"Fix the existing tests for subscriptions on topics across different namespaces.",[40,72290,72292],{"id":72291},"notable-bug-fix","Notable bug fix",[32,72294,61065],{"id":61064},[3933,72296,72298,72299,67128],{"id":72297},"pulsar-guarantees-security-for-clients-using-jwt-9172","Pulsar guarantees security for clients using JWT (",[55,72300,72303],{"href":72301,"rel":72302},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9172",[264],"#9172",[225,72305,57576],{"id":57598},[48,72307,72308],{},"Previously, it was possible for attackers to connect to Pulsar instances because the signature of the JWT was not validated when the token was set to none.",[225,72310,57583],{"id":57612},[48,72312,72313],{},"Modified JWT to use parseClaimsJws instead of parse to get the token objects. Now, parseClaimsJws guarantees the correct security model for parsing signed JWTs.",[3933,72315,72244,72317,67128],{"id":72316},"pulsar-identifies-when-individual-message-deletes-cause-an-unsynced-cursor-9732",[55,72318,72249],{"href":72247,"rel":72319},[264],[225,72321,57576],{"id":57632},[48,72323,72324],{},"Previously, cursors were not being flushed when acknowledgements caused a dirty cursor. Instead of deleting the acknowledged messages, messages were redelivered.",[225,72326,57583],{"id":57638},[48,72328,72329],{},"Fixed code to mark the individual acknowledgements and automatically trigger the flush of dirty cursors.",[3933,72331,72333,72334,67128],{"id":72332},"pulsar-can-expire-a-range-of-messages-9083","Pulsar can expire a range of messages (",[55,72335,72338],{"href":72336,"rel":72337},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9083",[264],"#9083",[225,72340,57576],{"id":57653},[48,72342,72343],{},"Previously, only a single message expired after an expiry check. As a result, many expired messages remained in a subscription and were delivered to consumers after the expiry time.",[225,72345,57583],{"id":57659},[48,72347,72348],{},"Modified OpFindNewest to jump to a valid position, which allows PersistentMessageExpiryMonitor to find the best range of messages to expire.",[3933,72350,72352,72353,67128],{"id":72351},"pulsar-allows-manual-forced-topic-deletion-after-removing-non-durable-subscriptions-7356","Pulsar allows manual (forced) topic deletion after removing non-durable subscriptions (",[55,72354,72357],{"href":72355,"rel":72356},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7356",[264],"#7356",[225,72359,57576],{"id":57674},[48,72361,72362],{},"Previously, during the removal of non-durable subscriptions, there was a race condition that left a topic in a state where you could not delete it until it was unloaded or reloaded.",[225,72364,57583],{"id":57725},[48,72366,72367],{},"Fixed the race condition by setting the topic fence before performing any delete operations and reverting the topic state after the delete operations.",[3933,72369,72228,72371,67128],{"id":72370},"broker-no-longer-delivers-old-messages-after-a-topic-is-closed-8634",[55,72372,72233],{"href":72231,"rel":72373},[264],[225,72375,57576],{"id":57684},[48,72377,72378],{},"Previously, it was possible to re-deliver very old messages if a topic was not gracefully closed. The cursor rolled back to the last persisted position and triggered the re-delivery of those messages.",[225,72380,57583],{"id":61450},[48,72382,72383],{},"Fixed the redelivery of messages by setting a time-bound period after which all cursor updates are flushed on the disk.",[3933,72385,72387,72388,67128],{"id":72386},"batch-index-acknowledgement-data-is-no-longer-persisted-9504","Batch index acknowledgement data is no longer persisted (",[55,72389,72392],{"href":72390,"rel":72391},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9504",[264],"#9504",[225,72394,57576],{"id":61462},[48,72396,72397],{},"Previously, the batch index acknowledgement data persisted because batchDeletedIndexInfoBuilder generated the batch index acknowledgement data but did not clear the current set before adding the delete set.",[225,72399,57583],{"id":61468},[48,72401,72402],{},"Fixed by clearing the delete set before adding a new delete set.",[3933,72404,72406,72407,67128],{"id":72405},"closed-ledger-deletes-after-expiration-9136","Closed ledger deletes after expiration (",[55,72408,72411],{"href":72409,"rel":72410},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9136",[264],"#9136",[225,72413,57576],{"id":61483},[48,72415,72416],{},"Previously, a closed ledger (with no incoming traffic) could fail to delete after expiring because the read position of the cursor still points to the last entry of the closed ledger.",[225,72418,57583],{"id":61489},[48,72420,72421],{},"Updated behavior when closing the current ledger. Now, when the cursor's mark-delete position points to the last entry of the current ledger, the read position is moved to the newly created ledger.",[32,72423,36160],{"id":31572},[3933,72425,72236,72427,67128],{"id":72426},"aws-credentials-are-refreshed-after-expiry-9387",[55,72428,72241],{"href":72239,"rel":72429},[264],[225,72431,57576],{"id":61504},[48,72433,72434],{},"Previously, expired AWS credentials were reused. With the refactor of Azure support, a regression occurred where the AWS credentials are fetched once and then used through the entire process.",[225,72436,57583],{"id":61510},[48,72438,72439],{},"The AWS credential provider chain takes care of the credential refresh. When integrating with JClouds, you still need to return a new set of credentials each time.",[32,72441,72443],{"id":72442},"java-client","Java client",[3933,72445,72447,72448,67128],{"id":72446},"compression-applied-during-schema-preparation-9396","Compression applied during schema preparation (",[55,72449,72452],{"href":72450,"rel":72451},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9396",[264],"#9396",[225,72454,57576],{"id":61525},[48,72456,72457],{},"Previously, compression was not applied during deferred schema preparation and the consumer could receive an uncompressed message and then fail.",[225,72459,57583],{"id":61531},[48,72461,72462],{},"Fixed by enforcing compression during the schema preparation.",[40,72464,39647],{"id":39646},[48,72466,57767,72467,69317,72470,72473,72474,69324],{},[55,72468,57771],{"href":53730,"rel":72469},[264],[55,72471,3550],{"href":61568,"rel":72472},[264]," in which Pulsar 2.6.4 changes are shipped! Moreover, we offer technical consulting and expert training to help get your organization started. As always, we are highly responsive to your feedback. Feel free to ",[55,72475,24379],{"href":57778},[48,72477,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":72479},[72480,72481,72484,72489],{"id":72217,"depth":19,"text":72211},{"id":72259,"depth":19,"text":72260,"children":72482},[72483],{"id":72263,"depth":279,"text":72264},{"id":72291,"depth":19,"text":72292,"children":72485},[72486,72487,72488],{"id":61064,"depth":279,"text":61065},{"id":31572,"depth":279,"text":36160},{"id":72442,"depth":279,"text":72443},{"id":39646,"depth":19,"text":39647},"We are excited to see the Apache Pulsar community has successfully released the 2.6.4 version! 10 contributors provided improvements and bug fixes that contributed to 16 PRs. Let's walk through the most noteworthy changes!","\u002Fimgs\u002Fblogs\u002F63c7fcf60f6d0c84f78b7fa0_63b2ee365cd67e2f34ac28e1_264-top.jpeg",{},"\u002Fblog\u002Fnew-apache-pulsar-2-6-4",{"title":72211,"description":72490},"blog\u002Fnew-apache-pulsar-2-6-4",[302,821],"cZWvtubdpJI5v1Z23J_j0EmQTP3Mpo7hyXFf0hNg7BM",{"id":72499,"title":58886,"authors":72500,"body":72501,"category":821,"createdAt":290,"date":73983,"description":73984,"extension":8,"featured":294,"image":73985,"isDraft":294,"link":290,"meta":73986,"navigation":7,"order":296,"path":73987,"readingTime":73988,"relatedResources":290,"seo":73989,"stem":73990,"tags":73991,"__hash__":73992},"blogs\u002Fblog\u002Fpulsar-isolation-part-ii-separate-pulsar-clusters.md",[58855,61300],{"type":15,"value":72502,"toc":73970},[72503,72506,72513,72524,72527,72530,72544,72548,72551,72554,72559,72562,72573,72579,72583,72596,72599,72605,72618,72621,72627,72631,72642,72644,72650,72660,72662,72668,72689,72695,72703,72705,72711,72714,72716,72722,72725,72731,72746,72752,72759,72761,72767,72771,72780,72785,72791,72796,72802,72808,72810,72816,72823,72825,72831,72842,72848,72853,72859,72861,72866,72876,72879,72885,72890,72896,72901,72903,72909,72912,72915,72920,72932,72934,72940,72943,72945,72951,72953,72959,72964,72969,72971,72977,72982,72992,72994,73000,73002,73004,73010,73012,73015,73021,73026,73028,73034,73036,73039,73045,73050,73052,73058,73060,73063,73069,73073,73076,73081,73083,73089,73091,73093,73099,73101,73107,73112,73114,73120,73122,73125,73131,73136,73142,73147,73149,73155,73158,73160,73166,73168,73171,73177,73180,73184,73187,73189,73193,73196,73201,73203,73209,73211,73214,73220,73225,73227,73233,73235,73237,73243,73245,73248,73254,73256,73262,73264,73267,73273,73278,73281,73284,73290,73295,73301,73307,73314,73320,73323,73325,73331,73333,73339,73344,73346,73352,73354,73357,73359,73364,73366,73369,73375,73378,73380,73385,73388,73390,73396,73400,73405,73408,73413,73415,73421,73423,73425,73431,73433,73436,73442,73447,73449,73455,73457,73460,73466,73469,73471,73477,73479,73485,73489,73493,73496,73501,73503,73509,73511,73513,73519,73524,73527,73533,73538,73540,73546,73551,73556,73558,73564,73569,73571,73577,73579,73582,73588,73593,73595,73598,73604,73607,73610,73616,73619,73625,73628,73630,73636,73638,73641,73647,73652,73657,73659,73665,73669,73671,73676,73678,73681,73687,73692,73694,73700,73702,73705,73711,73716,73718,73724,73726,73729,73735,73739,73744,73747,73752,73758,73762,73764,73769,73773,73775,73780,73782,73785,73791,73796,73806,73808,73814,73816,73822,73826,73828,73833,73835,73838,73844,73848,73852,73854,73859,73864,73866,73872,73874,73877,73883,73888,73890,73896,73898,73901,73907,73912,73914,73920,73922,73925,73931,73933,73936,73947,73950,73952,73955],[48,72504,72505],{},"This is the second blog in our four-part blog series on how to achieve resource isolation in Apache Pulsar.",[48,72507,72508,72509,72512],{},"The first blog, ",[55,72510,72511],{"href":64302},"Taking an In-Depth Look at How to Achieve Isolation in Pulsar",", explains how to use the following approaches to achieve isolation in Pulsar:",[321,72514,72515,72518,72521],{},[324,72516,72517],{},"Separate Pulsar clusters",[324,72519,72520],{},"Shared BookKeeper cluster",[324,72522,72523],{},"Single Pulsar cluster",[48,72525,72526],{},"This blog details how to create multiple, separate pulsar clusters for isolation of resources. Because this approach segregates resources and does not share storage or local ZooKeeper with other clusters, it provides the highest level of isolation. You should use this approach if you want to isolate critical workloads (such as billing and ads). You can create multiple, separate clusters dedicated to each workload.",[48,72528,72529],{},"To help you get started quickly, this blog walks you through every step for the following parts:",[1666,72531,72532,72535,72538,72541],{},[324,72533,72534],{},"Deploy two separate Pulsar clusters",[324,72536,72537],{},"Verify data isolation of clusters",[324,72539,72540],{},"Synchronize and migrate data between clusters (optional)",[324,72542,72543],{},"Scale up and down nodes (optional)",[8300,72545,72547],{"id":72546},"deploy-environment","Deploy environment",[48,72549,72550],{},"The examples in this blog are developed on a macOS (version 11.2.3, memory 8G).",[48,72552,72553],{},"Software requirement",[321,72555,72556],{},[324,72557,72558],{},"Java 8",[48,72560,72561],{},"You will deploy two clusters and each of them supports the following services:",[321,72563,72564,72567,72570],{},[324,72565,72566],{},"1 ZooKeeper",[324,72568,72569],{},"1 bookie",[324,72571,72572],{},"1 broker",[48,72574,72575],{},[384,72576],{"alt":72577,"src":72578},"table of pulsar cluster detail ","\u002Fimgs\u002Fblogs\u002F63a410da920de0f948a7e486_pulsar-cluster.webp",[40,72580,72582],{"id":72581},"prepare-deployment","Prepare deployment",[1666,72584,72585,72593],{},[324,72586,72587,72592],{},[55,72588,72591],{"href":72589,"rel":72590},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fstandalone\u002F#install-pulsar-using-binary-release",[264],"Download Pulsar"," and untar the tarball.In this example, Pulsar 2.7.0 is installed.",[324,72594,72595],{},"Create empty directories using the following structure and then change the names accordingly.You can create the directories anywhere in your local environment.",[48,72597,72598],{},"Input",[8325,72600,72603],{"className":72601,"code":72602,"language":8330},[8328],"|-separate-clusters\n    |-configuration-store\n        |-zk1\n    |-cluster1\n        |-zk1\n        |-bk1\n        |-broker1\n    |-cluster2\n        |-zk1\n        |-bk1\n        |-broker1\n",[4926,72604,72602],{"__ignoreMap":18},[1666,72606,72607,72610],{},[324,72608,72609],{},"Copy the files to each directory you created in step 2.",[324,72611,44119,72612,72617],{},[55,72613,72616],{"href":72614,"rel":72615},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fdeploy-bare-metal-multi-cluster\u002F#deploy-the-configuration-store",[264],"configuration store",".Configuration store operates at the instance level and provides configuration management and task coordination across clusters. In this example, cluster1 and cluster2 share one configuration store.",[48,72619,72620],{},"‍Input",[8325,72622,72625],{"className":72623,"code":72624,"language":8330},[8328],"cd configuration-store\u002Fzk1\n\nbin\u002Fpulsar-daemon start configuration-store\n",[4926,72626,72624],{"__ignoreMap":18},[40,72628,72630],{"id":72629},"deploy-pulsar-cluster1","Deploy Pulsar cluster1",[1666,72632,72633],{},[324,72634,72635,72636,72641],{},"Start a ",[55,72637,72640],{"href":72638,"rel":72639},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fdeploy-bare-metal-multi-cluster\u002F#deploy-local-zookeeper",[264],"local ZooKeeper",".For each Pulsar cluster, you need to deploy 1 local ZooKeeper to manage configurations and coordinate tasks.",[48,72643,72598],{},[8325,72645,72648],{"className":72646,"code":72647,"language":8330},[8328],"cd cluster1\u002Fzk1\n\nbin\u002Fpulsar-daemon start zookeeper\n",[4926,72649,72647],{"__ignoreMap":18},[1666,72651,72652],{},[324,72653,3931,72654,72659],{},[55,72655,72658],{"href":72656,"rel":72657},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fdeploy-bare-metal-multi-cluster\u002F#cluster-metadata-initialization",[264],"Initialize metadata",".Write metadata to ZooKeeper.",[48,72661,72598],{},[8325,72663,72666],{"className":72664,"code":72665,"language":8330},[8328],"cd cluster1\u002Fzk1\n\nbin\u002Fpulsar initialize-cluster-metadata \\\n  --cluster cluster1 \\\n  --zookeeper localhost:2181 \\\n  --configuration-store localhost:2184 \\\n  --web-service-url http:\u002F\u002Flocalhost:8080\u002F \\\n  --web-service-url-tls https:\u002F\u002Flocalhost:8443\u002F \\\n  --broker-service-url pulsar:\u002F\u002Flocalhost:6650\u002F \\\n  --broker-service-url-tls pulsar+ssl:\u002F\u002Flocalhost:6651\u002F\n",[4926,72667,72665],{"__ignoreMap":18},[1666,72669,72670],{},[324,72671,72672,72677,72678,72683,72684,72688],{},[55,72673,72676],{"href":72674,"rel":72675},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fdeploy-bare-metal-multi-cluster\u002F#deploy-bookkeeper",[264],"Deploy BookKeeper",".BookKeeper provides ",[55,72679,72682],{"href":72680,"rel":72681},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fconcepts-architecture-overview#persistent-storage",[264],"persistent storage"," for messages on Pulsar. Each Pulsar broker owns its bookie. BookKeeper clusters and Pulsar clusters share the local ZooKeeper.(1) ",[55,72685,72687],{"href":72680,"rel":72686},[264],"Configure bookies",".Change the value of the following configurations in the cluster1\u002Fbk1\u002Fconf\u002Fbookkeeper.conf file.",[8325,72690,72693],{"className":72691,"code":72692,"language":8330},[8328],"allowLoopback=true\nprometheusStatsHttpPort=8002\nhttpServerPort=8002\n",[4926,72694,72692],{"__ignoreMap":18},[48,72696,72697,72698,190],{},"(2) ",[55,72699,72702],{"href":72700,"rel":72701},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fdeploy-bare-metal-multi-cluster\u002F#start-bookies",[264],"Start bookies",[48,72704,72598],{},[8325,72706,72709],{"className":72707,"code":72708,"language":8330},[8328],"cd cluster1\u002Fbk1\n\nbin\u002Fpulsar-daemon start bookie\n",[4926,72710,72708],{"__ignoreMap":18},[48,72712,72713],{},"Check whether the bookie is started successfully.",[48,72715,72598],{},[8325,72717,72720],{"className":72718,"code":72719,"language":8330},[8328],"bin\u002Fbookkeeper shell bookiesanity\n",[4926,72721,72719],{"__ignoreMap":18},[48,72723,72724],{},"Output",[8325,72726,72729],{"className":72727,"code":72728,"language":8330},[8328],"Bookie sanity test succeeded\n",[4926,72730,72728],{"__ignoreMap":18},[1666,72732,72733],{},[324,72734,3931,72735,72740,72741,72745],{},[55,72736,72739],{"href":72737,"rel":72738},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fdeploy-bare-metal-multi-cluster\u002F#deploy-brokers",[264],"Deploy brokers",".(1) ",[55,72742,47154],{"href":72743,"rel":72744},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fdeploy-bare-metal-multi-cluster\u002F#broker-configuration",[264],".Change the value of the following configurations in the cluster1\u002Fbroker1\u002Fconf\u002Fbroker.conf file.",[8325,72747,72750],{"className":72748,"code":72749,"language":8330},[8328],"zookeeperServers=127.0.0.1:2181\nconfigurationStoreServers=127.0.0.1:2184\nclusterName=cluster1\nmanagedLedgerDefaultEnsembleSize=1\nmanagedLedgerDefaultWriteQuorum=1\nmanagedLedgerDefaultAckQuorum=1\n",[4926,72751,72749],{"__ignoreMap":18},[48,72753,72697,72754],{},[55,72755,72758],{"href":72756,"rel":72757},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fdeploy-bare-metal-multi-cluster\u002F#start-the-broker-service",[264],"Start brokers",[48,72760,72598],{},[8325,72762,72765],{"className":72763,"code":72764,"language":8330},[8328],"cd cluster1\u002Fbroker1\n\nbin\u002Fpulsar-daemon start broker\n",[4926,72766,72764],{"__ignoreMap":18},[40,72768,72770],{"id":72769},"deploy-pulsar-cluster2","Deploy Pulsar cluster2",[1666,72772,72773],{},[324,72774,72775,72776,72779],{},"Deploy a ",[55,72777,72640],{"href":72638,"rel":72778},[264],".(1) Configure a local ZooKeeper.",[321,72781,72782],{},[324,72783,72784],{},"Change the value of the following configurations in the cluster2\u002Fzk1\u002Fconf\u002Fzookeeper.conf file.",[8325,72786,72789],{"className":72787,"code":72788,"language":8330},[8328],"clientPort=2186\nadmin.serverPort=9992\n",[4926,72790,72788],{"__ignoreMap":18},[321,72792,72793],{},[324,72794,72795],{},"Add the following configurations to the cluster2\u002Fzk1\u002Fconf\u002Fpulsar_env.sh file.",[8325,72797,72800],{"className":72798,"code":72799,"language":8330},[8328],"OPTS=\"-Dstats_server_port=8011\"\n",[4926,72801,72799],{"__ignoreMap":18},[48,72803,72804,72805,190],{},"(2) Start a ",[55,72806,72640],{"href":72638,"rel":72807},[264],[48,72809,72598],{},[8325,72811,72814],{"className":72812,"code":72813,"language":8330},[8328],"cd cluster2\u002Fzk1\n\nbin\u002Fpulsar-daemon start zookeeper\n",[4926,72815,72813],{"__ignoreMap":18},[1666,72817,72818],{},[324,72819,3931,72820,190],{},[55,72821,72658],{"href":72656,"rel":72822},[264],[48,72824,72598],{},[8325,72826,72829],{"className":72827,"code":72828,"language":8330},[8328],"bin\u002Fpulsar initialize-cluster-metadata \\\n  --cluster cluster2 \\\n  --zookeeper localhost:2186 \\\n  --configuration-store localhost:2184 \\\n  --web-service-url http:\u002F\u002Flocalhost:8081\u002F \\\n  --web-service-url-tls https:\u002F\u002Flocalhost:8444\u002F \\\n  --broker-service-url pulsar:\u002F\u002Flocalhost:6660\u002F \\\n  --broker-service-url-tls pulsar+ssl:\u002F\u002Flocalhost:6661\u002F\n",[4926,72830,72828],{"__ignoreMap":18},[1666,72832,72833],{},[324,72834,72835,72740,72838,72841],{},[55,72836,72676],{"href":72674,"rel":72837},[264],[55,72839,72687],{"href":72680,"rel":72840},[264],".Change the value of the following configurations in the cluster2\u002Fbk1\u002Fconf\u002Fbookkeeper.conf file.",[8325,72843,72846],{"className":72844,"code":72845,"language":8330},[8328],"bookiePort=3182\nzkServers=localhost:2186\nallowLoopback=true\nprometheusStatsHttpPort=8003\nhttpServerPort=8003\n",[4926,72847,72845],{"__ignoreMap":18},[48,72849,72697,72850,190],{},[55,72851,72702],{"href":72700,"rel":72852},[264],[8325,72854,72857],{"className":72855,"code":72856,"language":8330},[8328],"**Input**\ncd cluster2\u002Fbk1\n\nbin\u002Fpulsar-daemon start bookie\n      Check whether the bookie is started successfully.\n\n      **Input**\nbin\u002Fbookkeeper shell bookiesanity\n",[4926,72858,72856],{"__ignoreMap":18},[48,72860,72724],{},[8325,72862,72864],{"className":72863,"code":72728,"language":8330},[8328],[4926,72865,72728],{"__ignoreMap":18},[1666,72867,72868],{},[324,72869,72870,72740,72873,190],{},[55,72871,72739],{"href":72737,"rel":72872},[264],[55,72874,47154],{"href":72743,"rel":72875},[264],[48,72877,72878],{},"Change the value of the following configurations in the cluster2\u002Fbroker1\u002Fconf\u002Fbroker.conf file.",[8325,72880,72883],{"className":72881,"code":72882,"language":8330},[8328],"clusterName=cluster2\nzookeeperServers=127.0.0.1:2186\nconfigurationStoreServers=127.0.0.1:2184\nbrokerServicePort=6660\nwebServicePort=8081\nmanagedLedgerDefaultEnsembleSize=1\nmanagedLedgerDefaultWriteQuorum=1\nmanagedLedgerDefaultAckQuorum=1\n",[4926,72884,72882],{"__ignoreMap":18},[321,72886,72887],{},[324,72888,72889],{},"Change the value of the following configurations in the cluster2\u002Fbroker1\u002Fconf\u002Fclient.conf file.",[8325,72891,72894],{"className":72892,"code":72893,"language":8330},[8328],"webServiceUrl=http:\u002F\u002Flocalhost:8081\u002F\nbrokerServiceUrl=pulsar:\u002F\u002Flocalhost:6660\u002F\n",[4926,72895,72893],{"__ignoreMap":18},[48,72897,72697,72898,190],{},[55,72899,72758],{"href":72756,"rel":72900},[264],[48,72902,72598],{},[8325,72904,72907],{"className":72905,"code":72906,"language":8330},[8328],"cd cluster2\u002Fbroker1\n\nbin\u002Fpulsar-daemon start broker\n",[4926,72908,72906],{"__ignoreMap":18},[8300,72910,72537],{"id":72911},"verify-data-isolation-of-clusters",[48,72913,72914],{},"This section verifies whether the data in the two Pulsar clusters is isolated.",[1666,72916,72917],{},[324,72918,72919],{},"Create namespace1 and assign it to cluster1.",[916,72921,72922],{},[48,72923,72924,72925],{},"Tip : The format of a namespace name is ",[72926,72927,10259,72928],"tenant-name",{},[72929,72930,72931],"namespace-name",{},". For more information, see Namespaces.",[48,72933,72598],{},[8325,72935,72938],{"className":72936,"code":72937,"language":8330},[8328],"cd cluster1\u002Fbroker1\n\nbin\u002Fpulsar-admin namespaces create -c cluster1 public\u002Fnamespace1\n",[4926,72939,72937],{"__ignoreMap":18},[48,72941,72942],{},"Check the result.",[48,72944,72598],{},[8325,72946,72949],{"className":72947,"code":72948,"language":8330},[8328],"bin\u002Fpulsar-admin namespaces list public\n",[4926,72950,72948],{"__ignoreMap":18},[48,72952,72724],{},[8325,72954,72957],{"className":72955,"code":72956,"language":8330},[8328],"\"public\u002Fdefault\"\n\"public\u002Fnamespace1\"\n",[4926,72958,72956],{"__ignoreMap":18},[1666,72960,72961],{},[324,72962,72963],{},"Set the retention policy for namespace1.",[916,72965,72966],{},[48,72967,72968],{},"Note | If the retention policy is not set and the topic is not subscribed, the data stored on the topic is deleted automatically after a while.",[48,72970,72598],{},[8325,72972,72975],{"className":72973,"code":72974,"language":8330},[8328],"bin\u002Fpulsar-admin namespaces set-retention -s 100M -t 3d public\u002Fnamespace1\n",[4926,72976,72974],{"__ignoreMap":18},[1666,72978,72979],{},[324,72980,72981],{},"Create topic1 in namespace1 and write 1000 messages to this topic.",[916,72983,72984],{},[48,72985,72986,72987,190],{},"Tip | The pulsar-client is a command line tool to send and consume data. For more information, see ",[55,72988,72991],{"href":72989,"rel":72990},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Freference-cli-tools\u002F",[264],"Pulsar command line tools",[48,72993,72598],{},[8325,72995,72998],{"className":72996,"code":72997,"language":8330},[8328],"bin\u002Fpulsar-client produce -m 'hello c1 to c2' -n 1000 public\u002Fnamespace1\u002Ftopic1\n\n09:56:34.504 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 1000 messages successfully produced\n",[4926,72999,72997],{"__ignoreMap":18},[48,73001,72942],{},[48,73003,72598],{},[8325,73005,73008],{"className":73006,"code":73007,"language":8330},[8328],"bin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 topics stats-internal public\u002Fnamespace1\u002Ftopic1\n",[4926,73009,73007],{"__ignoreMap":18},[48,73011,72724],{},[48,73013,73014],{},"The entriesAddedCounter parameter shows that 1000 messages are added.",[8325,73016,73019],{"className":73017,"code":73018,"language":8330},[8328],"{\n  \"entriesAddedCounter\" : 1000,\n  \"numberOfEntries\" : 1000,\n  \"totalSize\" : 65616,\n  \"currentLedgerEntries\" : 1000,\n  \"currentLedgerSize\" : 65616,\n  \"lastLedgerCreatedTimestamp\" : \"2021-04-22T10:24:00.582+08:00\",\n  \"waitingCursorsCount\" : 0,\n  \"pendingAddEntriesCount\" : 0,\n  \"lastConfirmedEntry\" : \"4:999\",\n  \"state\" : \"LedgerOpened\",\n  \"ledgers\" : [ {\n    \"ledgerId\" : 4,\n    \"entries\" : 0,\n    \"size\" : 0,\n    \"offloaded\" : false\n  } ],\n  \"cursors\" : { },\n  \"compactedLedger\" : {\n    \"ledgerId\" : -1,\n    \"entries\" : -1,\n    \"size\" : -1,\n    \"offloaded\" : false\n  }\n}\n",[4926,73020,73018],{"__ignoreMap":18},[1666,73022,73023],{},[324,73024,73025],{},"Check the data stored on public\u002Fnamespace1\u002Ftopic1 by cluster2 (localhost:8081).",[48,73027,72598],{},[8325,73029,73032],{"className":73030,"code":73031,"language":8330},[8328],"\nbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 topics stats-internal public\u002Fnamespace1\u002Ftopic1\n\n",[4926,73033,73031],{"__ignoreMap":18},[48,73035,72724],{},[48,73037,73038],{},"The attempt failed. The error message shows that the data stored on public\u002Fnamespace1 is assigned only to cluster1. This proves that the data is isolated.",[8325,73040,73043],{"className":73041,"code":73042,"language":8330},[8328],"\nNamespace missing local cluster name in clusters list: local_cluster=cluster2 ns=public\u002Fnamespace1 clusters=[cluster1]\n\nReason: Namespace missing local cluster name in clusters list: local_cluster=cluster2 ns=public\u002Fnamespace1 clusters=[cluster1]\n\n",[4926,73044,73042],{"__ignoreMap":18},[1666,73046,73047],{},[324,73048,73049],{},"Write data to public\u002Fnamespace1\u002Ftopic1 in cluster2.",[48,73051,72598],{},[8325,73053,73056],{"className":73054,"code":73055,"language":8330},[8328],"\ncd cluster2\u002Fbroker1\n\nbin\u002Fpulsar-client produce -m 'hello c1 to c2' -n 1000 public\u002Fnamespace1\u002Ftopic1\n\n",[4926,73057,73055],{"__ignoreMap":18},[48,73059,72724],{},[48,73061,73062],{},"The error message shows that 0 message is written. The attempt failed because namespace1 is assigned only to cluster1. This proves that the data is isolated.",[8325,73064,73067],{"className":73065,"code":73066,"language":8330},[8328],"\n12:09:50.005 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 0 messages successfully produced\n\n",[4926,73068,73066],{"__ignoreMap":18},[8300,73070,73072],{"id":73071},"synchronize-and-migrate-data-between-clusters","Synchronize and migrate data between clusters",[48,73074,73075],{},"After verifying that the data is isolated, you can synchronize (using geo-replication) and migrate data between clusters.",[1666,73077,73078],{},[324,73079,73080],{},"Assign namespace1 to cluster2, that is, adding cluster2 to the cluster list of namespace1.This enables geo-replication to synchronize the data between cluster1 and cluster2.",[48,73082,72598],{},[8325,73084,73087],{"className":73085,"code":73086,"language":8330},[8328],"\nbin\u002Fpulsar-admin namespaces set-clusters --clusters cluster1,cluster2 public\u002Fnamespace1\n\n",[4926,73088,73086],{"__ignoreMap":18},[48,73090,72942],{},[48,73092,72598],{},[8325,73094,73097],{"className":73095,"code":73096,"language":8330},[8328],"bin\u002Fpulsar-admin namespaces get-clusters public\u002Fnamespace1\n",[4926,73098,73096],{"__ignoreMap":18},[48,73100,72724],{},[8325,73102,73105],{"className":73103,"code":73104,"language":8330},[8328],"\"cluster1\"\n\"cluster2\"\n",[4926,73106,73104],{"__ignoreMap":18},[1666,73108,73109],{},[324,73110,73111],{},"Check whether topic1 is in cluster2.",[48,73113,72598],{},[8325,73115,73118],{"className":73116,"code":73117,"language":8330},[8328],"bin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8081 topics stats-internal public\u002Fnamespace1\u002Ftopic1\n",[4926,73119,73117],{"__ignoreMap":18},[48,73121,72724],{},[48,73123,73124],{},"The output shows that there are 1000 messages on cluster2\u002Ftopic1. This proves that the data stored on cluster1\u002Ftopic1 is replicated to cluster2 successfully.",[8325,73126,73129],{"className":73127,"code":73128,"language":8330},[8328],"{\n  \"entriesAddedCounter\" : 1000,\n  \"numberOfEntries\" : 1000,\n  \"totalSize\" : 75616,\n  \"currentLedgerEntries\" : 1000,\n  \"currentLedgerSize\" : 75616,\n  \"lastLedgerCreatedTimestamp\" : \"2021-04-23T12:02:52.929+08:00\",\n  \"waitingCursorsCount\" : 1,\n  \"pendingAddEntriesCount\" : 0,\n  \"lastConfirmedEntry\" : \"1:999\",\n  \"state\" : \"LedgerOpened\",\n  \"ledgers\" : [ {\n    \"ledgerId\" : 1,\n    \"entries\" : 0,\n    \"size\" : 0,\n    \"offloaded\" : false\n  } ],\n  \"cursors\" : {\n    \"pulsar.repl.cluster1\" : {\n      \"markDeletePosition\" : \"1:999\",\n      \"readPosition\" : \"1:1000\",\n      \"waitingReadOp\" : true,\n      \"pendingReadOps\" : 0,\n      \"messagesConsumedCounter\" : 1000,\n      \"cursorLedger\" : 2,\n      \"cursorLedgerLastEntry\" : 2,\n      \"individuallyDeletedMessages\" : \"[]\",\n      \"lastLedgerSwitchTimestamp\" : \"2021-04-23T12:02:53.248+08:00\",\n      \"state\" : \"Open\",\n      \"numberOfEntriesSinceFirstNotAckedMessage\" : 1,\n      \"totalNonContiguousDeletedMessagesRange\" : 0,\n      \"properties\" : { }\n    }\n  },\n  \"compactedLedger\" : {\n    \"ledgerId\" : -1,\n    \"entries\" : -1,\n    \"size\" : -1,\n    \"offloaded\" : false\n  }\n}\n",[4926,73130,73128],{"__ignoreMap":18},[1666,73132,73133],{},[324,73134,73135],{},"Migrate the producer and consumer from cluster1 to cluster2.",[8325,73137,73140],{"className":73138,"code":73139,"language":8330},[8328],"PulsarClient pulsarClient1 = PulsarClient.builder().serviceUrl(\"pulsar:\u002F\u002Flocalhost:6650\").build();\n\u002F\u002F migrate the client to cluster2 pulsar:\u002F\u002Flocalhost:6660\nPulsarClient pulsarClient2 = PulsarClient.builder().serviceUrl(\"pulsar:\u002F\u002Flocalhost:6660\").build();\n",[4926,73141,73139],{"__ignoreMap":18},[1666,73143,73144],{},[324,73145,73146],{},"Remove cluster1 from the cluster list of namespace1.",[48,73148,72598],{},[8325,73150,73153],{"className":73151,"code":73152,"language":8330},[8328],"bin\u002Fpulsar-admin namespaces set-clusters --clusters cluster2 public\u002Fnamespace1\n",[4926,73154,73152],{"__ignoreMap":18},[48,73156,73157],{},"Check if the data is stored on cluster1\u002Ftopic1.",[48,73159,72598],{},[8325,73161,73164],{"className":73162,"code":73163,"language":8330},[8328],"cd cluster1\u002Fbroker1\n\nbin\u002Fpulsar-admin --admin-url http:\u002F\u002Flocalhost:8080 topics stats-internal public\u002Fnamespace1\u002Ftopic1\n",[4926,73165,73163],{"__ignoreMap":18},[48,73167,72724],{},[48,73169,73170],{},"The data is removed from cluster1\u002Ftopic1 successfully since the value of numberOfEntries parameter is 0.",[8325,73172,73175],{"className":73173,"code":73174,"language":8330},[8328],"{\n  \"entriesAddedCounter\" : 0,\n  \"numberOfEntries\" : 0,\n  \"totalSize\" : 0,\n  \"currentLedgerEntries\" : 0,\n  \"currentLedgerSize\" : 0,\n  \"lastLedgerCreatedTimestamp\" : \"2021-04-23T15:20:08.1+08:00\",\n  \"waitingCursorsCount\" : 1,\n  \"pendingAddEntriesCount\" : 0,\n  \"lastConfirmedEntry\" : \"3:-1\",\n  \"state\" : \"LedgerOpened\",\n  \"ledgers\" : [ {\n    \"ledgerId\" : 3,\n    \"entries\" : 0,\n    \"size\" : 0,\n    \"offloaded\" : false\n  } ],\n  \"cursors\" : {\n    \"pulsar.repl.cluster2\" : {\n      \"markDeletePosition\" : \"3:-1\",\n      \"readPosition\" : \"3:0\",\n      \"waitingReadOp\" : true,\n      \"pendingReadOps\" : 0,\n      \"messagesConsumedCounter\" : 0,\n      \"cursorLedger\" : 4,\n      \"cursorLedgerLastEntry\" : 0,\n      \"individuallyDeletedMessages\" : \"[]\",\n      \"lastLedgerSwitchTimestamp\" : \"2021-04-23T15:20:08.122+08:00\",\n      \"state\" : \"Open\",\n      \"numberOfEntriesSinceFirstNotAckedMessage\" : 1,\n      \"totalNonContiguousDeletedMessagesRange\" : 0,\n      \"properties\" : { }\n    }\n  },\n  \"compactedLedger\" : {\n    \"ledgerId\" : -1,\n    \"entries\" : -1,\n    \"size\" : -1,\n    \"offloaded\" : false\n  }\n}\n",[4926,73176,73174],{"__ignoreMap":18},[48,73178,73179],{},"At this point, you replicated data from cluster1\u002Ftopic1 to cluster2 and then removed the data from cluster1\u002Ftopic1.",[8300,73181,73183],{"id":73182},"scale-up-and-down-nodes","Scale up and down nodes",[48,73185,73186],{},"If you need to handle increasing or decreasing workloads, you can scale up or down nodes. This section demonstrates how to scale up and scale down nodes (brokers and bookies).",[40,73188,61065],{"id":61064},[32,73190,73192],{"id":73191},"scale-up-brokers","Scale up brokers",[48,73194,73195],{},"In this procedure, you’ll create 2 partitioned topics on cluster1\u002Fbroker1 and add 2 brokers. Then, you’ll offload the data stored on partitioned topics and check the data distribution among 3 brokers.",[1666,73197,73198],{},[324,73199,73200],{},"Check the information about brokers in cluster1.",[48,73202,72598],{},[8325,73204,73207],{"className":73205,"code":73206,"language":8330},[8328],"cd\u002Fcluster1\u002Fbroker1\n\nbin\u002Fpulsar-admin brokers list cluster1\n",[4926,73208,73206],{"__ignoreMap":18},[48,73210,72724],{},[48,73212,73213],{},"The output shows that broker1 is the only broker in cluster1.",[8325,73215,73218],{"className":73216,"code":73217,"language":8330},[8328],"\"192.168.0.105:8080\"\n",[4926,73219,73217],{"__ignoreMap":18},[1666,73221,73222],{},[324,73223,73224],{},"Create 2 partitioned topics on cluster1\u002Fbroker1.Create 6 partitions for partitioned-topic1 and 7 partitions for partitioned-topic2.",[48,73226,72598],{},[8325,73228,73231],{"className":73229,"code":73230,"language":8330},[8328],"bin\u002Fpulsar-admin topics create-partitioned-topic -p 6 public\u002Fnamespace1\u002Fpartitioned-topic1\n\nbin\u002Fpulsar-admin topics create-partitioned-topic -p 7 public\u002Fnamespace1\u002Fpartitioned-topic2\n",[4926,73232,73230],{"__ignoreMap":18},[48,73234,72942],{},[48,73236,72598],{},[8325,73238,73241],{"className":73239,"code":73240,"language":8330},[8328],"bin\u002Fpulsar-admin topics partitioned-lookup public\u002Fnamespace1\u002Fpartitioned-topic1\n",[4926,73242,73240],{"__ignoreMap":18},[48,73244,72724],{},[48,73246,73247],{},"All data of partitioned-topic1 is from broker1.",[8325,73249,73252],{"className":73250,"code":73251,"language":8330},[8328],"\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-0    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-1    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-2    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-3    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-4    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-5    pulsar:\u002F\u002F192.168.0.105:6650\"\n",[4926,73253,73251],{"__ignoreMap":18},[48,73255,72598],{},[8325,73257,73260],{"className":73258,"code":73259,"language":8330},[8328],"bin\u002Fpulsar-admin topics partitioned-lookup public\u002Fnamespace1\u002Fpartitioned-topic2\n",[4926,73261,73259],{"__ignoreMap":18},[48,73263,72724],{},[48,73265,73266],{},"All data of partitioned-topic2 is from broker1.",[8325,73268,73271],{"className":73269,"code":73270,"language":8330},[8328],"\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-0    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-1    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-2    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-3    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-4    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-5    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-6    pulsar:\u002F\u002F192.168.0.105:6650\"\n",[4926,73272,73270],{"__ignoreMap":18},[1666,73274,73275],{},[324,73276,73277],{},"Add broker2 and broker3.",[48,73279,73280],{},"(1) Prepare for deployment.",[48,73282,73283],{},"Create two empty repositories (broker2 and broker3) under cluster1 repository. Copy the untarred files in the Pulsar repository to these two repositories.",[8325,73285,73288],{"className":73286,"code":73287,"language":8330},[8328],"|-separate-clusters\n    |-configuration-store\n        |-zk1\n    |-cluster1\n        |-zk1\n        |-bk1\n        |-broker1\n        |-broker2\n        |-broker3\n    |-cluster2\n        |-zk1\n        |-bk1\n        |-broker1\n",[4926,73289,73287],{"__ignoreMap":18},[48,73291,72697,73292,190],{},[55,73293,72739],{"href":72737,"rel":73294},[264],[48,73296,73297,73298,190],{},"(2.a) ",[55,73299,47154],{"href":72743,"rel":73300},[264],[48,73302,73303],{},[384,73304],{"alt":73305,"src":73306},"table of Configure brokers","\u002Fimgs\u002Fblogs\u002F63a4170c1b41ff8ad43494dc_Configure-brokers.webp",[48,73308,73309,73310],{},"(2.b) ",[55,73311,73313],{"href":72756,"rel":73312},[264],"Start brokers.",[48,73315,73316],{},[384,73317],{"alt":73318,"src":73319},"tabs of start brokers","\u002Fimgs\u002Fblogs\u002F63a4173f21f9313b93f29766_start-brokers.webp",[48,73321,73322],{},"(2.c) Check the information about the running brokers in cluster1.",[48,73324,72598],{},[8325,73326,73329],{"className":73327,"code":73328,"language":8330},[8328],"bin\u002Fpulsar-admin brokers list cluster1\n",[4926,73330,73328],{"__ignoreMap":18},[48,73332,72724],{},[8325,73334,73337],{"className":73335,"code":73336,"language":8330},[8328],"\"192.168.0.105:8080\" \u002F\u002F broker1\n\"192.168.0.105:8082\" \u002F\u002F broker2\n\"192.168.0.105:8083\" \u002F\u002F broker3\n",[4926,73338,73336],{"__ignoreMap":18},[1666,73340,73341],{},[324,73342,73343],{},"Offload the data stored on namespace1\u002Fpartitioned-topic1.",[48,73345,72598],{},[8325,73347,73350],{"className":73348,"code":73349,"language":8330},[8328],"bin\u002Fpulsar-admin namespaces unload public\u002Fnamespace1\n",[4926,73351,73349],{"__ignoreMap":18},[48,73353,72942],{},[48,73355,73356],{},"(1) Check the distribution of data stored on partitioned-topic1.",[48,73358,72598],{},[8325,73360,73362],{"className":73361,"code":73240,"language":8330},[8328],[4926,73363,73240],{"__ignoreMap":18},[48,73365,72724],{},[48,73367,73368],{},"The output shows that the data stored on partitioned-topic1 is distributed evenly on broker1, broker2, and broker3.",[8325,73370,73373],{"className":73371,"code":73372,"language":8330},[8328],"\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-0    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-1    pulsar:\u002F\u002F192.168.0.105:6653\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-2    pulsar:\u002F\u002F192.168.0.105:6652\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-3    pulsar:\u002F\u002F192.168.0.105:6653\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-4    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-5    pulsar:\u002F\u002F192.168.0.105:6653\"\n",[4926,73374,73372],{"__ignoreMap":18},[48,73376,73377],{},"(2) Check the distribution of data stored on partitioned-topic2.",[48,73379,72598],{},[8325,73381,73383],{"className":73382,"code":73259,"language":8330},[8328],[4926,73384,73259],{"__ignoreMap":18},[48,73386,73387],{},"The output shows that the data stored on partitioned-topic2 is distributed evenly on broker1, broker2, and broker3.",[48,73389,72724],{},[8325,73391,73394],{"className":73392,"code":73393,"language":8330},[8328],"\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-0    pulsar:\u002F\u002F192.168.0.105:6653\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-1    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-2    pulsar:\u002F\u002F192.168.0.105:6653\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-3    pulsar:\u002F\u002F192.168.0.105:6652\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-4    pulsar:\u002F\u002F192.168.0.105:6653\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-5    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-6    pulsar:\u002F\u002F192.168.0.105:6653\"\n",[4926,73395,73393],{"__ignoreMap":18},[32,73397,73399],{"id":73398},"scale-down-brokers","Scale down brokers",[916,73401,73402],{},[48,73403,73404],{},"Tip The following steps continue from the previous section “Scale up brokers”.",[48,73406,73407],{},"In this procedure, you’ll stop 1 broker in cluster1 and check how the data stored on the partitioned topics is distributed among other brokers.",[1666,73409,73410],{},[324,73411,73412],{},"Stop broker3.",[48,73414,72620],{},[8325,73416,73419],{"className":73417,"code":73418,"language":8330},[8328],"\ncd\u002Fcluster1\u002Fbroker3\n\nbin\u002Fpulsar-daemon stop broker\n\n",[4926,73420,73418],{"__ignoreMap":18},[48,73422,72942],{},[48,73424,72598],{},[8325,73426,73429],{"className":73427,"code":73428,"language":8330},[8328],"\nbin\u002Fpulsar-admin brokers list cluster1\n\n",[4926,73430,73428],{"__ignoreMap":18},[48,73432,72724],{},[48,73434,73435],{},"The output shows that only broker1 and broker2 are running in cluster1.",[8325,73437,73440],{"className":73438,"code":73439,"language":8330},[8328],"\n\"192.168.0.105:8080\" \u002F\u002F broker1\n\"192.168.0.105:8082\" \u002F\u002F broker2\n\n",[4926,73441,73439],{"__ignoreMap":18},[1666,73443,73444],{},[324,73445,73446],{},"Check the distribution of data stored on partitioned-topic1.",[48,73448,72598],{},[8325,73450,73453],{"className":73451,"code":73452,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics partitioned-lookup public\u002Fnamespace1\u002Fpartitioned-topic1\n\n",[4926,73454,73452],{"__ignoreMap":18},[48,73456,72724],{},[48,73458,73459],{},"The output shows that the data stored on partitioned-topic1 is distributed evenly between broker1 and broker2, which means that the data from broker3 is redistributed.",[8325,73461,73464],{"className":73462,"code":73463,"language":8330},[8328],"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-0    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-1    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-2    pulsar:\u002F\u002F192.168.0.105:6652\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-3    pulsar:\u002F\u002F192.168.0.105:6652\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-4    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic1-partition-5    pulsar:\u002F\u002F192.168.0.105:6650\"\n\n",[4926,73465,73463],{"__ignoreMap":18},[48,73467,73468],{},"Similarly, the data stored on partitioned-topic2 is distributed evenly between broker1 and broker2.",[48,73470,72598],{},[8325,73472,73475],{"className":73473,"code":73474,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics partitioned-lookup public\u002Fnamespace1\u002Fpartitioned-topic2\n\n",[4926,73476,73474],{"__ignoreMap":18},[48,73478,72724],{},[8325,73480,73483],{"className":73481,"code":73482,"language":8330},[8328],"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-0    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-1    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-2    pulsar:\u002F\u002F192.168.0.105:6652\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-3    pulsar:\u002F\u002F192.168.0.105:6652\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-4    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-5    pulsar:\u002F\u002F192.168.0.105:6650\"\n\"persistent:\u002F\u002Fpublic\u002Fnamespace1\u002Fpartitioned-topic2-partition-6    pulsar:\u002F\u002F192.168.0.105:6652\"\n\n",[4926,73484,73482],{"__ignoreMap":18},[40,73486,73488],{"id":73487},"bookie","Bookie",[32,73490,73492],{"id":73491},"scale-up-bookies","Scale up bookies",[48,73494,73495],{},"In this procedure, you’ll add 2 bookies to cluster1\u002Fbookkeeper1. Then, you’ll write data to topic1 and check whether the replicas are saved.",[1666,73497,73498],{},[324,73499,73500],{},"Check the information about bookies in cluster1.",[48,73502,72598],{},[8325,73504,73507],{"className":73505,"code":73506,"language":8330},[8328],"\ncd cluster1\u002Fbk1\n\nbin\u002Fbookkeeper shell listbookies -rw -h\n\n",[4926,73508,73506],{"__ignoreMap":18},[48,73510,72724],{},[48,73512,73213],{},[8325,73514,73517],{"className":73515,"code":73516,"language":8330},[8328],"\n12:31:34.933 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :\n12:31:34.946 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3181, IP:192.168.0.105, Port:3181, Hostname:192.168.0.105\n\n",[4926,73518,73516],{"__ignoreMap":18},[1666,73520,73521],{},[324,73522,73523],{},"Allow 3 bookies to serve.",[48,73525,73526],{},"Change the values of the following configurations in the cluster1\u002Fbroker1\u002Fconf\u002Fbroker.conf file.",[8325,73528,73531],{"className":73529,"code":73530,"language":8330},[8328],"\nmanagedLedgerDefaultEnsembleSize=3 \u002F\u002F specify the number of bookies to use when creating a ledger\nmanagedLedgerDefaultWriteQuorum=3 \u002F\u002F specify the number of copies to store for each message\nmanagedLedgerDefaultAckQuorum=2  \u002F\u002F specify the number of guaranteed copies (acks to wait before writing is completed) \n\n",[4926,73532,73530],{"__ignoreMap":18},[1666,73534,73535],{},[324,73536,73537],{},"Restart broker1 to enable the configurations.",[48,73539,72598],{},[8325,73541,73544],{"className":73542,"code":73543,"language":8330},[8328],"\ncd cluster1\u002Fbroker1\n\nbin\u002Fpulsar-daemon stop broker\n\nbin\u002Fpulsar-daemon start broker\n\n",[4926,73545,73543],{"__ignoreMap":18},[1666,73547,73548],{},[324,73549,73550],{},"Set the retention policy for the messages in public\u002Fdefault.",[916,73552,73553],{},[48,73554,73555],{},"Note If the retention policy is not set and the topic is not subscribed, the data of the topic is deleted automatically after a while.",[48,73557,72598],{},[8325,73559,73562],{"className":73560,"code":73561,"language":8330},[8328],"\ncd cluster1\u002Fbroker1\n\nbin\u002Fpulsar-admin namespaces set-retention -s 100M -t 3d public\u002Fdefault\n",[4926,73563,73561],{"__ignoreMap":18},[1666,73565,73566],{},[324,73567,73568],{},"Create topic1 in public\u002Fdefault and write 100 messages to this topic.",[48,73570,72598],{},[8325,73572,73575],{"className":73573,"code":73574,"language":8330},[8328],"\nbin\u002Fpulsar-client produce -m 'hello' -n 100 topic1\n\n",[4926,73576,73574],{"__ignoreMap":18},[48,73578,72724],{},[48,73580,73581],{},"The data is not written successfully because of the insufficient number of bookies.",[8325,73583,73586],{"className":73584,"code":73585,"language":8330},[8328],"\n···\n\n12:40:38.886 [pulsar-client-io-1-1] WARN  org.apache.pulsar.client.impl.ClientCnx - [id: 0x56f92aff, L:\u002F192.168.0.105:53069 - R:\u002F192.168.0.105:6650] Received error from server: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available\n\n...\n\n12:40:38.886 [main] ERROR org.apache.pulsar.client.cli.PulsarClientTool - Error while producing messages\n\n…\n\n12:40:38.890 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 0 messages successfully produced\n\n",[4926,73587,73585],{"__ignoreMap":18},[1666,73589,73590],{},[324,73591,73592],{},"Add bookie2 and bookie3.",[48,73594,73280],{},[48,73596,73597],{},"Create two empty repositories (bk2 and bk3) under cluster1 repository. Copy the untarred files in Pulsar repository to these two repositories.",[8325,73599,73602],{"className":73600,"code":73601,"language":8330},[8328],"\n|-separate-clusters\n    |-configuration-store\n        |-zk1\n    |-cluster1\n        |-zk1\n        |-bk1\n        |-bk2\n        |-bk3\n        |-broker1\n    |-cluster2\n        |-zk1\n        |-bk1\n        |-broker1\n\n",[4926,73603,73601],{"__ignoreMap":18},[48,73605,73606],{},"(2) Deploy bookies.",[48,73608,73609],{},"(2.a) Configure bookies.",[48,73611,73612],{},[384,73613],{"alt":73614,"src":73615},"table Configure bookies","\u002Fimgs\u002Fblogs\u002F63a41a27b8b33d222c8974ff_Configure-bookies.webp",[48,73617,73618],{},"(2.b) Start bookies.",[48,73620,73621],{},[384,73622],{"alt":73623,"src":73624},"table start bookies","\u002Fimgs\u002Fblogs\u002F63a41a551b41ff39be36483d_Start-bookies.webp",[48,73626,73627],{},"(2.c) Check the running bookies in cluster1.",[48,73629,72598],{},[8325,73631,73634],{"className":73632,"code":73633,"language":8330},[8328],"\nbin\u002Fbookkeeper shell listbookies -rw -h\n\n",[4926,73635,73633],{"__ignoreMap":18},[48,73637,72724],{},[48,73639,73640],{},"All three bookies are running in cluster1: bookie1：192.168.0.105:3181 bookie2：192.168.0.105:3183 bookie3：192.168.0.105:3184",[8325,73642,73645],{"className":73643,"code":73644,"language":8330},[8328],"\n12:12:47.574 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3183, IP:192.168.0.105, Port:3183, Hostname:192.168.0.105 \n12:12:47.575 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3184, IP:192.168.0.105, Port:3184, Hostname:192.168.0.105\n12:12:47.576 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3181, IP:192.168.0.105, Port:3181, Hostname:192.168.0.105 \n\n",[4926,73646,73644],{"__ignoreMap":18},[1666,73648,73649],{},[324,73650,73651],{},"Set the retention policy for messages in public\u002Fdefault.‍",[916,73653,73654],{},[48,73655,73656],{},"Note If the retention policy is not set and the topic is not subscribed, the data stored on the topic is deleted automatically after a while.",[48,73658,72598],{},[8325,73660,73663],{"className":73661,"code":73662,"language":8330},[8328],"\ncd cluster1\u002Fbroker1\n\nbin\u002Fpulsar-admin namespaces set-retention -s 100M -t 3d public\u002Fdefault\n\n",[4926,73664,73662],{"__ignoreMap":18},[1666,73666,73667],{},[324,73668,73568],{},[48,73670,72598],{},[8325,73672,73674],{"className":73673,"code":73574,"language":8330},[8328],[4926,73675,73574],{"__ignoreMap":18},[48,73677,72724],{},[48,73679,73680],{},"The messages are written successfully.",[8325,73682,73685],{"className":73683,"code":73684,"language":8330},[8328],"\n...\n12:17:40.222 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 100 messages successfully produced\n\n",[4926,73686,73684],{"__ignoreMap":18},[1666,73688,73689],{},[324,73690,73691],{},"Check the information about topic1.",[48,73693,72598],{},[8325,73695,73698],{"className":73696,"code":73697,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics stats-internal topic1\n\n",[4926,73699,73697],{"__ignoreMap":18},[48,73701,72724],{},[48,73703,73704],{},"The output shows that the data stored on topic1 is saved in the ledger with ledgerId 5.",[8325,73706,73709],{"className":73707,"code":73708,"language":8330},[8328],"\n{\n  \"entriesAddedCounter\" : 100,\n  \"numberOfEntries\" : 100,\n  \"totalSize\" : 5500,\n  \"currentLedgerEntries\" : 100,\n  \"currentLedgerSize\" : 5500,\n  \"lastLedgerCreatedTimestamp\" : \"2021-05-11T12:17:38.881+08:00\",\n  \"waitingCursorsCount\" : 0,\n  \"pendingAddEntriesCount\" : 0,\n  \"lastConfirmedEntry\" : \"5:99\",\n  \"state\" : \"LedgerOpened\",\n  \"ledgers\" : [ {\n    \"ledgerId\" : 5,\n    \"entries\" : 0,\n    \"size\" : 0,\n    \"offloaded\" : false\n  } ],\n  \"cursors\" : { },\n  \"compactedLedger\" : {\n    \"ledgerId\" : -1,\n    \"entries\" : -1,\n    \"size\" : -1,\n    \"offloaded\" : false\n  }\n}\n\n",[4926,73710,73708],{"__ignoreMap":18},[1666,73712,73713],{},[324,73714,73715],{},"Check in which bookies the ledger with ledgerId 5 is saved.",[48,73717,72598],{},[8325,73719,73722],{"className":73720,"code":73721,"language":8330},[8328],"\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 5\n\n",[4926,73723,73721],{"__ignoreMap":18},[48,73725,72724],{},[48,73727,73728],{},"As configured previously, the ledger with ledgerId 5 is saved on bookie1 (3181), bookie2 (3181), and bookie3 (3184).",[8325,73730,73733],{"className":73731,"code":73732,"language":8330},[8328],"\n...\n12:23:17.705 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - ledgerID: 5\n12:23:17.714 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=3, ackQuorumSize=2, state=OPEN, digestType=CRC32C, password=base64:, ensembles={0=[192.168.0.105:3184, 192.168.0.105:3181, 192.168.0.105:3183]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar\u002Fmanaged-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC90b3BpYzE=, application=base64:cHVsc2Fy}}\n…\n\n",[4926,73734,73732],{"__ignoreMap":18},[32,73736,73738],{"id":73737},"scale-down-bookies","Scale down bookies",[916,73740,73741],{},[48,73742,73743],{},"Tip The following steps continue from the previous section “Scale up bookies”.",[48,73745,73746],{},"In this procedure, you’ll remove 2 bookies. Then, you’ll write data to topic2 and check where the data is saved.",[1666,73748,73749],{},[324,73750,73751],{},"Allow 1 bookie to serve.Change the values of the following configurations in the cluster1\u002Fbroker1\u002Fconf\u002Fbroker.conf file.",[8325,73753,73756],{"className":73754,"code":73755,"language":8330},[8328],"\nmanagedLedgerDefaultEnsembleSize=1 \u002F\u002F specify the number of bookies to use when creating a ledger\nmanagedLedgerDefaultWriteQuorum=1 \u002F\u002F specify the number of copies to store for each message\nmanagedLedgerDefaultAckQuorum=1  \u002F\u002F specify the number of guaranteed copies (acks to wait before writing is completed) \n\n",[4926,73757,73755],{"__ignoreMap":18},[1666,73759,73760],{},[324,73761,73537],{},[48,73763,72598],{},[8325,73765,73767],{"className":73766,"code":73543,"language":8330},[8328],[4926,73768,73543],{"__ignoreMap":18},[1666,73770,73771],{},[324,73772,73500],{},[48,73774,72598],{},[8325,73776,73778],{"className":73777,"code":73506,"language":8330},[8328],[4926,73779,73506],{"__ignoreMap":18},[48,73781,72724],{},[48,73783,73784],{},"All three bookies are running in cluster1, including bookie1 (3181), bookie2 (3183), and bookie3 (3184).",[8325,73786,73789],{"className":73787,"code":73788,"language":8330},[8328],"\n...\n15:47:41.370 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :\n15:47:41.382 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3183, IP:192.168.0.105, Port:3183, Hostname:192.168.0.105\n15:47:41.383 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3184, IP:192.168.0.105, Port:3184, Hostname:192.168.0.105\n15:47:41.384 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3181, IP:192.168.0.105, Port:3181, Hostname:192.168.0.105\n…\n\n",[4926,73790,73788],{"__ignoreMap":18},[1666,73792,73793],{},[324,73794,73795],{},"Stop bookie2 and bookie3.",[916,73797,73798],{},[48,73799,73800,73801,190],{},"Tip For more information about how to stop bookies, see ",[55,73802,73805],{"href":73803,"rel":73804},"https:\u002F\u002Fbookkeeper.apache.org\u002Fdocs\u002F4.13.0\u002Fadmin\u002Fdecomission\u002F",[264],"Decommission Bookies",[48,73807,72598],{},[8325,73809,73812],{"className":73810,"code":73811,"language":8330},[8328],"\ncd cluster1\u002Fbk2\n\nbin\u002Fbookkeeper shell listunderreplicated\n\nbin\u002Fpulsar-daemon stop bookie\n\nbin\u002Fbookkeeper shell decommissionbookie\n\n",[4926,73813,73811],{"__ignoreMap":18},[48,73815,72598],{},[8325,73817,73820],{"className":73818,"code":73819,"language":8330},[8328],"\ncd cluster1\u002Fbk3\n\nbin\u002Fbookkeeper shell listunderreplicated\n\nbin\u002Fpulsar-daemon stop bookie\n\nbin\u002Fbookkeeper shell decommissionbookie\n\n",[4926,73821,73819],{"__ignoreMap":18},[1666,73823,73824],{},[324,73825,73500],{},[48,73827,72598],{},[8325,73829,73831],{"className":73830,"code":73506,"language":8330},[8328],[4926,73832,73506],{"__ignoreMap":18},[48,73834,72724],{},[48,73836,73837],{},"The output shows that bookie1 (3181) is the only running bookie in cluster1.",[8325,73839,73842],{"className":73840,"code":73841,"language":8330},[8328],"\n...\n16:05:28.690 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :\n16:05:28.700 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3181, IP:192.168.0.105, Port:3181, Hostname:192.168.0.105\n...\n\n",[4926,73843,73841],{"__ignoreMap":18},[1666,73845,73846],{},[324,73847,73550],{},[916,73849,73850],{},[48,73851,73656],{},[48,73853,72598],{},[8325,73855,73857],{"className":73856,"code":73662,"language":8330},[8328],[4926,73858,73662],{"__ignoreMap":18},[1666,73860,73861],{},[324,73862,73863],{},"Create topic2 in public\u002Fdefault and write 100 messages to this topic.",[48,73865,72598],{},[8325,73867,73870],{"className":73868,"code":73869,"language":8330},[8328],"\nbin\u002Fpulsar-client produce -m 'hello' -n 100 topic2\n\n",[4926,73871,73869],{"__ignoreMap":18},[48,73873,72724],{},[48,73875,73876],{},"The data is written successfully",[8325,73878,73881],{"className":73879,"code":73880,"language":8330},[8328],"\n...\n16:06:59.448 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 100 messages successfully produced\n\n",[4926,73882,73880],{"__ignoreMap":18},[1666,73884,73885],{},[324,73886,73887],{},"Check the information about topic2.",[48,73889,72598],{},[8325,73891,73894],{"className":73892,"code":73893,"language":8330},[8328],"\nbin\u002Fpulsar-admin topics stats-internal topic2\n\n",[4926,73895,73893],{"__ignoreMap":18},[48,73897,72724],{},[48,73899,73900],{},"The data stored on topic2 is saved in the ledger with ledgerId 7.",[8325,73902,73905],{"className":73903,"code":73904,"language":8330},[8328],"\n{\n  \"entriesAddedCounter\" : 100,\n  \"numberOfEntries\" : 100,\n  \"totalSize\" : 5400,\n  \"currentLedgerEntries\" : 100,\n  \"currentLedgerSize\" : 5400,\n  \"lastLedgerCreatedTimestamp\" : \"2021-05-11T16:06:59.058+08:00\",\n  \"waitingCursorsCount\" : 0,\n  \"pendingAddEntriesCount\" : 0,\n  \"lastConfirmedEntry\" : \"7:99\",\n  \"state\" : \"LedgerOpened\",\n  \"ledgers\" : [ {\n    \"ledgerId\" : 7,\n    \"entries\" : 0,\n    \"size\" : 0,\n    \"offloaded\" : false\n  } ],\n  \"cursors\" : { },\n  \"compactedLedger\" : {\n    \"ledgerId\" : -1,\n    \"entries\" : -1,\n    \"size\" : -1,\n    \"offloaded\" : false\n  }\n}\n\n",[4926,73906,73904],{"__ignoreMap":18},[1666,73908,73909],{},[324,73910,73911],{},"Check where the ledger with ledgerId 7 is saved.",[48,73913,72598],{},[8325,73915,73918],{"className":73916,"code":73917,"language":8330},[8328],"\nbin\u002Fbookkeeper shell ledgermetadata -ledgerid 7\n\n",[4926,73919,73917],{"__ignoreMap":18},[48,73921,72724],{},[48,73923,73924],{},"The ledger with ledgerId 7 is saved on bookie1 (3181).",[8325,73926,73929],{"className":73927,"code":73928,"language":8330},[8328],"\n...\n16:11:28.843 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - ledgerID: 7\n16:11:28.846 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - LedgerMetadata{formatVersion=3, ensembleSize=1, writeQuorumSize=1, ackQuorumSize=1, state=OPEN, digestType=CRC32C, password=base64:, ensembles={0=[192.168.0.105:3181]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar\u002Fmanaged-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC90b3BpYzM=, application=base64:cHVsc2Fy}}\n...\n\n",[4926,73930,73928],{"__ignoreMap":18},[8300,73932,2125],{"id":2122},[48,73934,73935],{},"This is the second blog in the series on configuring isolation in Apache Pulsar. Now you should now know how to:",[1666,73937,73938,73940,73942,73944],{},[324,73939,72534],{},[324,73941,72537],{},[324,73943,73072],{},[324,73945,73946],{},"Scale up and down nodes (brokers and bookies)",[48,73948,73949],{},"The next blog will discuss how to configure Pulsar isolation in a shared BookKeeper cluster. Coming soon!",[8300,73951,70497],{"id":70496},[48,73953,73954],{},"If you’re interested in Pulsar isolation policy, feel free to check the following resources out!",[321,73956,73957,73965],{},[324,73958,73959,73960],{},"For beginners: ",[55,73961,73964],{"href":73962,"rel":73963},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fadministration-isolation\u002F",[264],"Pulsar Isolation Policy - User Guide",[324,73966,73967,73968],{},"For advanced users: ",[55,73969,72511],{"href":64302},{"title":18,"searchDepth":19,"depth":19,"links":73971},[73972,73973,73974,73975,73979],{"id":72581,"depth":19,"text":72582},{"id":72629,"depth":19,"text":72630},{"id":72769,"depth":19,"text":72770},{"id":61064,"depth":19,"text":61065,"children":73976},[73977,73978],{"id":73191,"depth":279,"text":73192},{"id":73398,"depth":279,"text":73399},{"id":73487,"depth":19,"text":73488,"children":73980},[73981,73982],{"id":73491,"depth":279,"text":73492},{"id":73737,"depth":279,"text":73738},"2021-06-02","This blog is for Pulsar users of all levels. If you follow the instructions in this blog, you will successfully configure Pulsar isolation in separate Pulsar clusters.","\u002Fimgs\u002Fblogs\u002F63c7fd116c20795b5ffa7b02_63a410500a3572fece7d83a2_isolation-dummies-top.jpeg",{},"\u002Fblog\u002Fpulsar-isolation-part-ii-separate-pulsar-clusters","28 min read",{"title":58886,"description":73984},"blog\u002Fpulsar-isolation-part-ii-separate-pulsar-clusters",[27847,38442],"H9t6cUSSiw65gW79a1CLqtkXoSxE43pKAGkrpUV4z8s",{"id":73994,"title":73995,"authors":73996,"body":73998,"category":821,"createdAt":290,"date":74180,"description":74181,"extension":8,"featured":294,"image":74182,"isDraft":294,"link":290,"meta":74183,"navigation":7,"order":296,"path":74184,"readingTime":7986,"relatedResources":290,"seo":74185,"stem":74186,"tags":74187,"__hash__":74188},"blogs\u002Fblog\u002Fbuilding-connectors-on-pulsar-made-simple.md","Building Connectors On Pulsar Made Simple",[73997,61300],"Guangning E",{"type":15,"value":73999,"toc":74171},[74000,74004,74012,74037,74040,74046,74050,74055,74066,74069,74075,74079,74088,74109,74113,74116,74122,74126,74129,74133,74136,74159,74161,74168],[40,74001,74003],{"id":74002},"why-apache-pulsar-connectors","Why Apache Pulsar Connectors?",[48,74005,74006,74011],{},[55,74007,74010],{"href":74008,"rel":74009},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fio-overview\u002F",[264],"Pulsar connectors"," enable Pulsar to quickly and easily integrate with various external systems. In fact, according to the 2021 Pulsar User Survey Report (which will be published later this month), connectors are one of the most-used Pulsar features with 30% of Pulsar users using connectors.",[48,74013,74014,74015,74018,74019,1186,74022,1186,74027,1186,74031,74036],{},"To facilitate connector development and improve their ease of use, we launched ",[55,74016,38697],{"href":74017},"\u002Fen\u002Fblog\u002Ftech\u002F2020-05-26-intro-to-hub"," in 2020 to provide a single place to find, download, use, store, and share Pulsar related extensions, and offer a broad spectrum of Pulsar integrations. Since its launch last year, dozens of connectors have been created and added to the Hub. Some popular Pulsar plugins on StreamNative Hub include ",[55,74020,74021],{"href":71583},"AWS SQS connector",[55,74023,74026],{"href":74024,"rel":74025},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-aws-lambda",[264],"AWS Lambda connector",[55,74028,74030],{"href":74029},"\u002Fen\u002Fblog\u002Ftech\u002F2021-04-26-announcing-amqp10-connector-for-apache-pulsar","AMQP1_0 connector",[55,74032,74035],{"href":74033,"rel":74034},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-iotdb",[264],"IoTDB connector",", and more.",[48,74038,74039],{},"In this blog, we introduce recent updates that make developing and using Pulsar connectors even easier.",[48,74041,74042],{},[384,74043],{"alt":74044,"src":74045},"pulsar connectors illustration","\u002Fimgs\u002Fblogs\u002F63a39f93c62d6e51ec0e9010_connector-simple-1.png",[40,74047,74049],{"id":74048},"about-streamnative-hub","About StreamNative Hub",[48,74051,74052,74054],{},[55,74053,38697],{"href":74017}," is an app store for developing event streaming applications and provides dozens of plugins and integrations. Its key components include:",[321,74056,74057,74060,74063],{},[324,74058,74059],{},"Connectors: Allow you to move streaming data in and out of Pulsar, which simplifies integration for enterprises bringing Pulsar into their existing infrastructure. All Pulsar built-in connectors are shipped in the StreamNative Hub.",[324,74061,74062],{},"Offloader: Allow you to offload the majority of the data from BookKeeper to external remote storage, which provides a cheaper form of storage that readily scales with the volume of data.",[324,74064,74065],{},"Protocol handler: Allow you to support other messaging protocols natively and dynamically in Pulsar brokers on runtime, which streamlines operations with Pulsar’s enterprise-grade features without modifying code. Kafka, AMQP, and MQTT are supported.",[48,74067,74068],{},"As more and more members have contributed and used connectors, we’ve identified some opportunities to improve the Hub’s ease of use, read on to learn more.",[48,74070,74071],{},[384,74072],{"alt":74073,"src":74074},"streamNative Hub","\u002Fimgs\u002Fblogs\u002F63a39f93995a5d52a6e770fc_connector-simple-2.png",[40,74076,74078],{"id":74077},"new-pulsar-connector-development-guide","New Pulsar Connector Development Guide",[48,74080,74081,74082,74087],{},"To simplify the integration between Pulsar and external systems, we created a new development guide, ",[55,74083,74086],{"href":74084,"rel":74085},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-template\u002Fblob\u002Fmaster\u002FREADME.md",[264],"Pulsar Connector Development Guide",", that developers can reference to improve productivity and boost efficiency when developing a connector. This guide helps with the following:",[321,74089,74090,74093,74100,74103],{},[324,74091,74092],{},"Developing a New Connector",[324,74094,74095,74096,74099],{},"If you need to pipe data in or out of Pulsar and other systems that do not have a connector yet, you can read the ",[55,74097,74086],{"href":74084,"rel":74098},[264],". It contains step-by-step guidelines for how to develop and contribute a connector to StreamNative Hub, including detailed instructions and various templates for both code and documentation.",[324,74101,74102],{},"Promoting Awareness and Usage of an Existing Connector",[324,74104,74105,74106,190],{},"If you already developed a connector and want to make it available to the community, we recommend you host it in a public repository and show it on StreamNative Hub. You can host the connector repo at your desired location and then sync the documentation to StreamNative Hub using a simple script with just one line of code by following the instructions in the ",[55,74107,74086],{"href":74084,"rel":74108},[264],[40,74110,74112],{"id":74111},"future-streamnative-hub-upgrades","Future StreamNative Hub Upgrades",[48,74114,74115],{},"We are continuously looking for new ways to improve StreamNative Hub and we are working on additional upgrades, such as adding more comprehensive tests to improve the usability, reliability, and performance of connectors. You can also expect more connectors to be deployed and adopted on more cloud providers with GUI tools. Stay tuned!",[48,74117,74118],{},[384,74119],{"alt":74120,"src":74121},"teamwork illustration","\u002Fimgs\u002Fblogs\u002F63a39f93824b2b30db25b170_connector-simple-3.png",[40,74123,74125],{"id":74124},"contribute-your-connector","Contribute Your Connector",[48,74127,74128],{},"If you develop connectors, we encourage you to add your connector to StreamNative Hub! In StreamNative Hub, your connector will get exposure to the widest possible audience and enjoy faster innovation cycles of development. You will also be contributing to a robust Pulsar ecosystem.",[40,74130,74132],{"id":74131},"get-involved-in-the-pulsar-community","Get Involved in the Pulsar Community",[48,74134,74135],{},"In addition to adding a connector, there are more ways you can contribute, including:",[321,74137,74138,74141,74144,74147,74150,74153,74156],{},[324,74139,74140],{},"Improve documentation!",[324,74142,74143],{},"The documentation hosted at StreamNative Hub is open source. Feel free to submit or request changes (fix typos, add clarifications, and more).",[324,74145,74146],{},"Report bugs.",[324,74148,74149],{},"Review pull requests.",[324,74151,74152],{},"Provide feedback on proposed features, enhancements, or designs.",[324,74154,74155],{},"Suggest new features.",[324,74157,74158],{},"Answer questions in issues or channels.",[40,74160,25961],{"id":25960},[48,74162,74163,74164],{},"Start your journey with connectors now with the ",[55,74165,74167],{"href":74084,"rel":74166},[264],"Quick Start Guide!",[48,74169,74170],{},"Happy Connectoring!",{"title":18,"searchDepth":19,"depth":19,"links":74172},[74173,74174,74175,74176,74177,74178,74179],{"id":74002,"depth":19,"text":74003},{"id":74048,"depth":19,"text":74049},{"id":74077,"depth":19,"text":74078},{"id":74111,"depth":19,"text":74112},{"id":74124,"depth":19,"text":74125},{"id":74131,"depth":19,"text":74132},{"id":25960,"depth":19,"text":25961},"2021-06-01","New updates in StreamNative Hub make developing and using a Pulsar connector even easier! You can also expect more connectors to be deployed and adopted on more cloud providers.","\u002Fimgs\u002Fblogs\u002F63c7fd203c05b10690510753_63a39f9360e694137367c66c_connector-simple-top.jpeg",{},"\u002Fblog\u002Fbuilding-connectors-on-pulsar-made-simple",{"title":73995,"description":74181},"blog\u002Fbuilding-connectors-on-pulsar-made-simple",[302,28572],"e79e7NFZbKLGqLJLhbNQO04_TTrgpV2VXd3DKlIJaco",{"id":74190,"title":74191,"authors":74192,"body":74193,"category":821,"createdAt":290,"date":74766,"description":74767,"extension":8,"featured":294,"image":74768,"isDraft":294,"link":290,"meta":74769,"navigation":7,"order":296,"path":74770,"readingTime":42793,"relatedResources":290,"seo":74771,"stem":74772,"tags":74773,"__hash__":74774},"blogs\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-2.md","What’s New in Apache Pulsar 2.7.2",[48575,61300],{"type":15,"value":74194,"toc":74753},[74195,74198,74200,74208,74215,74219,74222,74224,74311,74313,74340,74342,74358,74360,74370,74374,74387,74389,74392,74395,74471,74474,74538,74541,74613,74617,74644,74646,74709,74711,74738,74740],[48,74196,74197],{},"We are excited to see the Apache Pulsar community has successfully released the 2.7.2 version! More than 38 contributors provided improvements and bug fixes that contributed to 85 commits.",[48,74199,61308],{},[321,74201,74202,74205],{},[324,74203,74204],{},"Consumers are no longer blocked after receiving multiple retry messages in Docker.",[324,74206,74207],{},"Consumers can consume messages published in the topic stats when using the Key_Shared subscription type.",[48,74209,74210,74211,190],{},"This blog walks through the most noteworthy changes grouped by the key functionality. For the complete list, including all enhancements and bug fixes, check out the ",[55,74212,74214],{"href":59847,"rel":74213},[264],"Pulsar 2.7.2 Release Notes",[40,74216,74218],{"id":74217},"notable-bug-fix-and-enhancement","Notable bug fix and enhancement",[48,74220,74221],{},"Pulsar 2.7.2 has included the following changes for broker, bookie, proxy, Pulsar admin, Pulsar SQL, and clients.",[32,74223,61065],{"id":61064},[321,74225,74226,74234,74237,74240,74243,74246,74252,74255,74263,74266,74274,74277,74280,74288,74291,74294,74297,74300,74308],{},[324,74227,74228,74229],{},"Fix NPEs and thread safety issues in PersistentReplicator. ",[55,74230,74233],{"href":74231,"rel":74232},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9763",[264],"PR-9763",[324,74235,74236],{},"Previously, in a non-persistent topic with a key-shared subscription, messages were marked as published in the topic stats, but consumers did not consume them. This caused NullPointerExceptions (NPEs).",[324,74238,74239],{},"Make cursor field volatile since the field is updated asynchronously in another thread.",[324,74241,74242],{},"Remove the unnecessary synchronization on the openCursorAsync method since it is not needed.",[324,74244,74245],{},"Add null checks before accessing the cursor field since statistics might be updated before the cursor is available.",[324,74247,74248,74249],{},"Fix the issue of a message not dispatched for the Key_Shared subscription type in a non-persistent topic. ",[55,74250,69032],{"href":69030,"rel":74251},[264],[324,74253,74254],{},"Previously, In a non-persistent topic with a key-shared subscription, messages were marked as published in the topic stats, but consumers did not consume them. This PR fixes this issue.",[324,74256,74257,74258],{},"Fix the issue of a consumer being blocked after receiving retry messages. ",[55,74259,74262],{"href":74260,"rel":74261},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10078",[264],"PR-10078",[324,74264,74265],{},"Previously, in the Docker environment, if a consumer enabled the retry feature and set the retry topic in DeadLetterPolicy, the consumer was blocked after receiving multiple retry messages because the hasMessageAvailable check was set to false. This PR fixes this issue.",[324,74267,74268,74269],{},"Fix the issue of schema not added when subscribing to an empty topic without schema. ",[55,74270,74273],{"href":74271,"rel":74272},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9853",[264],"PR-9853",[324,74275,74276],{},"Previously, when a consumer with a schema subscribed to an empty topic without schema, the previous check used isActive, which only checked whether the topic could be deleted. However, it should check if there was any connected producer or consumer of this topic. For the previous implementation, even if a topic had no active producers or consumers, the topic's subscription list was not empty and isActive returned true. Then the consumer's schema was not attached to the topic and it threw an IncompatibleSchemaException.",[324,74278,74279],{},"This PR changes to check if the topic has active producers or consumers instead of checking whether it can be deleted.",[324,74281,74282,74283],{},"Fix the issue of schema type check when using the ALWAYS_COMPATIBLE strategy. ",[55,74284,74287],{"href":74285,"rel":74286},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10367",[264],"PR-10367",[324,74289,74290],{},"This PR provides the following enhancements when using the ALWAYS_COMPATIBLE strategy for schema type check:",[324,74292,74293],{},"For non-transitive strategy, it checks only schema type for the last schema.",[324,74295,74296],{},"For transitive strategy, it checks all schema types.",[324,74298,74299],{},"For getting schema by schema data, it considers different schema types.",[324,74301,74302,74303],{},"Fix the issue of CPU 100% usage when deleting namespace. ",[55,74304,74307],{"href":74305,"rel":74306},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10337",[264],"PR-10337",[324,74309,74310],{},"Previously, When deleting a namespace, the namespace Policies setting was marked as deleted, triggering the topic's onPoliciesUpdate and a read of the data of ZooKeeper’s Policies node as checkReplicationAndRetryOnFailure. Because the namespace was deleted, the ZooKeeper node no longer existed and the failure to read data triggered infinite retries. This PR fixes this issue by adding a method to check for non-deleted policies.",[32,74312,73488],{"id":73487},[321,74314,74315,74323,74326,74334,74337],{},[324,74316,74317,74318],{},"Fallback to PULSAR_GC if BOOKIE_GC is not defined. ",[55,74319,74322],{"href":74320,"rel":74321},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9621",[264],"PR-9621",[324,74324,74325],{},"This PR changes fallback from PULSAR_MEM to PULSAR_GC if BOOKIE_GC is not defined.",[324,74327,74328,74329],{},"Fallback to PULSAR_EXTRA_OPTS if BOOKIE_EXTRA_OPTS is not defined. ",[55,74330,74333],{"href":74331,"rel":74332},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10397",[264],"PR-10397",[324,74335,74336],{},"This PR defines that -Dio.netty.* does not pass the system properties if PULSAR_EXTRA_OPTS or BOOKIE_EXTRA_OPTS is set. This change ensures consistency with PULSAR_EXTRA_OPTS behavior and prevents duplicate properties.",[324,74338,74339],{},"This PR also adds -Dio.netty.leakDetectionLevel=disabled (unless BOOKIE_EXTRA_OPTS is set) since PULSAR_EXTRA_OPTS does not include that setting by default.",[32,74341,68241],{"id":68240},[321,74343,74344,74352,74355],{},[324,74345,74346,74347],{},"Fix authorization error while using proxy and Prefix subscription authentication mode. ",[55,74348,74351],{"href":74349,"rel":74350},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10226",[264],"PR-10226",[324,74353,74354],{},"Previously, when using Pulsar proxy and Prefix subscription authentication mode, org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider#canConsumeAsync threw an exception, which caused the consumer error.",[324,74356,74357],{},"This PR updates the org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider#allowTopicOperationAsync logic, checks isSuperUser first, and then returns isAuthorizedFuture.",[32,74359,69159],{"id":38169},[321,74361,74362],{},[324,74363,74364,74365],{},"Add get version command for Pulsar REST API, pulsar-admin, and pulsar-client. ",[55,74366,74369],{"href":74367,"rel":74368},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9975",[264],"PR-9975",[32,74371,74373],{"id":74372},"pulsar-sql","Pulsar SQL",[321,74375,74376,74384],{},[324,74377,74378,74379],{},"Fix the issue of BKNoSuchLedgerExistsException. ",[55,74380,74383],{"href":74381,"rel":74382},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9910",[264],"PR-9910",[324,74385,74386],{},"Previously, when using Pulsar SQL to query messages, BKNoSuchLedgerExistsException was thrown if the ZooKeeper ledger root directory was changed. This PR fixes this issue.",[32,74388,60409],{"id":68276},[48,74390,74391],{},"Pulsar 2.7.2 includes the following changes for Java, Python, C++, and WebSocket clients.",[3933,74393,11285],{"id":74394},"java",[321,74396,74397,74405,74408,74416,74419,74427,74430,74433,74441,74444,74452,74460,74463],{},[324,74398,74399,74400],{},"Fix the issue that ClientConfigurationData's objects are not equal. ",[55,74401,74404],{"href":74402,"rel":74403},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10091",[264],"PR-10091",[324,74406,74407],{},"This PR fixes this issue and reuses AuthenticationDisabled.INSTANCE as default instead of creating a new one.",[324,74409,74410,74411],{},"Fix the issue of AutoConsumeSchema KeyValue encoding. ",[55,74412,74415],{"href":74413,"rel":74414},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10089",[264],"PR-10089",[324,74417,74418],{},"This PR keeps the KeyValueEncodingType when auto-consuming a KeyValue schema.",[324,74420,74421,74422],{},"Fix the error of OutOfMemoryError while using KeyValue\u003CGenericRecord, GenericRecord>. ",[55,74423,74426],{"href":74424,"rel":74425},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9981",[264],"PR-9981",[324,74428,74429],{},"Previously, a topic with schema KeyValue\u003CGenericRecord, GenericRecord> could not be consumed due to a problem inHttpLookupService. The HttpLookupService downloaded the schema in JSON format but the KeyValue schema was expected to be encoded in binary form.",[324,74431,74432],{},"This PR uses the existing utility functions to convert the JSON representation of the KeyValue schema to the desired format.",[324,74434,74435,74436],{},"Fix the concurrency issue in the client's producer epoch handling. ",[55,74437,74440],{"href":74438,"rel":74439},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10436",[264],"PR-10436",[324,74442,74443],{},"This PR uses a volatile field for epoch and AtomicLongFieldUpdater for incrementing the value.",[324,74445,74446,74447],{},"Handle NPE while receiving ack for a closed producer. ",[55,74448,74451],{"href":74449,"rel":74450},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8979",[264],"PR-8979",[324,74453,74454,74455],{},"Fix the issue of batch size not set when deserializing from a byte array. ",[55,74456,74459],{"href":74457,"rel":74458},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9855",[264],"PR-9855",[324,74461,74462],{},"Previously, batch index message acknowledgment was added to the seek method to support more precise seek using ACK sets. However, when the seek was performed by a message that was serialized and deserialized, the batchSize was set to zero, which led to a discrepancy between messageId forms and seek results. This PR fixes this issue.",[324,74464,74465,74466],{},"Fix the issue of a single-topic consumer being unable to close. ",[55,74467,74470],{"href":74468,"rel":74469},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9849",[264],"PR-9849",[3933,74472,11288],{"id":74473},"python",[321,74475,74476,74484,74487,74490,74493,74496,74499,74507,74510,74527,74535],{},[324,74477,74478,74479],{},"Support setting the default value when using Python Avro Schema. ",[55,74480,74483],{"href":74481,"rel":74482},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10265",[264],"PR-10265",[324,74485,74486],{},"Previously, the default value for the Python Avro schema could not be set, causing the Python schema to not be updated.",[324,74488,74489],{},"This PR fixes this issue and adds the following changes:",[324,74491,74492],{},"Add the required field to control the type of schema that can set null.",[324,74494,74495],{},"Add the required_default field to control the schema whether it has a default attribute or not.",[324,74497,74498],{},"Add the default field to control the default value of the schema.",[324,74500,74501,74502],{},"Fix the issue of nested Map or Array in schema does not work. ",[55,74503,74506],{"href":74504,"rel":74505},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9548",[264],"PR-9548",[324,74508,74509],{},"Previously, the Python client did not handle nested Map or Array well, and the generated schema string was invalid. When the Map\u002FArray's schema() method set the values field of the schema string, it ignored the Record type but not Map and Array.",[324,74511,74512,74513],{},"This PR fixes the issue and adds 4 tests for Map",[74514,74515,74516,74517],"map",{},", Map",[74518,74519,74520,74521],"array",{},", Array",[74518,74522,74523,74524],{},", and Array",[74514,74525,74526],{}," to cover all nested cases that involve Map or Array.",[324,74528,74529,74530],{},"Add TLS SNI support for Python and C++ clients. ",[55,74531,74534],{"href":74532,"rel":74533},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8957",[264],"PR-8957",[324,74536,74537],{},"This PR adds TLS SNI support for CPP and Python clients, so you can connect to brokers through the proxy.",[3933,74539,43705],{"id":74540},"c",[321,74542,74543,74551,74554,74562,74565,74568,74571,74574,74582,74585,74588,74596,74599,74607,74610],{},[324,74544,74545,74546],{},"Fix the issue that the C++ client cannot be built on Windows. ",[55,74547,74550],{"href":74548,"rel":74549},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10363",[264],"PR-10363",[324,74552,74553],{},"This PR puts PULSAR_PUBLIC before the variable type and keeps the LIB_NAME as the shared library's name (for example, removing the dll suffix).",[324,74555,74556,74557],{},"Fix the issue of the paused zero queue consumer pre-fetches messages. ",[55,74558,74561],{"href":74559,"rel":74560},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10036",[264],"PR-10036",[324,74563,74564],{},"Previously, zero queue consumers (the consumer's receiver queue size is 0) pre-fetched messages after pauseMessageListener was called. This was because ConsumerImpl::increaseAvailablePermits did not check the boolean variable messageListenerRunning_, which became false after pauseMessageListener was called. Therefore, after the zero queue consumer was paused, it still sent the FLOW command to pre-fetch a message to its internal unbounded queue incomingMessages_.",[324,74566,74567],{},"This PR fixes this issue and make the following changes:",[324,74569,74570],{},"Add the check for messageListenerRunning_ in increaseAvailablePermits method and make the implementation consistent with Java client's ConsumerImpl#increaseAvailablePermits. Change the type of availablePermits_ to std::atomic_int.",[324,74572,74573],{},"Add the increaseAvailablePermits invocation in resumeMessageListener to send FLOW command after consumer resumes since pauseMessageListener does not prefetch messages anymore.",[324,74575,74576,74577],{},"Fix the issue of segmentation fault when getting a topic name from the received message ID. ",[55,74578,74581],{"href":74579,"rel":74580},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10006",[264],"PR-10006",[324,74583,74584],{},"Previously, the C++ client supported getting a topic name from both the received message and its message ID. However, for a consumer that subscribed to a non-partitioned topic, getting a topic name from the received message ID caused a segmentation fault.",[324,74586,74587],{},"This PR uses setTopicName for every single message when a consumer receives a batch and adds related tests for all types of consumers (including ConsumerImpl, MultiTopicsConsumerImpl, and PartitionedConsumerImpl).",[324,74589,74590,74591],{},"Fix the issue of the SinglePartitionMessageRouter always picking the same partition. ",[55,74592,74595],{"href":74593,"rel":74594},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9702",[264],"PR-9702",[324,74597,74598],{},"Previously, the SinglePartitionMessageRouter was supposed to pick a random partition for a given producer and stick with that. The problem was that the C rand() call always used the seed 0 and that ended up having multiple processes to always deterministically pick the same partition. This PR fixes this issue.",[324,74600,74601,74602],{},"Reduce log level for an ack-grouping tracker. ",[55,74603,74606],{"href":74604,"rel":74605},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10094",[264],"PR-10094",[324,74608,74609],{},"Previously, the warning log occurred when the ACK grouping tracker tried to send ACKs while the connection was closed.",[324,74611,74612],{},"This PR changes the log level to debug when the connection is not ready for AckGroupingTrackerEnabled::flush.",[3933,74614,74616],{"id":74615},"websocket","WebSocket",[321,74618,74619,74627,74630,74638,74641],{},[324,74620,74621,74622],{},"Optimize URL token param value. ",[55,74623,74626],{"href":74624,"rel":74625},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10187",[264],"PR-10187",[324,74628,74629],{},"This PR removes the Bearer prefix requirement for the token param value of the WebSocket URL.",[324,74631,74632,74633],{},"Make the browser client support the token authentication. ",[55,74634,74637],{"href":74635,"rel":74636},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9886",[264],"PR-9886",[324,74639,74640],{},"Previously, the WebSocket client used the HTTP request header to transport the authentication params, but the browser JavaScript WebSocket client could not add new headers.",[324,74642,74643],{},"This PR uses the query param token to transport the authentication token for the browser JavaScript WebSocket client.",[32,74645,69250],{"id":69249},[321,74647,74648,74656,74659,74662,74670,74678,74681,74689,74692,74695,74703,74706],{},[324,74649,74650,74651],{},"Allow customizable function logging. ",[55,74652,74655],{"href":74653,"rel":74654},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10389",[264],"PR-10389",[324,74657,74658],{},"Previously, the function log configuration was in the jar package and could not be dynamically customized.",[324,74660,74661],{},"This PR changes the function log configuration file to the configuration directory, which can be customized.",[324,74663,74664,74665],{},"Pass through record properties from Pulsar sources. ",[55,74666,74669],{"href":74667,"rel":74668},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9943",[264],"PR-9943",[324,74671,74672,74673],{},"Fix the issue of the time unit in Pulsar Go functions. ",[55,74674,74677],{"href":74675,"rel":74676},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10160",[264],"PR-10160",[324,74679,74680],{},"This PR changes the time unit of avg process latency from ns to ms.",[324,74682,74683,74684],{},"Fix the issue that the Kinesis sink did not try to resend messages. ",[55,74685,74688],{"href":74686,"rel":74687},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10420",[264],"PR-10420",[324,74690,74691],{},"Previously, when the Kinesis sink connector failed to send a message, it did not retry. In this case, if retainOrdering was enabled, it would lead to subsequent messages not being sent.",[324,74693,74694],{},"This PR adds retry logic for the Kinesis sink connector. A message is retried to send if it fails to send.",[324,74696,74697,74698],{},"Fix the issue of null error messages in the onFailure exception in the Kinesis sink. ",[55,74699,74702],{"href":74700,"rel":74701},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F10416",[264],"PR-10416",[324,74704,74705],{},"Previously, if the Kinesis producer failed to send a message, the error message in the onFailure exception was null.",[324,74707,74708],{},"This PR extracts the UserRecordFailedException to show the real error messages.",[32,74710,36160],{"id":31572},[321,74712,74713,74721,74724,74732,74735],{},[324,74714,74715,74716],{},"Prevent class loader leak and restore offloader directory override. ",[55,74717,74720],{"href":74718,"rel":74719},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9878",[264],"PR-9878",[324,74722,74723],{},"Previously, there was a class loader leak. This PR updates the PulsarService and the PulsarConnectorCache classes to use a map from directory strings to offloaders.",[324,74725,74726,74727],{},"Add logs for cleanup of offloaded data operation. ",[55,74728,74731],{"href":74729,"rel":74730},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F9852",[264],"PR-9852",[324,74733,74734],{},"Previously, the cleanup offloaded data operation lacked logs making it hard for users to analyze the reason for the tiered storage data loss.",[324,74736,74737],{},"This PR adds some logs for the cleanup of offloaded data operation.",[40,74739,39647],{"id":39646},[48,74741,57767,74742,74746,74747,74750,74751,69324],{},[55,74743,74745],{"href":53730,"rel":74744},[264],"download Pulsar"," directly or you can spin up a Pulsar cluster on StreamNative Cloud with a free 30-day trial of ",[55,74748,3550],{"href":61568,"rel":74749},[264]," in which Pulsar 2.7.2 changes are shipped! Moreover, we offer technical consulting and expert training to help get your organization started. As always, we are highly responsive to your feedback. Feel free to ",[55,74752,24379],{"href":57778},{"title":18,"searchDepth":19,"depth":19,"links":74754},[74755,74765],{"id":74217,"depth":19,"text":74218,"children":74756},[74757,74758,74759,74760,74761,74762,74763,74764],{"id":61064,"depth":279,"text":61065},{"id":73487,"depth":279,"text":73488},{"id":68240,"depth":279,"text":68241},{"id":38169,"depth":279,"text":69159},{"id":74372,"depth":279,"text":74373},{"id":68276,"depth":279,"text":60409},{"id":69249,"depth":279,"text":69250},{"id":31572,"depth":279,"text":36160},{"id":39646,"depth":19,"text":39647},"2021-05-24"," Apache Pulsar community has successfully released the 2.7.2 version! More than 38 contributors provided improvements and bug fixes that contributed to 85 commits. Let's walk through the most noteworthy changes!","\u002Fimgs\u002Fblogs\u002F63c7fd30c446581dad15c90d_63a39eff3df09a45f8433f5b_272-top.jpeg",{},"\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-2",{"title":74191,"description":74767},"blog\u002Fwhats-new-in-apache-pulsar-2-7-2",[302,821],"4OfMUd6Cbb33x41bM712_l_ctQ4Wzewzv0Ur9hJP6zk",{"id":74776,"title":51857,"authors":74777,"body":74778,"category":821,"createdAt":290,"date":75254,"description":75255,"extension":8,"featured":294,"image":75256,"isDraft":294,"link":290,"meta":75257,"navigation":7,"order":296,"path":75258,"readingTime":5505,"relatedResources":290,"seo":75259,"stem":75260,"tags":75261,"__hash__":75262},"blogs\u002Fblog\u002Ffunction-mesh-simplify-complex-streaming-jobs-in-cloud.md",[810,6500],{"type":15,"value":74779,"toc":75230},[74780,74793,74797,74800,74803,74816,74819,74825,74829,74832,74838,74844,74847,74861,74864,74867,74871,74874,74878,74881,74883,74886,74888,74891,74899,74905,74909,74912,74916,74919,74922,74925,74931,74935,74938,74941,74947,74951,74959,74962,74968,74971,74974,74978,74981,74989,74992,74998,75002,75008,75015,75018,75021,75027,75030,75036,75040,75043,75046,75052,75055,75059,75068,75074,75082,75086,75094,75098,75101,75127,75131,75134,75162,75166,75207,75209,75228],[48,74781,74782,74783,4003,74787,74792],{},"Today, we are excited to introduce Function Mesh, a serverless framework purpose-built for event streaming applications. It brings powerful event-streaming capabilities to your applications by orchestrating multiple ",[55,74784,15627],{"href":74785,"rel":74786},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Ffunctions-overview\u002F",[264],[55,74788,74791],{"href":74789,"rel":74790},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fio-overview\u002F",[264],"Pulsar IO connector"," for complex event streaming jobs on Kubernetes.",[40,74794,74796],{"id":74795},"what-is-function-mesh","What is Function Mesh",[48,74798,74799],{},"Function Mesh is a Kubernetes operator that enables users to run Pulsar Functions and connectors natively on Kubernetes, unlocking the full power of Kubernetes’ application deployment, scaling, and management. For example, Function Mesh relies on Kubernetes’ scheduling functionality, which ensures that functions are resilient to failures and can be scheduled properly at any time.",[48,74801,74802],{},"Function Mesh is also a serverless framework to orchestrate multiple Pulsar Functions and I\u002FO connectors for complex streaming jobs in a simple way. If you’re seeking cloud-native serverless streaming solutions, Function Mesh is an ideal tool for you. The key benefits of Function Mesh include:",[321,74804,74805,74808,74810,74813],{},[324,74806,74807],{},"Eases the management of Pulsar Functions and connectors when you run multiple instances of Functions and connectors together.",[324,74809,70721],{},[324,74811,74812],{},"Makes Pulsar Functions and connectors run natively in the cloud environment, which leads to greater possibilities when more resources become available in the cloud.",[324,74814,74815],{},"Enables Pulsar Functions to work with different messaging systems and to integrate with existing tools in the cloud environment (Function Mesh runs Pulsar Functions and connectors independently from Pulsar).",[48,74817,74818],{},"Function Mesh is well-suited for common, lightweight streaming use cases, such as ETL jobs, and is not intended to be used as a full-power streaming engine.",[48,74820,74821],{},[384,74822],{"alt":74823,"src":74824},"illustration of  full-power streaming engine","\u002Fimgs\u002Fblogs\u002F63a39dabad154c9b7f920827_function-mesh.png",[40,74826,74828],{"id":74827},"why-function-mesh","Why Function Mesh",[48,74830,74831],{},"Pulsar introduces Pulsar Functions and Pulsar I\u002FO since its 2.0 release.",[48,74833,74834,74837],{},[55,74835,15627],{"href":74785,"rel":74836},[264]," is a turnkey serverless event streaming framework built natively for Apache Pulsar. Pulsar Functions enables users to create event processing logic on a per message basis and bring simplicity and serverless concepts to event streaming, thus eliminating the need to deploy a separate system. Popular use cases of Pulsar Functions include ETL jobs, real-time aggregation, microservices, reactive services, event routing, and more.",[48,74839,74840,74843],{},[55,74841,74791],{"href":74789,"rel":74842},[264]," is a framework that allows you to ingress or egress data from and to Pulsar using the existing Pulsar Functions framework. Pulsar IO consist of source and sink connectors. A source is an event processor that ingests data from an external system into Pulsar, and a sink is an event processor that egresses data from Pulsar to an external system.",[48,74845,74846],{},"Both Pulsar Functions and Pulsar I\u002FO have made building event streaming applications become simpler. Pulsar Functions supports running functions and connectors on Kubernetes. However the existing implementation has a few drawbacks:",[1666,74848,74849,74852,74855,74858],{},[324,74850,74851],{},"The function metadata is stored in Pulsar and the function running state is managed by Kubernetes. This results in inconsistency between metadata and running state, which makes the management become complicated and problematic. For example, the StatefulSet running Pulsar Functions can be deleted from Kubernetes while Pulsar isn’t aware of it.",[324,74853,74854],{},"The existing implementation uses Pulsar topics for storing function metadata. It can cause broker crash loops if the function metadata topics are temperaily not available.",[324,74856,74857],{},"Functions are tied to a specific Pulsar cluster, making it difficult to use functions across multiple Pulsar clusters.",[324,74859,74860],{},"The existing implementation makes it hard for users deploying Pulsar Functions on Kubernetes to implement certain features, such as auto-scaling.",[48,74862,74863],{},"Additionally, with the increased adoption of Pulsar Functions and Pulsar I\u002FO connectors for building serverless event streaming applications, people are looking for orchestrating multiple functions into a single streaming job to achieve complex event streaming capabilities. Without Function Mesh, there is a lot of manual work to organize and manage multiple functions to process events.",[48,74865,74866],{},"To solve the pain points and make Pulsar Functions Kubernetes-native, we developed Function Mesh -- a serverless framework purpose-built for running Pulsar Functions and connectors natively on Kubernetes, and for simplifying building complex event streaming jobs.",[40,74868,74870],{"id":74869},"core-concepts","Core Concepts",[48,74872,74873],{},"Function Mesh enables you to build event streaming applications leveraging your familiarity with Apache Pulsar and modern stream processing technologies. Three concepts are foundational to build an event streaming applications: streams, functions, and connectors.",[32,74875,74877],{"id":74876},"stream","Stream",[48,74879,74880],{},"A stream is a partitioned, immutable, append-only sequence for events that represents a series of historical facts. For example, the events of a stream could model a sequence of financial transactions, like “Jack sent $100 to Alice”, followed by “Alice sent $50 to Bob”. A stream is used for connecting functions and connectors. The streams in Function Mesh are implemented by using topics in Apache Pulsar.",[32,74882,61160],{"id":61159},[48,74884,74885],{},"A function is a lightweight event processor that consumes messages from one or more input streams, applies a user-supplied processing logic to one or multiple messages, and produces the results of the processing logic to another stream. The functions in Function Mesh are implemented based on Pulsar Functions.",[32,74887,60903],{"id":5023},[48,74889,74890],{},"A connector is an event processor that ingresses or egresses events from and to streams. There are two types of connectors in Functions Mesh:",[321,74892,74893,74896],{},[324,74894,74895],{},"Source Connector (aka Source): an event processor that ingests events from an external data system into a stream.",[324,74897,74898],{},"Sink Connector (aka Sink): an event processor that egresses events from streams to an external data system.",[48,74900,74901,74902,190],{},"The connectors in Function Mesh are implemented based on Pulsar IO connectors. The available Pulsar IO connectors can be found at ",[55,74903,38697],{"href":35258,"rel":74904},[264],[32,74906,74908],{"id":74907},"functionmesh","FunctionMesh",[48,74910,74911],{},"A FunctionMesh (aka Mesh) is a collection (can be either a Directed Acyclic Graph (DAG) or a cyclic graph) of functions and connectors connected by streams that are orchestrated together for achieving powerful stream processing logics. All the functions and connectors in a Mesh share the same lifecycle. They are started when a Mesh is created and terminated when the mesh is destroyed. All the functions and connectors are long running processes. They are auto-scaled based on the workload by Function Mesh.",[40,74913,74915],{"id":74914},"how-function-mesh-works","How Function Mesh works",[48,74917,74918],{},"Function Mesh APIs build on existing Kubernetes APIs, so that Function Mesh resources are compatible with other Kubernetes-native resources, and can be managed by cluster administrators using existing Kubernetes tools. The foundational concepts are delivered as Kubernetes Custom Resource Definitions (CRDs), which can be configured by a cluster administrator for developing event streaming applications.",[48,74920,74921],{},"Instead of using the Pulsar admin CLI tool to send function admin requests to Pulsar clusters, you now can use kubectl to submit a Function Mesh CRD manifest directly to Kubernetes clusters. The Function Mesh controller watches the CRD and creates Kubernetes resources to run the defined Function\u002FSource\u002FSink, or Mesh. The benefit of this approach is both the function metadata and function running state are directly stored and managed by Kubernetes to avoid the inconsistency problem that was seen in Pulsar’s existing approach.",[48,74923,74924],{},"The following diagram illustrates a typical user flow of Function Mesh.",[48,74926,74927],{},[384,74928],{"alt":74929,"src":74930},"illustration of flow of Function Mesh","\u002Fimgs\u002Fblogs\u002F63a39dab33b90c5280ce7296_function-mesh-kubectl-workflow.png",[32,74932,74934],{"id":74933},"function-mesh-internals","Function Mesh Internals",[48,74936,74937],{},"Function Mesh mainly consists of two components. One is a Kubernetes operator that watches Function Mesh CRDs and creates Kubernetes resources (i.e. StatefulSet) to run functions, connectors, and meshes on Kubernetes; while the other one is a Function Runner that invokes the functions and connectors logic when receiving events from input streams and produces the results to output streams. The Runner is currently implemented using Pulsar Functions runner.",[48,74939,74940],{},"The below diagram illustrates the overall architecture of Function Mesh. When a user creates a Function Mesh CRD, the controller receives the submitted CRD from Kubernetes API server. The controller processes the CRD and generates the corresponding Kubernetes resources. For example, when the controller processes the Function CRD, it creates a StatefulSet to run the function. Each pod of this function StatefulSet launches a Runner to invoke the function logic.",[48,74942,74943],{},[384,74944],{"alt":74945,"src":74946},"illustration of Function Mesh Internals","\u002Fimgs\u002Fblogs\u002F63a39dabb7edfc693c4420f1_function-mesh-internals.png",[40,74948,74950],{"id":74949},"how-to-use-function-mesh","How to use Function Mesh",[48,74952,74953,74954,190],{},"To use Function Mesh, you need to install Function Mesh operator and CRD into the Kubernetes cluster first. For more details about installation, refer to ",[55,74955,74958],{"href":74956,"rel":74957},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Finstall-function-mesh\u002F",[264],"installation guide",[48,74960,74961],{},"After installing the Function Mesh operator and deploying a Pulsar cluster, you need to package your functions\u002Fconnectors, define CRDs for functions, connectors and Function Mesh, and then submit the CRDs to the Kubernetes cluster with the following command.",[8325,74963,74966],{"className":74964,"code":74965,"language":8330},[8328],"\n$ kubectl apply -f \u002Fpath\u002Fto\u002Fcustom-crd.yaml \n\n",[4926,74967,74965],{"__ignoreMap":18},[48,74969,74970],{},"Once your Kubernetes cluster receives the CRD, the Function Mesh operator will schedule individual parts and run the functions as a stateful set with other necessary resource objects.",[48,74972,74973],{},"Below we illustrate how to run Functions, Connectors and Meshes respectively with some examples.",[32,74975,74977],{"id":74976},"how-to-run-functions-using-function-mesh","How to run functions using Function Mesh",[48,74979,74980],{},"Function Mesh does not change how you develop Pulsar Functions to run in the cloud. The submission process just switches from a pulsar-admin client tool to a yaml file. Behind the scenes, we developed the CRD resources for Pulsar Function and the controller to handle it properly.",[48,74982,74983,74984,190],{},"After developing and testing your function, you need to package it and then submit it to a Pulsar cluster or build it as a Docker image and upload it to the image registry. For details, refer to ",[55,74985,74988],{"href":74986,"rel":74987},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Ffunctions\u002Frun-function\u002Frun-java-function",[264],"run Pulsar Functions using Function Mesh",[48,74990,74991],{},"This following example for Function CRD launches an ExclamationFunction inside Kubernetes and enables auto-scaling, and it uses a Java runtime to talk to the Pulsar messaging system.",[8325,74993,74996],{"className":74994,"code":74995,"language":8330},[8328],"\napiVersion: compute.functionmesh.io\u002Fv1alpha1\nkind: Function\nmetadata:\n  name: function-sample\n  namespace: default\nspec:\n  className: org.apache.pulsar.functions.api.examples.ExclamationFunction\n  replicas: 1\n  maxReplicas: 5\n  image: streamnative\u002Ffunction-mesh-example:latest\n  logTopic: persistent:\u002F\u002Fpublic\u002Fdefault\u002Flogging-function-logs\n  input:\n    topics:\n    - persistent:\u002F\u002Fpublic\u002Fdefault\u002Fsource-topic\n    typeClassName: java.lang.String\n  output:\n    topic: persistent:\u002F\u002Fpublic\u002Fdefault\u002Fsink-topic\n    typeClassName: java.lang.String\n  resources:\n    requests:\n      cpu: \"0.1\"\n      memory: 1G\n    limits:\n      cpu: \"0.2\"\n      memory: 1.1G\n  pulsar:\n    pulsarConfig: \"test-pulsar\"\n  java:\n    jar:  \"\u002Fpulsar\u002Fexamples\u002Fapi-examples.jar\"\n\n",[4926,74997,74995],{"__ignoreMap":18},[32,74999,75001],{"id":75000},"how-to-run-connectors-using-function-mesh","How to run connectors using Function Mesh",[48,75003,75004,75005,190],{},"Source and sink are specialized functions. If you use Pulsar built-in or StreamNative-managed connectors, you can create them by specifying the Docker image in the source or sink CRDs. These Docker images are public at the Docker Hub, with the image name in a format of streamnative\u002Fpulsar-io-CONNECTOR-NAME:TAG, such as streamnative\u002Fpulsar-io-hbase:2.7.1. You can check all supported connectors in the ",[55,75006,38697],{"href":35258,"rel":75007},[264],[48,75009,75010,75011,190],{},"If you use self-built connectors, you can package them to an external package or to a docker image, upload the package and then submit the connectors through CDRs. For details, refer to ",[55,75012,75014],{"href":45126,"rel":75013},[264],"run Pulsar connectors using Function Mesh",[48,75016,75017],{},"In the following CRD YAML files for source and sink, the connectors receive the input from DebeziumMongoDB and send the output to ElasticSearch.",[48,75019,75020],{},"Define the CRD yaml file for source:",[8325,75022,75025],{"className":75023,"code":75024,"language":8330},[8328],"\napiVersion: compute.functionmesh.io\u002Fv1alpha1\nkind: Source\nmetadata:\n  name: source-sample\nspec:\n  image: streamnative\u002Fpulsar-io-debezium-mongodb:2.7.1\n  className: org.apache.pulsar.io.debezium.mongodb.DebeziumMongoDbSource\n  replicas: 1\n  output:\n    topic: persistent:\u002F\u002Fpublic\u002Fdefault\u002Fdestination\n    typeClassName: org.apache.pulsar.common.schema.KeyValue\n  sourceConfig:\n    mongodb.hosts: rs0\u002Fmongo-dbz-0.mongo.default.svc.cluster.local:27017,rs0\u002Fmongo-dbz-1.mongo.default.svc.cluster.local:27017,rs0\u002Fmongo-dbz-2.mongo.default.svc.cluster.local:27017\n    mongodb.name: dbserver1\n    mongodb.user: debezium\n    mongodb.password: dbz\n    mongodb.task.id: \"1\"\n    database.whitelist: inventory\n    pulsar.service.url: pulsar:\u002F\u002Ftest-pulsar-broker.default.svc.cluster.local:6650\n  pulsar:\n    pulsarConfig: \"test-source\"\n  java:\n    jar: connectors\u002Fpulsar-io-debezium-mongodb-2.7.1.nar\n    jarLocation: \"\" # use pulsar provided connectors\n\n",[4926,75026,75024],{"__ignoreMap":18},[48,75028,75029],{},"Define the CRD yaml file for sink:",[8325,75031,75034],{"className":75032,"code":75033,"language":8330},[8328],"\napiVersion: compute.functionmesh.io\u002Fv1alpha1\nkind: Sink\nmetadata:\n  name: sink-sample\nspec:\n  image: streamnative\u002Fpulsar-io-elastic-search:2.7.1\n  className: org.apache.pulsar.io.elasticsearch.ElasticSearchSink\n  replicas: 1\n  input:\n    topics:\n    - persistent:\u002F\u002Fpublic\u002Fdefault\u002Finput\n    typeClassName: \"[B\"\n  sinkConfig:\n    elasticSearchUrl: \"http:\u002F\u002Fquickstart-es-http.default.svc.cluster.local:9200\"\n    indexName: \"my_index\"\n    typeName: \"doc\"\n    username: \"elastic\"\n    password: \"X2Mq33FMWMnqlhvw598Z8562\"\n  pulsar:\n    pulsarConfig: \"test-sink\"\n  java:\n    jar: connectors\u002Fpulsar-io-elastic-search-2.7.1.nar\n    jarLocation: \"\" # use pulsar provided connectors\n\n",[4926,75035,75033],{"__ignoreMap":18},[32,75037,75039],{"id":75038},"how-to-run-function-mesh-on-kubernetes","How to Run Function Mesh on Kubernetes",[48,75041,75042],{},"A FunctionMesh orchestrates functions, sources and sinks together and manages them as a whole. The FunctionMesh CRD has a list of fields for functions, sources and sinks and you can connect them together through the topics field. Once the YAML file is submitted, the FunctionMesh controller will reconcile it into multiple function\u002Fsource\u002Fsink resources and delegate each of them to corresponding controllers. The function\u002Fsource\u002Fsink controllers reconcile each task and launch corresponding sub-components. The FunctionMesh controller collects the status of each component from the system and aggregates them in its own status field.",[48,75044,75045],{},"The following FunctionMesh job example launches two functions and streams the input through the two functions to append exclamation marks.",[8325,75047,75050],{"className":75048,"code":75049,"language":8330},[8328],"\napiVersion: compute.functionmesh.io\u002Fv1alpha1\nkind: FunctionMesh\nmetadata:\n  name: mesh-sample\nspec:\n  functions:\n    - name: ex1\n      className: org.apache.pulsar.functions.api.examples.ExclamationFunction\n      replicas: 1\n      maxReplicas: 5\n      input:\n        topics:\n          - persistent:\u002F\u002Fpublic\u002Fdefault\u002Fsource-topic\n        typeClassName: java.lang.String\n      output:\n        topic: persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmid-topic\n        typeClassName: java.lang.String\n      pulsar:\n        pulsarConfig: \"mesh-test-pulsar\"\n      java:\n        jar: pulsar-functions-api-examples.jar\n        jarLocation: public\u002Fdefault\u002Ftest\n   - name: ex2\n      className: org.apache.pulsar.functions.api.examples.ExclamationFunction\n      replicas: 1\n      maxReplicas: 3\n      input:\n        topics:\n          - persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmid-topic\n        typeClassName: java.lang.String\n      output:\n        topic: persistent:\u002F\u002Fpublic\u002Fdefault\u002Fsink-topic\n        typeClassName: java.lang.String\n      pulsar:\n        pulsarConfig: \"mesh-test-pulsar\"\n      java:\n        jar: pulsar-functions-api-examples.jar\n        jarLocation: public\u002Fdefault\u002Ftest\n\n",[4926,75051,75049],{"__ignoreMap":18},[48,75053,75054],{},"The output topic and input topic of the two functions are the same, so that one can publish the result into this topic and the other can fetch the data from that topic.",[32,75056,75058],{"id":75057},"work-with-pulsar-admin-cli-tool","Work with pulsar-admin CLI tool",[48,75060,75061,75062,75067],{},"If you want to use Function Mesh and do not want to change the way you create and submit functions, you can use Function Mesh worker service. It is similar to Pulsar Functions worker service but uses Function Mesh to schedule and run functions. Function Mesh worker service enables you to use the ",[55,75063,75066],{"href":75064,"rel":75065},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.10.x\u002Fpulsar-admin\u002F",[264],"pulsar-admin CLI"," tool to manage Pulsar Functions and connectors in Function Mesh. The following figure illustrates how Function Mesh worker service works with Pulsar proxy, converts and forwards requests to the Kubernetes cluster.",[48,75069,75070],{},[384,75071],{"alt":75072,"src":75073},"illustration pulsar-admin CLI tool","\u002Fimgs\u002Fblogs\u002F63a39e9794af5376b0802e49_function-mesh-workflow.png",[48,75075,75076,75077,190],{},"For details about the usage, you can refer to ",[55,75078,75081],{"href":75079,"rel":75080},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Finstall-function-mesh\u002F#work-with-pulsar-admin-cli-tool",[264],"work with pulsar-admin CLI tool",[32,75083,75085],{"id":75084},"migrate-pulsar-functions-to-function-mesh","Migrate Pulsar Functions to Function Mesh",[48,75087,75088,75089,190],{},"If you run Pulsar Functions using the existing Kubernetes runtime and want to migrate them to Function Mesh, Function Mesh provides you a tool to generate a list of CRDs of your existing functions. You can then apply these CRDs to ask Function Mesh to take over the ownership of managing the running Pulsar Functions on Kubernetes. For details, refer to ",[55,75090,75093],{"href":75091,"rel":75092},"https:\u002F\u002Ffunctionmesh.io\u002Fdocs\u002Fmigration\u002Fmigrate-function",[264],"migration Pulsar Functions guide",[32,75095,75097],{"id":75096},"supported-features","Supported Features",[48,75099,75100],{},"Currently, Function Mesh supports the following features:",[321,75102,75103,75106,75109,75112,75115,75118,75121,75124],{},[324,75104,75105],{},"Running Pulsar Functions and connectors natively in Kubernetes.",[324,75107,75108],{},"Orchestrating multiple Pulsar Functions and connectors as a streaming job.",[324,75110,75111],{},"Compatibility with original Pulsar Admin API for submitting Functions and connectors.",[324,75113,75114],{},"Auto-scaling instances for functions and connectors using Horizontal Pod Autoscaler.",[324,75116,75117],{},"Authentication and authorization.",[324,75119,75120],{},"Multiple runtimes with Java, Python, and Golang support.",[324,75122,75123],{},"Schema and SerDe.",[324,75125,75126],{},"Resource limitation.",[40,75128,75130],{"id":75129},"future-plans","Future Plans",[48,75132,75133],{},"We plan to enable the following features in the upcoming releases, if you have any ideas or would like to contribute to it, feel free to contact us.",[321,75135,75136,75139,75142,75145,75148,75151,75154],{},[324,75137,75138],{},"Improve the capability level of the Function Mesh operator.",[324,75140,75141],{},"Feature parity with Pulsar Functions, such as stateful function.",[324,75143,75144],{},"Support additional runtime based on self-contained function runtime, such as web-assembly.",[324,75146,75147],{},"Develop better tools\u002Ffrontend to manage and inspect Function Meshes.",[324,75149,75150],{},"Group individual functions together to improve latency and reduce cost.",[324,75152,75153],{},"Support advanced auto-scaling based on Pulsar metrics.",[324,75155,75156,75157,190],{},"Integrate function registry with ",[55,75158,75161],{"href":75159,"rel":75160},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fadmin-api-packages\u002F",[264],"Apache Pulsar Packages",[40,75163,75165],{"id":75164},"try-function-mesh-now","Try Function Mesh Now",[321,75167,75168,75176,75183,75195],{},[324,75169,75170,75171,75175],{},"Function Mesh is now ",[55,75172,75174],{"href":34283,"rel":75173},[264],"open source",". Try it on your Kubernetes clusters.",[324,75177,75178,75179,75182],{},"Function Mesh is also built in StreamNative Cloud. ",[55,75180,75181],{"href":11302},"Read this blog"," for how you can quickly cover various messaging and streaming use cases, such as ETL pipelines, event-driven applications, and simple data analytics applications on StreamNative Cloud.",[324,75184,75185,75186,4003,75190,190],{},"To learn more about Function Mesh, ",[55,75187,75189],{"href":42567,"rel":75188},[264],"read the docs",[55,75191,75194],{"href":75192,"rel":75193},"https:\u002F\u002Fyoutu.be\u002Fu_9YDM44fMw",[264],"watch a live demo",[324,75196,75197,75198,75201,75202,75206],{},"If you have any feedback or suggestions for this project, feel free to ",[55,75199,24379],{"href":75200},"mailto:function-mesh@streamnative.io"," or open issues in the ",[55,75203,75205],{"href":34283,"rel":75204},[264],"GitHub repo",". Any feedback is highly appreciated.",[40,75208,36477],{"id":36476},[321,75210,75211,75215,75223],{},[324,75212,45216,75213,47757],{},[55,75214,38404],{"href":45219},[324,75216,47760,75217,1154,75220,45209],{},[55,75218,47764],{"href":45463,"rel":75219},[264],[55,75221,47768],{"href":45206,"rel":75222},[264],[324,75224,45223,75225,45227],{},[55,75226,31914],{"href":31912,"rel":75227},[264],[48,75229,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":75231},[75232,75233,75234,75240,75243,75251,75252,75253],{"id":74795,"depth":19,"text":74796},{"id":74827,"depth":19,"text":74828},{"id":74869,"depth":19,"text":74870,"children":75235},[75236,75237,75238,75239],{"id":74876,"depth":279,"text":74877},{"id":61159,"depth":279,"text":61160},{"id":5023,"depth":279,"text":60903},{"id":74907,"depth":279,"text":74908},{"id":74914,"depth":19,"text":74915,"children":75241},[75242],{"id":74933,"depth":279,"text":74934},{"id":74949,"depth":19,"text":74950,"children":75244},[75245,75246,75247,75248,75249,75250],{"id":74976,"depth":279,"text":74977},{"id":75000,"depth":279,"text":75001},{"id":75038,"depth":279,"text":75039},{"id":75057,"depth":279,"text":75058},{"id":75084,"depth":279,"text":75085},{"id":75096,"depth":279,"text":75097},{"id":75129,"depth":19,"text":75130},{"id":75164,"depth":19,"text":75165},{"id":36476,"depth":19,"text":36477},"2021-05-03","Run Pulsar Functions (Pulsar’s serverless computing framework) and connectors with Function Mesh. Leverage Kubernetes’ application deployment, scaling, and management. Bring event streaming capabilities to your applications.","\u002Fimgs\u002Fblogs\u002F63c7fd3febac459bdc2f7ff9_63a39dab7bdd430d7c1c1d75_mesh-top.jpeg",{},"\u002Fblog\u002Ffunction-mesh-simplify-complex-streaming-jobs-in-cloud",{"title":51857,"description":75255},"blog\u002Ffunction-mesh-simplify-complex-streaming-jobs-in-cloud",[9636,821,28572,4839,16985],"9cNMas2sMnMYwUbo03T66g7hb3R5IvonK4nHnk4RVOQ",{"id":75264,"title":53856,"authors":75265,"body":75266,"category":821,"createdAt":290,"date":75466,"description":75467,"extension":8,"featured":294,"image":75468,"isDraft":294,"link":290,"meta":75469,"navigation":7,"order":296,"path":75470,"readingTime":20144,"relatedResources":290,"seo":75471,"stem":75472,"tags":75473,"__hash__":75474},"blogs\u002Fblog\u002Fannouncing-amqp-1-0-connector-for-apache-pulsar.md",[58855,61300],{"type":15,"value":75267,"toc":75460},[75268,75288,75292,75308,75316,75322,75330,75336,75340,75350,75353,75359,75362,75366,75369,75389,75391,75394,75441,75446,75457],[48,75269,75270,75271,75276,75277,75281,75282,75287],{},"Today StreamNative announces the release of the ",[55,75272,75275],{"href":75273,"rel":75274},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-amqp-1-0\u002Ftree\u002Fmaster",[264],"AMQP 1.0 connector"," for ",[55,75278,821],{"href":75279,"rel":75280},"https:\u002F\u002Fpulsar.apache.org\u002Fen\u002F",[264],". This connector enables seamless integration between the Pulsar ecosystem and ",[55,75283,75286],{"href":75284,"rel":75285},"https:\u002F\u002Fwww.amqp.org\u002F",[264],"AMQP",". If you are an organization looking to deploy reactive data pipelines, try it out!",[40,75289,75291],{"id":75290},"what-is-the-amqp-10-connector","What is the AMQP 1.0 connector?",[48,75293,3600,75294,75298,75299,75303,75304,75307],{},[55,75295,75297],{"href":75273,"rel":75296},[264],"AMQP 1.0 connector (AMQP1_0)"," enables your application to publish and consume data using the ",[55,75300,75302],{"href":75284,"rel":75301},[264],"AMQP 1.0","-compliant broker and move data bi-directionally between Pulsar and AMQP service. The ",[55,75305,20384],{"href":74008,"rel":75306},[264]," framework allows you to read data from Pulsar or write data to Pulsar using source and sink.",[321,75309,75310,75313],{},[324,75311,75312],{},"AMQP 1.0 source",[324,75314,75315],{},"This source feeds data from AMQP 1.0 and persists data to Pulsar topics.",[48,75317,75318],{},[384,75319],{"alt":75320,"src":75321},"pulsar and AMQP source illustration","\u002Fimgs\u002Fblogs\u002F63a39d0afb93966955b28fb3_amqp10-source.png",[321,75323,75324,75327],{},[324,75325,75326],{},"AMQP 1.0 sink",[324,75328,75329],{},"This sink feeds data from Pulsar topics and persists data to AMQP.",[48,75331,75332],{},[384,75333],{"alt":75334,"src":75335},"pulsar and AMQP sink illustration","\u002Fimgs\u002Fblogs\u002F63a39d0a2be9e61bf5538a35_amqp10-sink.png",[40,75337,75339],{"id":75338},"why-did-streamnative-develop-the-amqp-10-connector","Why did StreamNative develop the AMQP 1.0 connector?",[48,75341,75342,4003,75346,75349],{},[55,75343,75345],{"href":75279,"rel":75344},[264],"Pulsar",[55,75347,75286],{"href":75284,"rel":75348},[264]," are at the heart of modern cloud architectures.",[48,75351,75352],{},"As one of the leading open-source distributed messaging systems, Pulsar unifies streaming and queuing capabilities and provides a broad set of features and functionalities all in one system.",[48,75354,75355,75358],{},[55,75356,75302],{"href":75284,"rel":75357},[264]," is one of the most efficient and reliable messaging protocols, allowing you to construct cross-platform and message-based applications with the vendor-agnostic and implementation-neutral protocol.",[48,75360,75361],{},"Both Pulsar and AMQP have grown rapidly in recent years. Pulsar has received global adoptions from top tech companies such as Yahoo! JAPAN, Verizon Media, Splunk, Iterable, Tencent, just to name a few. AMQP is widely adopted by leading organizations such as Google, Microsoft, IBM, Red Hat, and more. Many users want to leverage both the benefits of Pulsar and AMQP and we have received requests from a number of our customers looking for an integration between Pulsar and AMQP.",[40,75363,75365],{"id":75364},"why-use-the-amqp-10-connector","Why use the AMQP 1.0 connector?",[48,75367,75368],{},"Built to deploy integrations between Pulsar and AMQP quickly, securely, and reliably, the AMQP 1.0 connector brings various advantages, including but not limited to:",[321,75370,75371,75374,75377,75380,75383,75386],{},[324,75372,75373],{},"Simplicity",[324,75375,75376],{},"This connector simplifies integration for organizations who want to bring Pulsar into their existing infrastructure. It empowers organizations to move data in and out of Pulsar without writing a single line of code.",[324,75378,75379],{},"Scalability",[324,75381,75382],{},"This connector is able to run jobs on a single node (standalone) or deliver reliability at scale for an entire organization (distributed), which allows you to build reactive data pipelines to serve your business and operational needs in real-time.",[324,75384,75385],{},"Sustainability",[324,75387,75388],{},"Taking advantage of the full power of this connector enables you to spend less time worrying about the data layer and have more time to maximize business value from living data in an efficient manner.",[40,75390,39647],{"id":39646},[48,75392,75393],{},"The AMQP 1.0 connector is a major step in the journey of integrating other message systems into the Pulsar ecosystem. To get involved with the AMQP 1.0 connector, check out the following featured resources:",[321,75395,75396,75399,75402,75405,75430,75433],{},[324,75397,75398],{},"Try the AMQP 1.0 connector",[324,75400,75401],{},"To get started, download the connector and head over to the user guides that will walk you through the setup process.",[324,75403,75404],{},"Ask a question",[324,75406,75407,75408,48888,75412,75415,75416,75419,75420,75425,75426,39692],{},"Have questions? As always, feel free to create issues on ",[55,75409,39680],{"href":75410,"rel":75411},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-amqp-1-0\u002Fissues\u002Fnew\u002Fchoose",[264],[55,75413,39686],{"href":39684,"rel":75414},[264],", message us on ",[55,75417,39691],{"href":33664,"rel":75418},[264],", or join the ",[55,75421,75424],{"href":75422,"rel":75423},"https:\u002F\u002Fapache-pulsar.slack.com\u002Farchives\u002FC01U6F97ZCM",[264],"#connector-AMQP1_0 channel"," on ",[55,75427,57762],{"href":75428,"rel":75429},"https:\u002F\u002Fapache-pulsar.slack.com\u002Farchives\u002FC01NG4T1BU6",[264],[324,75431,75432],{},"Make a contribution",[324,75434,75435,75436,75440],{},"The AMQP 1.0 connector is a community-driven service, which hosts its source code on the StreamNative GitHub repo. We would love you to explore this new connector and contribute to its evolution. Have feature requests or bug reports? Do not hesitate to ",[55,75437,75439],{"href":75410,"rel":75438},[264],"share your ideas"," and welcome to submit your pull request.",[48,75442,75443],{},[384,75444],{"alt":18,"src":75445},"\u002Fimgs\u002Fblogs\u002F63a39d0ac8d13407eac6ccba_amqp10-cooperation.png",[48,75447,75448,75449,75451,75452,75456],{},"The AMQP 1.0 connector is available on the ",[55,75450,38697],{"href":74017},", a centralized hub for the Pulsar ecosystem. Whether you are a longtime Pulsar user or new to Pulsar, ",[55,75453,75455],{"href":35258,"rel":75454},[264],"StreamNative Hub and its rich integrations"," are a great way to take your organization to the next level. At StreamNative, we’re committed to the Pulsar community and will continue to invest in the Pulsar ecosystem. Today’s announcement is just another step towards realizing our vision of delivering enhanced integrations to enable enterprises to easily access their data. Stay tuned for more announcements from StreamNative!",[48,75458,75459],{},"Happy Connecting!",{"title":18,"searchDepth":19,"depth":19,"links":75461},[75462,75463,75464,75465],{"id":75290,"depth":19,"text":75291},{"id":75338,"depth":19,"text":75339},{"id":75364,"depth":19,"text":75365},{"id":39646,"depth":19,"text":39647},"2021-04-26","Today StreamNative announces the release of the AMQP 1.0 connector for Apache Pulsar! This connector enables your application to publish and consume data using the AMQP 1.0-compliant broker and move data bi-directionally between Pulsar and AMQP service.","\u002Fimgs\u002Fblogs\u002F63c7fd51403da9893fa39edf_63a39d0aad8b8059e72ded8c_amqp10-top.jpeg",{},"\u002Fblog\u002Fannouncing-amqp-1-0-connector-for-apache-pulsar",{"title":53856,"description":75467},"blog\u002Fannouncing-amqp-1-0-connector-for-apache-pulsar",[28572,302],"Yp6javsM2REK13qhVUhbZeB3zEIYG_h54ytyCOKJwz0",{"id":75476,"title":60561,"authors":75477,"body":75478,"category":3550,"createdAt":290,"date":75777,"description":75778,"extension":8,"featured":294,"image":75779,"isDraft":294,"link":290,"meta":75780,"navigation":7,"order":296,"path":75781,"readingTime":4475,"relatedResources":290,"seo":75782,"stem":75783,"tags":75784,"__hash__":75785},"blogs\u002Fblog\u002Fflink-sql-on-streamnative-cloud.md",[806,810],{"type":15,"value":75479,"toc":75761},[75480,75483,75487,75490,75493,75496,75500,75503,75509,75518,75521,75527,75530,75536,75540,75543,75547,75550,75553,75559,75563,75566,75569,75575,75579,75582,75585,75588,75591,75597,75601,75610,75618,75624,75635,75638,75642,75645,75651,75655,75658,75664,75667,75670,75674,75680,75683,75686,75689,75692,75695,75701,75704,75708,75711,75715,75728,75731,75754],[48,75481,75482],{},"We are excited to announce the launch of Flink SQL on StreamNative Cloud. Flink SQL on StreamNative Cloud (aka “Flink SQL”) provides an intuitive and interactive SQL interface that reduces the complexity of building real-time data queries on Apache Pulsar. StreamNative is Cloud Partners with Ververica, the original developers of and the company behind Apache Flink. This partnership has enabled a close collaboration and integration and has helped us to create a powerful, turnkey platform for real time data insights.",[40,75484,75486],{"id":75485},"why-apache-flink-and-flink-sql","Why Apache Flink and Flink SQL?",[48,75488,75489],{},"Apache Flink is a distributed, stream data processing engine that provides high throughput, low latency data processing, powerful abstractions and operational flexibility. With Apache Flink, users can easily develop and deploy event-driven applications, data analytics jobs, and data pipelines to handle real-time and historical data in complex distributed systems. Because of its powerful functionality and mature community, Apache Flink is widely adopted globally by some of the largest and most successful data-driven enterprises, including Alibaba, Netflix, and Uber.",[48,75491,75492],{},"Flink SQL provides relational abstractions of events stored in Apache Pulsar. It supports SQL standards for unified stream and batch processing. With Flink SQL, users can write SQL queries and access key insights from their real-time data, without having to write a line of Java or Python.",[48,75494,75495],{},"With a powerful execution engine and simple abstraction layer, Apache Flink and Flink SQL provide a distributed, real-time data processing solution with low development and maintenance costs. With Pulsar and Flink, StreamNative offers both stream storage and stream compute for a complete streaming solution.",[40,75497,75499],{"id":75498},"flink-pulsar-a-cloud-native-streaming-platform-for-infinite-data-streams","Flink + Pulsar: A cloud-native streaming platform for infinite data streams",[48,75501,75502],{},"The need for real-time data insights has never been more critical. But data insights aren’t limited to real-time data. Companies also need to integrate and understand large amounts of historical data in order to gain a complete picture of their business. This requires the ability to capture, store and compute both real-time and historical data.",[48,75504,75505],{},[384,75506],{"alt":75507,"src":75508},"apache flink","\u002Fimgs\u002Fblogs\u002F63d798a0d2a567bd34c4006f_63a322ab79c781416e968279_1.webp",[48,75510,75511,75512,75517],{},"Pulsar’s ",[55,75513,75516],{"href":75514,"rel":75515},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-tiered-storage\u002F",[264],"tiered storage model"," provides the storage capabilities required for both batch and stream processing, enabling StreamNative Cloud to offer unified storage. Integrating Apache Flink and Flink SQL enables us to offer unified batch and stream processing, and Flink SQL simplifies the execution.",[48,75519,75520],{},"In a streaming-first world, the core abstraction of data is the infinite stream. The tables are derived from the stream and updated continuously as new data arrives in the stream. Apache Pulsar is the storage for infinite streams and Apache Flink is the engine that creates the materialized views in the form of streaming tables. You can then run streaming queries to perform continuous transformations, or run batch queries against streaming tables to get the latest value for every key in the stream in real time.",[48,75522,75523],{},[384,75524],{"alt":75525,"src":75526},"illustration of processing beetwin streams and app","\u002Fimgs\u002Fblogs\u002F63a322acb4dbadf611af5fb0_2.png",[48,75528,75529],{},"Integrating Apache Flink with Apache Pulsar enables companies to represent and process streaming data in new ways. The Pulsar infinite stream is the core storage abstraction for streaming data and everything else is a materialized view over the infinite stream, including databases, search indexes, or other data serving systems in the company. All the data enrichment and ETL needed to create these derived views can now be created in a streaming fashion using Apache Flink. Monitoring, security, anomaly and threat detection, analytics, and response to failures can be done in real-time by combining historical context with real-time data analytics.",[48,75531,75532],{},[384,75533],{"alt":75534,"src":75535},"StreamNative Cloud as a complete streaming solution","\u002Fimgs\u002Fblogs\u002F63a322abf700740e23056984_3.png",[40,75537,75539],{"id":75538},"when-to-use-flink-sql","When to use Flink SQL",[48,75541,75542],{},"With Flink SQL on StreamNative Cloud, Pulsar clusters are treated as Flink catalogs. Users can query infinite streams of events in Apache Pulsar using Flink SQL. Below are some top use cases for utilizing the streaming SQL queries over Pulsar streams:",[32,75544,75546],{"id":75545},"_1-real-time-monitoring","1. Real-time monitoring",[48,75548,75549],{},"We often think of monitoring as tracking low-level performance statistics using counters and gauges. While these metrics can tell you that your CPU usage is high, they can’t tell you if your application is doing what it’s supposed to. Flink SQL allows you to define custom metrics from streams of messages that applications generate, whether they are logging events, captured change data, or any other kind. For example, a cloud service might need to check that every time a new user signs up, a welcome email is sent, a new user record is created, and their credit card is billed. These functions might be spread over multiple different services or applications, and you want to monitor that each thing happened for each new customer within a certain SLA.",[48,75551,75552],{},"Below is a streaming SQL query to monitor error counts over a stream of error codes.",[8325,75554,75557],{"className":75555,"code":75556,"language":8330},[8328],"\nINSERT INTO error_counts\nSELECT error_code, count(*) FROM monitoring_stream\nGROUP BY TUMBLE(ts, INTERVAL '1' MINUTE), error_code\nHAVING type = ‘ERROR’;\n\n",[4926,75558,75556],{"__ignoreMap":18},[32,75560,75562],{"id":75561},"_2-real-time-anomaly-detection","2. Real-time anomaly detection",[48,75564,75565],{},"Security use cases often look a lot like monitoring and analytics. Rather than monitoring application behavior or business behavior, application developers are looking for patterns of fraud, abuse, spam, intrusion, or other bad behavior. Flink SQL provides a simple and real-time way of defining these patterns and querying real-time Pulsar streams.",[48,75567,75568],{},"Below is a streaming SQL query to detect frauds over a stream of transactions.",[8325,75570,75573],{"className":75571,"code":75572,"language":8330},[8328],"\nINSERT INTO possible_fraud\nSELECT card_number, count(*)\nFROM transactions\nGROUP BY TUMBLE(ts, INTERVAL '1' MINUTE), card_number\nHAVING count(*) > 3;\n\n",[4926,75574,75572],{"__ignoreMap":18},[32,75576,75578],{"id":75577},"_3-real-time-data-pipelines","3. Real-time data pipelines",[48,75580,75581],{},"Companies build real-time data pipelines for data enrichment. These data pipelines capture data changes coming out of several databases, transform them, join them together, and store them in a key-value database, search index, cache, or other data serving systems.",[48,75583,75584],{},"For a long time, ETL pipelines were built as periodic batch jobs. For example, they ingest the raw data in realtime, and then transform it every few hours to enable efficient queries. For many real-time use cases, such as transaction or payment processing, this delay is unacceptable. Flink SQL together with Pulsar I\u002FO connectors enables real-time data integration between different systems.",[48,75586,75587],{},"Now you can enrich streams of events with metadata stored in a different table using joins, or perform simple filtering of Personally Identifiable Information (PII) data before loading the stream into another system.",[48,75589,75590],{},"The streaming SQL query below shows an example enriching a click stream using a users table.",[8325,75592,75595],{"className":75593,"code":75594,"language":8330},[8328],"\nINSERT INTO vip_users\nSELECT user_id, page, action\nFROM clickstream c\nLEFT JOIN users u ON c.user_id = u.user_id\nWHERE u.level = ‘Platinum’;\n\n",[4926,75596,75594],{"__ignoreMap":18},[40,75598,75600],{"id":75599},"pulsar-abstractions-in-flink-sql","Pulsar Abstractions in Flink SQL",[48,75602,75603,75604,75609],{},"The integration of Flink SQL and Apache Pulsar utilizes Flink's ",[55,75605,75608],{"href":75606,"rel":75607},"https:\u002F\u002Fci.apache.org\u002Fprojects\u002Fflink\u002Fflink-docs-release-1.12\u002Fdev\u002Ftable\u002Fcatalogs.html",[264],"catalog API"," to reference existing Pulsar metadata and automatically map them to Flink’s corresponding metadata. There are a few core abstractions in this integration that map to the core abstractions in Pulsar and allow you to manipulate Pulsar topics using SQL.",[321,75611,75612,75615],{},[324,75613,75614],{},"Catalog: A catalog is a collection of databases. It is mapped to an existing Pulsar cluster.",[324,75616,75617],{},"Database: A database is a collection of tables. It is mapped to a namespace in Apache Pulsar. All the namespaces within a Pulsar cluster will automatically be converted to Flink databases in a Pulsar catalog. Databases can also be created or deleted via Data Definition Language (DDL) queries, where the underlying Pulsar namespaces will be created or deleted.",[8325,75619,75622],{"className":75620,"code":75621,"language":8330},[8328],"\nCREATE DATABASE userdb;\n\n",[4926,75623,75621],{"__ignoreMap":18},[321,75625,75626,75629,75632],{},[324,75627,75628],{},"Table: A Pulsar topic can be presented as a STREAMING table or an UPSERT table.",[324,75630,75631],{},"Schema: The schema of a Pulsar topic will be automatically mapped as Flink table schema if the topic already exists with a schema. If a Pulsar topic doesn’t exist, creating a table via DDL queries will convert the Flink table schema to a Pulsar schema for creating a Pulsar topic.",[324,75633,75634],{},"Metadata Columns: The message metadata and properties of a Pulsar message will be mapped into the metadata columns of a Flink table. These metadata columns are: - messageId: the message ID of a Pulsar message. (read-only) - sequenceId: the sequence ID of a Pulsar message. (read-only) - publishTime: the publish timestamp of a Pulsar message. (read-only) - eventTime: the event timestamp of a Pulsar message. (readable\u002Fwritable) - properties: the message properties of a Pulsar message. (readable\u002Fwritable)",[48,75636,75637],{},"A Pulsar topic can be presented as a STREAMING table or an UPSERT table in Flink.",[32,75639,75641],{"id":75640},"streaming-table","STREAMING table",[48,75643,75644],{},"A streaming table represents an unbounded sequence of structured data (“facts”). For example, we could have a stream of financial transactions such as “Jack sent $100 to Kate, then Alice sent $200 to Kate”. Facts in a table are immutable, which means new events can be inserted into a table, but existing events can never be updated or deleted. All the topics within a Pulsar namespace will automatically be mapped to streaming tables in a catalog configured to use a pulsar connector. Streaming tables can also be created or deleted via DDL queries, where the underlying Pulsar topics will be created or deleted.",[8325,75646,75649],{"className":75647,"code":75648,"language":8330},[8328],"\nCREATE TABLE pageviews (\n  user_id BIGINT,\n  page_id BIGINT,\n  viewtime TIMESTAMP,\n  user_region STRING,\n  WATERMARK FOR viewtime AS viewtime - INTERVAL '2' SECOND\n);\n\n",[4926,75650,75648],{"__ignoreMap":18},[32,75652,75654],{"id":75653},"upsert-table","UPSERT table",[48,75656,75657],{},"An upsert table represents a collection of evolving facts. For example, we could have a table that contains the latest financial information such as “Kate’s current account balance is $300”. It is the equivalent of a traditional database table but enriched by streaming semantics such as windowing. Facts in a UPSERT table are mutable, which means new facts can be inserted into the table, and existing facts can be updated or deleted. Upsert tables can be created by specifying connector to be upsert-pulsar.",[8325,75659,75662],{"className":75660,"code":75661,"language":8330},[8328],"\nCREATE TABLE pageviews_per_region (\n  user_region STRING,\n  pv BIGINT,\n  uv BIGINT,\n  PRIMARY KEY (user_region) NOT ENFORCED\n) with (\n  “connector” = “upsert-pulsar”\n};\n\n",[4926,75663,75661],{"__ignoreMap":18},[48,75665,75666],{},"By integrating the concepts of streaming tables and upsert tables, FlinkSQL allows joining upsert tables that represent the current state of the world with streaming tables that represent events that are happening right now. A topic in Pulsar can be represented as either a streaming table or an upsert table in Flink SQL, depending on the intended semantics of the processing on the topic.",[48,75668,75669],{},"For instance, if you want to read the data in a topic as a series of independent values, you would treat a Pulsar topic as a streaming table. An example of such a streaming table is a topic that captures page view events where each page view event is unrelated and independent of another. On the other hand, if you want to read the data in a topic as an evolving collection of updatable values, you would treat the topic as an upsert topic. An example of a topic that should be read as an UPSERT table in Flink is one that captures user metadata where each event represents the latest metadata for a particular user id including its user name, address or preferences.",[40,75671,75673],{"id":75672},"a-dive-into-flink-sql-on-streamnative-cloud","A dive into Flink SQL on StreamNative Cloud",[48,75675,75676],{},[384,75677],{"alt":75678,"src":75679},"streamNative Cloud Architecture","\u002Fimgs\u002Fblogs\u002F63a3257e23d41137fa70002d_4.png",[48,75681,75682],{},"StreamNative Cloud operates out of a control plane and cloud pools.",[48,75684,75685],{},"The control plane includes the backend services that StreamNative manages in its own cloud account. The backend services mainly include a Cloud API service and a Cloud console. Users can interact with StreamNative Cloud via the Cloud console, and applications can interact with it via the Cloud API service.",[48,75687,75688],{},"The cloud pools can be managed by StreamNative in its own cloud account or in the customers’ cloud accounts. Pulsar clusters are run inside the cloud pools. The SQL queries are also run on the cloud pools.",[48,75690,75691],{},"The diagram below demonstrates how the authentication\u002Fauthorization is implemented in our system. Here it assumes that data has already been ingested into the Pulsar clusters on StreamNative Cloud, but you can ingest data from external data sources, such as events data, streaming data, IoT data, and more, using Pulsar’s pub\u002Fsub messaging API.",[48,75693,75694],{},"Users or applications can interact with the StreamNative control plane to create a Pulsar cluster. Once the Pulsar cluster is ready, users can either create a Flink session cluster and use the SQL editor in StreamNative’s Cloud console to initiate interactive queries, or create long-running deployments to continuously process data streams in the Pulsar cluster.",[48,75696,75697],{},[384,75698],{"alt":75699,"src":75700},"how Flink SQL interacts with Pulsar clusters","\u002Fimgs\u002Fblogs\u002F63a3257f5ae00eadfac1ddb7_5.png",[48,75702,75703],{},"For each Flink session cluster, there is a SQL Gateway process which parses SQL queries and executes queries locally or submits queries to the Flink cluster. Each SQL session in the SQL Gateway will initiate Pulsar catalogs, with each catalog representing one existing Pulsar cluster. The catalog contains all the necessary information needed to securely access the Pulsar cluster. For DDL queries, they are directly executed in the SQL gateway, while all the DML queries will be submitted to the Flink session cluster to execute. All the SQL queries are impersonated as the actual user who submits them for security purposes.",[40,75705,75707],{"id":75706},"whats-next-for-flink-pulsar-integration-on-streamnative-cloud","What’s Next for Flink + Pulsar integration on StreamNative Cloud?",[48,75709,75710],{},"We are releasing Flink SQL on StreamNative Cloud as a developer preview feature to gather feedback. We plan to add several more capabilities such as running Flink SQL as continuous deployments, providing the ability to run arbitrary Flink jobs, and more, as we work with both the Pulsar and Flink communities to build a robust, unified batch and streaming solution.",[40,75712,75714],{"id":75713},"how-do-i-access-flink-sql-on-streamnative-cloud","How Do I Access Flink SQL on StreamNative Cloud?",[48,75716,75717,75718,75723,75724,190],{},"You can get started by watching the ",[55,75719,75722],{"href":75720,"rel":75721},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0BxXjEqoJlU",[264],"quick start tutorial"," for Flink SQL on StreamNative Cloud. We’d love to hear about any ideas you have for improvement and to work closely with early adopters. Note, the Flink SQL offering is only available on paid clusters for now. We will give free cloud credits to our early adopters. If you are interested in trying out, please email us at ",[55,75725,75727],{"href":75726},"mailto:info@streamnative.io","info@streamnative.io",[48,75729,75730],{},"To learn more about Flink SQL, you can:",[321,75732,75733,75741,75748],{},[324,75734,75735,75736,190],{},"Watch the ",[55,75737,75740],{"href":75738,"rel":75739},"https:\u002F\u002Fyoutu.be\u002F9ojajM7Zt0M?t=2105",[264],"intro video",[324,75742,75743,75744,190],{},"Read about Flink SQL ",[55,75745,267],{"href":75746,"rel":75747},"https:\u002F\u002Fdocs.streamnative.io\u002Fcloud\u002Fstable\u002Fcompute\u002Fflink-sql",[264],[324,75749,75750,75751,190],{},"Get started with Flink SQL in ",[55,75752,3550],{"href":55797,"rel":75753},[264],[48,75755,75756,75757,190],{},"Finally, if you’re interested in messaging and event streaming, and want to help build Pulsar and Flink, ",[55,75758,75760],{"href":75759},"\u002Fen\u002Fcareers","we are hiring",{"title":18,"searchDepth":19,"depth":19,"links":75762},[75763,75764,75765,75770,75774,75775,75776],{"id":75485,"depth":19,"text":75486},{"id":75498,"depth":19,"text":75499},{"id":75538,"depth":19,"text":75539,"children":75766},[75767,75768,75769],{"id":75545,"depth":279,"text":75546},{"id":75561,"depth":279,"text":75562},{"id":75577,"depth":279,"text":75578},{"id":75599,"depth":19,"text":75600,"children":75771},[75772,75773],{"id":75640,"depth":279,"text":75641},{"id":75653,"depth":279,"text":75654},{"id":75672,"depth":19,"text":75673},{"id":75706,"depth":19,"text":75707},{"id":75713,"depth":19,"text":75714},"2021-04-20","StreamNative is launching Flink SQL (Apache Flink) on StreamNative Cloud. Building real-time data queries on Pulsar is now easier than ever.","\u002Fimgs\u002Fblogs\u002F63d79887e154b56d38b95950_63a322ab339d9cd2339ffd7a_top.webp",{},"\u002Fblog\u002Fflink-sql-on-streamnative-cloud",{"title":60561,"description":75778},"blog\u002Fflink-sql-on-streamnative-cloud",[302,3550,821,8057,303],"MihExZeN3DGHydABEA0hnp1bmPvtfgPFsvykJzjqxpw",{"id":75787,"title":75788,"authors":75789,"body":75790,"category":821,"createdAt":290,"date":75901,"description":75902,"extension":8,"featured":294,"image":75903,"isDraft":294,"link":290,"meta":75904,"navigation":7,"order":296,"path":75905,"readingTime":4475,"relatedResources":290,"seo":75906,"stem":75907,"tags":75908,"__hash__":75909},"blogs\u002Fblog\u002Fcall-for-beta-users-function-mesh-now-available-for-pulsar-functions.md","Call for Beta Users: Function Mesh Now Available for Pulsar Functions",[69353],{"type":15,"value":75791,"toc":75895},[75792,75795,75798,75802,75805,75808,75814,75822,75826,75829,75835,75838,75841,75844,75848,75851,75857,75863,75866,75869,75875,75878,75889,75892],[48,75793,75794],{},"Pulsar Functions is a turnkey serverless computing option native to the Pulsar messaging system. Popular use cases of Pulsar Functions include ETL jobs, real-time aggregation, microservices, reactive services, event routing, and more. Today, we are excited to introduce Function Mesh, a new feature that makes Pulsar Functions even more powerful.",[48,75796,75797],{},"Built as a Kubernetes controller, Function Mesh is a serverless framework that enables users to organize related Pulsar Functions\u002FConnectors together to form a complex streaming job and run them natively on Kubernetes. It is a valuable tool for those who are seeking cloud-native, serverless event streaming solutions.",[40,75799,75801],{"id":75800},"an-introduction-to-pulsar-functions","An Introduction to Pulsar Functions",[48,75803,75804],{},"Pulsar Functions is the native computing infrastructure of the Pulsar messaging system. It enables the creation of complex processing logic on a per message basis and brings simplicity and serverless concepts to event streaming, thereby eliminating the need to deploy a separate system such as Apache Spark or Apache Flink.",[48,75806,75807],{},"The lightweight compute functions consume messages from one or more Pulsar topics, apply user-supplied processing logic to each message, and publish computation results to a result topic. Some common use cases of Pulsar Functions include simple ETL jobs, real-time aggregation, microservices, reactive services, event routing, etc.",[48,75809,75810],{},[384,75811],{"alt":75812,"src":75813},"illustration of pulsar function","\u002Fimgs\u002Fblogs\u002F63a31d80a8bd86714fb90406_1.png",[48,75815,75816,75817,75821],{},"Pulsar Functions is not a full-power streaming processing engine nor a computation abstraction layer, rather, the benefits of Pulsar Functions are in its simplicity. Pulsar Functions supports multiple languages and developers do not need to learn new APIs, which increase development productivity. Using Pulsar Functions also means easier troubleshooting and maintenance because there is no need for an external processing system. For a deep dive on Pulsar Functions, ",[55,75818,75820],{"href":75819},"\u002Fen\u002Fblog\u002Ftech\u002F2020-10-06-pulsar-functions-deep-dive","read this article"," by Sanjeev Kulkarni, Sr. Principal Software Engineer at Splunk.",[40,75823,75825],{"id":75824},"why-do-you-need-function-mesh","Why Do You Need Function Mesh",[48,75827,75828],{},"With the increased adoption of Pulsar Functions, we noticed that users would process data input in a stage-by-stage pattern by organizing several Pulsar Functions together.",[48,75830,75831],{},[384,75832],{"alt":75833,"src":75834},"illustration of pulsar function mesh","\u002Fimgs\u002Fblogs\u002F63a31d80c30858e71ba2639b_2.png",[48,75836,75837],{},"With this practice, developers had to deploy and manage each Pulsar Function individually, which is a time-consuming process. Take the framework above as an example, the owner would have to use five commands to deploy Pulsar Connectors and Pulsar Functions. Additionally, when multiple Pulsar Functions run concurrently, it is hard to track the functions since there is no aggregated view or linking among them. It is difficult to manage life cycles or know the upstream and downstream of functions.",[48,75839,75840],{},"To solve these pain points, StreamNative has created Function Mesh. Function Mesh allows people to manage functions as a unit and provides an integrated view of cooperating functions. With Function Mesh, Pulsar Functions owners can easily manage multi-stage jobs, saving substantial operational resources.",[48,75842,75843],{},"Like Pulsar Functions, Function Mesh is not a full-power streaming engine. The goal of Function Mesh is not to replace heavyweight streaming engines, such as Spark or Flink, but to use a simple API and execution framework for common lightweight streaming use cases.",[40,75845,75847],{"id":75846},"launch-function-mesh-on-kubernetes","Launch Function Mesh on Kubernetes",[48,75849,75850],{},"Since Kubernetes has become the standard platform for containerized applications, Function Mesh was built based on it to run Pulsar Functions in a cloud-native way. It comes with several CRD abstractions (Function, Source, Sink, FunctionMesh) to help users model the serverless computing tasks and a Kubernetes operator which reconciles the functions, connectors and meshes submitted by users. Instead of using Pulsar admin and sending function requests to Pulsar clusters, users can use kubectl to submit a Function Mesh CRD manifest directly to Kubernetes clusters. The corresponding Mesh operator installed inside Kubernetes will then launch parts individually, organize scheduling, and load balance the functions together.",[8325,75852,75855],{"className":75853,"code":75854,"language":8330},[8328],"\n$ kubectl apply -f function-mesh.yaml \n\n",[4926,75856,75854],{"__ignoreMap":18},[8325,75858,75861],{"className":75859,"code":75860,"language":8330},[8328],"\napiVersion: cloud.streamnative.io\u002Fv1alpha1 \nkind: FunctionMesh\nmetadata: \n  name: functionmesh-sample\nSpec:\n  sources:\n    - name: MangoDBSource\n      ...\n  functions:\n      - name: f1\n        ...\n      - name: f2\n        ...\n      - name: f3\n        ...\n  sinks:\n    - name: ElasticSearchSink\n      ...\n\n",[4926,75862,75860],{"__ignoreMap":18},[48,75864,75865],{},"We implemented custom resources, including Function, Source, Sink, and FunctionMesh. These custom resources allow users to run Pulsar Functions and Connectors on Kubernetes with simplicity. With these Function Mesh CRDs, you can define your Function Mesh, organize the functions, and then submit it to Kubernetes. The diagram below illustrates the scheduling process.",[48,75867,75868],{},"The user provides CRD definitions for Pulsar Functions, Pulsar Connectors or Function Meshes. Once your Kubernetes cluster receives this CRD request, the Function Mesh operator will schedule individual parts and run the functions you desire as a stateful set.",[48,75870,75871],{},[384,75872],{"alt":75873,"src":75874},"illustration of kubernetes cluster","\u002Fimgs\u002Fblogs\u002F63a31dc0d89f445ec9421bd3_3.png",[48,75876,75877],{},"For those who are seeking cloud-native, serverless streaming solutions, Function Mesh brings many benefits beyond making managing Pulsar Functions easier. The benefits of Function Mesh include:",[321,75879,75880,75883,75886],{},[324,75881,75882],{},"Allowing Pulsar Functions owners to utilize the full power of Kubernetes Scheduler, including rebalancing, rescheduling, fault-tolerance, etc. These features are crucial in production setups.",[324,75884,75885],{},"It makes Pulsar Functions a first-class citizen in the cloud environment, which leads to greater possibilities when more resources become available in the cloud.",[324,75887,75888],{},"Function Mesh runs Pulsar Functions separately from Pulsar, which enables it to work with different messaging systems and integrate with existing tools in the cloud environment.",[40,75890,75891],{"id":75164},"Try Function Mesh Now!",[48,75893,75894],{},"Function Mesh is now available in private beta and we invite you to become one of the first users! Contact us to try Function Mesh on your Kubernetes clusters today.",{"title":18,"searchDepth":19,"depth":19,"links":75896},[75897,75898,75899,75900],{"id":75800,"depth":19,"text":75801},{"id":75824,"depth":19,"text":75825},{"id":75846,"depth":19,"text":75847},{"id":75164,"depth":19,"text":75891},"2021-04-05","StreamNative is excited to announce that Function Mesh private beta is now available. Try it now on your Kubernetes clusters!","\u002Fimgs\u002Fblogs\u002F63d799326d7343694fe792e0_63a31d80588fcadbd10ceb8a_top.webp",{},"\u002Fblog\u002Fcall-for-beta-users-function-mesh-now-available-for-pulsar-functions",{"title":75788,"description":75902},"blog\u002Fcall-for-beta-users-function-mesh-now-available-for-pulsar-functions",[9636,821,28572,4839,16985,8058,303],"lLADqgNlrMRhn4jOnfe6zqaFaQSqLxQaKZQuDc4olyI",{"id":75911,"title":75912,"authors":75913,"body":75914,"category":821,"createdAt":290,"date":76056,"description":76057,"extension":8,"featured":294,"image":76058,"isDraft":294,"link":290,"meta":76059,"navigation":7,"order":296,"path":76060,"readingTime":20144,"relatedResources":290,"seo":76061,"stem":76062,"tags":76063,"__hash__":76064},"blogs\u002Fblog\u002Fannouncing-aws-sqs-connector-for-apache-pulsar.md","Announcing AWS SQS Connector for Apache Pulsar",[61300,6500],{"type":15,"value":75915,"toc":76050},[75916,75929,75933,75936,75941,75947,75951,75960,75964,75967,75981,75983,75986,76032,76038,76047],[48,75917,75918,75919,75924,75925,75928],{},"Today StreamNative ships the first-ever release of the ",[55,75920,75923],{"href":75921,"rel":75922},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-sqs",[264],"AWS Simple Queue Service (SQS)"," connector for ",[55,75926,821],{"href":75279,"rel":75927},[264],". Over the last couple of months, we built an SQS connector that enables seamless integration between AWS SQS and the Pulsar ecosystem. Now we are thrilled to announce the general availability of the SQS connector! For organizations looking to deploy reactive data pipelines around Pulsar and SQS, we would recommend you try it out!",[40,75930,75932],{"id":75931},"what-is-the-sqs-connector","What is the SQS connector?",[48,75934,75935],{},"The SQS connector is an integration service that moves data bi-directionally between SQS and Pulsar by leveraging two types of connectors:",[321,75937,75938],{},[324,75939,75940],{},"SQS source connector – This connector feeds data from AWS SQS and persists data to Pulsar topics.",[48,75942,75943],{},[384,75944],{"alt":75945,"src":75946},"AWS SQS Sink connector","\u002Fimgs\u002Fblogs\u002F63a39c2c92217961736c954c_sqs-sink.png",[40,75948,75950],{"id":75949},"why-develop-the-sqs-connector","Why develop the SQS connector?",[48,75952,75953,75954,75959],{},"Pulsar and ",[55,75955,75958],{"href":75956,"rel":75957},"https:\u002F\u002Faws.amazon.com\u002Fsqs\u002F",[264],"AWS SQS"," are at the heart of modern cloud architectures. As one of the leading open-source distributed messaging systems, Pulsar unifies streaming and queuing capabilities and provides a broad set of features and functionalities all in one system. AWS SQS is one of the most popular queue-based message systems, offering a secure, durable, and available hosted queue that allows you to integrate and decouple distributed software systems and components. Both Pulsar and SQS have a welcoming and rapidly expanding community, and many users want to be able to leverage both the benefits of Pulsar and SQS. Moreover, we have received requests from our customers looking for an integration between Pulsar and SQS.",[40,75961,75963],{"id":75962},"why-use-the-sqs-connector","Why use the SQS connector?",[48,75965,75966],{},"Built to deploy integrations between Pulsar and SQS quickly and securely, the SQS connector brings various advantages, including but not limited to:",[321,75968,75969,75971,75973,75975,75977,75979],{},[324,75970,75373],{},[324,75972,75376],{},[324,75974,75379],{},[324,75976,75382],{},[324,75978,75385],{},[324,75980,75388],{},[40,75982,39647],{"id":39646},[48,75984,75985],{},"The SQS connector is a major step in the journey of integrating other message systems into the Pulsar ecosystem. To get involved with the SQS connector, check out the following featured resources:",[321,75987,75988,75991,76005,76007,76024,76026],{},[324,75989,75990],{},"Try it out",[324,75992,75993,75994,75998,75999,76004],{},"To get started, ",[55,75995,36195],{"href":75996,"rel":75997},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-sqs\u002Freleases",[264]," the connector and head over to the ",[55,76000,76003],{"href":76001,"rel":76002},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-sqs\u002Fblob\u002Fmaster\u002FREADME.md",[264],"user guides"," that walk you through the whole process.",[324,76006,75404],{},[324,76008,76009,76010,48888,76014,75415,76017,76020,76021,39692],{},"Have questions? As always, feel free to create an issue on ",[55,76011,39680],{"href":76012,"rel":76013},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-sqs\u002Fissues\u002Fnew\u002Fchoose",[264],[55,76015,39686],{"href":39684,"rel":76016},[264],[55,76018,39691],{"href":33664,"rel":76019},[264],", or join the #connector-sqs channel on ",[55,76022,57762],{"href":75428,"rel":76023},[264],[324,76025,75432],{},[324,76027,76028,76029,75440],{},"The SQS connector is a community-driven service, which hosts its source code on the StreamNative GitHub repo. We would love you to explore this new connector and contribute to its evolution. Have feature requests or bug reports? Do not hesitate to ",[55,76030,75439],{"href":76012,"rel":76031},[264],[48,76033,76034],{},[384,76035],{"alt":76036,"src":76037},"illustration with green yellow and orange blocs ","\u002Fimgs\u002Fblogs\u002F63a39c2ccb08ad7d44a70d97_sqs-cooperation.jpeg",[48,76039,76040,76041,75451,76043,76046],{},"Today we are making the SQS connector available to you on ",[55,76042,38697],{"href":74017},[55,76044,75455],{"href":35258,"rel":76045},[264]," are great ways to take your organization to the next level. At StreamNative, we’re committed to the Pulsar community and will continue to invest in the Pulsar ecosystem. Today’s announcement is just another step towards realizing our vision. Stay tuned for more announcements from StreamNative!",[48,76048,76049],{},"Happy Pulsaring!",{"title":18,"searchDepth":19,"depth":19,"links":76051},[76052,76053,76054,76055],{"id":75931,"depth":19,"text":75932},{"id":75949,"depth":19,"text":75950},{"id":75962,"depth":19,"text":75963},{"id":39646,"depth":19,"text":39647},"2021-03-17","Today StreamNative ships the first-ever release of the AWS Simple Queue Service (SQS) connector for Apache Pulsar. For organizations looking to deploy reactive data pipelines around Pulsar and SQS, we would recommend you try it out!","\u002Fimgs\u002Fblogs\u002F63c7fd5fdbd853361e401f4c_63a39c2c15194c57329b5cf5_sqs-top.png",{},"\u002Fblog\u002Fannouncing-aws-sqs-connector-for-apache-pulsar",{"title":75912,"description":76057},"blog\u002Fannouncing-aws-sqs-connector-for-apache-pulsar",[302,28572],"HJq5FRQV6nWuHnrUyehu1hBiDvf5IVeWguTvXX3gE6Q",{"id":76066,"title":76067,"authors":76068,"body":76069,"category":7338,"createdAt":290,"date":76303,"description":76304,"extension":8,"featured":294,"image":76305,"isDraft":294,"link":290,"meta":76306,"navigation":7,"order":296,"path":76307,"readingTime":11508,"relatedResources":290,"seo":76308,"stem":76309,"tags":76310,"__hash__":76311},"blogs\u002Fblog\u002Fintroducing-the-apache-pulsar-hackathon-2021.md","Introducing the Apache Pulsar Hackathon 2021",[69353],{"type":15,"value":76070,"toc":76293},[76071,76074,76085,76089,76092,76106,76109,76113,76116,76133,76140,76144,76167,76171,76174,76186,76190,76204,76207,76230,76234,76245,76249,76285],[48,76072,76073],{},"Adoption of Apache Pulsar is accelerating as organizations around the world pursue cloud-native technologies. To build on this momentum, StreamNative is excited to announce the first-ever Apache Pulsar Hackathon 2021, taking place on May 6-7th. This event will give you an opportunity to:",[321,76075,76076,76079,76082],{},[324,76077,76078],{},"Engage and grow with the Pulsar community",[324,76080,76081],{},"Generate ideas to enhance Pulsar and its ecosystem",[324,76083,76084],{},"Drive contributions as part of a diverse group of developers and data architects",[40,76086,76088],{"id":76087},"why-participate","Why Participate?",[48,76090,76091],{},"Apache Pulsar is quickly becoming one of the most adopted platforms for companies looking to develop real-time data messaging and streaming applications. As a result, the community has seen an increase in engagement and currently boasts:",[321,76093,76094,76097,76100,76103],{},[324,76095,76096],{},"370+ Pulsar Contributors (with 150+ new contributors in 2020 alone)",[324,76098,76099],{},"7,400+ Github Stars",[324,76101,76102],{},"2,600+ Pulsar Slack Members",[324,76104,76105],{},"80 speakers and 1,600 registrations across 2 Apache Pulsar global conferences in 2020",[48,76107,76108],{},"At the Pulsar Hackathon you will meet other Pulsar enthusiasts and get involved in a rapidly growing technology. And did we mention cash prizes up to $5,000? Whether you’re a Pulsar newbie or expert, join us and showcase your creativity and skills.",[40,76110,76112],{"id":76111},"pulsar-hackathon-judges","Pulsar Hackathon Judges",[48,76114,76115],{},"We have an amazing list of judges on board and by participating you will have a chance to interact with these judges. The panel includes:",[321,76117,76118,76121,76124,76127,76129,76131],{},[324,76119,76120],{},"Sijie Guo, Apache Pulsar PMC Member and CEO of StreamNative",[324,76122,76123],{},"Matteo Merli, Apache Pulsar PMC Member and Sr. Principal Software Engineer at Splunk",[324,76125,76126],{},"Jerry Peng, Apache Pulsar PMC Member and Principal Software Engineer at Splunk",[324,76128,70050],{},[324,76130,70041],{},[324,76132,70038],{},[48,76134,76135,76136,76139],{},"Winners will be announced live at the ",[55,76137,71617],{"href":70535,"rel":76138},[264],", taking place on June 16-17. You can find more details on the Pulsar Hackathon below.",[40,76141,76143],{"id":76142},"hackathon-timeline-all-times-listed-in-pst","Hackathon Timeline (All times listed in PST):",[321,76145,76146,76149,76152,76155,76158,76161,76164],{},[324,76147,76148],{},"April 28, 2021 - Registration Deadline",[324,76150,76151],{},"May 6-7, 2021 - Hackathon",[324,76153,76154],{},"May 6, 9:00 AM - Virtual Live Kick Off",[324,76156,76157],{},"May 6, 9:15 AM - 10 PM - Hacking Time",[324,76159,76160],{},"May 7, 9:00 AM - 10 PM - Hacking Time",[324,76162,76163],{},"May 7, 11:59 PM - Video Submission Deadline",[324,76165,76166],{},"June 16-17, 2021 - Winners Announced at the Pulsar Summit North America 2021",[40,76168,76170],{"id":76169},"hackathon-categories","Hackathon Categories:",[48,76172,76173],{},"To help inspire you, we’ve created five categories for this year’s hackathon:",[321,76175,76176,76178,76180,76182,76184],{},[324,76177,70008],{},[324,76179,70011],{},[324,76181,70014],{},[324,76183,70017],{},[324,76185,70020],{},[40,76187,76189],{"id":76188},"hackathon-prizes","Hackathon Prizes:",[321,76191,76192,76195,76198,76201],{},[324,76193,76194],{},"First-place: $5,000",[324,76196,76197],{},"Second-place: $2,500",[324,76199,76200],{},"Third-place: $1,000",[324,76202,76203],{},"All participants: Pulsar and StreamNative swag",[40,76205,76206],{"id":52653},"How to Participate:",[321,76208,76209,76216,76223],{},[324,76210,76211,76212,190],{},"Sign up! We encourage teams up to four people. If you already have the team you will be working with and your idea, you can sign up ",[55,76213,267],{"href":76214,"rel":76215},"https:\u002F\u002Fwww.eventbrite.com\u002Fmanage\u002Fevents\u002F143906003731\u002Fdetails",[264],[324,76217,76218,76219,76222],{},"If you are looking for teammates, ideas, or just to connect with the community, we encourage you to join ",[55,76220,71975],{"href":57760,"rel":76221},[264]," (designated hackathon channel: #pulsar-hackathon-2021).",[324,76224,76225,76226,190],{},"Additionally, we will be hosting a special “Office Hours” for Pulsar Hackathon participants with StreamNative’s Chief Architect Addision Higham on April 21, 12:00pm PT. You can sign up for Office Hours ",[55,76227,267],{"href":76228,"rel":76229},"https:\u002F\u002Fus02web.zoom.us\u002Fwebinar\u002Fregister\u002F8916122493161\u002FWN_bOsEtkkXSYmBwQXnE5sCMg?utm_medium=email&_hsmi=109206128&_hsenc=p2ANqtz--Lxq_y22sfDt1wM5enSm1FPU0OoUIInpYNWCgDnC40Fsp-2oFpY_M_DDxYjgFCUzwZR1UN3QQU9LRBQGbzZDwTtJm5jw&utm_content=109206128&utm_source=hs_email",[264],[40,76231,76233],{"id":76232},"submission-requirements","Submission Requirements:",[321,76235,76236,76239,76242],{},[324,76237,76238],{},"You and your team will need to submit a pre-recorded video demo of your project by 11:59 PM, PST on May 7th. The max video length is ten minutes.",[324,76240,76241],{},"Note: You cannot use pre-existing code but you can connect to pre-existing code and build new stuff on top of existing functionality.",[324,76243,76244],{},"You will be judged on three criteria: (1) Innovation (2) Utility \u002F Applicability (3) Difficulty",[40,76246,76248],{"id":76247},"want-to-polish-your-pulsar-skills-we-have-resources-to-help-you-ramp-up","Want to polish your Pulsar skills? We have resources to help you ramp up:",[321,76250,76251,76259,76267,76275],{},[324,76252,76253,76258],{},[55,76254,76257],{"href":76255,"rel":76256},"https:\u002F\u002Fwww.meetup.com\u002FSF-Bay-Area-Apache-Pulsar-Meetup\u002Fevents\u002F276555550\u002F",[264],"Apache Flink x Pulsar Virtual Meetup: Streaming SQL at Uber and Facebook"," Time: March 16-17, 11:00 AM-12:30 PM, PST",[324,76260,76261,76266],{},[55,76262,76265],{"href":76263,"rel":76264},"https:\u002F\u002Fstreamnative.zoom.us\u002Fwebinar\u002Fregister\u002FWN_mV-MgBdtTQmpzSPAymd29w",[264],"Intro: Apache Pulsar 101"," Time: March 24, 3:00 PM, EST",[324,76268,76269,76274],{},[55,76270,76273],{"href":76271,"rel":76272},"https:\u002F\u002Fus02web.zoom.us\u002Fwebinar\u002Fregister\u002F2216003858545\u002FWN_Qig1o1VlSVe6lk5omYb6zA",[264],"TGIP: Monthly Apache Pulsar Updates"," Time: April 14, 12:00 PM, PST",[324,76276,76277,76278,76281,76282,190],{},"For more resources, sign up for our monthly Pulsar newsletter ",[55,76279,267],{"href":34070,"rel":76280},[264]," or visit ",[55,76283,76284],{"href":10259},"our website",[48,76286,76287,76288,76292],{},"Please contact ",[55,76289,76291],{"href":76290},"mailto:events@streamnative.io","events@streamnative.io"," with any questions.",{"title":18,"searchDepth":19,"depth":19,"links":76294},[76295,76296,76297,76298,76299,76300,76301,76302],{"id":76087,"depth":19,"text":76088},{"id":76111,"depth":19,"text":76112},{"id":76142,"depth":19,"text":76143},{"id":76169,"depth":19,"text":76170},{"id":76188,"depth":19,"text":76189},{"id":52653,"depth":19,"text":76206},{"id":76232,"depth":19,"text":76233},{"id":76247,"depth":19,"text":76248},"2021-03-11","StreamNative is excited to announce the first-ever Apache Pulsar Hackathon 2021, taking place on May 6-7th. First place wins $5,000. You’ll meet new people, learn new skills, and have the chance to connect with judges from StreamNative, Splunk, Ververica, and Elastic.","\u002Fimgs\u002Fblogs\u002F63c7fd6b8c09ab514ff91977_63a39bcb9221794c576c3ca8_hackathonbg.jpeg",{},"\u002Fblog\u002Fintroducing-the-apache-pulsar-hackathon-2021",{"title":76067,"description":76304},"blog\u002Fintroducing-the-apache-pulsar-hackathon-2021",[821],"jlIj7RSU5Bpg7b0U5BbwxEXG-YCYJJl5NlRw1ajd1z4",{"id":76313,"title":76314,"authors":76315,"body":76316,"category":3550,"createdAt":290,"date":76396,"description":76397,"extension":8,"featured":294,"image":76398,"isDraft":294,"link":290,"meta":76399,"navigation":7,"order":296,"path":76400,"readingTime":20144,"relatedResources":290,"seo":76401,"stem":76402,"tags":76403,"__hash__":76404},"blogs\u002Fblog\u002Fververica-streamnative-cloud-partners.md","Ververica + StreamNative: Cloud Partners",[806],{"type":15,"value":76317,"toc":76391},[76318,76328,76332,76335,76338,76341,76344,76348,76351,76354,76363,76367,76375,76379,76386,76388],[48,76319,76320,76321,4003,76324,76327],{},"We are excited to announce the Cloud Partnership of Ververica and StreamNative. This is fantastic news for both the ",[55,76322,31802],{"href":31800,"rel":76323},[264],[55,76325,821],{"href":23526,"rel":76326},[264]," communities as it means a closer collaboration and integration between these two industry-leading technologies.",[40,76329,76331],{"id":76330},"why-ververica-streamnative","Why Ververica + StreamNative?",[48,76333,76334],{},"Ververica is an integrated platform for stateful stream processing and streaming analytics with Open Source Apache Flink. Ververica and Apache Flink are leveraged by top companies around the world to build streaming processing applications.",[48,76336,76337],{},"Use cases include building real-time ETL pipelines to provide business insights; anomaly detection for banking, manufacturing, and IoT; real-time monitoring for core business applications, and data application development.",[48,76339,76340],{},"StreamNative Cloud, powered by Apache Pulsar, provides a turnkey solution to help organizations make the transition to cloud native, event-driven architectures. It enables developers to focus on building resilient, scalable applications quickly, freeing teams from the day-to-day management of messaging and event storage infrastructure.",[48,76342,76343],{},"Companies look to StreamNative and Apache Pulsar to build the critical infrastructure to support both mission critical application services as well as high throughput, low latency, data streaming applications.",[40,76345,76347],{"id":76346},"flink-pulsar","Flink + Pulsar",[48,76349,76350],{},"As the volume and range of data grows, companies are increasingly adopting Apache Flink and Apache Pulsar to build unified batch and streaming applications. These unified applications make it simpler to build solutions that leverage data in real-time.",[48,76352,76353],{},"Apache Pulsar provides the fundamental real-time data serving and storage ability while Apache Flink provides the critical computing ability for both streaming and batch use cases. Leveraged together, these technologies provide an integrated data processing platform which covers a broad range of use cases across industries.",[48,76355,76356,76357,76362],{},"The high demand for Pulsar and Flink prompted the two communities to create the Pulsar Flink connector. The ",[55,76358,76361],{"href":76359,"rel":76360},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink\u002F",[264],"Pulsar Flink connector"," provides elastic data processing with Apache Pulsar and Apache Flink, allowing Apache Flink to read\u002Fwrite data from\u002Fto Apache Pulsar.",[40,76364,76366],{"id":76365},"learn-more-about-flink-pulsar","Learn More About Flink + Pulsar",[48,76368,76369,76370,76374],{},"If you’re interested to learn more about Flink and Pulsar, we’d like to invite you to an upcoming ",[55,76371,76373],{"href":76359,"rel":76372},[264],"Apache Flink x Pulsar Virtual Meetup"," taking place March 16 & 17, 2021. Join us for this two-day event sponsored by Uber, Ververica, and StreamNative.",[3933,76376,76378],{"id":76377},"about-ververica","About Ververica",[48,76380,76381,76385],{},[55,76382,491],{"href":76383,"rel":76384},"https:\u002F\u002Fwww.ververica.com\u002F",[264]," was founded by the original creators of Apache Flink® with the mission of enabling business in real-time. It provides Ververica Platform, a turnkey solution that operationalizes years of experience working with Flink users to accelerate the journey of enterprises to stream processing adoption.",[3933,76387,10248],{"id":10247},[48,76389,76390],{},"Founded by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative provides StreamNative Cloud, offering Apache Pulsar as a Service. The company also supports on-premise Pulsar deployments and related commercial support. StreamNative Cloud provides a scalable, resilient, and secure messaging and event streaming platform for enterprise.",{"title":18,"searchDepth":19,"depth":19,"links":76392},[76393,76394,76395],{"id":76330,"depth":19,"text":76331},{"id":76346,"depth":19,"text":76347},{"id":76365,"depth":19,"text":76366},"2021-03-03","We are excited to announce the Cloud Partnership of Ververica and StreamNative. This is fantastic news for both the Apache Flink and Apache Pulsar communities as it means a closer collaboration and integration between these two industry-leading technologies.","\u002Fimgs\u002Fblogs\u002F63c7fd7953f98a64d7a495a2_63a39b6a8f21cb9a9f2ac5d3_ververicaback.jpeg",{},"\u002Fblog\u002Fververica-streamnative-cloud-partners",{"title":76314,"description":76397},"blog\u002Fververica-streamnative-cloud-partners",[821,8057,303],"hFvI2rhCeyUtUi8Xn44vDBWWs0d-QJ-bHk7P86vJUpw",{"id":76406,"title":58870,"authors":76407,"body":76408,"category":821,"createdAt":290,"date":76799,"description":76800,"extension":8,"featured":294,"image":76801,"isDraft":294,"link":290,"meta":76802,"navigation":7,"order":296,"path":38014,"readingTime":3556,"relatedResources":290,"seo":76803,"stem":76804,"tags":76805,"__hash__":76806},"blogs\u002Fblog\u002Fpulsar-isolation-depth-look-how-to-achieve-isolation-in-pulsar.md",[808,61300],{"type":15,"value":76409,"toc":76781},[76410,76416,76419,76427,76444,76447,76450,76452,76455,76461,76464,76534,76537,76544,76548,76551,76554,76557,76560,76563,76569,76571,76624,76630,76633,76640,76647,76650,76652,76655,76667,76669,76672,76683,76686,76689,76692,76695,76701,76703,76724,76727,76733,76739,76742,76745,76747,76756,76759,76761,76770,76772,76775,76778],[48,76411,76412],{},[55,76413],{"href":76414,"rel":76415},"https:\u002F\u002Ftwitter.com\u002FAnonymitaet1",[264],[48,76417,76418],{},"One of the great things about using Apache Pulsar is that Pulsar’s multi-layer and segment-centric architecture and hierarchical resource management provide a solid foundation for isolation, which allows you to isolate resources in your desired manner, prevent resource competition, and attain stability.",[48,76420,76421,76422,76426],{},"This is the first blog in our four-part blog series on how to achieve ",[55,76423,76425],{"href":73962,"rel":76424},[264],"resource isolation in Apache Pulsar",". In this blog, we give you an overview of how to use the following approaches to achieve isolation in Pulsar:",[321,76428,76429,76434,76439],{},[324,76430,76431],{},[55,76432,72517],{"href":76433},"\u002Fblog\u002Ftech\u002F2021-03-02-taking-an-in-depth-look-at-how-to-achieve-isolation-in-pulsar\u002F#separate-pulsar-clusters",[324,76435,76436],{},[55,76437,72520],{"href":76438},"\u002Fblog\u002Ftech\u002F2021-03-02-taking-an-in-depth-look-at-how-to-achieve-isolation-in-pulsar\u002F#shared-bookkeeper-cluster",[324,76440,76441],{},[55,76442,72523],{"href":76443},"\u002Fblog\u002Ftech\u002F2021-03-02-taking-an-in-depth-look-at-how-to-achieve-isolation-in-pulsar\u002F#single-pulsar-cluster",[40,76445,72517],{"id":76446},"separate-pulsar-clusters",[48,76448,76449],{},"In this approach, you need to create different Pulsar clusters for your isolation units.",[32,76451,69531],{"id":2696},[48,76453,76454],{},"As shown in figure 1, it demonstrates the deployment of separate Pulsar clusters to achieve isolation.",[48,76456,76457],{},[384,76458],{"alt":76459,"src":76460},"illustration Deployment of separate Pulsar clusters","\u002Fimgs\u002Fblogs\u002F63a39a7c944d6a8176d55fc8_isolation-1.png",[48,76462,76463],{},"Here are some key points for understanding how it works:",[321,76465,76466,76474,76486,76489,76503,76510],{},[324,76467,76468,76469,76473],{},"Each ",[55,76470,42908],{"href":76471,"rel":76472},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fconcepts-architecture-overview\u002F#clusters",[264]," exposes its service through a DNS entry point and makes sure a client can access the cluster through the DNS entry point. From the client side, the client can use one or multiple Pulsar URLs that the Pulsar cluster exposes as the service URL.",[324,76475,76476,76477,4003,76481,190],{},"Each Pulsar cluster has one or multiple ",[55,76478,66207],{"href":76479,"rel":76480},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fconcepts-architecture-overview\u002F#brokers",[264],[55,76482,76485],{"href":76483,"rel":76484},"https:\u002F\u002Fbookkeeper.apache.org\u002Fdocs\u002Flatest\u002Fgetting-started\u002Fconcepts\u002F#basic-terms",[264],"bookies",[324,76487,76488],{},"Each Pulsar cluster has one metadata store.",[324,76490,76491,76492,4003,76497,76502],{},"Metadata store can be separated into ",[55,76493,76496],{"href":76494,"rel":76495},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fconcepts-architecture-overview\u002F#metadata-store",[264],"Pulsar metadata store",[55,76498,76501],{"href":76499,"rel":76500},"https:\u002F\u002Fbookkeeper.apache.org\u002Fdocs\u002Flatest\u002Fgetting-started\u002Fconcepts\u002F#metadata-storage",[264],"BookKeeper metadata store",". While the metadata store in this guide refers to these two concepts rather than distinguish them.",[324,76504,76505,76506,190],{},"Separate Pulsar clusters use a shared ",[55,76507,72616],{"href":76508,"rel":76509},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fconcepts-architecture-overview\u002F#configuration-store",[264],[324,76511,76512,76513,76518,76519,76524,76525,76529,76530,190],{},"Pulsar's hierarchical resource management provides a solid foundation for isolation. In this approach, if you want to achieve namespace isolation, you need to ",[55,76514,76517],{"href":76515,"rel":76516},"http:\u002F\u002Fpulsar.apache.org\u002Ftools\u002Fpulsar-admin\u002F2.8.0-SNAPSHOT\u002F#-em-create-em--9",[264],"specify a cluster for a namespace",". The cluster must be in the ",[55,76520,76523],{"href":76521,"rel":76522},"http:\u002F\u002Fpulsar.apache.org\u002Ftools\u002Fpulsar-admin\u002F2.8.0-SNAPSHOT\u002F#namespaces",[264],"allowed cluster list of the tenant",". Topics under the namespace are assigned to this cluster. For how to set a cluster for a namespace, see ",[55,76526,267],{"href":76527,"rel":76528},"http:\u002F\u002Fpulsar.apache.org\u002Ftools\u002Fpulsar-admin\u002F2.8.0-SNAPSHOT\u002F#-em-set-clusters-em-",[264],". For how to manage Pulsar clusters, see ",[55,76531,267],{"href":76532,"rel":76533},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fadmin-api-clusters\u002F",[264],[32,76535,76536],{"id":64614},"Migrate namespace",[48,76538,76539,76540,190],{},"If you want to migrate namespaces between different clusters, you need to enable geo-replication for the namespaces and disable it after all data replicated to the target cluster. For how to set geo-replication for a namespace, see ",[55,76541,267],{"href":76542,"rel":76543},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fadministration-geo\u002F",[264],[32,76545,76547],{"id":76546},"scale-up-or-down-node","Scale up or down node",[48,76549,76550],{},"If you want to scale up or scale down brokers or bookies, you need to scale up or scale down the brokers and bookies in the corresponding cluster.",[40,76552,72520],{"id":76553},"shared-bookkeeper-cluster",[48,76555,76556],{},"In this approach, you need to deploy one BookKeeper cluster shared across multiple broker clusters.",[32,76558,69531],{"id":76559},"how-it-works-1",[48,76561,76562],{},"As shown in figure 2, it demonstrates the deployment of a shared BookKeeper cluster to achieve isolation.",[48,76564,76565],{},[384,76566],{"alt":76567,"src":76568},"figure Deployment of shared BookKeeper cluster","\u002Fimgs\u002Fblogs\u002F63a39a7c8f21cb5d162a4175_isolation-2.png",[48,76570,76463],{},[321,76572,76573,76578,76583,76585,76588,76602,76605,76608,76611],{},[324,76574,76468,76575,76473],{},[55,76576,42908],{"href":76471,"rel":76577},[264],[324,76579,76476,76580,190],{},[55,76581,66207],{"href":76479,"rel":76582},[264],[324,76584,76488],{},[324,76586,76587],{},"Separate Pulsar clusters use a shared BookKeeper cluster.",[324,76589,76512,76590,76518,76593,76524,76596,76529,76599,190],{},[55,76591,76517],{"href":76515,"rel":76592},[264],[55,76594,76523],{"href":76521,"rel":76595},[264],[55,76597,267],{"href":76527,"rel":76598},[264],[55,76600,267],{"href":76532,"rel":76601},[264],[324,76603,76604],{},"As shown in figure 3, the storage isolation is achieved by different bookie affinity groups.",[324,76606,76607],{},"All bookie isolation groups use a shared BookKeeper cluster and a metadata store.",[324,76609,76610],{},"Each bookie isolation group has one or several bookies.",[324,76612,76613,76614,76619,76620,190],{},"You can specify a ",[55,76615,76618],{"href":76616,"rel":76617},"http:\u002F\u002Fpulsar.apache.org\u002Ftools\u002Fpulsar-admin\u002F2.8.0-SNAPSHOT\u002F#-em-set-bookie-affinity-group-em-",[264],"primary or secondary group"," (one or several) for a namespace. Topics under the namespace are created on the bookies in the primary group firstly and then created on the bookies in the secondary group. For how to set bookie affinity groups, see ",[55,76621,267],{"href":76622,"rel":76623},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fadministration-isolation\u002F#bookie-isolation",[264],[48,76625,76626],{},[384,76627],{"alt":76628,"src":76629},"illustration bookkeeper cluster","\u002Fimgs\u002Fblogs\u002F63a39a7c6e8eb40d7ff2cceb_isolation-3.png",[32,76631,76536],{"id":76632},"migrate-namespace-1",[48,76634,76635,76636,190],{},"If you want to migrate the message service of the namespace to another broker cluster, you need to ",[55,76637,76639],{"href":76527,"rel":76638},[264],"change the cluster for the namespace",[48,76641,76642,76643,76646],{},"If you want to migrate the namespace to another bookie affinity group, you need to change the bookie affinity group. For how to set a bookie affinity group, see ",[55,76644,267],{"href":76622,"rel":76645},[264],". Besides, since the BookKeeper cluster is shared across all broker clusters, there is no need to copy data to another BookKeeper cluster.",[32,76648,76547],{"id":76649},"scale-up-or-down-node-1",[3933,76651,61065],{"id":61064},[48,76653,76654],{},"When scaling up or scaling down brokers, you need to take the following key points into consideration:",[321,76656,76657,76664],{},[324,76658,76659,76660,190],{},"When scaling up brokers, specify the broker isolation group for the newly added broker using the ",[55,76661,76618],{"href":76662,"rel":76663},"http:\u002F\u002Fpulsar.apache.org\u002Ftools\u002Fpulsar-admin\u002F2.8.0-SNAPSHOT\u002F#-em-set-em-",[264],[324,76665,76666],{},"When scaling down brokers, make sure the broker isolation group has enough brokers.",[3933,76668,73488],{"id":73487},[48,76670,76671],{},"When scaling up or scaling down bookies, you need to take the following key points into consideration:",[321,76673,76674,76677],{},[324,76675,76676],{},"When scaling up bookies, specify the bookie affinity group for the newly added bookies.",[324,76678,76679,76680,190],{},"When scaling down bookies, make sure the bookie affinity group has enough bookies. For how to set bookie affinity groups, see ",[55,76681,267],{"href":76622,"rel":76682},[264],[40,76684,72523],{"id":76685},"single-pulsar-cluster",[48,76687,76688],{},"In this approach, you do not need to deploy multiple broker clusters and multiple bookie clusters. Instead, you need to manage a single Pulsar cluster.",[32,76690,69531],{"id":76691},"how-it-works-2",[48,76693,76694],{},"As shown in figure 4, it demonstrates the deployment of a single Pulsar cluster to achieve isolation.",[48,76696,76697],{},[384,76698],{"alt":76699,"src":76700},"illustration Deployment of single Pulsar cluster","\u002Fimgs\u002Fblogs\u002F63a39a7ddd066e1186b0ec6d_isolation-4.png",[48,76702,76463],{},[321,76704,76705,76711,76718],{},[324,76706,3600,76707,76710],{},[55,76708,42908],{"href":76471,"rel":76709},[264]," exposes its service through a DNS entry point and makes sure a client can access the cluster through the DNS entry point. From the client side, the client can use the Pulsar URL that the Pulsar cluster exposes as the service URL.",[324,76712,76713,76714,190],{},"Broker isolation is achieved by different broker isolation groups (Pulsar assigns the topic to the broker under the specific broker isolation). For how to set broker isolation groups, see ",[55,76715,267],{"href":76716,"rel":76717},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fadministration-isolation\u002F#broker-isolation",[264],[324,76719,76720,76721,190],{},"Storage isolation is achieved by different bookie affinity groups. For how to set bookie affinity groups, see ",[55,76722,267],{"href":76622,"rel":76723},[264],[32,76725,76536],{"id":76726},"migrate-namespace-2",[48,76728,76729,76730,190],{},"If you want to migrate the namespace to another broker isolation group, you need to change the namespace isolation policy. For how to set namespace isolation policy, see ",[55,76731,267],{"href":76662,"rel":76732},[264],[48,76734,76735,76736,190],{},"If you want to migrate the namespace to another bookie affinity group (it does not move the old data to the new bookie affinity group), you need to change the bookie affinity group. For how to set a bookie affinity group, see ",[55,76737,267],{"href":76622,"rel":76738},[264],[32,76740,76547],{"id":76741},"scale-up-or-down-node-2",[3933,76743,61065],{"id":76744},"broker-1",[48,76746,76654],{},[321,76748,76749,76754],{},[324,76750,76659,76751,190],{},[55,76752,76618],{"href":76662,"rel":76753},[264],[324,76755,76666],{},[3933,76757,73488],{"id":76758},"bookie-1",[48,76760,76671],{},[321,76762,76763,76765],{},[324,76764,76676],{},[324,76766,76679,76767,190],{},[55,76768,267],{"href":76622,"rel":76769},[264],[40,76771,52473],{"id":52472},[48,76773,76774],{},"In product environments, you can combine all Pulsar isolation approaches together or choose none of them to suit your needs. Normally, when choosing isolation approaches, you can take the following points as references:",[48,76776,76777],{},"For some critical businesses (such as billing, ads, and so on), you can have multiple small Pulsar clusters, which do not share storage or local ZooKeeper with the other clusters. This approach provides the highest level of isolation for the most critical workloads.",[48,76779,76780],{},"For the organization consists of multiple teams, you can deploy a single large Pulsar cluster and use various namespaces for different isolation groups. The isolation groups can be determined by capacity but more often by different workloads. For example, use cases with large amounts of fanout may have different hardware than those tailored for the lowest end-to-end-latency.",{"title":18,"searchDepth":19,"depth":19,"links":76782},[76783,76788,76793,76798],{"id":76446,"depth":19,"text":72517,"children":76784},[76785,76786,76787],{"id":2696,"depth":279,"text":69531},{"id":64614,"depth":279,"text":76536},{"id":76546,"depth":279,"text":76547},{"id":76553,"depth":19,"text":72520,"children":76789},[76790,76791,76792],{"id":76559,"depth":279,"text":69531},{"id":76632,"depth":279,"text":76536},{"id":76649,"depth":279,"text":76547},{"id":76685,"depth":19,"text":72523,"children":76794},[76795,76796,76797],{"id":76691,"depth":279,"text":69531},{"id":76726,"depth":279,"text":76536},{"id":76741,"depth":279,"text":76547},{"id":52472,"depth":19,"text":52473},"2021-03-02","This blog gives you a deep explanation of how to use different approaches to achieve isolation in Pulsar.","\u002Fimgs\u002Fblogs\u002F63c7fd87bc45dd18118c8061_63a39a7c7f64cadb1bc903bd_isolation-top.jpeg",{},{"title":58870,"description":76800},"blog\u002Fpulsar-isolation-depth-look-how-to-achieve-isolation-in-pulsar",[27847,821],"olME5mfb7Wye6IM1pELDwssYftjluFBBt14MkDSoq8g",{"id":76808,"title":76809,"authors":76810,"body":76811,"category":3550,"createdAt":290,"date":77147,"description":77148,"extension":8,"featured":294,"image":77149,"isDraft":294,"link":290,"meta":77150,"navigation":7,"order":296,"path":77151,"readingTime":11508,"relatedResources":290,"seo":77152,"stem":77153,"tags":77154,"__hash__":77155},"blogs\u002Fblog\u002Fstreamnatives-2020-year-in-review.md","StreamNative's 2020 Year in Review",[806],{"type":15,"value":76812,"toc":77135},[76813,76816,76819,76823,76826,76855,76859,76862,76876,76880,76883,76928,76932,76935,77006,77010,77013,77017,77020,77080,77084,77087,77098,77100,77103,77109,77111,77114],[48,76814,76815],{},"2020 was a difficult year as individuals, communities, and organizations around the world were faced with the challenges of the global pandemic. While many of the plans and expectations for 2020 had to be put on hold, we were impressed with the resilience of the Pulsar community and its commitment to finding new ways to connect and collaborate.",[48,76817,76818],{},"The challenges we faced last year have made us even more grateful for the Pulsar community and the growth and advancements we were able to achieve together. We'd like to take some time to share some of the highlights from 2020.",[40,76820,76822],{"id":76821},"events-recap","Events Recap",[48,76824,76825],{},"With in-person events on hold, we endeavored to connect and serve the community in the digital space. We held the first-ever (and second-ever) global Pulsar Summit, shared weekly project updates and monthly webinars with partners, launched a Pulsar training program, and much more. Below are some of the highlights from the 2020 events:",[1666,76827,76828,76835,76842,76848],{},[324,76829,76830],{},[55,76831,76834],{"href":76832,"rel":76833},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fvirtual-conference-2020",[264],"Pulsar Summit Virtual Conference 2020 (North America)",[324,76836,76837],{},[55,76838,76841],{"href":76839,"rel":76840},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020",[264],"Pulsar Summit Asia 2020",[324,76843,76844],{},[55,76845,76847],{"href":76846},"\u002Fen\u002Fresource#tgip","TGIP (Thank Goodness It's Pulsar) Weekly Project Updates",[324,76849,76850],{},[55,76851,76854],{"href":76852,"rel":76853},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqRma1oIkcWhfmUuJrMM5YIG8hjju62Ev",[264],"StreamNative Webinar Series",[40,76856,76858],{"id":76857},"pulsar-community-growth","Pulsar Community Growth",[48,76860,76861],{},"In spite of the challenges, the Apache Pulsar community continued to be engaged and witnessed strong growth in 2020. From new and returning contributors to the Pulsar project, to Slack member conversations, to the speaker, sponsor, and attendee participation at the Pulsar Summits, the Pulsar community showed its commitment to the project.",[321,76863,76864,76867,76870,76873],{},[324,76865,76866],{},"340+ Pulsar Contributors (with 150+ new contributors in 2020 alone)",[324,76868,76869],{},"7,000+ Github Stars",[324,76871,76872],{},"2,500+ Pulsar Slack Members",[324,76874,76875],{},"80 speakers and 1,600 attendee sign ups across 2 global conferences",[40,76877,76879],{"id":76878},"pulsar-project-announcements","Pulsar Project Announcements",[48,76881,76882],{},"We had two major project releases (Apache Pulsar 2.6 and 2.7) and made some major additions to the Pulsar ecosystem. Most excitingly, we launched StreamNative Cloud, providing Apache Pulsar as a Service, a turnkey solution to help organizations get up and running on Pulsar and to ensure their success. You can find out more about each below:",[321,76884,76885,76888,76893,76898,76903,76908,76915,76922],{},[324,76886,76887],{},"Pulsar Ecosystem Updates",[324,76889,76890],{},[55,76891,76892],{"href":74017},"Announcing StreamNative Hub",[324,76894,76895],{},[55,76896,76897],{"href":71126},"Announcing \"Kafka on Pulsar\" (KoP), Bringing the Native Apache Kafka Protocol Support to Apache Pulsar, Open-Sourced by StreamNative and OVHCloud Open-Sourced",[324,76899,76900],{},[55,76901,76902],{"href":71137},"Announcing \"AMQP on Pulsar\" (AoP), Open-Sourced by StreamNative and ChinaMobile",[324,76904,76905],{},[55,76906,76907],{"href":71149},"Announcing \"MQTT on Pulsar\" (MoP), Brings the Native MQTT Protocol Support to Apache Pulsar",[324,76909,76910,76914],{},[55,76911,76913],{"href":76912},"\u002Fen\u002Fblog\u002Frelease\u002F2020-06-18-pulsar-260","Apache Pulsar 2.6",", including support for large message sizes, namespace change events, and more.",[324,76916,76917,76921],{},[55,76918,76920],{"href":76919},"\u002Fen\u002Fblog\u002Frelease\u002F2020-12-25-pulsar-270","Apache Pulsar 2.7",", including transaction support, topic level policy, and more.",[324,76923,76924],{},[55,76925,76927],{"href":76926},"\u002Fen\u002Fblog\u002Frelease\u002F2020-08-18-announcing-streamnative-cloud","Announcing StreamNative Cloud - the Only Fully-Managed Apache Pulsar Offering",[40,76929,76931],{"id":76930},"top-engineering-blog-posts","Top Engineering Blog Posts",[48,76933,76934],{},"In 2020, we worked with the community to develop educational content to help developers and devops teams adopt and implement Pulsar. Below are the top 10 blogs and success stories from 2020:",[1666,76936,76937,76943,76949,76955,76958,76964,76970,76976,76982,76988,76994,77000],{},[324,76938,76939],{},[55,76940,76942],{"href":76941},"\u002Fen\u002Fsuccess-stories\u002Fiterable","How Apache Pulsar is Helping Iterable Scale its Customer Engagement Platform",[324,76944,76945],{},[55,76946,76948],{"href":76947},"\u002Fen\u002Fblog\u002Fcase\u002F2020-02-18-pulsar-help-tencent","Apache Pulsar Helps Tencent Process Tens of Billions of Financial Transactions Efficiently with Virtually No Data Loss",[324,76950,76951],{},[55,76952,76954],{"href":76953},"\u002Fen\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance","Benchmarking Pulsar and Kafka - A More Accurate Perspective on Pulsar's Performance",[324,76956,76957],{},"Pulsar vs Kafka Report",[324,76959,76960],{},[55,76961,76963],{"href":76962},"\u002Fen\u002Fblog\u002Ftech\u002F2020-07-08-pulsar-vs-kafka-part-1","Pulsar vs Kafka Report - Part 1",[324,76965,76966],{},[55,76967,76969],{"href":76968},"\u002Fen\u002Fblog\u002Ftech\u002F2020-07-22-pulsar-vs-kafka-part-2","Pulsar vs Kafka Report - Part 2",[324,76971,76972],{},[55,76973,76975],{"href":76974},"\u002Fen\u002Fsuccess-stories\u002Ftencent-angel","Powering Federated Learning at Tencent with Apache Pulsar",[324,76977,76978],{},[55,76979,76981],{"href":76980},"\u002Fen\u002Fblog\u002Fcase\u002F2020-05-08-tuya-iot","How Apache Pulsar Helps Streamline Message System and Reduces O&M Costs at Tuya Smart",[324,76983,76984],{},[55,76985,76987],{"href":76986},"\u002Fen\u002Fblog\u002Fcase\u002F2020-05-07-zhaopin-sql","Why Zhaopin Chooses Pulsar SQL for Search Log Analysis",[324,76989,76990],{},[55,76991,76993],{"href":76992},"\u002Fen\u002Fblog\u002Ftech\u002F2020-04-21-from-apache-kafka-to-apache-pulsar","Why we moved from Apache Kafka to Apache Pulsar",[324,76995,76996],{},[55,76997,76999],{"href":76998},"\u002Fen\u002Fblog\u002Frelease\u002F2020-12-24-pulsar-flink-connector-270","What's New in Pulsar Flink Connector 2.7.0",[324,77001,77002],{},[55,77003,77005],{"href":77004},"\u002Fen\u002Fblog\u002Fcommunity\u002F2020-03-17-announcing-the-apache-pulsar-2020-user-survey-report","Announcing: The Apache Pulsar 2020 User Survey Report",[40,77007,77009],{"id":77008},"looking-forward-to-2021-news-and-events","Looking Forward to 2021 News and Events",[48,77011,77012],{},"While the world continues to heal from the pandemic and looks ahead to a brighter future, we are excited to share some upcoming news and events. We hope you can join us!",[32,77014,77016],{"id":77015},"_2021-events","2021 Events",[48,77018,77019],{},"We are building on the momentum of last year and launching a number of new events to help support the Pulsar community. See below for details:",[321,77021,77022,77029,77036,77049,77052,77055,77061,77064,77072],{},[324,77023,77024,77025,190],{},"TGIP Pulsar Updates by StreamNative, held monthly. ",[55,77026,39858],{"href":77027,"rel":77028},"https:\u002F\u002Fus02web.zoom.us\u002Fwebinar\u002Fregister\u002F2216003858545\u002FWN_Qig1o1VlSVe6lk5omYb6zA?utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-_A7AaXYbeBXZUoR7Ei9WroGvACEVW8aUh9iZkUT23fBJux14tK6_n3lKfIpZmhUm8NGCug",[264],[324,77030,77031,77032,190],{},"TGIP Pulsar Office Hours by StreamNative, held monthly. ",[55,77033,39858],{"href":77034,"rel":77035},"https:\u002F\u002Fus02web.zoom.us\u002Fwebinar\u002Fregister\u002F8916122493161\u002FWN_bOsEtkkXSYmBwQXnE5sCMg?utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-_A7AaXYbeBXZUoR7Ei9WroGvACEVW8aUh9iZkUT23fBJux14tK6_n3lKfIpZmhUm8NGCug",[264],[324,77037,77038,77039,77044,77045,190],{},"Professional Training - Fundamental, Developers, and Operations, held quarterly. Upcoming Fundamentals, ",[55,77040,77043],{"href":77041,"rel":77042},"https:\u002F\u002Fwww.eventbrite.com\u002Fe\u002Fapache-pulsar-fundamentals-online-training-tickets-135059932895?utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-_A7AaXYbeBXZUoR7Ei9WroGvACEVW8aUh9iZkUT23fBJux14tK6_n3lKfIpZmhUm8NGCug",[264],"sign up here",", Developers, ",[55,77046,77043],{"href":77047,"rel":77048},"https:\u002F\u002Fwww.eventbrite.com\u002Fe\u002Fdeveloping-pulsar-applications-online-training-tickets-134793983433?utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-_A7AaXYbeBXZUoR7Ei9WroGvACEVW8aUh9iZkUT23fBJux14tK6_n3lKfIpZmhUm8NGCug",[264],[324,77050,77051],{},"Local Meetups: Coming up in March, Uber + StreamNative host this year's first Pulsar Bay Area meetup with speakers from Uber, Facebook, Splunk, and Ververica.",[324,77053,77054],{},"Three Global Pulsar Summits - North America, Europe, and Asia. North America CFP coming soon.",[324,77056,77057,77058,77060],{},"Pulsar Hackathon, May 6th and 7th. Details coming soon. (Contact ",[55,77059,75727],{"href":75726}," with any questions.)",[324,77062,77063],{},"StreamNative's monthly webinars continue!",[324,77065,77066,77067],{},"You can sign up for ",[55,77068,77071],{"href":77069,"rel":77070},"https:\u002F\u002Fus02web.zoom.us\u002Fwebinar\u002Fregister\u002F4316132664852\u002FWN_Z46DPNd3RKKB9q5uLzwB3Q",[264],"TiDB + Pulsar, Event Streaming Architecture in Action, on Tues, Feb. 23rd",[324,77073,77074,77075],{},"You can view the January webinar, ",[55,77076,77079],{"href":77077,"rel":77078},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=C5Vzu1M7fYE",[264],"Watch Your Streams: Implementing OpenTelemetry with Apache Pulsar",[32,77081,77083],{"id":77082},"roadmap-and-announcements","Roadmap and Announcements",[48,77085,77086],{},"StreamNative will continue to broaden the capabilities of both Pulsar and StreamNative Cloud in 2021. Continue reading for a sneak peek at some of the exciting projects we will deliver this year:",[321,77088,77089,77092,77095],{},[324,77090,77091],{},"Pulsar + Flink: An Improved Flink integration and a managed Flink offering in StreamNative Cloud.",[324,77093,77094],{},"Function Mesh: Launching in Q1 2021, this will allow teams to manage a group of functions.",[324,77096,77097],{},"Expansion of StreamNative Cloud: Currently available on AWS, GCP, and Alibaba cloud, we will also launch on more cloud providers, including Microsoft Azure.",[40,77099,10248],{"id":10247},[48,77101,77102],{},"Founded by the original developers of Apache Pulsar, the StreamNative team is committed to the Pulsar community and to helping the community successfully adopt and deploy Pulsar. As the core developers of Pulsar, the StreamNative team is deeply versed in the technology, the community, and the use cases, and has experience operating Pulsar in large scale production environments, including at both Twitter and Yahoo!. The StreamNative team's unmatched operational experience on Pulsar and Bookkeeper is now available to you through StreamNative Cloud.",[48,77104,77105,77106,190],{},"We are hiring! ",[55,77107,77108],{"href":75759},"Join Our Team",[40,77110,69725],{"id":69724},[48,77112,77113],{},"There are a number of ways you can stay connected in the Pulsar Community:",[1666,77115,77116,77123,77129],{},[324,77117,77118,77119,190],{},"Subscribe to the Pulsar Newsletter by StreamNative, a monthly update of all things Pulsar and the Pulsar community. Join the ",[55,77120,77122],{"href":34070,"rel":77121},[264],"mailing list here",[324,77124,55539,77125,190],{},[55,77126,77128],{"href":57760,"rel":77127},[264],"Pulsar Slack Channel here",[324,77130,77131,77134],{},[55,77132,22668],{"href":55797,"rel":77133},[264]," on StreamNative Cloud!",{"title":18,"searchDepth":19,"depth":19,"links":77136},[77137,77138,77139,77140,77141,77145,77146],{"id":76821,"depth":19,"text":76822},{"id":76857,"depth":19,"text":76858},{"id":76878,"depth":19,"text":76879},{"id":76930,"depth":19,"text":76931},{"id":77008,"depth":19,"text":77009,"children":77142},[77143,77144],{"id":77015,"depth":279,"text":77016},{"id":77082,"depth":279,"text":77083},{"id":10247,"depth":19,"text":10248},{"id":69724,"depth":19,"text":69725},"2021-02-15","Despite the challenges presented by 2020, the Apache Pulsar community and StreamNative discovered new ways to connect, engage, and help Pulsar continue to grow. In this post, Sijie Guo shares the highlights from 2020, including key events, community growth, project updates, and more! He also shares his insights on the product roadmap and upcoming events for 2021.","\u002Fimgs\u002Fblogs\u002F63c7fda7ebac450a632f84f5_63a399933290f687a0332ece_2020-year-in-review-top.png",{},"\u002Fblog\u002Fstreamnatives-2020-year-in-review",{"title":76809,"description":77148},"blog\u002Fstreamnatives-2020-year-in-review",[821],"WfM2OgDJVv8PFx56esAM-NV7tmbW-eimTb0udoA3PAE",{"id":77157,"title":77158,"authors":77159,"body":77161,"category":821,"createdAt":290,"date":77346,"description":77347,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":77348,"navigation":7,"order":296,"path":77349,"readingTime":11508,"relatedResources":290,"seo":77350,"stem":77351,"tags":77352,"__hash__":77353},"blogs\u002Fblog\u002Fmigrate-to-serverless-with-pulsar-functions.md","Migrate to Serverless with Pulsar Functions",[77160],"Axel Sirota",{"type":15,"value":77162,"toc":77337},[77163,77169,77172,77176,77179,77190,77193,77196,77199,77202,77206,77209,77212,77215,77219,77222,77228,77231,77237,77240,77244,77247,77250,77253,77259,77263,77266,77269,77272,77275,77279,77282,77288,77296,77299,77310,77313,77320,77322,77331,77334],[48,77164,77165],{},[384,77166],{"alt":77167,"src":77168},"illustration with pulsar logo","\u002Fimgs\u002Fblogs\u002F63a398387f64caacd0c62645_migrating-to-serverless-with-pulsar-functions-top.jpeg",[48,77170,77171],{},"Pulsar Functions, introduced in Pulsar 2.0, provide a smooth path enabling Pulsar users to migrate easily to serverless technology. In this article, I discuss what Pulsar Functions are and how to develop them. I also provide a checklist of important points to consider when migrating your existing application onto this exciting new technology.",[40,77173,77175],{"id":77174},"a-simple-scenario","A Simple Scenario",[48,77177,77178],{},"First, let’s start with a use case. Assume that we run an e-commerce company. A critical element of our business is processing invoices for payment. In Pulsar, this is a three-step process:",[321,77180,77181,77184,77187],{},[324,77182,77183],{},"Import the invoices into an Order topic",[324,77185,77186],{},"Execute some code to split the invoice values by comma into individual fields",[324,77188,77189],{},"Insert the invoice values into PostgreSQL",[48,77191,77192],{},"Today, we’re going to focus on that second bullet point. Usually, the code we execute is either a serverless function we create in AWS Lambda or a full-fledged microservice. But there are significant drawbacks to this approach.",[48,77194,77195],{},"First, we’re developing a full service for what is fundamentally a small and simple piece of code. The complexity this introduces can require up to two weeks of work to implement correctly.",[48,77197,77198],{},"Second, it’s hard to maintain over time as the schema of our source data changes. This requires a full versioning and re-deployment of our service and the underlying PostGreSQL tables - a task that can require a full day of work (or more).",[48,77200,77201],{},"And third, our AWS Lambda function needs to authenticate into and out of Pulsar. This negatively impacts performance, as Pulsar has to call a Lambda function that then itself has to authenticate into Pulsar. Lambda functions, in other words, introduce a lot of unnecessary round-tripping.",[40,77203,77205],{"id":77204},"introducing-pulsar-functions","Introducing Pulsar Functions",[48,77207,77208],{},"Pulsar Functions are lightweight computing processes that process data between topics. Since they run in Pulsar, they eliminate the need to deploy a separate microservice. This not only saves time - it also simplifies troubleshooting.",[48,77210,77211],{},"Pulsar Functions can be simple or complex. Besides transforming and moving data from one topic to another, we can send data to multiple topics, engage in complex routing, and batch requests.",[48,77213,77214],{},"Pulsar Functions are also easy to debug. A Function can be deployed in debugging mode, which enables us to connect to the code and debug it as it’s executing in real-time.",[40,77216,77218],{"id":77217},"developing-pulsar-functions","Developing Pulsar Functions",[48,77220,77221],{},"Creating a Pulsar Function is as easy as implementing a Pulsar Functions subclass in your preferred programming language. In the example below, I’m using Java, but you can also write Pulsar Functions in Python and Go.",[8325,77223,77226],{"className":77224,"code":77225,"language":8330},[8328],"\npublic class SplitFunction implements Function> {\n    @Override\n    public List apply(String input) {\n        return Arrays.asList(input.split(\",\"));\n    }\n}\n \n",[4926,77227,77225],{"__ignoreMap":18},[48,77229,77230],{},"Once you’ve compiled and packaged your code, you can deploy it to your Pulsar instance using the functions create command. The command takes a handful of parameters: our packaged code, as well as the input and output topics for the function.",[8325,77232,77235],{"className":77233,"code":77234,"language":8330},[8328],"\nbin\u002Fpulsar-admin functions create --jar target\u002Fsplit.jar --classname demo.SplitFunction --input input-topic --output output-topic\n\n",[4926,77236,77234],{"__ignoreMap":18},[48,77238,77239],{},"Developing and deploying our Pulsar Function is at most two days of work. This drastically simplifies our workload and reduces our time to release. We can deploy any number of Pulsar Functions that take data from multiple topics and send them to other topics. We can also easily write status messages to the Pulsar logs. Using Pulsar Functions, our Pulsar deployment becomes increasingly more flexible and capable.",[40,77241,77243],{"id":77242},"developing-a-fully-fledged-pulsar-function","Developing a Fully-Fledged Pulsar Function",[48,77245,77246],{},"How do we take advantage of all of the rich functionality of Pulsar Functions that I described earlier?",[48,77248,77249],{},"Developing a fully-fledged Pulsar Function is as easy as implementing the Function interface in our class. We then implement a single method, called Process(). Process() gives us a context object that acts as our gateway into Pulsar. Using the context, we can access the logger, trace our output, and send messages to topics, among other tasks.",[48,77251,77252],{},"In the code below, you can see how we use Pulsar Functions to take our data input and extract the price of the invoice from it. We then use the context object to send this data to another output topic. (If we wanted to send the data to the output topic we specified when we deployed our Function, we’d just return it as the return value of the function. Here, I elected to send the data to a different topic showing how we can use Pulsar Functions for Routing and just return null from the Function.)",[8325,77254,77257],{"className":77255,"code":77256,"language":8330},[8328],"\nimport org.apache.pulsar.functions.api.Function;\n\npublic class RoutingFunction implements Function {\n    @Override\n    public Void process(String input, Context context) throws Exception {\n        Logger LOG = context.getLogger();\n        LOG.info(String.format(\"Got this input: %s\", input));\n        \n        Price inputPrice  = new Price(input);\n        String topic = String.format(\"year-%s\", inputPrice.getYear());\n        \n        context.newOutputMessage(topic, Schema.STRING).value(inputPrice.getPrice()).send();\n\n        \u002F\u002F We could also return some object here and it would be sent to the \n        \u002F\u002F output topic set during function submission\n        return null;    \n    }\n}\n \n",[4926,77258,77256],{"__ignoreMap":18},[40,77260,77262],{"id":77261},"more-affordable-than-aws-lambda","More Affordable Than AWS Lambda",[48,77264,77265],{},"You may be wondering why we would use Pulsar Functions when AWS Lambda can already do this for us. As I stated above, Pulsar Functions have multiple advantages over AWS Lambda, including ease of debugging and the elimination of round-trip authentication between Pulsar and Lambda.",[48,77267,77268],{},"But let’s also consider the cost of using AWS Lambda by looking at a common use case: implementing a real-time bidding system for online auctions. Let’s assume 10k bids\u002Fsecond (26 billion requests\u002Fmonth) comes out to $5k but that doesn't include compute hours, just the request charges. Assuming each request takes 100 ms and a 2048 GB VM, that would be $86k in compute charges. This doesn’t even include AWS data transfer costs!",[48,77270,77271],{},"AWS Lambda is an excellent tool for serverless functions. But it’s only excellent for small-scale use cases. The cost of Lambda makes it prohibitively expensive for any data pipeline handling billions of transactions.",[48,77273,77274],{},"Moving to Pulsar Functions can generate tremendous cost savings. When I arrived at my own company, JAMPP, the team was using Lambda exclusively and paying upwards of $30,000\u002Fmonth for just a small part of our pipeline. When we moved off of AWS Lambda and onto Pulsar Functions, our cost dropped to a couple hundred dollars per month - basically, the cost of hosting Pulsar on Amazon EC2 instances.",[40,77276,77278],{"id":77277},"migrating-to-pulsar-functions","Migrating to Pulsar Functions",[48,77280,77281],{},"So let’s look at our revised architecture. In our use case, you’ll remember that we had a Java function in AWS Lambda processing data between topics. Pulsar Functions takes the place of Lambda in our architecture, simplifying both development and deployment.",[48,77283,77284],{},[384,77285],{"alt":77286,"src":77287},"slide:\"A Whole New World\"","\u002Fimgs\u002Fblogs\u002F63a3989bbfc1d24941de3eb0_migrating-to-serverless-with-pulsar-functions-slide.jpeg",[48,77289,77290,77291,77295],{},"Once we have deployed our Pulsar Functions, the only thing we have left to do is to create import and dump scripts for our data. Pulsar simplifies this process with ",[55,77292,20384],{"href":77293,"rel":77294},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002F2.3.1\u002Fio-overview\u002F",[264],". Pulsar IO enables us to define external data sources and sinks easily within Pulsar. Pulsar IO sources and sinks are themselves implemented as Pulsar Functions, which means we can create our own custom sources and sinks that we can easily debug within Pulsar.",[48,77297,77298],{},"So our migration path to Pulsar Functions is:",[321,77300,77301,77304,77307],{},[324,77302,77303],{},"Migrate all processing logic into one or more Pulsar Functions",[324,77305,77306],{},"Switch I\u002FO logic to using Pulsar IO sources and sinks",[324,77308,77309],{},"Use log topics for logging data",[48,77311,77312],{},"And that’s it! At that point, you’ve fully migrated to a serverless application running completely within Pulsar.",[48,77314,77315,77316,77319],{},"What if you’re currently running on Kafka? No problem - ",[55,77317,70645],{"href":29592,"rel":77318},[264]," enables a no-code transition from Kafka to Pulsar.",[40,77321,2125],{"id":2122},[48,77323,77324,77325,77330],{},"I’ve only touched the surface of what you can do with Pulsar Functions. Besides the features I discussed here, new and exciting additions are being developed even as we speak. For example, StreamNative recently announced ",[55,77326,77329],{"href":77327,"rel":77328},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VGtFz0mWKfY",[264],"Pulsar Function Mesh",", which enables you to deploy a cluster of Pulsar Function services in a coordinated manner.",[48,77332,77333],{},"For now, I hope I’ve shown you how easy it is to develop Pulsar Functions and migrate your application to a serverless application running on Pulsar.",[48,77335,77336],{},"Happy migrating!",{"title":18,"searchDepth":19,"depth":19,"links":77338},[77339,77340,77341,77342,77343,77344,77345],{"id":77174,"depth":19,"text":77175},{"id":77204,"depth":19,"text":77205},{"id":77217,"depth":19,"text":77218},{"id":77242,"depth":19,"text":77243},{"id":77261,"depth":19,"text":77262},{"id":77277,"depth":19,"text":77278},{"id":2122,"depth":19,"text":2125},"2021-02-10","Axel Sirota explores what Pulsar Functions are, why they can leverage your needs, and how we can develop them easily to migrate full applications and ETLs into serverless within Apache Pulsar.",{},"\u002Fblog\u002Fmigrate-to-serverless-with-pulsar-functions",{"title":77158,"description":77347},"blog\u002Fmigrate-to-serverless-with-pulsar-functions",[9636,821,4839,32622],"4j0f3yMr7e-tpwV-QmaZVkIhJ2vXUjiGTf9uip_Th2w",{"id":77355,"title":77356,"authors":77357,"body":77358,"category":7338,"createdAt":290,"date":77492,"description":77493,"extension":8,"featured":294,"image":77494,"isDraft":294,"link":290,"meta":77495,"navigation":7,"order":296,"path":77496,"readingTime":11180,"relatedResources":290,"seo":77497,"stem":77498,"tags":77499,"__hash__":77500},"blogs\u002Fblog\u002Fpulsar-virtual-summit-north-america-2021-cfp-open-now.md","Pulsar Virtual Summit North America 2021: CFP Open Now!",[69353],{"type":15,"value":77359,"toc":77481},[77360,77363,77366,77369,77373,77376,77379,77393,77400,77402,77418,77420,77434,77438,77442,77450,77452,77458,77464,77467,77469,77476,77478],[48,77361,77362],{},"Pulsar Summit is the conference dedicated to Apache Pulsar, and the messaging and event streaming community. The conference gathers an international audience of CTOs\u002FCIOs, developers, data architects, data scientists, Apache Pulsar committers\u002Fcontributors, and the messaging and streaming community. Together, they share experiences, exchange ideas and knowledge, and receive hands-on training sessions led by Pulsar experts.",[48,77364,77365],{},"Last year, Pulsar Summit Virtual Conference and Pulsar Summit Asia featured 80 interactive sessions by tech leads, open-source developers, software engineers, and software architects from Salesforce, Splunk, Verizon Media, Iterable, Yahoo! JAPAN, TIBCO, OVHcloud, Clever Cloud, and more. The conferences garnered 1,600 attendees around the globe, including attendees from top tech, fintech and media companies, such as Google, Microsoft, AMEX, Salesforce, Disney, and Paypal.",[48,77367,77368],{},"This year, the summit will be hosted virtually on June 16th-17th. The speaker committee includes top Pulsar experts, such as Apache Pulsar PMC members Sijie Guo, CEO of StreamNative, Matteo Merli and Sanjeev Kulkarni from Splunk, and Joe Francis from Verizon. Additionally, Fabian Hueske from Apache Flink, Addison Higham from StreamNative, Ben Lorica from The Data Exchange Media, and Jesse Anderson from Big Data Institute will be participating.",[40,77370,77372],{"id":77371},"join-us-and-speak-at-the-pulsar-virtual-summit","Join Us and Speak at the Pulsar Virtual Summit!",[48,77374,77375],{},"Do you have a Pulsar story to share? Join us and speak at the summit! You will be on stage with all the top Pulsar thought-leaders. It is a great way to participate and raise your profile in the rapidly growing Apache Pulsar community.",[48,77377,77378],{},"We are looking for Pulsar stories that are innovative, informative, or thought-provoking. Here are some suggestions for what to talk about:",[321,77380,77381,77383,77385,77387,77390],{},[324,77382,48333],{},[324,77384,48336],{},[324,77386,48339],{},[324,77388,77389],{},"A Pulsar success story",[324,77391,77392],{},"Anything that inspires the audience!",[48,77394,69661,77395,77399],{},[55,77396,56336],{"href":77397,"rel":77398},"https:\u002F\u002Fsessionize.com\u002Fpulsar-summit-north-america-2021",[264]," about your presentation. Remember to keep your proposal short, relevant and engaging.",[32,77401,39751],{"id":39750},[321,77403,77404,77407,77410,77412,77415],{},[324,77405,77406],{},"The chance to demonstrate your experience and deep knowledge in the rapidly growing event streaming space.",[324,77408,77409],{},"Your name, title, company, and bio will be featured on the Pulsar Virtual Summit North America 2021 website.",[324,77411,69684],{},[324,77413,77414],{},"A professionally produced video of your presentation.",[324,77416,77417],{},"Exclusive Pulsar swag only available to the speakers.",[32,77419,39793],{"id":39792},[321,77421,77422,77425,77428,77431],{},[324,77423,77424],{},"CFP opens: Feb 18th, 2021",[324,77426,77427],{},"CFP closes: Mar 26th, 2021",[324,77429,77430],{},"Speaker notifications sent: April 2nd, 2021",[324,77432,77433],{},"Schedule announcement: April 9th, 2021",[48,77435,69703,77436,39815],{},[55,77437,39814],{"href":39813},[40,77439,77441],{"id":77440},"register-for-the-summit","Register for the Summit",[48,77443,77444,77445,77449],{},"If you are interested in attending Pulsar Virtual Summit North America 2021, please ",[55,77446,77448],{"href":70535,"rel":77447},[264],"sign up in Hopin",". Once you are registered, we will keep you updated on the summit.",[40,77451,56379],{"id":56378},[48,77453,77454,77455,38617],{},"Pulsar Summit is a conference for the community and your support is needed. Sponsoring this event will provide a great opportunity for your organization to further engage with the Apache Pulsar community. ",[55,77456,38404],{"href":77457},"mailto:partners@pulsar-summit.org",[48,77459,77460,77461,69721],{},"Help us make #PulsarSummit NA 2021 a big success by spreading the word and submitting your proposal! Follow us on ",[55,77462,39691],{"href":39821,"rel":77463},[264],[48,77465,77466],{},"See you at Pulsar Virtual Summit North America 2021!",[32,77468,39828],{"id":39827},[48,77470,77471,77472],{},"Apache Pulsar is a cloud-native, distributed messaging and streaming platform that manages hundreds of billions of events per day. Since Pulsar was contributed to open source by Yahoo in 2016 and became a top-level Apache Software Foundation project in 2018, its community has witnessed incredible growth. ",[55,77473,77475],{"href":77474},"\u002Fen\u002Fblog\u002Fcommunity\u002F2021-02-16-streamnative-2020-year-in-review","In 2020, Pulsar had two releases and several ecosystem updates. As of today, Pulsar has gained 340+ contributors (with 150+ new contributors in 2020 alone), 7,000+ Github stars, and 2,500 Pulsar Slack members.",[32,77477,10248],{"id":10247},[48,77479,77480],{},"StreamNative is the organizer of Pulsar Summit North America 2021. Founded by the original developers of Apache Pulsar, the StreamNative team is committed to the Pulsar community and to helping the community successfully adopt and deploy Pulsar. As the core developers of Pulsar, the StreamNative team is deeply versed in the technology, the community, and the use cases, and has experience operating Pulsar in large scale production environments, including at both Twitter and Yahoo!. The StreamNative team's unmatched operational experience on Pulsar and Bookkeeper is now available to you through StreamNative Cloud.",{"title":18,"searchDepth":19,"depth":19,"links":77482},[77483,77487,77488],{"id":77371,"depth":19,"text":77372,"children":77484},[77485,77486],{"id":39750,"depth":279,"text":39751},{"id":39792,"depth":279,"text":39793},{"id":77440,"depth":19,"text":77441},{"id":56378,"depth":19,"text":56379,"children":77489},[77490,77491],{"id":39827,"depth":279,"text":39828},{"id":10247,"depth":279,"text":10248},"2021-02-03","CFP and attendee registration are now open. Get involved!","\u002Fimgs\u002Fblogs\u002F63c7fd983854157780069f1e_63a39a14bfc1d206ebdf4883_top.jpeg",{},"\u002Fblog\u002Fpulsar-virtual-summit-north-america-2021-cfp-open-now",{"title":77356,"description":77493},"blog\u002Fpulsar-virtual-summit-north-america-2021-cfp-open-now",[5376,821],"AIKeDfXJjYe7O7x2yP0_tWlw-M0Y_1BJbQbfXa-mzxQ",{"id":77502,"title":77503,"authors":77504,"body":77505,"category":3550,"createdAt":290,"date":77581,"description":77515,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":77582,"navigation":7,"order":296,"path":77583,"readingTime":11180,"relatedResources":290,"seo":77584,"stem":77585,"tags":77586,"__hash__":77587},"blogs\u002Fblog\u002Fstreamnative-launches-pulsar-as-a-service-on-aws.md","StreamNative Launches Pulsar-as-a-Service on AWS",[69353,60441],{"type":15,"value":77506,"toc":77575},[77507,77513,77516,77519,77527,77530,77533,77537,77540,77543,77547,77550,77554,77557,77560,77562],[48,77508,77509],{},[384,77510],{"alt":77511,"src":77512},"image announcing streamnative cloud and aws","\u002Fimgs\u002Fblogs\u002F63a397d453018f59ea5d84ee_aws-top.png",[48,77514,77515],{},"StreamNative, a cloud-native, real-time data platform powered by Apache Pulsar, just announced its fully managed Pulsar®-as-a-Service offering, StreamNative Cloud, is now available on AWS.",[48,77517,77518],{},"StreamNative Cloud provides a scalable, resilient, and secure messaging and event streaming platform for enterprises. The company offers both Cloud-Hosted and Cloud-Managed options to help organizations accelerate application development and improve time-to-market, details below:",[321,77520,77521,77524],{},[324,77522,77523],{},"Cloud-Hosted provides the ability to spin up a StreamNative-hosted Pulsar cluster on a cloud provider of your choice within minutes.",[324,77525,77526],{},"Cloud-Managed offers a fully managed Pulsar cluster deployable to a public or private cloud environment, fully customized to meet user needs.",[48,77528,77529],{},"StreamNative Cloud has witnessed rapid adoption since its initial release, with customers utilizing StreamNative Cloud to power applications for a broad set of use cases, such as order and delivery tracking, customer communication, and powering real-time data lakes.",[48,77531,77532],{},"The expansion of StreamNative Cloud Hosted to AWS will make the offering available to more users and enable AWS cloud customers to more deeply integrate Pulsar with the broader AWS offerings.",[40,77534,77536],{"id":77535},"streamnative-cloud-on-aws-pulsar-flink","StreamNative Cloud on AWS: Pulsar + Flink",[48,77538,77539],{},"The launch on AWS provides not only another cloud option for customers, it also enables customers to integrate Pulsar with AWS-managed offerings, such as Kinesis Data Analytics for Flink and Flink on Amazon EMR.",[48,77541,77542],{},"Leveraging Flink and Pulsar together, companies are able to create a unified data architecture for real-time data-driven businesses. Flink unifies batch and stream processing into a single computing engine with “streams” as the unified data representation. Pulsar, together with BookKeeper, allows organizations to store data as one copy, or source-of-truth, that can be accessed in streams, via pub-sub interfaces, and segments, for batch processing. With StreamNative Cloud now available on AWS, customers will be able to seamlessly integrate Pulsar with AWS’s managed-Flink offering.",[40,77544,77546],{"id":77545},"new-features-in-streamnative-cloud","New Features in StreamNative Cloud",[48,77548,77549],{},"We are also excited to announce new features that are coming soon in StreamNative Cloud: 1. Tiered storage allows users to offload data to external, cost-effective storage, enabling infinite stream retention that readily scales with the volume of data and without any change to APIs or performance. 2. Pulsar Functions bring serverless computation to event streaming by providing an easy-to-use interface. It allows users to transform, filter, and route data with user-provided code that runs inside your Pulsar cluster.",[40,77551,77553],{"id":77552},"why-streamnative-cloud","Why StreamNative Cloud?",[48,77555,77556],{},"StreamNative Cloud is built and operated by the original developers of Apache Pulsar and Apache BookKeeper. The team has experience operating Pulsar in large scale production environments, including at both Twitter and Yahoo!.",[48,77558,77559],{},"Today, the StreamNative team plays an active role in Pulsar’s roadmap and community. Organizations choose StreamNative Cloud not just for its powerful and easy-to-use platform, but also for the team's unmatched operational experience on Pulsar.",[40,77561,3880],{"id":3877},[321,77563,77564,77570],{},[324,77565,77566,77567],{},"Want to try StreamNative Cloud for free? ",[55,77568,39858],{"href":17075,"rel":77569},[264],[324,77571,44517,77572],{},[55,77573,39858],{"href":44520,"rel":77574},[264],{"title":18,"searchDepth":19,"depth":19,"links":77576},[77577,77578,77579,77580],{"id":77535,"depth":19,"text":77536},{"id":77545,"depth":19,"text":77546},{"id":77552,"depth":19,"text":77553},{"id":3877,"depth":19,"text":3880},"2021-01-21",{},"\u002Fblog\u002Fstreamnative-launches-pulsar-as-a-service-on-aws",{"title":77503,"description":77515},"blog\u002Fstreamnative-launches-pulsar-as-a-service-on-aws",[302,3550,821,303],"1JO0jvp8DMS7c_w6dREedC_xMXv7gFFdSOkCtUAvWq4",{"id":77589,"title":77590,"authors":77591,"body":77593,"category":821,"createdAt":290,"date":78062,"description":78063,"extension":8,"featured":294,"image":78064,"isDraft":294,"link":290,"meta":78065,"navigation":7,"order":296,"path":78066,"readingTime":31039,"relatedResources":290,"seo":78067,"stem":78068,"tags":78069,"__hash__":78070},"blogs\u002Fblog\u002Ftaking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning.md","Taking a Deep-Dive into Apache Pulsar Architecture for Performance Tuning",[808,77592],"Devin Bost",{"type":15,"value":77594,"toc":78043},[77595,77598,77601,77605,77608,77612,77619,77622,77625,77629,77632,77638,77640,77644,77647,77651,77654,77658,77661,77665,77668,77671,77675,77683,77686,77692,77695,77698,77704,77708,77711,77715,77723,77727,77730,77733,77739,77742,77745,77749,77752,77755,77761,77765,77768,77771,77777,77780,77784,77787,77790,77796,77800,77803,77807,77810,77813,77824,77830,77834,77837,77840,77844,77847,77850,77854,77857,77863,77866,77872,77875,77878,77884,77887,77890,77896,77899,77902,77905,77911,77915,77918,77922,77925,77930,77934,77937,77943,77945,77948,77953,77956,77959,77962,77967,77973,77977,77980,77988,77992,77995,78009,78013,78039,78041],[48,77596,77597],{},"When we talk about Apache Pulsar’s performance, we are usually referring to the throughput and latency associated with message writes and reads. Pulsar has certain configuration parameters that allow you to control how the system handles message writes or reads. To effectively tune Pulsar clusters for optimal performance, you need to understand Pulsar’s architecture and its storage layer, Apache BookKeeper.",[48,77599,77600],{},"In this blog, we explain some basic concepts and how messages are sent (produced) and received (consumed) in Apache Pulsar. You will learn key components, data flow and key metrics to monitor for Pulsar performance tuning.",[40,77602,77604],{"id":77603},"_1-apache-pulsar-basic-concepts","1. Apache Pulsar Basic Concepts",[48,77606,77607],{},"The basic concepts and terminology explained in this section are key to understanding how Apache Pulsar works.",[32,77609,77611],{"id":77610},"_11-message","1.1 Message",[48,77613,77614,77615,190],{},"The basic unit of data in Pulsar is called a message. Producers send messages to brokers, and brokers send messages to consumers using flow control. For an in-depth discussion of Pulsar’s flow command, click ",[55,77616,267],{"href":77617,"rel":77618},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fdevelop-binary-protocol\u002F#flow-control",[264],[48,77620,77621],{},"Messages contain the data written to the topic by the producer, along with some important metadata.",[48,77623,77624],{},"In Pulsar, a message can be either of two types: a batch message or a single message. A batch message is a sequence of single messages. (See Section 1.5.1 below for more detailed information about batch messages.)",[32,77626,77628],{"id":77627},"_12-topic","1.2 Topic",[48,77630,77631],{},"A topic is a category or feed name to which messages are published (produced). Topics in Pulsar can have multiple producers and\u002For consumers. Producers write messages to the topic, and consumers consume messages from the topic. Figure 1 shows how they work together.",[48,77633,77634],{},[384,77635],{"alt":77636,"src":77637},"Figure 1. How Producers and Consumers Work on Topics","\u002Fimgs\u002Fblogs\u002F63be7241482366a21e2f5e03_1.png",[48,77639,3931],{},[32,77641,77643],{"id":77642},"_13-bookie","1.3 Bookie",[48,77645,77646],{},"Apache Pulsar uses Apache BookKeeper as its storage layer. Apache BookKeeper is a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads. Messages published by clients are stored in a server instance of Bookkeeper, which is called a bookie.",[3933,77648,77650],{"id":77649},"_131-entry-and-ledger","1.3.1 Entry and Ledger",[48,77652,77653],{},"Entry and ledger are basic terms used within BookKeeper. An entry contains the data written to the ledger, along with some important metadata. A ledger is the basic unit of storage in BookKeeper. A ledger is a sequence of entries. Entries are written to a ledger sequentially.",[3933,77655,77657],{"id":77656},"_132-journal","1.3.2 Journal",[48,77659,77660],{},"A journal file contains BookKeeper transaction logs. Before a ledger update takes place, the bookie ensures that a transaction describing the update, called a transaction log entry, is written to non-volatile storage. A new journal file is created when a bookie is first started, or when the older journal file reaches the specified journal file size threshold.",[3933,77662,77664],{"id":77663},"_133-entry-log","1.3.3 Entry Log",[48,77666,77667],{},"An entry log file manages the written entries received from BookKeeper clients. Entries from different ledgers are aggregated and written sequentially, while their offsets are kept as pointers in a ledger cache for fast lookup.",[48,77669,77670],{},"A new entry log file is created when the bookie is started,​ o​ r when the older entry log file reaches the specified entry log size threshold. The Garbage Collector Thread removes old entry log files when they are no longer associated with any active ledgers.",[3933,77672,77674],{"id":77673},"_134-index-db","1.3.4 Index DB",[48,77676,77677,77678,77682],{},"A bookie uses RocksDB as the entry index DB. RocksDB is a high-performance, embeddable, persistent, key-value store based on log-structured merge (LSM) trees. Understanding the mechanics of an LSM tree will provide additional insights into the mechanics of Bookkeeper. More information about the design of the LSM tree is available in its original paper which can be found at ",[55,77679,267],{"href":77680,"rel":77681},"https:\u002F\u002Fwww.cs.umb.edu\u002F~poneil\u002Flsmtree.pdf",[264],"。",[48,77684,77685],{},"When a BookKeeper client writes an entry to a ledger, the bookie writes the entry to the journal and sends a response to a client after the journal is written. A background thread writes the entry to an entry log. When the bookie’s background thread flushes data to the entry log, the index is simultaneously updated. This process is illustrated in Figure 2.",[48,77687,77688],{},[384,77689],{"alt":77690,"src":77691},"Figure 3. Ledgers and Cursors Within a Managed Ledger Associated with a Topic","\u002Fimgs\u002Fblogs\u002F63be7241d7bf17ba3226490a_3.png",[48,77693,77694],{},"The cursor uses the ledger to store the mark delete position of a subscription. The mark delete position is similar to an offset in Apache Kafka®, but it is more than a simple offset because Pulsar supports multiple subscription modes.",[48,77696,77697],{},"A managed ledger has many ledgers, so how does the managed ledger decide whether to start a new ledger? If a ledger is too large, data recovery time increases. If a ledger is too small, the ledger must switch more frequently, and the managed ledger calls upon Meta Store more often to update the metadata in the managed ledger. The ledger rollover policy for the managed ledger determines how frequently a new ledger is created. You use the following Pulsar parameters to control ledger behavior in broker.conf:",[48,77699,77700],{},[384,77701],{"alt":77702,"src":77703},"Pulsar configuration parameters to control ledger behavior in broker. ","\u002Fimgs\u002Fblogs\u002F63be72f0cace4e29996f4386_Screenshot-2023-01-11-at-09.27.18.png",[3933,77705,77707],{"id":77706},"_142-managed-ledger-cache","1.4.2 Managed Ledger Cache",[48,77709,77710],{},"Managed ledger cache is a type of cache memory used for storing tailing messages across topics.​ ​For tailing reads, consumers read the data from the serving broker. Because the broker already has the data cached in memory, there’s no need to read from disk or compete for resources with writes.",[32,77712,77714],{"id":77713},"_15-client","1.5 Client",[48,77716,77717,77718,77722],{},"Users utilize Pulsar clients to create producers (which publish messages to topics) or consumers (which consume messages from topics). There are many Pulsar client libraries available. For more details, visit ",[55,77719,267],{"href":77720,"rel":77721},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fclient-libraries\u002F",[264],".​",[3933,77724,77726],{"id":77725},"_151-batch-message","1.5.1 Batch Message",[48,77728,77729],{},"A batch message consists of a set of single messages that are assumed to represent a single contiguous sequence. Using a batch can reduce the overhead on both the client and server sides. Messages are grouped into small batches to achieve some of the performance advantages of batch processing without increasing the latency for each task too much.",[48,77731,77732],{},"In Pulsar, when using batch processing, producers send the batch to the broker. After the batch reaches the broker, the broker coordinates with the bookie,​ ​which then stores the batch in BookKeeper. When the consumer reads messages from the broker, the broker also dispatches the batch to the consumer. So, both combining batches and splitting batches occurs in the client. The sample code below shows how to enable and configure message batching for a producer:",[8325,77734,77737],{"className":77735,"code":77736,"language":8330},[8328],"client.newProducer()\n    .topic(“topic-name”)\n    .enableBatching(true)\n    .batchingMaxPublishDelay(2, TimeUnit.MILLISECONDS) .batchingMaxMessages(100)\n    .batchingMaxBytes(1024 * 1024) \n    .create();\n",[4926,77738,77736],{"__ignoreMap":18},[48,77740,77741],{},"In this example, the producer flushes the batch when the size of the batch exceeds 100 messages or 1MB of data.​ If these parameters are not met within two milliseconds, the producer will trigger batch flushing.",[48,77743,77744],{},"Therefore, your parameter settings will depend on message throughput and whatever publish latency you deem acceptable when publishing messages.",[3933,77746,77748],{"id":77747},"_152-message-compression","1.5.2 Message Compression",[48,77750,77751],{},"Message compression can reduce message size by paying some CPU overhead. The Pulsar client supports multiple compression types, such as lz4, zlib, zstd, and snappy. Compression types are stored in the message metadata, so consumers can adopt different compression types automatically, as needed.",[48,77753,77754],{},"When you enable message batching, the Pulsar client provides improved compression by reducing the size of the batch. The sample code below shows how to enable compression type for a producer:",[8325,77756,77759],{"className":77757,"code":77758,"language":8330},[8328],"client.newProducer()\n    .topic(“topic-name”) \n    .compressionType(CompressionType.LZ4) \n    .create();\n",[4926,77760,77758],{"__ignoreMap":18},[3933,77762,77764],{"id":77763},"_153-setting-the-maximum-number-of-pending-messages-for-a-producer","1.5.3 Setting the Maximum Number of Pending Messages for a Producer",[48,77766,77767],{},"Each producer uses a queue to hold the messages that are waiting to receive acknowledgments from the broker. Increasing the size of this queue can improve the throughput of published messages. However, doing so can cause unwanted memory overhead.",[48,77769,77770],{},"The sample code below shows how to configure the size of the pending messages queue for a producer:",[8325,77772,77775],{"className":77773,"code":77774,"language":8330},[8328],"client.newProducer() \n    .topic(“topic-name”) \n    .maxPendingMessages(2000) \n    .create();\n",[4926,77776,77774],{"__ignoreMap":18},[48,77778,77779],{},"When setting the value of maxPendingMessages, it is important to consider the memory impact on the client application. To estimate the memory impact, multiply the number of bytes per message by the number of maxPendingMessages. For example, if each message is 100 KB, setting 2000 maxPendingMessages may add 200 MB (2000 * 100 KB = 200,000 KB = 200 MB) of additional required memory.",[3933,77781,77783],{"id":77782},"_154-configuring-the-size-of-the-receiver-queue-for-a-consumer","1.5.4 Configuring the Size of the Receiver Queue for a Consumer",[48,77785,77786],{},"The consumer’s receiver queue controls how many messages the consumer is allowed to accumulate before the messages are taken away by the user’s application. Making the receiver queue size larger could potentially increase consumption throughput at the expense of higher memory utilization.",[48,77788,77789],{},"The sample code below shows how to configure the size of the receiver queue for a consumer:",[8325,77791,77794],{"className":77792,"code":77793,"language":8330},[8328],"client.newConsumer() \n    .topic(“topic-name”) \n    .subscriptionName(“sub-name”) \n    .receiverQueueSize(2000) \n    .subscribe();\n",[4926,77795,77793],{"__ignoreMap":18},[40,77797,77799],{"id":77798},"_2-how-message-writing-works-on-the-server-side","2. How Message Writing Works on the Server Side",[48,77801,77802],{},"To be able to tune message writing performance effectively, you first need to understand how message writing works.",[32,77804,77806],{"id":77805},"_21-interactions-between-brokers-and-bookies","2.1. Interactions Between Brokers and Bookies",[48,77808,77809],{},"When a client publishes a message to a topic, the message is sent to the broker that is serving the topic, and the broker writes data in parallel to the storage layer.",[48,77811,77812],{},"As shown in Figure 4, having more data replicas makes the broker pay more network bandwidth overhead. You can mitigate the impact on network bandwidth by configuring persistence parameters at the following levels:",[321,77814,77815,77818,77821],{},[324,77816,77817],{},"In Pulsar",[324,77819,77820],{},"At the broker level",[324,77822,77823],{},"At namespace level",[48,77825,77826],{},[384,77827],{"alt":77828,"src":77829},"Configuring Persistence Parameters at the Broker Level","\u002Fimgs\u002Fblogs\u002F63be73a511e947552c943434_Screenshot-2023-01-11-at-09.30.12.png",[3933,77831,77833],{"id":77832},"_213-configuring-persistence-parameters-at-the-namespace-level","2.1.3 Configuring Persistence Parameters at the Namespace Level",[48,77835,77836],{},"Optionally, you can overwrite the persistence parameters at namespace level policy. In the example shown below, all three persistence parameters have been set to a value of \"3\".",[48,77838,77839],{},"$ bin\u002Fpulsar-admin namespaces set-persistence --bookkeeper-ack-quorum 3 --bookkeeper-ensemble 3 --bookkeeper-write-quorum 3 my-tenant\u002Fmy-namespace",[3933,77841,77843],{"id":77842},"_214-configuring-the-size-of-the-worker-thread-pool","2.1.4 Configuring the Size of the Worker Thread Pool",[48,77845,77846],{},"To guarantee that the messages within a topic are stored in the order in which they are written, the broker uses a single thread for writing the managed ledger entries associated with a topic. The broker takes a thread from the managed ledger worker thread pool that bears the same name. You use the following parameter to configure the size of the worker thread pool in broker.conf.",[48,77848,77849],{},"Parameter managedLedgerNumWorkerThreads is used to specify the number of threads to be used for dispatching managed ledger tasks. Negative numbers are not allowed. If no value has been specified, the system will use the number of processors available to the Java virtual machine by default.8",[32,77851,77853],{"id":77852},"_22-understanding-how-bookies-handle-entry-requests","2.2 Understanding How Bookies Handle Entry Requests",[48,77855,77856],{},"This section provides a more detailed, step-by-step explanation of how a bookie handles the addition of entry requests. The diagram in Figure 5 gives you an overview of the process.",[48,77858,77859],{},[384,77860],{"alt":77861,"src":77862},"Configuration parameters that control the journal directories and ledger directories in bookkeeper.conf","\u002Fimgs\u002Fblogs\u002F63be73d92f025bf27960b345_Screenshot-2023-01-11-at-09.31.01.png",[48,77864,77865],{},"When the request processor appends the new entries to the journal log, which is a type of write-ahead log (WAL), a bookie asks the processor to provide a thread from the write thread pool associated with the ledger ID. You can configure the size of the thread pool and the maximum number of pending requests in each thread for handling entry write requests.",[48,77867,77868],{},[384,77869],{"alt":77870,"src":77871},"Configuration parameters that control the size of the thread pool and the maximum number of pending requests in each thread for handling entry write requests.","\u002Fimgs\u002Fblogs\u002F63be73f88141108f9ad0b2a3_Screenshot-2023-01-11-at-09.31.39.png",[48,77873,77874],{},"If the number of pending requests of adding entry exceeds the maximum number of pending requests of adding entry specified in bookkeeper.conf, the bookie will reject new requests of adding entry.",[48,77876,77877],{},"By default, all journal log entries are synchronized to disk to avoid data loss in the event that a machine loses power. So, the latency of data synchronization has the most important influence on write throughput and latency. If you use the HDD as journal disks, be sure to disable the journal sync mechanism so the bookie client gets responses after the entry writes to the OS page cache successfully. Use the following parameter to enable or disable journal data synchronization in bookkeeper.conf:",[48,77879,77880],{},[384,77881],{"alt":77882,"src":77883},"Parameter to enable or disable journal data synchronization in bookkeeper.conf","\u002Fimgs\u002Fblogs\u002F63be741363863b256204020f_Screenshot-2023-01-11-at-09.32.07.png",[48,77885,77886],{},"The group commit mechanism allows any tasks that are waiting to be executed to be grouped into small batches. This technique achieves better performance for batch processing without a sharp increase in latency for each task. Bookies can also use the same method to improve the throughput for journal data writes. Enabling group committing for journal data can reduce disk operations and avoid excessive small file writes. However, to avoid an increase in latency, you can disable group commit.",[48,77888,77889],{},"Use the following parameters to enable or disable the​ g​roup commit mechanism in bookkeeper.conf:",[48,77891,77892],{},[384,77893],{"alt":77894,"src":77895},"Parameters to enable or disable the​ g​roup commit mechanism in bookkeeper.conf","\u002Fimgs\u002Fblogs\u002F63be742ac9d13eee89bce3ee_Screenshot-2023-01-11-at-09.32.30.png",[48,77897,77898],{},"After the entry is written to the journal, the entry is also added to the ledger storage. By default, the bookie uses the value you specify in DbLedgerStorage as the ledger storage. DbLedgerStorage is an implementation of ledger storage that uses RocksDB to keep the indices for entries stored in entry logs. Requests of adding entry in ledger storage are completed after the entry is successfully written to the memory table, and then the requests on the bookie’s client-side are completed. The memory table will periodically flush to the entry logs and build the indices for entries stored in entry logs, also called the checkpoint.",[48,77900,77901],{},"The checkpoint introduces much random disk I\u002FO. If journal directories and ledger directories are located on separate devices, then flushing will not affect performance. But, if journal directories and ledger directories are located on the same device, then performance degrades significantly due to frequent flushing. You can consider increasing a bookie’s flush interval to improve performance. However, if you increase the flush interval, recovery will take longer when the bookie restarts (for example, after a failure).",[48,77903,77904],{},"For optimal performance, the memory table should be big enough to hold a substantial number of entries during the flush interval. Use the following parameters to set up the write cache size and the flush interval in bookkeeper.conf:",[48,77906,77907],{},[384,77908],{"alt":77909,"src":77910},"Parameters to set up the write cache size and the flush interval in bookkeeper.conf","\u002Fimgs\u002Fblogs\u002F63be7442814110006dd0b5ae_Screenshot-2023-01-11-at-09.32.55.png",[40,77912,77914],{"id":77913},"_3-how-message-reading-works-on-the-server-side","3. How Message Reading Works on the Server Side",[48,77916,77917],{},"Apache Pulsar is a multi-layer system that allows message reading to be split into tailing reads and catch-up reads. Tailing reads refers to reading the most recently written data. Catch-up reads read historical data. In Pulsar, there are different approaches for tailing reads and catch-up reads.",[32,77919,77921],{"id":77920},"_31-tailing-reads","3.1 Tailing Reads",[48,77923,77924],{},"For tailing reads, consumers read the data from the serving broker, which already has that data stored in managed ledger cache. This process is illustrated in Figure 6.",[48,77926,77927],{},[384,77928],{"alt":21101,"src":77929},"\u002Fimgs\u002Fblogs\u002F63be7464f1dcf760c562766d_Screenshot-2023-01-11-at-09.33.28.png",[32,77931,77933],{"id":77932},"_32-catch-up-reads","3.2 Catch-up reads",[48,77935,77936],{},"Catch-up reads go to the storage layer to read data. This process is illustrated in Figure 7.",[48,77938,77939],{},[384,77940],{"alt":77941,"src":77942},"Figure 7. How Catch-up Reads Are Read from the Storage Layer","\u002Fimgs\u002Fblogs\u002F63be7243cace4e852b6f3bb4_7.png",[48,77944,3931],{},[48,77946,77947],{},"The bookie server uses a single thread to handle entries that read requests from a ledger. The bookie server takes a thread from the read worker thread pool associated with the ledger ID. You use the following parameters in bookkeeper.conf to set up the size of the read worker thread pool and the maximum number of pending read requests for each thread:",[48,77949,77950],{},[384,77951],{"alt":21101,"src":77952},"\u002Fimgs\u002Fblogs\u002F63be7480814110d67dd0b6ec_Screenshot-2023-01-11-at-09.33.59.png",[48,77954,77955],{},"When reading entries from ledger storage, the bookie will first find an entry's position in the entry logs through the index file. DbLedgerStorage uses RocksDB to store the index for ledger entries. So, be sure to allocate enough memory to hold a significant portion of the index database to avoid swap-in and swap-out index entries.",[48,77957,77958],{},"For optimum performance, the size of the RocksDB block-cache needs to be big enough to hold a significant portion of the index database, which has been known to reach ~2GB in some cases.",[48,77960,77961],{},"You use the following parameter in bookkeeper.conf to control the size of the RocksDB block cache:",[48,77963,77964],{},[384,77965],{"alt":21101,"src":77966},"\u002Fimgs\u002Fblogs\u002F63be749fbd2de3a3057fce12_Screenshot-2023-01-11-at-09.34.29.png",[48,77968,77969,77970],{},"Enabling the entry read-ahead cache can reduce the operation of the disk for sequential reading. You use the following parameters to configure the entry read-ahead cache size in bookkeeper.conf:\n",[384,77971],{"alt":18,"src":77972},"\u002Fimgs\u002Fblogs\u002F63be74b3482366a4482f8d0b_Screenshot-2023-01-11-at-09.34.50.png",[40,77974,77976],{"id":77975},"_4-metadata-storage-optimization","4. Metadata Storage Optimization",[48,77978,77979],{},"Pulsar uses Apache® ZookeeperTM as its default metadata storage area. ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.",[48,77981,77982,77983,77987],{},"Zookeeper performance tuning is not discussed in this post. For excellent guidance on how to tune Zookeeper, visit ",[55,77984,267],{"href":77985,"rel":77986},"https:\u002F\u002Fzookeeper.apache.org\u002Fdoc\u002Fr3.4.13\u002FzookeeperAdmin.pdf",[264],".​ Of the recommendations mentioned in that document, pay special attention to those pertaining to disk I\u002FO.",[40,77989,77991],{"id":77990},"_5-conclusion","5. Conclusion",[48,77993,77994],{},"Hopefully, this introduction has given you a better understanding of some Pulsar basic concepts and, in particular, some insights into how pulsar handles message writing and reading. To review, we addressed the following concepts:",[321,77996,77997,78000,78003,78006],{},[324,77998,77999],{},"Improving read and write I\u002FO isolation gives bookies higher throughput and lower latency.",[324,78001,78002],{},"Taking advantage of I\u002FO parallelism between multiple disks allows us to optimize the performance of the journal and ledger.",[324,78004,78005],{},"For tailing reads, the entry cache in the broker can reduce resource overhead and avoid competing for resources with writes.",[324,78007,78008],{},"Improving Zookeeper performance maximizes system stability.",[40,78010,78012],{"id":78011},"_6-more-pulsar-resources","6. More Pulsar Resources",[321,78014,78015,78021,78026,78030,78034],{},[324,78016,38396,78017,38400,78019,38405],{},[55,78018,38399],{"href":37361},[55,78020,38404],{"href":38403},[324,78022,78023,62252],{},[55,78024,62251],{"href":31912,"rel":78025},[264],[324,78027,36219,78028,38411],{},[55,78029,38410],{"href":21458},[324,78031,38414,78032,38418],{},[55,78033,38417],{"href":35424},[324,78035,78036,62245],{},[55,78037,10265],{"href":45212,"rel":78038},[264],[48,78040,3931],{},[48,78042,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":78044},[78045,78051,78055,78059,78060,78061],{"id":77603,"depth":19,"text":77604,"children":78046},[78047,78048,78049,78050],{"id":77610,"depth":279,"text":77611},{"id":77627,"depth":279,"text":77628},{"id":77642,"depth":279,"text":77643},{"id":77713,"depth":279,"text":77714},{"id":77798,"depth":19,"text":77799,"children":78052},[78053,78054],{"id":77805,"depth":279,"text":77806},{"id":77852,"depth":279,"text":77853},{"id":77913,"depth":19,"text":77914,"children":78056},[78057,78058],{"id":77920,"depth":279,"text":77921},{"id":77932,"depth":279,"text":77933},{"id":77975,"depth":19,"text":77976},{"id":77990,"depth":19,"text":77991},{"id":78011,"depth":19,"text":78012},"2021-01-14","Learn how to control Pulsar message writes and reads using certain configuration parameters to achieve optimal throughput and latency.","\u002Fimgs\u002Fblogs\u002F63be72252cb463483869a062_top.jpg",{},"\u002Fblog\u002Ftaking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning",{"title":77590,"description":78063},"blog\u002Ftaking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning",[7347,12106,821],"VFTb_rdoCLOwBB20-d6LF7nnxguy-0UKX4vNOum7qgA",{"id":78072,"title":76942,"authors":78073,"body":78075,"category":821,"createdAt":290,"date":78375,"description":78376,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":78377,"navigation":7,"order":296,"path":78378,"readingTime":3556,"relatedResources":290,"seo":78379,"stem":78380,"tags":78381,"__hash__":78382},"blogs\u002Fblog\u002Fhow-apache-pulsar-is-helping-iterable-scale-its-customer-engagement-platform.md",[78074],"Greg Methvin",{"type":15,"value":78076,"toc":78365},[78077,78083,78092,78094,78108,78116,78119,78122,78125,78129,78142,78154,78157,78160,78164,78167,78170,78176,78179,78182,78196,78200,78207,78210,78216,78219,78226,78229,78233,78236,78239,78242,78259,78262,78266,78279,78282,78291,78300,78303,78306,78310,78331,78334,78340,78343,78347,78350,78353,78362],[48,78078,78079],{},[384,78080],{"alt":78081,"src":78082},"pulsar and iterable logo","\u002Fimgs\u002Fblogs\u002F63a396a6e1d5c013ef04b3f2_iterable-top.jpeg",[916,78084,78085],{},[48,78086,64923,78087,78091],{},[55,78088,78090],{"href":53869,"rel":78089},[264],"InfoQ"," on November 30, 2020.",[40,78093,8924],{"id":8923},[321,78095,78096,78099,78102,78105],{},[324,78097,78098],{},"Distributed messaging systems support two types of semantics: streaming and queueing. Each is best suited for certain kinds of use cases.",[324,78100,78101],{},"Apache Pulsar is unique in that it supports both streaming and queueing use cases.",[324,78103,78104],{},"Pulsar's multi-layered architecture allows users to scale the number and size of topics more conveniently than other messaging systems.",[324,78106,78107],{},"Pulsar provided the right balance of scalability, reliability, and features to replace RabbitMQ at Iterable and, ultimately, to replace other messaging systems like Kafka and Amazon SQS.",[48,78109,78110,78111,78115],{},"At ",[55,78112,96],{"href":78113,"rel":78114},"https:\u002F\u002Fiterable.com\u002F",[264],", we send large numbers of marketing messages on behalf of our customers every day. These include email, push, SMS, and in-app messages. Iterable also processes even greater numbers of user updates, events, and custom workflow states daily, many of which can trigger other actions in the system. This results in a system that is not only extremely useful to our customers, but also quite complex. Managing that complexity becomes more critical as our customer base grows.",[48,78117,78118],{},"One way Iterable manages complexity is by using distributed messaging systems in several parts of its architecture. The main purpose of a distributed message system is to store messages that need to be processed by consumers, and to keep track of the state of those consumers in processing those messages. This way, consumers can focus on the task of processing each message.",[48,78120,78121],{},"Iterable uses a work queue approach to execute customer-specified marketing workflows, webhooks, and other types of job scheduling and processing. Other components, such as user and event ingestion, use a streaming model to process ordered streams of messages.",[48,78123,78124],{},"In general, distributed messaging systems support two types of semantics: streaming and queueing. Each is best suited for certain kinds of use cases.",[40,78126,78128],{"id":78127},"streaming-and-queueing","Streaming and Queueing",[48,78130,78131,78132,4003,78136,78141],{},"In streaming message systems, producers append data to a set of append-only streams of messages. Within each stream, messages must be processed in a specific sequence, and consumers mark their place in the stream. Messages may be partitioned using some strategy (such as hashing a user ID) to allow greater parallelism, and each partition acts as a separate stream of data. Because the data in each stream is immutable and only the offset entry is stored, messages may not be skipped. Streaming works well in situations where the order of messages is important, such as data ingestion. ",[55,78133,78135],{"href":31428,"rel":78134},[264],"Kafka",[55,78137,78140],{"href":78138,"rel":78139},"https:\u002F\u002Faws.amazon.com\u002Fkinesis\u002F",[264],"Amazon Kinesis"," are examples of messaging systems that use streaming semantics for consuming messages.",[48,78143,78144,78145,4003,78149,78153],{},"In queueing message systems, producers send messages to a queue which may be shared by multiple consumers. Consumers process messages as they receive them and send an acknowledgement to the queueing system as each message is processed. Because multiple consumers may share a single queue and message sequence is unimportant, it's typically easier to scale the consumer side of a queue-based system. Queueing systems are ideal for work queues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. ",[55,78146,11043],{"href":78147,"rel":78148},"https:\u002F\u002Fwww.rabbitmq.com\u002F",[264],[55,78150,78152],{"href":75956,"rel":78151},[264],"Amazon SQS"," are examples of popular queue-based message systems.",[48,78155,78156],{},"Queueing systems typically include functionality that simplifies the task of handling message-level errors. For example, after an error occurs, RabbitMQ makes it easy to transfer a message to a special queue where it is held for a specified amount of time before being returned to the original queue to be retried. It can also negatively acknowledge a message in order to have it redelivered after a failure. Because most message queues typically do not store messages in a backlog after they have been acknowledged, debugging and disaster recovery are more difficult, as there are no messages to inspect.",[48,78158,78159],{},"A streaming-based system like Kafka may be used for queueing use cases, with some caveats. Indeed, many users choose this option because these systems often offer superior performance. This solution can be a challenge, however, as it places an undue burden on developers to handle the limitations imposed by the strict ordering of streams. If a consumer is slow to consume a message or needs to retry processing following a transient failure, the processing of other messages on the same stream can be delayed. A common solution is to retry processing by republishing messages to another topic, but this introduces complexity because the application logic has to manage additional states.",[40,78161,78163],{"id":78162},"why-iterable-needed-a-new-messaging-platform","Why Iterable Needed a New Messaging Platform",[48,78165,78166],{},"We had been using RabbitMQ heavily and relied on its features to handle internal messaging. We use Time-to-Live (TTL) values liberally, not only for fixed-length retries, but also to implement explicit delays in message processing. For example, we might delay sending a marketing email so the marketing message can be delivered to each recipient at the time when they are most likely to open it. We also rely on negative acknowledgements to retry queued messages.",[48,78168,78169],{},"Here's a simplified version of what our architecture looks like:",[48,78171,78172],{},[384,78173],{"alt":78174,"src":78175},"Messaging Platform","\u002Fimgs\u002Fblogs\u002F63a396a660075d25eca2a991_iterable-1.jpeg",[48,78177,78178],{},"When we started evaluating Pulsar, all the queues mentioned above were on RabbitMQ, except for ingestion, which used Kafka. Kafka was a fit for ingestion, since it provided the necessary performance and ordering guarantees. Kafka was not a good fit for the other use cases, since it lacked the necessary work-queue semantics. The fact that we used many RabbitMQ-specific features like delays also made it more challenging to find an alternative.",[48,78180,78181],{},"As we scaled our system, RabbitMQ began to show the following limitations:",[321,78183,78184,78187,78190,78193],{},[324,78185,78186],{},"At high loads, RabbitMQ frequently experienced flow control issues. Flow control is a mechanism that slows publishers when the message broker cannot keep up, usually because of memory and other resource limits. This impeded the ability of the producers to publish, which caused service delays and request failures in other areas. Specifically, we noticed that flow control occurred more often when large numbers of messages had TTLs that expired at the same time. In these cases, RabbitMQ attempted to deliver the expiring messages to their destination queue all at once. This overwhelmed the memory capacity of the RabbitMQ instance, which triggered the flow control mechanism for normal producers, blocking their attempts to publish.",[324,78188,78189],{},"Debugging became more difficult because RabbitMQ's broker does not store messages after they are acknowledged. In other words, it is not possible to set a retention time for messages.",[324,78191,78192],{},"Replication was difficult to achieve, as the replication component in RabbitMQ was not robust enough for our use cases, leading to RabbitMQ being a single point of failure for our message state.",[324,78194,78195],{},"RabbitMQ had difficulty handling large numbers of queues. As we have many use cases that require dedicated queues, we often need more than 10,000 queues at a time. At this level, RabbitMQ experienced performance issues, which usually appeared first in the management interface and API.",[40,78197,78199],{"id":78198},"evaluating-apache-pulsar","Evaluating Apache Pulsar",[48,78201,78202,78203,78206],{},"Overall, ",[55,78204,821],{"href":23526,"rel":78205},[264]," appeared to offer all the features we needed. While a lot of the publicity we had seen around Pulsar had compared it to Kafka for streaming workloads, we also discovered that Pulsar was a great fit for our queueing needs. Pulsar's shared subscription feature allows topics to be used as queues, potentially offering multiple virtual queues to different subscribers within the same topic. Pulsar also supports delayed and scheduled messages natively, though these features were very new at the time we started considering Pulsar.",[48,78208,78209],{},"In addition to providing a rich feature set, Pulsar's multi-layered architecture allows us to scale the number and size of topics more conveniently than other messaging systems.",[48,78211,78212],{},[384,78213],{"alt":78214,"src":78215},"illustration Evaluating Apache Pulsar","\u002Fimgs\u002Fblogs\u002F63a396a67d7fba11c49bbd10_iterable-2.jpeg",[48,78217,78218],{},"Pulsar's top layer consists of brokers, which accept messages from producers and send them to consumers, but do not store data. A single broker handles each topic partition, but the brokers can easily exchange topic ownership, as they do not store topic states. This makes it easy to add brokers to increase throughput and immediately take advantage of new brokers. This also enables Pulsar to handle broker failures.",[48,78220,78221,78222,78225],{},"Pulsar's bottom layer, ",[55,78223,12106],{"href":23555,"rel":78224},[264],", stores topic data in segments, which are distributed across the cluster. If additional storage is needed, we can easily add BookKeeper nodes (bookies) to the cluster and use them to store new topic segments. Brokers coordinate with bookies to update the state of each topic as it changes. Pulsar's use of BookKeeper for topic data also helps it to support a very large number of topics, which is critical for many of Iterable's current use cases.",[48,78227,78228],{},"After evaluating several messaging systems, we decided that Pulsar provided the right balance of scalability, reliability, and features to replace RabbitMQ at Iterable and, ultimately, to replace other messaging systems like Kafka and Amazon SQS.",[40,78230,78232],{"id":78231},"first-pulsar-use-case-message-sends","First Pulsar Use Case: Message Sends",[48,78234,78235],{},"One of the most important functions of Iterable's platform is to schedule and send marketing emails on behalf of Iterable's customers. To do this, we publish messages to customer-specific queues, then have another service that handles the final rendering and sending of the message. These queues were the first thing we decided to migrate from RabbitMQ to Pulsar.",[48,78237,78238],{},"We chose marketing message sends as our first Pulsar use case for two reasons. First, because sending incorporated some of our more complex RabbitMQ use cases. And second, because it represented a very large portion of our RabbitMQ usage. This was not the lowest risk use case; however, after extensive performance and scalability testing, we felt it was where Pulsar could add the most value.",[48,78240,78241],{},"Here are three common types of campaigns created on the Iterable platform:",[1666,78243,78244,78253,78256],{},[324,78245,78246,78247,78252],{},"Blast campaigns that send a marketing message to all recipients at the same time. Suppose a customer wants to send an email newsletter to users who have been active in the past month. In this case, we can query ",[55,78248,78251],{"href":78249,"rel":78250},"https:\u002F\u002Fwww.elastic.co\u002F",[264],"ElasticSearch"," for the list of users at the time the campaign is scheduled and publish them to that customer's Pulsar topic.",[324,78254,78255],{},"Blast campaigns that specify a custom send time for each recipient. The send time can be either fixed — for example, \"9AM in the recipient's local time zone\" — or computed by our send-time optimization feature. In each case, we want to delay the processing of the queued message until the designated time.",[324,78257,78258],{},"User-triggered campaigns. These can be triggered by a custom workflow or by a user-initiated transaction, such as an online purchase. User-triggered marketing sends are done individually on demand.",[48,78260,78261],{},"In each of the above scenarios the number of sends being performed at any given time can vary widely, so we also need to be able to scale consumers up and down to account for the changing load.",[40,78263,78265],{"id":78264},"migrating-to-apache-pulsar","Migrating to Apache Pulsar",[48,78267,78268,78269,4003,78274,190],{},"Although Pulsar had performed well in load tests, we were unsure if it would be able to sustain high load levels in production. This was a special concern because we planned to take advantage of several of Pulsar's new features, including ",[55,78270,78273],{"href":78271,"rel":78272},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-messaging\u002F#negative-acknowledgement",[264],"negative acknowledgements",[55,78275,78278],{"href":78276,"rel":78277},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002F2.5.0\u002Fconcepts-messaging\u002F#delayed-message-delivery",[264],"scheduled message delivery",[48,78280,78281],{},"To build our confidence, we implemented a parallel pipeline in which we published messages to both RabbitMQ and Pulsar; in this case, we set up the consumers on these topics to acknowledge queued messages without actually processing them. We also simulated consumption delays. This helped us understand Pulsar's behavior in our particular production environment. We used customer-level feature flags for both test topics and actual production topics, so we could migrate customers one-by-one for testing and, ultimately, for production usage.",[48,78283,78284,78285,78290],{},"During testing, we uncovered a few bugs in Pulsar. For example, we found a ",[55,78286,78289],{"href":78287,"rel":78288},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F5499",[264],"race condition associated with delayed messages",", which Pulsar developers helped to identify and fix. This was the most serious issue we found, as it caused consumers to get stuck, creating a backlog of unconsumed messages.",[48,78292,78293,78294,78299],{},"We also noticed some interesting issues related to Pulsar's batching of messages, which is enabled by default in Pulsar producers. For example, we noticed that Pulsar's backlog metrics report the number of batches rather than the actual number of messages, which makes it more challenging to set alert thresholds for message backlogs. Later we discovered a more serious ",[55,78295,78298],{"href":78296,"rel":78297},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F5969",[264],"bug"," in the interaction between negative acknowledgements and batching, which has recently been fixed. Ultimately we decided batching was not worth the trouble. Fortunately it's easy to disable batching in Pulsar producers, and the performance without batching was more than sufficient for our needs. These issues are also likely to be fixed in upcoming releases.",[48,78301,78302],{},"Delays and negative acknowledgements were relatively new features at the time, so we anticipated we might find some issues. This is why we chose to migrate to Pulsar slowly over many months, initially publishing to only test topics then gradually migrating real sends. This approach enabled us to identify issues before they could become problems for our customers. Although it took around six months to develop complete confidence that Pulsar was working as intended, the outcome was worth the time.",[48,78304,78305],{},"We migrated our entire marketing sends operation to Pulsar over the course of about six months. When migration was complete, we found that Pulsar reduced our operational costs by nearly half, with room to grow as we add new customers. The cost reduction was significant, in part, because our RabbitMQ instances had been overprovisioned to compensate for performance issues. To date, our Pulsar cluster has been running smoothly for over six months with no issues.",[40,78307,78309],{"id":78308},"implementation-and-tooling","Implementation and Tooling",[48,78311,78312,78313,78318,78319,78324,78325,78330],{},"Iterable primarily uses ",[55,78314,78317],{"href":78315,"rel":78316},"https:\u002F\u002Fwww.scala-lang.org\u002F",[264],"Scala"," on the backend, so having good Scala tooling for Pulsar was important to us. We've used the excellent ",[55,78320,78323],{"href":78321,"rel":78322},"https:\u002F\u002Fwww.google.com\u002Furl?q=https:\u002F\u002Fgithub.com\u002Fsksamuel\u002Fpulsar4s&sa=D&ust=1601492836975000&usg=AFQjCNFrrI2ad0hiHpMogUSNrgoQ6mVduA",[264],"pulsar4s"," library and have made numerous contributions that support new features, such as delayed messages. We also contributed an ",[55,78326,78329],{"href":78327,"rel":78328},"https:\u002F\u002Fdoc.akka.io\u002Fdocs\u002Fakka\u002Fcurrent\u002Fstream\u002Findex.html",[264],"Akka Streams-based"," connector for consuming messages as a source, with individual acknowledgement support.",[48,78332,78333],{},"For example, we can consume all the topics in a namespace like this:",[8325,78335,78338],{"className":78336,"code":78337,"language":8330},[8328],"\n\u002F\u002F Create a consumer on all topics in this namespace\nval createConsumer = () => client.consumer(ConsumerConfig(\n  topicPattern = \"persistent:\u002F\u002Femail\u002Fproject-123\u002F.*\".r,\n  subscription = Subscription(\"email-service\")\n))\n\n\u002F\u002F Create an Akka streams `Source` stage for this consumer\nval pulsarSource = committableSource(createConsumer, Some(MessageId.earliest))\n\n\u002F\u002F Materialize the source and get back a `control` to shut it down later.\nval control = pulsarSource.mapAsync(parallelism)(handleMessage).to(Sink.ignore).run()\n\n",[4926,78339,78337],{"__ignoreMap":18},[48,78341,78342],{},"We like using regular expression subscriptions for consumers. They make it easy to automatically subscribe to new topics as they're created and make it so consumers don't have to be aware of a specific topic partitioning strategy. At the same time, we're also taking advantage of Pulsar's ability to support a large number of topics. Since Pulsar automatically creates new topics on publish, it's simple to create new topics for new message types or even for individual campaigns. This also makes it easier to implement rate limits for different customers and types of messages.",[40,78344,78346],{"id":78345},"what-we-learned","What We Learned",[48,78348,78349],{},"As Pulsar is a rapidly evolving open-source project, we had some challenges—mainly in getting up to speed and learning its quirks—that we might not have seen with other more mature technologies. The documentation was not always complete, and we often needed to lean on the community for help. That said, the community has been quite welcoming and helpful, and we were happy to get more involved with Pulsar's development and participate in discussions around new features.",[48,78351,78352],{},"Pulsar is unique in that it supports both streaming and queueing use cases, while also supporting a wide feature set that makes it a viable alternative to many other distributed messaging technologies currently being used in our architecture. Pulsar covers all of our use cases for Kafka, RabbitMQ, and SQS. This lets us focus on building expertise and tooling around a single unified system.",[48,78354,78355,78356,78361],{},"We have been encouraged by the progress in Pulsar's development since we started working with it in early 2019, particularly in the barriers to entry for beginners. The tooling has improved substantially: for example, ",[55,78357,78360],{"href":78358,"rel":78359},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar-manager",[264],"Pulsar Manager"," now provides a very convenient GUI for managing the cluster. We also see many companies offering hosted and managed Pulsar services, which makes it easier for startups and small teams to start using Pulsar.",[48,78363,78364],{},"Overall, Iterable's transition to Pulsar has been interesting and sometimes challenging, but quite successful so far. In many ways, our use cases represented a new path that had not been widely pursued. We expected to encounter some problems, but our testing process helped minimize their impact on our customers. We now feel confident using Pulsar, and are continuing to expand our use of Pulsar for other existing and new components in Iterable's platform.",{"title":18,"searchDepth":19,"depth":19,"links":78366},[78367,78368,78369,78370,78371,78372,78373,78374],{"id":8923,"depth":19,"text":8924},{"id":78127,"depth":19,"text":78128},{"id":78162,"depth":19,"text":78163},{"id":78198,"depth":19,"text":78199},{"id":78231,"depth":19,"text":78232},{"id":78264,"depth":19,"text":78265},{"id":78308,"depth":19,"text":78309},{"id":78345,"depth":19,"text":78346},"2021-01-05","Iterable confidently uses Pulsar to power customer engagement, and are continuing to expand their use of Pulsar for other existing and new components in Iterable's platform.",{},"\u002Fblog\u002Fhow-apache-pulsar-is-helping-iterable-scale-its-customer-engagement-platform",{"title":76942,"description":78376},"blog\u002Fhow-apache-pulsar-is-helping-iterable-scale-its-customer-engagement-platform",[35559,821],"1nqFYaQQKFRGdq7dnij5P06hw1PkAsqL7kqierBOcdo",{"id":78384,"title":78385,"authors":78386,"body":78387,"category":821,"createdAt":290,"date":78647,"description":78648,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":78649,"navigation":7,"order":296,"path":78650,"readingTime":42793,"relatedResources":290,"seo":78651,"stem":78652,"tags":78653,"__hash__":78654},"blogs\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-0.md","What's New in Apache Pulsar 2.7.0",[808],{"type":15,"value":78388,"toc":78637},[78389,78392,78396,78399,78402,78405,78411,78414,78420,78423,78429,78432,78438,78449,78452,78455,78458,78464,78467,78473,78480,78484,78487,78493,78500,78504,78507,78513,78519,78523,78526,78529,78535,78538,78542,78545,78553,78560,78564,78567,78577,78581,78602,78605,78631],[48,78390,78391],{},"We are very glad to see the Apache Pulsar community has successfully released the wonderful 2.7.0 version after accumulated hard work. It is a great milestone for this fast-growing project and the whole Pulsar community. This is the result of a huge effort from the community, with over 450 commits and a long list of new features, improvements, and bug fixes.",[40,78393,78395],{"id":78394},"transaction-support","Transaction support",[48,78397,78398],{},"Transactional semantics enable event streaming applications to consume, process, and produce messages in one atomic operation. With transactions, Pulsar achieves the exactly-once semantics for a single partition and multiple partitions as well. This enables new use cases with Pulsar where a client (either as a producer or consumer) can work with messages across multiple topics and partitions and ensure those messages will all be processed as a single unit. This will strengthen the message delivery semantics of Apache Pulsar and processing guarantees for Pulsar Functions.",[48,78400,78401],{},"Currently, Pulsar transactions are in developer preview. The community will work further to enhance the feature to be used in the production environment soon.",[48,78403,78404],{},"To enable transactions in Pulsar, you need to configure the parameter in the broker.conf file.",[8325,78406,78409],{"className":78407,"code":78408,"language":8330},[8328],"transactionCoordinatorEnabled=true\n",[4926,78410,78408],{"__ignoreMap":18},[48,78412,78413],{},"Initialize transaction coordinator metadata, so the transaction coordinators can leverage advantages of the partitioned topic, such as load balance.",[8325,78415,78418],{"className":78416,"code":78417,"language":8330},[8328],"bin\u002Fpulsar initialize-transaction-coordinator-metadata -cs 127.0.0.1:2181 -c standalone\n",[4926,78419,78417],{"__ignoreMap":18},[48,78421,78422],{},"From the client-side, you can also enable the transactions for the Pulsar client.",[8325,78424,78427],{"className":78425,"code":78426,"language":8330},[8328],"PulsarClient pulsarClient = PulsarClient.builder()\n        .serviceUrl(\"pulsar:\u002F\u002Flocalhost:6650\")\n        .enableTransaction(true)\n        .build();\n",[4926,78428,78426],{"__ignoreMap":18},[48,78430,78431],{},"Here is an example to demonstrate the Pulsar transactions.",[8325,78433,78436],{"className":78434,"code":78435,"language":8330},[8328],"\u002F\u002F Open a transaction\nTransaction txn = pulsarClient\n        .newTransaction()\n        .withTransactionTimeout(5, TimeUnit.MINUTES)\n        .build()\n        .get();\n\n\u002F\u002F  Publish messages with the transaction\nproducer.newMessage(txn).value(\"Hello Pulsar Transaction\".getBytes()).send();\n\n\u002F\u002F Consume and acknowledge messages with the transaction\nMessage message = consumer.receive();\nconsumer.acknowledgeAsync(message.getMessageId(), txn);\n\n\u002F\u002F Commit the transaction\ntxn.commit()\n",[4926,78437,78435],{"__ignoreMap":18},[48,78439,78440,78441,78445,78446,190],{},"For more details about the Pulsar transactions, refer to ",[55,78442,267],{"href":78443,"rel":78444},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Ftransactions\u002F",[264],". For more details about the design of Pulsar transactions, refer to ",[55,78447,267],{"href":71438,"rel":78448},[264],[40,78450,78451],{"id":42965},"Topic level policy",[48,78453,78454],{},"Pulsar 2.7.0 introduces the system topic which can maintain all policy change events to achieve the topic level policy. All policies at the namespace level are now also available at the topic level, so users can set different policies at the topic level flexibly without using lots of metadata service resources. The topic level policy enables users to manage topics more flexibly and adds no burden to ZooKeeper.",[48,78456,78457],{},"To enable topic level policy in Pulsar, you need to configure the parameter in the broker.conf file.",[8325,78459,78462],{"className":78460,"code":78461,"language":8330},[8328],"systemTopicEnabled=true\ntopicLevelPoliciesEnabled=true\n",[4926,78463,78461],{"__ignoreMap":18},[48,78465,78466],{},"After topic level policy is enabled, you can use Pulsar Admin to update the policy of a topic. Here is an example for setting the data retention for a specific topic.",[8325,78468,78471],{"className":78469,"code":78470,"language":8330},[8328],"bin\u002Fpulsar-admin topics set-retention -s 10G -t 7d persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic\n",[4926,78472,78470],{"__ignoreMap":18},[48,78474,78475,78476,190],{},"For more details about the system topic and topic level policy, refer to ",[55,78477,267],{"href":78478,"rel":78479},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-39%3A-Namespace-Change-Events",[264],[40,78481,78483],{"id":78482},"support-azure-blobstore-offloader","Support Azure BlobStore offloader",[48,78485,78486],{},"In Pulsar 2.7.0, we add support for Azure BlobStore offloader, which allows users to offload topic data into Azure BlobStore. You can configure the Azure BlobStore offloader driver in the configuration broker.conf file.",[8325,78488,78491],{"className":78489,"code":78490,"language":8330},[8328],"managedLedgerOffloadDriver=azureblob\n",[4926,78492,78490],{"__ignoreMap":18},[48,78494,78495,78496,190],{},"For more details, refer to ",[55,78497,267],{"href":78498,"rel":78499},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8436",[264],[40,78501,78503],{"id":78502},"native-protobuf-schema-support","Native protobuf schema support",[48,78505,78506],{},"Pulsar 2.7.0 introduces a native protobuf schema support, which can provide more ability for protobuf users who want to integrate with Pulsar. Here is an example to show how to use native protobuf schema in Java client:",[8325,78508,78511],{"className":78509,"code":78510,"language":8330},[8328],"Consumer\n consumer = client.newConsumer(Schema.PROTOBUFNATIVE(PBMessage.class))\n.topic(topic)\n.subscriptionName(\"my-subscription-name\")\n.subscribe();\n",[4926,78512,78510],{"__ignoreMap":18},[48,78514,78495,78515,190],{},[55,78516,267],{"href":78517,"rel":78518},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8372",[264],[40,78520,78522],{"id":78521},"resource-limitation","Resource limitation",[48,78524,78525],{},"In Pulsar, tenant, namespace, and topic are the core resources of a cluster. Pulsar 2.7.0 enables you to limit the maximum tenants of a cluster, the maximum namespaces per tenant, the maximum topics per namespace, and the maximum subscriptions per topic.",[48,78527,78528],{},"You can configure the resource limitations in the broker.conf file.",[8325,78530,78533],{"className":78531,"code":78532,"language":8330},[8328],"maxTenants=0\nmaxNamespacesPerTenant=0\nmaxTopicsPerNamespace=0\nmaxSubscriptionsPerTopic=0\n",[4926,78534,78532],{"__ignoreMap":18},[48,78536,78537],{},"This provides Pulsar administrators with great convenience in resource management.",[40,78539,78541],{"id":78540},"support-e2e-encryption-for-pulsar-functions","Support e2e encryption for Pulsar Functions",[48,78543,78544],{},"Pulsar 2.7.0 enables you to add End-to-End (e2e) encryption for Pulsar Functions. You can use the public and private key pair that the application configured to perform encryption. Only consumers with a valid key can decrypt encrypted messages.",[48,78546,78547,78548,190],{},"To enable End-to-End encryption on Functions Worker, you can set it by specifying --producer-config in the command line terminal. For more information, refer to ",[55,78549,78552],{"href":78550,"rel":78551},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fsecurity-encryption\u002F",[264],"Pulsar Encryption",[48,78554,78555,78556],{},"For more details, you can see ",[55,78557,267],{"href":78558,"rel":78559},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8432",[264],[40,78561,78563],{"id":78562},"function-rebalance","Function rebalance",[48,78565,78566],{},"Before 2.7.0, there was no mechanism for rebalancing functions scheduler on workers. The workload for functions might become skewed. Pulsar 2.7.0 supports manual trigger functions rebalance and automatic periodic functions rebalance.",[48,78568,78495,78569,4003,78573,190],{},[55,78570,78571],{"href":78571,"rel":78572},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7388",[264],[55,78574,78575],{"href":78575,"rel":78576},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7449",[264],[40,78578,78580],{"id":78579},"more-information","More information",[321,78582,78583,78589],{},[324,78584,78585,78586,190],{},"To download Apache Pulsar 2.7.0, click ",[55,78587,267],{"href":53730,"rel":78588},[264],[324,78590,78591,78592,4003,78597,190],{},"For more information about Apache Pulsar 2.7.0, see ",[55,78593,78596],{"href":78594,"rel":78595},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#2.7.0",[264],"2.7.0 release notes",[55,78598,78601],{"href":78599,"rel":78600},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpulls?q=milestone%3A2.7.0+-label%3Arelease%2F2.6.2+-label%3Arelease%2F2.6.1+",[264],"2.7.0 PR list",[48,78603,78604],{},"If you have any questions or suggestions, contact us with mailing lists or slack.",[321,78606,78607,78613,78619,78625],{},[324,78608,78609],{},[55,78610,78612],{"href":78611},"mailto:users@pulsar.apache.org","users@pulsar.apache.org",[324,78614,78615],{},[55,78616,78618],{"href":78617},"mailto:dev@pulsar.apache.org","dev@pulsar.apache.org",[324,78620,78621,78622],{},"Pulsar slack channel: ",[55,78623,36242],{"href":36242,"rel":78624},[264],[324,78626,78627,78628],{},"Self-registration at ",[55,78629,57760],{"href":57760,"rel":78630},[264],[48,78632,78633,78634,190],{},"Looking forward to your contributions to ",[55,78635,821],{"href":36230,"rel":78636},[264],{"title":18,"searchDepth":19,"depth":19,"links":78638},[78639,78640,78641,78642,78643,78644,78645,78646],{"id":78394,"depth":19,"text":78395},{"id":42965,"depth":19,"text":78451},{"id":78482,"depth":19,"text":78483},{"id":78502,"depth":19,"text":78503},{"id":78521,"depth":19,"text":78522},{"id":78540,"depth":19,"text":78541},{"id":78562,"depth":19,"text":78563},{"id":78579,"depth":19,"text":78580},"2020-12-25","Learn the most interesting and major features in Apache Pulsar 2.7.0 and how to use them.",{},"\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-7-0",{"title":78385,"description":78648},"blog\u002Fwhats-new-in-apache-pulsar-2-7-0",[302,821,9144],"hw4ANQDBX2dVtz1NbRl9AJnGada-6HgN4Bw4kt6_Ydg",{"id":78656,"title":76999,"authors":78657,"body":78660,"category":821,"createdAt":290,"date":78964,"description":78965,"extension":8,"featured":294,"image":78966,"isDraft":294,"link":290,"meta":78967,"navigation":7,"order":296,"path":78968,"readingTime":4475,"relatedResources":290,"seo":78969,"stem":78970,"tags":78971,"__hash__":78972},"blogs\u002Fblog\u002Fwhats-new-in-pulsar-flink-connector-2-7-0.md",[78658,78659],"Jianyun Zhao","Jennifer Huang",{"type":15,"value":78661,"toc":78944},[78662,78666,78669,78682,78684,78705,78708,78711,78715,78718,78721,78725,78728,78734,78737,78741,78744,78747,78750,78753,78756,78759,78763,78766,78777,78780,78788,78802,78805,78814,78822,78826,78832,78835,78838,78842,78845,78859,78862,78865,78871,78874,78880,78883,78886,78894,78897,78903,78907,78910,78913,78916,78922,78929,78931],[40,78663,78665],{"id":78664},"about-pulsar-flink-connector","About Pulsar Flink Connector",[48,78667,78668],{},"In order for companies to access real-time data insights, they need unified batch and streaming capabilities. Apache Flink unifies batch and stream processing into one single computing engine with “streams” as the unified data representation. Although developers have done extensive work at the computing and API layers, very little work has been done at the data and messaging and storage layers. However, in reality, data is segregated into data silos, created by various storage and messaging technologies. As a result, there is still no single source-of-truth and the overall operation for the developer teams is still messy. To address the messy operations, we need to store data in streams. Apache Pulsar (together with Apache BookKeeper) perfectly meets the criteria: data is stored as one copy (source-of-truth), and can be accessed in streams (via pub-sub interfaces) and segments (for batch processing). When Flink and Pulsar come together, the two open source technologies create a unified data architecture for real-time data-driven businesses.",[48,78670,3600,78671,78674,78675,4003,78678,78681],{},[55,78672,76361],{"href":76359,"rel":78673},[264]," provides elastic data processing with ",[55,78676,821],{"href":23526,"rel":78677},[264],[55,78679,31802],{"href":31800,"rel":78680},[264],", allowing Apache Flink to read\u002Fwrite data from\u002Fto Apache Pulsar. The Pulsar Flink Connector enables you to concentrate on your business logic without worrying about the storage details.",[40,78683,19190],{"id":19189},[48,78685,78686,78687,78692,78693,78698,78699,78704],{},"When we first developed the Pulsar Flink Connector, it received wide adoption from both the Flink and Pulsar communities. Leveraging the Pulsar Flink connector, ",[55,78688,78691],{"href":78689,"rel":78690},"https:\u002F\u002Fwww.hpe.com\u002Fus\u002Fen\u002Fhome.html",[264],"Hewlett Packard Enterprise (HPE)"," built a real-time computing platform, ",[55,78694,78697],{"href":78695,"rel":78696},"https:\u002F\u002Fwww.bigo.sg\u002F",[264],"BIGO"," built a real-time message processing system, and ",[55,78700,78703],{"href":78701,"rel":78702},"https:\u002F\u002Fwww.zhihu.com\u002F",[264],"Zhihu"," is in the process of assessing the Connector’s fit for a real-time computing system.",[48,78706,78707],{},"As more users adopted the Pulsar Flink Connector, we heard a common issue from the community: it’s hard to do serialization and deserialization. While the Pulsar Flink connector leverages Pulsar serialization, the previous versions did not support the Flink data format. As a result, users had to do a lot of configurations in order to use the connector to do real-time computing.",[48,78709,78710],{},"To make the Pulsar Flink connector easier to use, we decided to build the capabilities to fully support the Flink data format, so users do not need to spend time on configuration.",[40,78712,78714],{"id":78713},"whats-new-in-pulsar-flink-connector-270","What’s New in Pulsar Flink Connector 2.7.0?",[48,78716,78717],{},"The Pulsar Flink Connector 2.7.0 supports features in Apache Pulsar 2.7.0 and Apache Flink 1.12, and is fully compatible with the Flink connector and Flink message format. Now, you can use important features in Flink, such as exactly-once sink, upsert Pulsar mechanism, Data Definition Language (DDL) computed columns, watermarks, and metadata. You can also leverage the Key-Shared subscription in Pulsar, and conduct serialization and deserialization without much configuration. Additionally, you can customize the configuration based on your business easily.",[48,78719,78720],{},"Below, we introduce the key features in Pulsar Flink Connector 2.7.0 in detail.",[32,78722,78724],{"id":78723},"ordered-message-queue-with-high-performance","Ordered message queue with high-performance",[48,78726,78727],{},"When users needed to guarantee the ordering of messages strictly, only one consumer was allowed to consume messages. This had a severe impact on the throughput. To address this, we designed a Key_Shared subscription model in Pulsar. It guarantees the ordering of messages and improves throughput by adding a Key to each message, and routes messages with the same Key Hash to one consumer.",[48,78729,78730],{},[384,78731],{"alt":78732,"src":78733},"illustration Pulsar Flink Connector 2.7.0 ","\u002Fimgs\u002Fblogs\u002F63a3945553018f454a592131_pulsar-key-shared.png",[48,78735,78736],{},"Pulsar Flink Connector 2.7.0 supports the Key_Shared subscription model. You can enable this feature by setting enable-key-hash-range to true. The Key Hash range processed by each consumer is decided by the parallelism of tasks.",[32,78738,78740],{"id":78739},"introducing-exactly-once-semantics-for-pulsar-sink-based-on-the-pulsar-transaction","Introducing exactly-once semantics for Pulsar sink (based on the Pulsar transaction)",[48,78742,78743],{},"In previous versions, sink operators only supported at-least-once semantics, which could not fully meet requirements for end-to-end consistency. To deduplicate messages, users had to do some dirty work, which was not user-friendly.",[48,78745,78746],{},"Transactions are supported in Pulsar 2.7.0, which will greatly improve the fault tolerance capability of Flink sink. In Pulsar Flink Connector 2.7.0, we designed exactly-once semantics for sink operators based on Pulsar transactions. Flink uses the two-phase commit protocol to implement TwoPhaseCommitSinkFunction. The main life cycle methods are beginTransaction(), preCommit(), commit(), abort(), recoverAndCommit(), recoverAndAbort().",[48,78748,78749],{},"You can select semantics flexibly when creating a sink operator, and the internal logic changes are transparent. Pulsar transactions are similar to the two-phase commit protocol in Flink, which will greatly improve the reliability of Connector Sink.",[48,78751,78752],{},"It’s easy to implement beginTransaction and preCommit. You only need to start a Pulsar transaction, and persist the TID of the transaction after the checkpoint. In the preCommit phase, you need to ensure that all messages are flushed to Pulsar, and messages pre-committed will be committed eventually.",[48,78754,78755],{},"We focus on recoverAndCommit and recoverAndAbort in implementation. Limited by Kafka features, Kafka connector adopts hack styles for recoverAndCommit. Pulsar transactions do not rely on the specific Producer, so it’s easy for you to commit and abort transactions based on TID.",[48,78757,78758],{},"Pulsar transactions are highly efficient and flexible. Taking advantages of Pulsar and Flink, the Pulsar Flink connector is even more powerful. We will continue to improve transactional sink in the Pulsar Flink connector.",[32,78760,78762],{"id":78761},"introducing-upsert-pulsar-connector","Introducing upsert-pulsar connector",[48,78764,78765],{},"Users in the Flink community expressed their needs for the upsert Pulsar. After looking through mailing lists and issues, we’ve summarized the following three reasons.",[321,78767,78768,78771,78774],{},[324,78769,78770],{},"Interpret Pulsar topic as a changelog stream that interprets records with keys as upsert (aka insert\u002Fupdate) events.",[324,78772,78773],{},"As a part of the real time pipeline, join multiple streams for enrichment and store results into a Pulsar topic for further calculation later. However, the result may contain update events.",[324,78775,78776],{},"As a part of the real time pipeline, aggregate on data streams and store results into a Pulsar topic for further calculation later. However, the result may contain update events.",[48,78778,78779],{},"Based on the requirements, we add support for Upsert Pulsar. The upsert-pulsar connector allows for reading data from and writing data into Pulsar topics in the upsert fashion.",[321,78781,78782,78785],{},[324,78783,78784],{},"As a source, the upsert-pulsar connector produces a changelog stream, where each data record represents an update or delete event. More precisely, the value in a data record is interpreted as an UPDATE of the last value for the same key, if any (if a corresponding key does not exist yet, the update will be considered an INSERT). Using the table analogy, a data record in a changelog stream is interpreted as an UPSERT (aka INSERT\u002FUPDATE) because any existing row with the same key is overwritten. Also, null values are interpreted in a special way: a record with a null value represents a “DELETE”.",[324,78786,78787],{},"As a sink, the upsert-pulsar connector can consume a changelog stream. It will write INSERT\u002FUPDATE_AFTER data as normal Pulsar messages value, and write DELETE data as Pulsar messages with null values (indicate tombstone for the key). Flink will guarantee the message ordering on the primary key by partition data on the values of the primary key columns, so the update\u002Fdeletion messages on the same key will fall into the same partition.",[32,78789,78791,78792,4003,78797],{"id":78790},"support-new-source-interface-and-table-api-introduced-in-flip-27-and-flip-95","Support new source interface and Table API introduced in ",[55,78793,78796],{"href":78794,"rel":78795},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FFLINK\u002FFLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-BatchandStreamingUnification",[264],"FLIP-27",[55,78798,78801],{"href":78799,"rel":78800},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FFLINK\u002FFLIP-95%3A+New+TableSource+and+TableSink+interfaces",[264],"FLIP-95",[48,78803,78804],{},"This feature unifies the source of the batch stream and optimizes the mechanism for task discovery and data reading. It is also the cornerstone of our implementation of Pulsar batch and streaming unification. The new Table API supports DDL computed columns, watermarks and metadata.",[32,78806,78808,78809],{"id":78807},"support-sql-read-and-write-metadata-as-described-in-flip-107","Support SQL read and write metadata as described in ",[55,78810,78813],{"href":78811,"rel":78812},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FFLINK\u002FFLIP-107%3A+Handling+of+metadata+in+SQL+connectors",[264],"FLIP-107",[48,78815,78816,78817,190],{},"FLIP-107 enables users to access connector metadata as a metadata column in table definitions. In real-time computing, users usually need additional information, such as eventTime, customized fields. Pulsar Flink connector supports SQL read and write metadata, so it is flexible and easy for users to manage metadata of Pulsar messages in Pulsar Flink Connector 2.7.0. For details on the configuration, refer to ",[55,78818,78821],{"href":78819,"rel":78820},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink#pulsar-message-metadata-manipulation",[264],"Pulsar Message metadata manipulation",[32,78823,78825],{"id":78824},"add-flink-format-type-atomic-to-support-pulsar-primitive-types","Add Flink format type atomic to support Pulsar primitive types",[48,78827,78828,78829,190],{},"In Pulsar Flink Connector 2.7.0, we add Flink format type atomic to support Pulsar primitive types. When Flink processing requires a Pulsar primitive type, you can use atomic as the connector format. For more information on Pulsar primitive types, see ",[55,78830,62718],{"href":62718,"rel":78831},[264],[40,78833,32622],{"id":78834},"migration",[48,78836,78837],{},"If you’re using the previous Pulsar Flink Connector version, you need to adjust SQL and API parameters accordingly. Below we provide details on each.",[32,78839,78841],{"id":78840},"sql","SQL",[48,78843,78844],{},"In SQL, we’ve changed Pulsar configuration parameters in DDL declaration. The name of some parameters are changed, but the values are not changed.",[321,78846,78847,78850,78853,78856],{},[324,78848,78849],{},"Remove the connector. prefix from the parameter names.",[324,78851,78852],{},"Change the name of the connector.type parameter into connector.",[324,78854,78855],{},"Change the startup mode parameter name from connector.startup-mode into scan.startup.mode.",[324,78857,78858],{},"Adjust Pulsar properties as properties.pulsar.reader.readername=testReaderName.",[48,78860,78861],{},"If you use SQL in Pulsar Flink Connector, you need to adjust your SQL configuration accordingly when migrating to Pulsar Flink Connector 2.7.0. The following sample shows the differences between previous versions and the 2.7.0 version for SQL.",[48,78863,78864],{},"SQL in previous versions：",[8325,78866,78869],{"className":78867,"code":78868,"language":8330},[8328],"\ncreate table topic1(\n    `rip` VARCHAR,\n    `rtime` VARCHAR,\n    `uid` bigint,\n    `client_ip` VARCHAR,\n    `day` as TO_DATE(rtime),\n    `hour` as date_format(rtime,'HH')\n) with (\n    'connector.type' ='pulsar',\n    'connector.version' = '1',\n    'connector.topic' ='persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftest_flink_sql',\n    'connector.service-url' ='pulsar:\u002F\u002Fxxx',\n    'connector.admin-url' ='http:\u002F\u002Fxxx',\n    'connector.startup-mode' ='earliest',\n    'connector.properties.0.key' ='pulsar.reader.readerName',\n    'connector.properties.0.value' ='testReaderName',\n    'format.type' ='json',\n    'update-mode' ='append'\n);\n\n",[4926,78870,78868],{"__ignoreMap":18},[48,78872,78873],{},"SQL in Pulsar Flink Connector 2.7.0:",[8325,78875,78878],{"className":78876,"code":78877,"language":8330},[8328],"\ncreate table topic1(\n    `rip` VARCHAR,\n    `rtime` VARCHAR,\n    `uid` bigint,\n    `client_ip` VARCHAR,\n    `day` as TO_DATE(rtime),\n    `hour` as date_format(rtime,'HH')\n) with (\n    'connector' ='pulsar',\n    'topic' ='persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftest_flink_sql',\n    'service-url' ='pulsar:\u002F\u002Fxxx',\n    'admin-url' ='http:\u002F\u002Fxxx',\n    'scan.startup.mode' ='earliest',\n    'properties.pulsar.reader.readername' = 'testReaderName',\n    'format' ='json');\n\n",[4926,78879,78877],{"__ignoreMap":18},[32,78881,3560],{"id":78882},"api",[48,78884,78885],{},"From an API perspective, we adjusted some classes and enabled easier customization.",[321,78887,78888,78891],{},[324,78889,78890],{},"To solve serialization issues, we changed the signature of the construction method FlinkPulsarSink, and added PulsarSerializationSchema.",[324,78892,78893],{},"We removed inappropriate classes related to row, such as FlinkPulsarRowSink, FlinkPulsarRowSource. If you need to deal with Row format, you can use Flink Row related serialization components.",[48,78895,78896],{},"You can build PulsarSerializationSchema by using PulsarSerializationSchemaWrapper.Builder. TopicKeyExtractor is moved into PulsarSerializationSchemaWrapper. When you adjust your API, you can take the following sample as reference.",[8325,78898,78901],{"className":78899,"code":78900,"language":8330},[8328],"\nnew PulsarSerializationSchemaWrapper.Builder\u003C>(new SimpleStringSchema())\n                .setTopicExtractor(str -> getTopic(str))\n                .build();\n\n",[4926,78902,78900],{"__ignoreMap":18},[40,78904,78906],{"id":78905},"future-plan","Future Plan",[48,78908,78909],{},"Today, we are designing a batch and stream solution integrated with Pulsar Source, based on the new Flink Source API (FLIP-27). The new solution will unlock limitations of the current streaming source interface (SourceFunction) and simultaneously to unify the source interfaces between the batch and streaming APIs.",[48,78911,78912],{},"Pulsar offers a hierarchical architecture where data is divided into streaming, batch, and cold data, which enables Pulsar to provide infinite capacity. This makes Pulsar an ideal solution for unified batch and streaming.",[48,78914,78915],{},"The batch and stream solution based on the new Flink Source API is divided into two simple parts: SplitEnumerator and Reader. SplitEnumerator discovers and assigns partitions, and Reader reads data from the partition.",[48,78917,78918],{},[384,78919],{"alt":78920,"src":78921},"future plan illustration","\u002Fimgs\u002Fblogs\u002F63a394fc7d7fba708f992278_pulsar-flink-batch-stream.png",[48,78923,78924,78925,190],{},"Pulsar stores messages in the ledger block, and you can locate the ledgers through Pulsar admin, and then provide broker partition, BookKeeper partition, Offloader partition, and other information through different partitioning policies. For more details, refer to ",[55,78926,78927],{"href":78927,"rel":78928},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink\u002Fissues\u002F187",[264],[40,78930,2125],{"id":2122},[48,78932,78933,78934,78939,78940,190],{},"Pulsar Flink Connector 2.7.0 is released and we strongly encourage everyone to use Pulsar Flink Connector 2.7.0. The new version is more user-friendly and is enabled with various features in Pulsar 2.7 and Flink 1.12. We’ll contribute Pulsar Flink Connector 2.7.0 to ",[55,78935,78938],{"href":78936,"rel":78937},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fflink\u002F",[264],"Flink repository",". If you have any concern on Pulsar Flink Connector, feel free to open issues in ",[55,78941,78942],{"href":78942,"rel":78943},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink\u002Fissues",[264],{"title":18,"searchDepth":19,"depth":19,"links":78945},[78946,78947,78948,78958,78962,78963],{"id":78664,"depth":19,"text":78665},{"id":19189,"depth":19,"text":19190},{"id":78713,"depth":19,"text":78714,"children":78949},[78950,78951,78952,78953,78955,78957],{"id":78723,"depth":279,"text":78724},{"id":78739,"depth":279,"text":78740},{"id":78761,"depth":279,"text":78762},{"id":78790,"depth":279,"text":78954},"Support new source interface and Table API introduced in FLIP-27 and FLIP-95",{"id":78807,"depth":279,"text":78956},"Support SQL read and write metadata as described in FLIP-107",{"id":78824,"depth":279,"text":78825},{"id":78834,"depth":19,"text":32622,"children":78959},[78960,78961],{"id":78840,"depth":279,"text":78841},{"id":78882,"depth":279,"text":3560},{"id":78905,"depth":19,"text":78906},{"id":2122,"depth":19,"text":2125},"2020-12-24","Learn the most interesting and major features about Pulsar Flink Connector 2.7.0.","\u002Fimgs\u002Fblogs\u002F63d795f7546798cda451dd24_63a394611b7271d862261287_top-flink.webp",{},"\u002Fblog\u002Fwhats-new-in-pulsar-flink-connector-2-7-0",{"title":76999,"description":78965},"blog\u002Fwhats-new-in-pulsar-flink-connector-2-7-0",[302,28572],"smXXGJAb1lSXxhSSLuHKMlxzbVXjEvOslN9wSdEeTFE",{"id":78974,"title":78975,"authors":78976,"body":78977,"category":821,"createdAt":290,"date":79088,"description":79089,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":79090,"navigation":7,"order":296,"path":79091,"readingTime":20144,"relatedResources":290,"seo":79092,"stem":79093,"tags":79094,"__hash__":79095},"blogs\u002Fblog\u002Fcloud-native-apache-pulsar-2-7-supports-transactions-and-azure-blob-storage-offloader.md","Cloud-Native Apache Pulsar 2.7 Supports Transactions and Azure Blob Storage Offloader",[808,78659],{"type":15,"value":78978,"toc":79082},[78979,78985,78991,78994,79020,79022,79024,79026,79030,79033,79036,79038,79042,79045,79048,79051,79054,79061,79068,79074],[48,78980,78981],{},[384,78982],{"alt":78983,"src":78984},"head img","\u002Fimgs\u002Fblogs\u002F63a393e4690dcd53f9bb4741_270-top.jpeg",[48,78986,78987,78990],{},[55,78988,821],{"href":50964,"rel":78989},[264]," is a cloud-native and distributed messaging and streaming platform originally created in Yahoo! and now a top-level Apache project. The latest 2.7 version supports transactions, Azure Blob storage offloader, topic-level policy, and more. The new version enables event streaming applications to consume, process, and produce messages in one atomic operation and also allows Pulsar users to offload their historical data to Azure Cloud.",[48,78992,78993],{},"Main features in the new release include:",[1666,78995,78996,78999,79002,79004,79011,79014,79017],{},[324,78997,78998],{},"Pulsar transactions",[324,79000,79001],{},"Azure Blob Storage Offloader",[324,79003,78451],{},[324,79005,79006,79007,79010],{},"Upgrade of ",[55,79008,862],{"href":23555,"rel":79009},[264]," to version 4.12",[324,79012,79013],{},"OAuth2 authentication",[324,79015,79016],{},"Native protobuf Schema",[324,79018,79019],{},"30+ Pulsar Functions Enhancement……",[40,79021,9144],{"id":53272},[48,79023,78398],{},[48,79025,78401],{},[40,79027,79029],{"id":79028},"azure-blob-storage-offloader","Azure Blob storage offloader",[48,79031,79032],{},"Pulsar 2.7.0 supports Azure Blob storage offloader. With this offloader, users can offload their historical data to Azure Blob Storage. It greatly benefits Azure Cloud users, and effectively reduces the cost of managing massive historical data in BookKeeper. Pulsar will add more support on Azure Cloud in the upcoming releases.",[40,79034,79035],{"id":42965},"Topic-level Policy",[48,79037,78454],{},[40,79039,79041],{"id":79040},"deep-dive-on-pulsar-and-kafka-benchmark","Deep Dive on Pulsar and Kafka Benchmark",[48,79043,79044],{},"CSDN spoke with Penghui Li, an Apache Pulsar PMC, about the Pulsar benchmark report they published recently.",[48,79046,79047],{},"Question: You recently wrote Benchmarking Pulsar and Kafka, why do you want to conduct the benchmark?",[48,79049,79050],{},"Penghui Li: This year, Confluent ran a benchmark to evaluate how Kafka, Pulsar, and RabbitMQ compare in terms of throughput and latency. According to Confluent, Kafka was the \"fastest\" in all scenarios. Given our knowledge of Pulsar's capabilities, this did not seem accurate.",[48,79052,79053],{},"For the community, we have already met many users who hope to get official benchmark results for reference, and even the performance comparison with other messaging systems. So we think this is also an opportunity to push us to do this. So we set out to repeat the benchmark.",[48,79055,79056,79057,190],{},"Taking a deeper look at Confluent's benchmark, we noticed a number of issues with the setup, framework, and methodology. We identified and fixed these issues and also added additional test parameters that would provide insights on more real-world use cases. You can read the ",[55,79058,79060],{"href":79059},"\u002Fwhitepaper\u002Fbenchmark-pulsar-vs-kafka","full benchmark",[48,79062,79063,79064,190],{},"Although in the test results, Pulsar is better than Kafka in many aspects of latency. But we still think that this cannot cover all user scenarios. Different physical resource environments may get completely different results, we also recommend that users have a better understanding of Pulsar's design and performance-related knowledge, this will allow Pulsar to perform better in a real environment. We have published a whitepaper which introduces many aspects of the performance tuning of Pulsar. You can read the ",[55,79065,79067],{"href":79066},"\u002Fwhitepaper\u002Ftaking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning","full whitepaper",[48,79069,79070,79071,190],{},"High performance is only one aspect of Pulsar. Pulsar has advanced architecture, better scalability, and easy operations and maintenance. We sincerely invite you to download Pulsar and try it out, and you will have a better understanding of Pulsar. To download the Apache Pulsar 2.7.0, click ",[55,79072,267],{"href":53730,"rel":79073},[264],[48,79075,79076,79077,79081],{},"For more information on the new release, check out the ",[55,79078,23976],{"href":79079,"rel":79080},"http:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F",[264]," on Pulsar website.",{"title":18,"searchDepth":19,"depth":19,"links":79083},[79084,79085,79086,79087],{"id":53272,"depth":19,"text":9144},{"id":79028,"depth":19,"text":79029},{"id":42965,"depth":19,"text":79035},{"id":79040,"depth":19,"text":79041},"2020-12-03","Learn the most interesting and major features in Pulsar 2.7.0.",{},"\u002Fblog\u002Fcloud-native-apache-pulsar-2-7-supports-transactions-and-azure-blob-storage-offloader",{"title":78975,"description":79089},"blog\u002Fcloud-native-apache-pulsar-2-7-supports-transactions-and-azure-blob-storage-offloader",[302,821,9144],"t8efA-_a9RgrPtdql38eO_ljJY3MwLtSyMfeh2xj2ac",{"id":79097,"title":76975,"authors":79098,"body":79100,"category":821,"createdAt":290,"date":79610,"description":79611,"extension":8,"featured":294,"image":79612,"isDraft":294,"link":290,"meta":79613,"navigation":7,"order":296,"path":79614,"readingTime":31039,"relatedResources":290,"seo":79615,"stem":79616,"tags":79617,"__hash__":79618},"blogs\u002Fblog\u002Fpowering-federated-learning-tencent-with-apache-pulsar.md",[79099],"Chao Zhang",{"type":15,"value":79101,"toc":79581},[79102,79104,79107,79110,79114,79226,79229,79232,79240,79243,79249,79252,79255,79258,79261,79264,79267,79270,79273,79275,79278,79281,79284,79292,79295,79306,79312,79314,79317,79323,79326,79329,79332,79343,79346,79349,79355,79362,79367,79370,79373,79376,79379,79382,79385,79391,79394,79397,79402,79408,79413,79419,79422,79425,79434,79440,79443,79446,79457,79460,79463,79466,79472,79480,79483,79486,79492,79495,79498,79504,79507,79513,79516,79519,79522,79526,79529,79533,79536,79540,79543,79545,79548,79552,79560,79564,79567,79569,79572,79575,79578],[40,79103,46],{"id":42},[48,79105,79106],{},"Tencent Angel PowerFL is a distributed federated learning platform which can support trillions of concurrent training. Angel PowerFL has been widely used in Tencent Financial Cloud, Advertising Joint Modeling, and other businesses. The platform requires a stable and reliable messaging system with guaranteed high performance and data privacy. After investigating different solutions and comparing several messaging queues, Angel PowerFL adopted Apache Pulsar as the data synchronization solution in Federated Learning (FL).",[48,79108,79109],{},"In this blog, the Tencent Angel PowerFL team shares how they built federated communication based on Pulsar, the challenges they encountered with Pulsar, and how they solved those problems and contributed to the Pulsar community. Tencent’s use of Pulsar in production has demonstrated it provides the stability, reliability, and scalability that the machine learning platform requires.",[40,79111,79113],{"id":79112},"content","CONTENT",[321,79115,79116,79122,79128,79134,79140,79146,79152,79158,79163,79168,79174,79180,79186,79192,79198,79204,79210,79215,79220],{},[324,79117,79118],{},[55,79119,79121],{"href":79120},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#about-tencent-angel-powerfl","About Tencent Angel PowerFL",[324,79123,79124],{},[55,79125,79127],{"href":79126},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#requirements-for-communication-services","Requirements for Communication Services",[324,79129,79130],{},[55,79131,79133],{"href":79132},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#stable-and-reliable","Stable and reliable",[324,79135,79136],{},[55,79137,79139],{"href":79138},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#high-throughput-and-low-latency","High throughput and low latency",[324,79141,79142],{},[55,79143,79145],{"href":79144},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#data-privacy","Data privacy",[324,79147,79148],{},[55,79149,79151],{"href":79150},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#why-apache-pulsar","Why Apache Pulsar",[324,79153,79154],{},[55,79155,79157],{"href":79156},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#layered-and-segment-centric-architecture","Layered and Segment-Centric Architecture",[324,79159,79160],{},[55,79161,43576],{"href":79162},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#geo-replication",[324,79164,79165],{},[55,79166,75379],{"href":79167},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#scalability",[324,79169,79170],{},[55,79171,79173],{"href":79172},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#federated-communication-solution-based-on-apache-pulsar","Federated Communication Solution Based on Apache Pulsar",[324,79175,79176],{},[55,79177,79179],{"href":79178},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#remove-dependency-on-global-zookeeper","Remove dependency on Global ZooKeeper",[324,79181,79182],{},[55,79183,79185],{"href":79184},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#add-token-authentication-for-client","Add token authentication for client",[324,79187,79188],{},[55,79189,79191],{"href":79190},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#enable-topic-automatic-recycle-in-multi-cluster","Enable topic automatic recycle in multi-cluster",[324,79193,79194],{},[55,79195,79197],{"href":79196},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#enable-topic-throttling","Enable topic throttling",[324,79199,79200],{},[55,79201,79203],{"href":79202},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#configure-topic-unloading","Configure topic unloading",[324,79205,79206],{},[55,79207,79209],{"href":79208},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#pulsar-on-kubernetes","Pulsar on Kubernetes",[324,79211,79212],{},[55,79213,75130],{"href":79214},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#future-plans",[324,79216,79217],{},[55,79218,2125],{"href":79219},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#conclusion",[324,79221,79222],{},[55,79223,79225],{"href":79224},"\u002Fblog\u002Fcase\u002F2020-11-26-tencent-angel\u002F#special-thanks","Special Thanks",[40,79227,79121],{"id":79228},"about-tencent-angel-powerfl",[48,79230,79231],{},"Federated learning (FL) is a machine-learning technique that trains statistical models across multiple decentralized edge devices, servers, or siloed data centers while keeping data localized. These decentralized devices collaboratively learn a shared prediction model while keeping the training data on the device instead of requiring the data to be uploaded and stored on a central server. As a result, organizations like the financial industry and hospitals that are required to operate under strict privacy constraints can also participate in model training.",[48,79233,79234,79239],{},[55,79235,79238],{"href":79236,"rel":79237},"https:\u002F\u002Fgithub.com\u002FAngel-ML",[264],"Angel"," is a distributed machine-learning platform based on the philosophy of Parameter Servers (similar to a database, this is a core part of machine learning applications that is used to store the parameters of a machine learning model and to serve them to clients). Angel is tuned for performance with big data from Tencent and has gained a wide range of applicability and stability, demonstrating increasing advantages in handling higher-dimension models.",[48,79241,79242],{},"Tencent Angel PowerFL is built based on the Angel machine learning platform. Angel Parameter Server (Angel-PS) can support trillions of models that are training concurrently, so Angel PowerFL migrates computing from Worker (a logic component that processes a received task on a different thread, and gives you feedback via a call back method) to the Parameter Server (PS). Angel PowerFL provides basic operation interfaces such as computing, encryption, storage, and state synchronization for the federated learning algorithm, and it coordinates participants with process scheduler models. Angel PowerFL has been widely used in Tencent Financial Cloud, Tencent Advertising Joint Modeling, and other businesses.",[48,79244,79245],{},[384,79246],{"alt":79247,"src":79248}," illustration powerFL Scheduler","\u002Fimgs\u002Fblogs\u002F63a3919f6e8eb4d281ec8e69_figure1.png",[40,79250,79127],{"id":79251},"requirements-for-communication-services",[48,79253,79254],{},"During the federated training sessions, participants transfer a large amount of encrypted data via the communication model. Consequently, the Angel PowerFL platform requires a stable and reliable messaging system that provides high performance and ensures data privacy.",[32,79256,79133],{"id":79257},"stable-and-reliable",[48,79259,79260],{},"The federated learning tasks last from minutes to hours. The learning algorithms require accurate data, and the peak of data transmission varies for different algorithms. So we need stable and robust communication model services in order to avoid data loss.",[32,79262,79139],{"id":79263},"high-throughput-and-low-latency",[48,79265,79266],{},"Angel PowerFL processes computing with Spark. The concurrent execution of Executors, which are processes that run computations and store data for your application, generates a lot of intermediate data. To transmit the encrypted data to other parties efficiently, the communication model must support low latency and high throughput.",[32,79268,79145],{"id":79269},"data-privacy",[48,79271,79272],{},"Our participants in federated learning are distributed in different companies. Although all data is encrypted with the encryption model, transmitting them on a public network poses risks. As a result, we need a secure and robust communication model to protect data from being attacked in the public network.",[40,79274,79151],{"id":50969},[48,79276,79277],{},"When we were researching solutions for federated communication services, we considered an RPC (Remote Procedure Call) direct connection, HDFS (Hadoop Distributed File System) synchronization, and MQ (messaging queue) synchronization. Since we have high requirements for security and performance, we decided to adopt the MQ synchronization solution. Several MQ options such as Apache Pulsar, Kafka, RabbitMQ, and TubeMQ were available. We consulted the MQ team from the Tencent Data Platform Department, who recommended Pulsar. Then, we conducted further research on Pulsar and found that the built-in features of Pulsar perfectly met our requirements for the messaging system.",[48,79279,79280],{},"Below, we summarize the points as to why Pulsar is the best fit for our federated communication.",[32,79282,79157],{"id":79283},"layered-and-segment-centric-architecture",[48,79285,79286,79287,79291],{},"Apache Pulsar is a cloud-native distributed messaging and event-streaming platform that adopts layered architecture and decouples computing from storage. An Apache Pulsar cluster is composed of two layers: a stateless serving layer and a stateful storage layer. The serving layer consists of a set of brokers that receive and deliver messages, and the storage layer consists of a set of ",[55,79288,862],{"href":79289,"rel":79290},"http:\u002F\u002Fbookkeeper.apache.org\u002F",[264]," storage nodes called bookies that store messages durably.",[48,79293,79294],{},"Compared to traditional messaging systems such as RabbitMQ and Kafka, Pulsar has a unique and differentiated architecture. Some unique aspects of Pulsar’s architecture include:",[321,79296,79297,79300,79303],{},[324,79298,79299],{},"Separate brokers from bookies and allow for independent scalability and fault tolerance, thus improving system availability.",[324,79301,79302],{},"With segment-based storage architecture and tiered storage, data is evenly distributed and balanced across all bookies and the capacity is not limited by a single bookie node.",[324,79304,79305],{},"BookKeeper is secure and reliable, ensuring no data loss. In addition, BookKeeper supports batch flashing and higher throughput.",[48,79307,79308],{},[384,79309],{"alt":79310,"src":79311},"illustration segment centric Architecture","\u002Fimgs\u002Fblogs\u002F63a3919ffe698bf516a92fe4_figure2.png",[32,79313,43576],{"id":30199},[48,79315,79316],{},"Pulsar provides built-in geo-replication for replicating data synchronously or asynchronously among multiple data centers, permitting us to restrict replication selectively. By default, messages are replicated to all clusters configured for the namespace. If we want to replicate messages to some specified clusters, we can specify a replication list.",[48,79318,79319],{},[384,79320],{"alt":79321,"src":79322},"illustration Pulsar Message Architecture ","\u002Fimgs\u002Fblogs\u002F63a3919fba4f90322863f07d_figure3.png",[48,79324,79325],{},"In the above figure, whenever P1, P2, and P3 producers publish messages to the T1 topic in Cluster-A, Cluster-B, and Cluster-C clusters respectively, those messages are instantly replicated across clusters. Once Pulsar replicates the messages, C1 and C2 consumers can consume those messages from their respective clusters.",[32,79327,75379],{"id":79328},"scalability",[48,79330,79331],{},"With the segment-based storage architecture, Pulsar divides the topic partition into smaller blocks called fragments. Each segment stores data as an Apache BookKeeper ledger, and the set of segments constituting the partition is distributed in the Apache BookKeeper cluster. This design makes it easier to manage capacity and scalability, and it meets our demand for high throughput. Let’s take a closer look at these elements:",[321,79333,79334,79337,79340],{},[324,79335,79336],{},"Easy to manage capacity: The capacity of the topic partition can be scaled to the entire BookKeeper cluster without being limited by the capacity of a single node.",[324,79338,79339],{},"Easy to scale out: We do not need to rebalance or replicate data for scaling. When a new bookie node is added, it is used only for the new segment or its replica. Moreover, Pulsar rebalances the segment distribution and the traffic in the cluster.",[324,79341,79342],{},"High throughput: The write traffic is distributed in the storage layer, so no partition write competes for the resources of a single node. Apache Pulsar’s multi-layer architecture and decoupling of the computing and storage layers provides stability, reliability, scalability, and high performance. Additionally, its built-in geo-replication enables us to synchronize messaging queues among parties across different companies. Finally, Pulsar’s authentication and authorization help ensure data privacy in transmission. These are all required features for Angel PowerFL and are why we decided to adopt Apache Pulsar in the Angel PowerFL platform.",[40,79344,79173],{"id":79345},"federated-communication-solution-based-on-apache-pulsar",[48,79347,79348],{},"In Angel PowerFL, we identify each business as a Party, and each Party has a unique ID, such as 10000\u002F20000. Those Parties are distributed in different departments of the same company (without network isolation) or in different companies (across public networks). Data from each Party is synchronized via Pulsar geo-replication. The following is our communication services design based on Apache Pulsar.",[48,79350,79351],{},[384,79352],{"alt":79353,"src":79354}," image Angel PowerFL Communication Services Based on Pulsar","\u002Fimgs\u002Fblogs\u002F63a3919f944d6a2049ceafa5_figure4.png",[48,79356,79357,79358,79361],{},"The FL training tasks are connected to the Pulsar cluster of the Party by the producer and consumer of the message. The cluster name follows the fl-pulsar-",[2628,79359,79360],{},"partyID"," pattern. After the training task generates intermediate data, the producer sends the data to the local Pulsar cluster, and then the Pulsar cluster sends data to the consuming Party via the Pulsar proxy synchronous replication network. The consumer of the consuming Party monitors the training topic, consumes data, and processes it.",[48,79363,79364],{},[384,79365],{"alt":758,"src":79366},"\u002Fimgs\u002Fblogs\u002F63a3919f7ccf63732dd94ad5_figure5.png",[48,79368,79369],{},"Figure 5: Angel PowerFL Federated Communication Data Streaming",[48,79371,79372],{},"During training, the driver and each partition create a channel variable, which maps a specific topic in Pulsar. The producer sends all exchange data to the topic.",[48,79374,79375],{},"Angel PowerFL supports multi-party federation, so data will be replicated synchronously in more than two clusters. Each FL task specifies participants in the task parameter, and the producer ensures data only transmits between participating Parties by calling the setReplicationClusters interface.",[48,79377,79378],{},"We make full use of Pulsar geo-replication, topic throttling, and token authentication in Angel PowerFL communication model. Next, I’ll introduce how we adopt Pulsar in Angle PowerFL in detail.",[32,79380,79179],{"id":79381},"remove-dependency-on-global-zookeeper",[48,79383,79384],{},"In Angel PowerFL platform, we rely on Local ZooKeeper and Global ZooKeeper to deploy a Pulsar cluster. Local ZooKeeper is used to store metadata, similar to the method used in Kafka. Global ZooKeeper shares configuration information among multiple Pulsar clusters.",[48,79386,79387],{},[384,79388],{"alt":79389,"src":79390}," illustration Pulsar Cluster","\u002Fimgs\u002Fblogs\u002F63a3919fba4f90272d63f07e_figure6.png",[48,79392,79393],{},"Every time we add a new Party to Angel PowerFL, we have to deploy a sub-node for Global ZooKeeper or share the public ZooKeeper among different companies or regions. Consequently, adding a new Party makes it more difficult to deploy a cluster and protect data from being attacked.",[48,79395,79396],{},"The metadata stored in Global ZooKeeper include cluster name, service address, namespace permissions, and so on. Pulsar supports creating and adding new clusters. We register the federated Pulsar clusters to the local ZooKeeper in the following steps, thereby removing dependency on Global ZooKeeper.",[321,79398,79399],{},[324,79400,79401],{},"Step 1: Register the Pulsar cluster for the newly added Party",[8325,79403,79406],{"className":79404,"code":79405,"language":8330},[8328],"\n# OTHER_CLUSTER_NAME is the Pulsar cluster name of the Party to be registered\n# OTHER_CLUSTER_BROKER_URL is the broker address of the Pulsar cluster\n\n.\u002Fbin\u002Fpulsar-admin clusters create ${OTHER_CLUSTER_NAME} \\\n    --url http:\u002F\u002F${OTHER_CLUSTER_HTTP_URL} \\\n    --broker-url pulsar:\u002F\u002F${OTHER_CLUSTER_BROKER_URL}\n\n",[4926,79407,79405],{"__ignoreMap":18},[321,79409,79410],{},[324,79411,79412],{},"Step 2: Authorize the namespace used for training to access the cluster",[8325,79414,79417],{"className":79415,"code":79416,"language":8330},[8328],"\n.\u002Fbin\u002Fpulsar-admin namespaces set-clusters fl-tenant\u002F${namespace} \\\n     -clusters ${LOCAL_CLUSTR_NAME},${OTHER_CLUSTER_NAME}\n\n",[4926,79418,79416],{"__ignoreMap":18},[48,79420,79421],{},"We register the newly added Party with its Pulsar cluster name\u002Fservice address, and replicate data synchronously with the registration information through geo-replication.",[32,79423,79185],{"id":79424},"add-token-authentication-for-client",[48,79426,79427,79428,79433],{},"As the communication model of Angel PowerFL, Pulsar has no permission control on the user level. To ensure the client produces and consumes data securely, we add token authentication according to ",[55,79429,79432],{"href":79430,"rel":79431},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fsecurity-jwt\u002F#token-authentication-overview",[264],"Pulsar Client authentication using tokens based on JSON Web Tokens",". Then, we need to configure the service address of the current Party and admin token for training tasks. Since Angel PowerFL is deployed on Kubernetes, we generate the Public\u002FPrivate keys required by the Pulsar cluster in the container and then register them to K8S secret.",[8325,79435,79438],{"className":79436,"code":79437,"language":8330},[8328],"\n# generate fl-private.key and fl-public.key\ndocker run --rm -v \"$(pwd)\":\u002Ftmp \\\n     apachepulsar\u002Fpulsar-all:2.5.2 \\\n     \u002Fpulsar\u002Fbin\u002Fpulsar tokens create-key-pair --output-private-key \\\n     \u002Ftmp\u002Ffl-private.key --output-public-key \u002Ftmp\u002Ffl-public.key \n\n# generate `admin-token.txt token` file\necho -n `docker run --rm -v \\\n     \"$(pwd)\":\u002Ftmp apachepulsar\u002Fpulsar-all:2.5.2 \\\n     \u002Fpulsar\u002Fbin\u002Fpulsar tokens create --private-key \\\n     file:\u002F\u002F\u002Ftmp\u002Ffl-private.key --subject admin`\n# register authentication to K8S\nkubectl create secret generic token-symmetric-key \\\n     --from-file=TOKEN=admin-token.txt \\\n     --from-file=PUBLICKEY=fl-public.key -n ${PARTY_NAME}\n\n",[4926,79439,79437],{"__ignoreMap":18},[32,79441,79191],{"id":79442},"enable-topic-automatic-recycle-in-multi-cluster",[48,79444,79445],{},"When geo-replication is enabled for the Pulsar cluster, we cannot delete topics that are used with commands directly. Angel PowerFL training tasks are disposable, so we need to recycle those topics after usage and free space in time. So, we configure the brokerDeleteInactivetopicsEnabled parameter to recycle topics replicated through geo-replication and make sure that:",[321,79447,79448,79451,79454],{},[324,79449,79450],{},"The topic is not connected to any producer or consumer.",[324,79452,79453],{},"The topic is not subscribed.",[324,79455,79456],{},"The topic has no message retention. We recycle topics automatically in Pulsar clusters every three hours by configuring the brokerDeleteInactivetopicsEnabled and brokerDeleteInactivetopicsFrequencySeconds parameters.",[32,79458,79197],{"id":79459},"enable-topic-throttling",[48,79461,79462],{},"During federated training, the data traffic peaks vary for different data sets, algorithms, and execution. The largest data volume of a task in the production environment is over 200G\u002Fh. If Pulsar is disconnected or an exception occurs in the production or consumption process, we have to restart the whole training process.",[48,79464,79465],{},"To reduce this risk, we adopted Pulsar throttling. Pulsar supports message-rate and byte-rate throttling policies on the producer side. Message-rate throttling limits the number of messages produced per second, and byte-rate throttling limits the size of messages produced per second. In Angel PowerFL, we set message size as 4M and limit the number of messages to 30 for namespace through message-rate throttling (under 30*4 = 120 M\u002Fs).",[8325,79467,79470],{"className":79468,"code":79469,"language":8330},[8328],"\n.\u002Fbin\u002Fpulsar-admin namespaces set-publish-rate fl-tenant\u002F${namespace} -m 30\n\n",[4926,79471,79469],{"__ignoreMap":18},[48,79473,79474,79475,190],{},"When we tested on message-rate throttling initially, it did not work well. After debugging with the MQ team from Tencent Data Platform Department, we found that the throttling did not take effect if we configured the topicPublisherThrottlingTickTimeMillis parameter. Then, we enabled the precise topic publishing rate throttling on the broker side and contributed this improvement to the Pulsar community. For details, refer to ",[55,79476,79479],{"href":79477,"rel":79478},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7078",[264],"PR-7078: introduce precise topic publish rate limiting",[32,79481,79203],{"id":79482},"configure-topic-unloading",[48,79484,79485],{},"Pulsar assigns topics to brokers dynamically based on the load of the brokers in the cluster. If the broker owning the topic crashes or is overloaded, the topic is reassigned to another broker immediately; this process is termed topic unloading. Topic unloading means to close the topic, release the ownership, and reassign the topic to a less-loaded broker. Topic unloading is adjusted by load balance, and the client will encounter slight jitter, which usually lasts for about 10 ms. However, when we started training at the early stage, a lot of connection exceptions occurred due to topic unloading. The following is a part of the log information.",[8325,79487,79490],{"className":79488,"code":79489,"language":8330},[8328],"\n[sub] Could not get connection to broker: topic is temporarily unavailable -- Will try again in 0.1 s\n\n",[4926,79491,79489],{"__ignoreMap":18},[48,79493,79494],{},"To resolve the issue, we further explored broker, namespace, bundle, and topic. Bundle is a fragmentation mechanism of the Pulsar namespace. The namespace is fragmented into a list of bundles, and each bundle contains a part of the hush of the namespace. Topics are not directly assigned to the broker. Instead, each topic is assigned to a specific bundle by the hush of the topic. The bundles are independent of each other and are assigned to different brokers.",[48,79496,79497],{},"We did not reuse the training topics at an early stage. To train an LR algorithm, 2,000+ topics were created, and the data load produced by each topic varied. We suspected that creating and using many topics in a short period would lead to an unbalanced load and frequent topic unloading. To reduce topic unloading, we adjusted the following parameters for the Pulsar bundle.",[8325,79499,79502],{"className":79500,"code":79501,"language":8330},[8328],"\n# increase the maximum number of topics that can be distributed by the broker\nloadBalancerBrokerMaxTopics=500000\n# enable automatic namespace bundle split\nloadBalancerAutoBundleSplitEnabled=true\n# increase the maximum number of topics that triggers bundle split\nloadBalancerNamespaceBundleMaxTopics=10000\n# increase the maximum number of messages that triggers bundle split\nloadBalancerNamespaceBundleMaxMsgRate=10000 \n\n",[4926,79503,79501],{"__ignoreMap":18},[48,79505,79506],{},"Meanwhile, we set the default number of bundles to 64 when creating a namespace.",[8325,79508,79511],{"className":79509,"code":79510,"language":8330},[8328],"\n.\u002Fbin\u002Fpulsar-admin namespaces create fl-tenant\u002F${namespace} --bundles 64\n\n",[4926,79512,79510],{"__ignoreMap":18},[48,79514,79515],{},"After adjusting the configuration, we solved the frequent topic unloading issue perfectly.",[32,79517,79209],{"id":79518},"pulsar-on-kubernetes",[48,79520,79521],{},"All services of Angel PowerFL are deployed on Kubernetes through Helm. As one of the charts, Pulsar leverages K8S resource isolation, scalability, and other advantages. When deploying Pulsar with Helm, we use Local Persistent Volume as storage, use NodeSelector in geo-replication, and configure useHostNameAsBookieID in bookies.",[3933,79523,79525],{"id":79524},"use-local-persistent-volume-as-storage","Use Local Persistent Volume as storage",[48,79527,79528],{},"Pulsar is sensitive to IO, especially the bookies. It is recommended to use SSD or separate disks in production environments. When we ran tasks with big data sets in Angel PowerFL, “No Bookies Available” exceptions occurred frequently due to high IO utility. With Local Persistent Volume, we mounted bookie, ZooKeeper, and other components to a separate disk and reduced IO competition. We tried to replace Pulsar PV storage with Ceph and NFS, and we found that the performance was best when using Local Persistent Volume.",[3933,79530,79532],{"id":79531},"use-nodeselector","Use NodeSelector",[48,79534,79535],{},"The broker needs to access the Pulsar proxy container of the other party while replicating data synchronically with geo-replication. In Angel PowerFL, we label the gateway machine separately and install the broker on the gateway machine that has access to the external network through NodeSelector.",[3933,79537,79539],{"id":79538},"configure-usehostnameasbookieid","Configure useHostNameAsBookieID",[48,79541,79542],{},"Bookie is stateful. We configure useHostNameAsBookieID after rebuilding the bookie pod, ensuring the ID registered on ZooKeeper is the hostname of the pod.",[40,79544,75130],{"id":75129},[48,79546,79547],{},"We’ve been using Apache Pulsar in Angel PowerFL for a year, and we’ve run Pulsar clusters in our production environment for over 8 months. It’s stable and reliable, and we’d like to upgrade our Pulsar cluster and improve Pulsar on K8S.",[32,79549,79551],{"id":79550},"upgrade-pulsar-to-26x","Upgrade Pulsar to 2.6.x",[48,79553,79554,79555,79559],{},"Currently, we are using Pulsar 2.5.2 and would like to backup Angel-PS failover recovery with Pulsar Key_Shared subscription mode. The Key_Shared subscription mode is enhanced in Pulsar 2.6.0 (",[55,79556,79557],{"href":79557,"rel":79558},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F5928",[264],"), so we hope to upgrade Pulsar to 2.6.x.",[32,79561,79563],{"id":79562},"support-multi-disk-mounting-for-pulsar-on-k8s","Support multi-disk mounting for Pulsar on K8S",[48,79565,79566],{},"All Angel PowerFL services are running on Kubernetes—except the YARN computing resources. As one of the charts, Pulsar is deployed with other services and uses Local Persistent Volume as storage. Currently, only one disk (directory) can be mounted on the bookie, so we could not make full use of machines with multiple disks. To address this need, we have a plan to mount multiple disks on the bookie.",[40,79568,2125],{"id":2122},[48,79570,79571],{},"I’ve introduced how we adopted Pulsar in the Angel PowerFL platform. We leverage Pulsar features and improve Pulsar functionalities and performance based on our demands.",[48,79573,79574],{},"As a cloud-native distributed messaging and event-streaming platform, Pulsar has many outstanding features and has been widely used in live broadcast and short video platforms, retail and e-commerce businesses, media, finance, and other industries. We believe that Pulsar’s adoption and community will continue to expand.",[40,79576,79225],{"id":79577},"special-thanks",[48,79579,79580],{},"Thanks to the MQ team at the Tencent Data Platform Department for their support and guidance. The MQ team is experienced in Apache Pulsar and TubeMQ and has made great contributions to the Apache Pulsar community. Apache Pulsar is a young, active community that enjoys rapid growth. We’d like to work with the Pulsar community, make contributions, and build a more thriving community.",{"title":18,"searchDepth":19,"depth":19,"links":79582},[79583,79584,79585,79586,79591,79596,79604,79608,79609],{"id":42,"depth":19,"text":46},{"id":79112,"depth":19,"text":79113},{"id":79228,"depth":19,"text":79121},{"id":79251,"depth":19,"text":79127,"children":79587},[79588,79589,79590],{"id":79257,"depth":279,"text":79133},{"id":79263,"depth":279,"text":79139},{"id":79269,"depth":279,"text":79145},{"id":50969,"depth":19,"text":79151,"children":79592},[79593,79594,79595],{"id":79283,"depth":279,"text":79157},{"id":30199,"depth":279,"text":43576},{"id":79328,"depth":279,"text":75379},{"id":79345,"depth":19,"text":79173,"children":79597},[79598,79599,79600,79601,79602,79603],{"id":79381,"depth":279,"text":79179},{"id":79424,"depth":279,"text":79185},{"id":79442,"depth":279,"text":79191},{"id":79459,"depth":279,"text":79197},{"id":79482,"depth":279,"text":79203},{"id":79518,"depth":279,"text":79209},{"id":75129,"depth":19,"text":75130,"children":79605},[79606,79607],{"id":79550,"depth":279,"text":79551},{"id":79562,"depth":279,"text":79563},{"id":2122,"depth":19,"text":2125},{"id":79577,"depth":19,"text":79225},"2020-11-26","The Tencent Angel PowerFL team shares how they built federated communication based on Pulsar, the challenges they encountered, and how they solved those problems and contributed to the Pulsar community.","\u002Fimgs\u002Fblogs\u002F63d7964c4a22f648f7f1e8b6_63a3919fa74e893f0e4026cf_tencent-angel-top.webp",{},"\u002Fblog\u002Fpowering-federated-learning-tencent-with-apache-pulsar",{"title":76975,"description":79611},"blog\u002Fpowering-federated-learning-tencent-with-apache-pulsar",[35559,821],"x3qE7QukQ8QsQCGzBjXiMmQx9pRjPT3awuvHIImix9E",{"id":79620,"title":79621,"authors":79622,"body":79624,"category":3550,"createdAt":290,"date":79719,"description":79720,"extension":8,"featured":294,"image":79721,"isDraft":294,"link":290,"meta":79722,"navigation":7,"order":296,"path":79723,"readingTime":11180,"relatedResources":290,"seo":79724,"stem":79725,"tags":79726,"__hash__":79727},"blogs\u002Fblog\u002Fstreamnative-alibaba-cloud.md","Apache Pulsar-as-a-Service Launches on Alibaba Cloud",[79623,69353],"Yang Yang",{"type":15,"value":79625,"toc":79713},[79626,79629,79632,79637,79640,79644,79652,79658,79662,79665,79668,79671,79674,79677,79684,79688,79691,79695,79698,79703,79706],[48,79627,79628],{},"StreamNative, a cloud-native event streaming company powered by Apache Pulsar, just announced its fully managed Apache Pulsar cloud offering, StreamNative Cloud, is now available on Alibaba Cloud.",[48,79630,79631],{},"Alibaba Cloud is the top cloud provider in Asia and this expansion will enable StreamNative to serve a new segment of customers. According to Sijie Guo, CEO and co-founder of StreamNative, and, also, one of the original developers of Apache Pulsar and Apache Bookkeeper, the Cloud Offering on the Alibaba Cloud is an opportunity to enable more Asian developers and organizations to try Pulsar.",[916,79633,79634],{},[48,79635,79636],{},"\"The launch of StreamNative Cloud on Alibaba Cloud will help accelerate the adoption of Pulsar across Asia by providing companies with access to operational expertise and management.\"",[48,79638,79639],{},"Sijie Guo, CEO, StreamNative",[40,79641,79643],{"id":79642},"growth-of-apache-pulsar","Growth of Apache Pulsar",[48,79645,79646,79647,79651],{},"Pulsar’s growth has skyrocketed globally since it became a top-level Apache Project in September 2018. According to ",[55,79648,45122],{"href":79649,"rel":79650},"https:\u002F\u002Fhub.docker.com\u002F",[264],", Apache Pulsar has more than 10 million downloads. On Github, the project is approaching 7,000 stars and currently has over 330 contributors. For context, the number of Github contributors has grown tenfold over the past two years. Adoption is largely driven by companies looking to build innovative applications and improve their existing systems with real-time streaming solutions.",[48,79653,79654],{},[384,79655],{"alt":79656,"src":79657},"graph growth in pulsar contriutors","\u002Fimgs\u002Fblogs\u002F63a3906315194c735c91328c_pulsar-growth.png",[40,79659,79661],{"id":79660},"pulsar-adoption-in-asia","Pulsar Adoption in Asia",[48,79663,79664],{},"Pulsar has had strong adoption in Asia since the launch of the project. In fact, Yahoo! JAPAN has been using Pulsar for nearly four years. Yahoo! JAPAN originally chose Pulsar to build out a new unified messaging platform. Today, Yahoo! JAPAN is using Pulsar as its messaging backbone to process hundreds of billions of messages every day.",[48,79666,79667],{},"Zhaopin.com, another early adopter, is a leading online recruitment platform based in China. Zhaopin.com has been leveraging Pulsar’s power and flexibility since 2018 and currently processes hundreds of billions of messages per day.",[48,79669,79670],{},"Adoption in Asia continues to accelerate and attract major players. One notable user is Tencent, a Chinese multinational conglomerate that markets various Internet-related services and products globally through its subsidiaries. Tencent built its transactional billing system, Midas, on Pulsar, demonstrating Pulsar's ability to handle mission-critical applications. Midas operates on a massive scale, processing more than 10 billion financial transactions and 10+ TBs of data daily.",[48,79672,79673],{},"Asia also represents a highly active and engaged segment of the Pulsar community. One recent example is China Mobile’s partnership with StreamNative to launch AMQP on Pulsar (AoP) earlier this year. AoP allows organizations using RabbitMQ (or other AMQP message brokers) to migrate existing applications and services to Pulsar without code modification. This new capability is one of many recent additions to the Pulsar ecosystem, expanding its ability to cover a broad range of real-time data needs. It also demonstrates how Pulsar’s active community is driving the continued development of the project.",[48,79675,79676],{},"Tencent, China Mobile, Yahoo! JAPAN, and Zhaopin.com are now among more than a hundred organizations using Pulsar in Asia across the telecom, internet, retail, e-commerce, finance, and IoT industries.",[48,79678,79679,79680,190],{},"Pulsar’s momentum in Asia continues with the first-ever Pulsar Summit Asia, hosted by StreamNative. Taking place on November 28th & 29th, the two-day event will feature more than 30 live sessions by tech leads, open-source developers, software engineers, and software architects from Splunk, Yahoo! JAPAN, TIBCO, China Mobile, Tencent, Dada Group, KingSoft Cloud, Tuya Smart, and PingCAP, and will include sessions on Pulsar use cases, its ecosystem, operations, and technology deep dives. You can sign up to attend the summit ",[55,79681,267],{"href":79682,"rel":79683},"https:\u002F\u002Fhopin.to\u002Fevents\u002Fpulsar-summit-asia-2020",[264],[40,79685,79687],{"id":79686},"why-alibaba-cloud","Why Alibaba Cloud",[48,79689,79690],{},"The decision to launch StreamNative Cloud on Alibaba Cloud was a strategic one. Alibaba Cloud is the largest public cloud vendor in Asia with Asia’s largest cloud network and is able to serve the rapidly growing Pulsar community in the region. As a fully managed, scalable messaging and event streaming service, StreamNative Cloud provides a turnkey solution for companies looking to build and launch event streaming applications in the cloud. The launch of StreamNative Cloud on Alibaba Cloud will streamline adoption of Pulsar for companies in this region.",[40,79692,79694],{"id":79693},"about-streamnative-cloud","About StreamNative Cloud",[48,79696,79697],{},"Built and operated by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative Cloud provides a scalable, resilient, and secure messaging and event streaming platform for enterprises. Weisheng Xie, the chief data scientist at Bestpay, is impressed with StreamNative’s offering.",[916,79699,79700],{},[48,79701,79702],{},"\"With StreamNative Cloud, we are now able to launch a resilient, secure, and scalable event streaming service within minutes. It’s straightforward and extremely easy to use, which greatly boosts the efficiency of our engineering team.\"",[48,79704,79705],{},"Weisheng (Vincent) Xie, Chief Data Scientist\u002FSenior Director, China Telecom Bestpay",[48,79707,79708,79709,190],{},"To get started on StreamNative Cloud on Alibaba Cloud, sign up ",[55,79710,267],{"href":79711,"rel":79712},"https:\u002F\u002Fconsole.cloud.streamnative.cn\u002F?defaultMethod=signup",[264],{"title":18,"searchDepth":19,"depth":19,"links":79714},[79715,79716,79717,79718],{"id":79642,"depth":19,"text":79643},{"id":79660,"depth":19,"text":79661},{"id":79686,"depth":19,"text":79687},{"id":79693,"depth":19,"text":79694},"2020-11-25","Announce the launch StreamNative Cloud on Alibaba Cloud. Help accelerate the adoption of Pulsar across Asia by providing companies with access to operational expertise and management.","\u002Fimgs\u002Fblogs\u002F63d796762a4dc23683bef79c_63a390630343a03ee031a8d2_top-alibaba-cloud.webp",{},"\u002Fblog\u002Fstreamnative-alibaba-cloud",{"title":79621,"description":79720},"blog\u002Fstreamnative-alibaba-cloud",[302,3550,12106,821],"4HVTbcaAbymEGN6aqpbRCNHEoguATd7tb9ZTXg2O2_8",{"id":79729,"title":79730,"authors":79731,"body":79732,"category":821,"createdAt":290,"date":80462,"description":80463,"extension":8,"featured":294,"image":80464,"isDraft":294,"link":290,"meta":80465,"navigation":7,"order":296,"path":80466,"readingTime":62820,"relatedResources":290,"seo":80467,"stem":80468,"tags":80469,"__hash__":80470},"blogs\u002Fblog\u002Fapache-pulsar-2-6-2.md","Apache Pulsar 2.6.2",[53434],{"type":15,"value":79733,"toc":80416},[79734,79737,79740,79742,79746,79749,79752,79760,79764,79767,79770,79777,79781,79784,79791,79795,79798,79835,79838,79845,79849,79852,79855,79863,79870,79874,79877,79883,79886,79893,79897,79900,79906,79909,79916,79920,79923,79926,79933,79937,79940,79943,79951,79958,79962,79965,79968,79971,79977,79980,79986,79993,79997,80000,80006,80009,80016,80018,80022,80025,80032,80036,80039,80042,80049,80053,80056,80059,80062,80069,80072,80076,80079,80082,80089,80093,80096,80099,80102,80105,80116,80123,80127,80131,80134,80137,80145,80152,80156,80159,80162,80169,80173,80176,80179,80186,80190,80193,80196,80216,80223,80226,80230,80233,80236,80243,80247,80250,80253,80260,80262,80266,80269,80276,80278,80282,80285,80292,80296,80299,80301,80315,80322,80326,80329,80332,80339,80343,80347,80350,80352,80360,80367,80369,80389,80391,80411],[48,79735,79736],{},"We are excited to see that the Apache Pulsar community has successfully released the 2.6.2 version after a lot of hard work. It is a great milestone for this fast-growing project and the Pulsar community. 2.6.2 is the result of a big effort from the community, with over 154 commits and a long list of improvements and bug fixes.",[48,79738,79739],{},"Here are some highlights and major features added in Pulsar 2.6.2.",[40,79741,61065],{"id":61064},[32,79743,79745],{"id":79744},"catch-throwable-when-starting-pulsar","Catch throwable when starting Pulsar",[48,79747,79748],{},"Before 2.6.2, Pulsar caught exceptions only when BrokerStarter.start() failed. Some errors such as NoSuchMethodError or NoClassDefFoundError could not be caught, and Pulsar was in abnormal status yet no error log was found in the log file.",[48,79750,79751],{},"In 2.6.2, we modify exceptions to use throwable to avoid this issue.",[48,79753,79754,79755,190],{},"For more information about implementation, see ",[55,79756,79759],{"href":79757,"rel":79758},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7221",[264],"PR-7221",[32,79761,79763],{"id":79762},"handle-subscriptionbusyexception-in-resetcursor-api","Handle SubscriptionBusyException in resetCursor API",[48,79765,79766],{},"In PersistentSubscription.resetCursor method, SubscriptionFencedException is thrown in several places, but it is not handled in PersistentTopicBase, so error messages are not clear.",[48,79768,79769],{},"In 2.6.2, we export SubscriptionBusyException in PersistentTopicBase for resetCursor, so error messages in the REST API are clear.",[48,79771,79754,79772,190],{},[55,79773,79776],{"href":79774,"rel":79775},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7335",[264],"PR-7335",[32,79778,79780],{"id":79779},"update-jersey-to-231","Update Jersey to 2.31",[48,79782,79783],{},"Before 2.6.1, Pulsar used the Jersey 2.27, which has security concerns. In Pulsar 2.6.2, we update the Jersey version to the latest stable version(2.31) to enhance security.",[48,79785,79754,79786,190],{},[55,79787,79790],{"href":79788,"rel":79789},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7515",[264],"PR-7515",[32,79792,79794],{"id":79793},"stop-to-dispatch-when-consumers-using-the-key_shared-subscription-stuck","Stop to dispatch when consumers using the Key_Shared subscription stuck",[48,79796,79797],{},"Consumers using the Key_Shared subscription would encounter disorder messages occasionally. The following are steps to reproduce the situation:",[1666,79799,79800,79803,79806,79809,79812,79815,79823,79826,79829,79832],{},[324,79801,79802],{},"Connect Consumer1 to Key_Shared subscription sub and stop to receive",[324,79804,79805],{},"receiverQueueSize: 500",[324,79807,79808],{},"Connect Producer and publish 500 messages with key (i % 10)",[324,79810,79811],{},"Connect Consumer2 to same subscription and start to receive",[324,79813,79814],{},"receiverQueueSize: 1",[324,79816,79817,79818,79822],{},"since ",[55,79819,79820],{"href":79820,"rel":79821},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7106",[264]," , Consumer2 can't receive (expected)",[324,79824,79825],{},"Producer publish more 500 messages with same key generation algorithm",[324,79827,79828],{},"After that, Consumer1 start to receive",[324,79830,79831],{},"Check Consumer2 message ordering",[324,79833,79834],{},"sometimes message ordering was broken in same key",[48,79836,79837],{},"In 2.6.2, when consumers use the Key_Shared subscription, Pulsar stops dispatching messages to consumers that are stuck on delivery to guarantee message order.",[48,79839,79754,79840,190],{},[55,79841,79844],{"href":79842,"rel":79843},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7553",[264],"PR-7553",[32,79846,79848],{"id":79847},"reestablish-namespace-bundle-ownership-from-false-negative-releasing-and-false-positive-acquiring","Reestablish namespace bundle ownership from false negative releasing and false positive acquiring",[48,79850,79851],{},"In acquiring\u002Freleasing namespace bundle ownership, ZooKeeper might be disconnected before or after these operations are persisted in the ZooKeeper cluster. It leads to inconsistency between the local ownership cache and ZooKeeper cluster.",[48,79853,79854],{},"In 2.6.2, we fix the issue with the following:",[321,79856,79857,79860],{},[324,79858,79859],{},"In ownership releasing, do not retain ownership in failure.",[324,79861,79862],{},"In ownership checking, querying and acquiring, reestablish the lost ownership in false negative releasing and false positive acquiring.",[48,79864,79754,79865,190],{},[55,79866,79869],{"href":79867,"rel":79868},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7773",[264],"PR-7773",[32,79871,79873],{"id":79872},"enable-users-to-configure-the-executor-pool-size","Enable users to configure the executor pool size",[48,79875,79876],{},"Before 2.6.2, the executor pool size in Pulsar was set to 20 when starting Pulsar services. Users could not configure the executor pool size.",[8325,79878,79881],{"className":79879,"code":79880,"language":8330},[8328],"\nprivate final ScheduledExecutorService executor = Executors.newScheduledThreadPool(20,\n           new DefaultThreadFactory(\"pulsar\"));\n\n",[4926,79882,79880],{"__ignoreMap":18},[48,79884,79885],{},"In 2.6.2, users can configure the executor pool size in the broker.conf file based on their needs.",[48,79887,79754,79888,190],{},[55,79889,79892],{"href":79890,"rel":79891},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7782",[264],"PR-7782",[32,79894,79896],{"id":79895},"add-replicated-check-for-checkinactivesubscriptions","Add replicated check for checkInactiveSubscriptions",[48,79898,79899],{},"After the replicated subscription is deleted by checkInactiveSubscriptions, replicated subscriptions are created with receiveSubscriptionUpdated. In this case, the position becomes the latest position.",[8325,79901,79904],{"className":79902,"code":79903,"language":8330},[8328],"\ntopic.createSubscription(update.getSubscriptionName(),\n        InitialPosition.Latest, true \u002F* replicateSubscriptionState *\u002F);\n\n",[4926,79905,79903],{"__ignoreMap":18},[48,79907,79908],{},"In 2.6.2, the replicated subscription is excluded from automatic deletion by fixing the PersistentTopic.",[48,79910,79754,79911,190],{},[55,79912,79915],{"href":79913,"rel":79914},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8066",[264],"PR-8066",[32,79917,79919],{"id":79918},"upgrade-jetty-util-version-to-9431","Upgrade jetty-util version to 9.4.31",[48,79921,79922],{},"Pulsar client depends on jetty-util. Jetty-util versions earlier than 9.4.30 contain known vulnerabilities.",[48,79924,79925],{},"In 2.6.2, we upgrade the jetty-util version to 9.4.31 to enhance security.",[48,79927,79754,79928,190],{},[55,79929,79932],{"href":79930,"rel":79931},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8035",[264],"PR-8035",[32,79934,79936],{"id":79935},"add-command-to-delete-a-clusters-metadata-from-zookeeper","Add command to delete a cluster's metadata from ZooKeeper",[48,79938,79939],{},"When we share the same ZooKeeper and BookKeeper cluster among multiple broker clusters, if a cluster was removed, its metadata in ZooKeeper were also removed.",[48,79941,79942],{},"In 2.6.2, we fix the issue in the following ways:",[321,79944,79945,79948],{},[324,79946,79947],{},"Add a PulsarClusterMetadataTeardown class to delete the relative nodes from ZooKeeper;",[324,79949,79950],{},"Wrap the class to bin\u002Fpulsar script.",[48,79952,79754,79953,190],{},[55,79954,79957],{"href":79955,"rel":79956},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8169",[264],"PR-8169",[32,79959,79961],{"id":79960},"replace-eventloop-with-threadpoolexecutor-to-improve-performance-instead-of-eventloop","Replace EventLoop with ThreadPoolExecutor to improve performance instead of EventLoop",[48,79963,79964],{},"In 2.6.2, we replace EventLoop with a native JDK thread pool(ThreadPoolExecutor) to improve performance.",[48,79966,79967],{},"The following is the test result with pulsar perf.",[48,79969,79970],{},"Before 2.6.1:",[8325,79972,79975],{"className":79973,"code":79974,"language":8330},[8328],"\nAggregated throughput stats --- 11715556 records received --- 68813.420 msg\u002Fs --- 537.605 Mbit\u002Fs\n\n",[4926,79976,79974],{"__ignoreMap":18},[48,79978,79979],{},"In 2.6.2：",[8325,79981,79984],{"className":79982,"code":79983,"language":8330},[8328],"\nAggregated throughput stats --- 18392800 records received --- 133314.602 msg\u002Fs --- 1041.520 Mbit\u002Fs\n\n",[4926,79985,79983],{"__ignoreMap":18},[48,79987,79754,79988,190],{},[55,79989,79992],{"href":79990,"rel":79991},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8208",[264],"PR-8208",[32,79994,79996],{"id":79995},"fix-deadlock-that-occurred-during-topic-ownership-check","Fix deadlock that occurred during topic ownership check",[48,79998,79999],{},"Some broker servers had deadlocks while splitting namespace bundles. When checking the thread dump of the broker, some threads were blocked in NamespaceService#getBundle().",[8325,80001,80004],{"className":80002,"code":80003,"language":8330},[8328],"\n\"pulsar-ordered-OrderedExecutor-7-0\" #34 prio=5 os_prio=0 tid=0x00007eeeab05a800 nid=0x81a5 waiting on condition [0x00007eeeafbd2000]\n  java.lang.Thread.State: WAITING (parking)\n       at sun.misc.Unsafe.park(Native Method)\n       - parking to wait for   (a java.util.concurrent.CompletableFuture$Signaller)\n       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)\n       at org.apache.pulsar.common.naming.NamespaceBundleFactory.getBundles(NamespaceBundleFactory.java:155)\n...\n \n",[4926,80005,80003],{"__ignoreMap":18},[48,80007,80008],{},"The reason for the issue is that the getBundle() method leads to deadlock in NamespaceService#isTopicOwned(). To fix the issue, we remove the getBundle() method. When isTopicOwned() returns false, the bundle metadata is cached and can be got asynchronously. When the client reconnects the next time, Pulsar returns the correct bundle metadata from the cache.",[48,80010,79754,80011,190],{},[55,80012,80015],{"href":80013,"rel":80014},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8406",[264],"PR-8406",[40,80017,68241],{"id":68240},[32,80019,80021],{"id":80020},"enable-users-to-configure-advertisedaddress-in-proxy","Enable users to configure advertisedAddress in proxy",[48,80023,80024],{},"Before 2.6.2, users could not configure advertisedAddress on the proxy side. In 2.6.2, users can configure advertisedAddress in proxy just as they do in Pulsar broker.",[48,80026,79754,80027,190],{},[55,80028,80031],{"href":80029,"rel":80030},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7542",[264],"PR-7542",[32,80033,80035],{"id":80034},"add-proxy-plugin-interface-to-support-user-defined-additional-servlet","Add proxy plugin interface to support user defined additional servlet",[48,80037,80038],{},"To enable users to access the broker flexibly, Pulsar provides plugins similar to broker protocol and broker interceptor. However, users could not access the proxy before 2.6.2.",[48,80040,80041],{},"To enable users to customize data requests in proxy, we add the protocol plugin for proxy in 2.6.2.",[48,80043,79754,80044,190],{},[55,80045,80048],{"href":80046,"rel":80047},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8067",[264],"PR-8067",[32,80050,80052],{"id":80051},"fix-the-null-exception-when-starting-the-proxy-service","Fix the null exception when starting the proxy service",[48,80054,80055],{},"When enabling the broker TLS and broker client authentication with OAuth2 plugin, the proxy service exits with an unexpected null exception.",[48,80057,80058],{},"The reason is that when initializing the flow, authentication is called, so the token client is not initialized before using.",[48,80060,80061],{},"In 2.6.2, we fix the null exception when starting the proxy service.",[48,80063,79754,80064,190],{},[55,80065,80068],{"href":80066,"rel":80067},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8019",[264],"PR-8019",[40,80070,80071],{"id":72442},"Java Client",[32,80073,80075],{"id":80074},"support-input-stream-for-truststore-cert","Support input-stream for trustStore cert",[48,80077,80078],{},"In 2.6.1, Pulsar supports dynamic cert loading by using input stream for TLS cert and key file. The feature is mainly used by container. However, container also requires dynamic loading for truststore certs and users cannot store trust-store cert into file-system.",[48,80080,80081],{},"In 2.6.2, Pulsar supports loading truststore cert dynamically using input stream.",[48,80083,79754,80084,190],{},[55,80085,80088],{"href":80086,"rel":80087},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7442",[264],"PR-7442",[32,80090,80092],{"id":80091},"avoid-subscribing-the-same-topic","Avoid subscribing the same topic",[48,80094,80095],{},"The current key of MultiTopicsConsumerImpl.topics is the topic name passed by the user. The topicNameValid method checks if the name is valid and topics doesn't contain the key.",[48,80097,80098],{},"However, if a multi-topic consumer subscribes a partition of a subscribed partitioned topic, subscribeAsync succeeds and a new ConsumerImpl of the same partition is created, which is redundant.",[48,80100,80101],{},"Also, if a multi-topic consumer subscribes public\u002Fdefault\u002Ftopic or persistent:\u002F\u002Fpublic\u002Fdefault\u002Ftopic, while the initial subscribed topic is topic, the redundant consumers would be created.",[48,80103,80104],{},"In 2.6.2, we fix the issue in the following ways to avoid subscribing the same topic again:",[321,80106,80107,80110,80113],{},[324,80108,80109],{},"Use the full topic name as key for MultiTopicsConsumerImpl.topics.",[324,80111,80112],{},"Check that both the full topic name and the full partitioned topic name do not exist in MultiTopicsConsumerImpl.topics when subscribeAsync is called.",[324,80114,80115],{},"Throw a different exception to a different topic is invalid and the topic is already subscribed",[48,80117,79754,80118,190],{},[55,80119,80122],{"href":80120,"rel":80121},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7823",[264],"PR-7823",[40,80124,80126],{"id":80125},"cpp-client","CPP Client",[32,80128,80130],{"id":80129},"wait-for-all-seek-operations-complete","Wait for all seek operations complete",[48,80132,80133],{},"When a partitioned consumer calls seek, it waits for only one partition's seek operation completion because each internal consumer calls callback(result) to complete the same promise.",[48,80135,80136],{},"In 2.6.2, we use the following methods to avoid this problem:",[321,80138,80139,80142],{},[324,80140,80141],{},"Add a MultiResultCallback implementation, the callback completes only when all N events complete successfully or one of N events fails.",[324,80143,80144],{},"Use MultiResultCallback to wrap callback from PartitionedConsumerImpl::seekAsync.",[48,80146,79754,80147,190],{},[55,80148,80151],{"href":80149,"rel":80150},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7216",[264],"PR-7216",[32,80153,80155],{"id":80154},"make-clear-thread-safe","Make clear() thread-safe",[48,80157,80158],{},"Before 2.6.2, the clear() methods of BatchAcknowledgementTracker and UnAckedMessageTrackerEnabled are not thread-safe.",[48,80160,80161],{},"In 2.6.2, we acquire a mutex in these clear() methods to make it thread-safe.",[48,80163,79754,80164,190],{},[55,80165,80168],{"href":80166,"rel":80167},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7862",[264],"PR-7862",[32,80170,80172],{"id":80171},"add-snappy-library-to-docker-images-for-building-c-packages","Add Snappy library to Docker images for building C++ packages",[48,80174,80175],{},"The program crashes when Snappy compression is enabled on the C++ client packaged as RPM\u002FDEB. This is because Snappy library is not included in the Docker image for building the RPM\u002FDEB package.",[48,80177,80178],{},"In 2.6.2, we add the Snappy library to the docker images to avoid the issue.",[48,80180,79754,80181,190],{},[55,80182,80185],{"href":80183,"rel":80184},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8086",[264],"PR-8086",[32,80187,80189],{"id":80188},"support-key-based-batching","Support key based batching",[48,80191,80192],{},"Support key based batching for the C++ client. In addition, currently, the implementation of BatchMessageContainer is coupling to ProducerImpl tightly. The batch message container registers a timer to the producer's executor and the timeout callback is also the producer's method. Even its add method could call sendMessage to send a batch to the producer's pending queue. These should be the producer's work.",[48,80194,80195],{},"In 2.6.2, we implement the feature in the following ways:",[321,80197,80198,80201,80204,80207,80210,80213],{},[324,80199,80200],{},"Add a MessageAndCallbackBatch to store a MessageImpl of serialized single messages and a callback list.",[324,80202,80203],{},"Add a BatchMessageContainerBase to provide interface methods and methods like update\u002Fclear message number\u002Fbytes, create OpSendMsg.",[324,80205,80206],{},"Let ProducerImpl manage the batch timer and determine whether to create OpSendMsg from BatchMessageContainerBase and send it.",[324,80208,80209],{},"Make BatchMessageContainer inherit BatchMessageContainerBase, it only manages a MessageAndCallbackBatch.",[324,80211,80212],{},"Add a BatchMessageKeyBasedContainer that inherits BatchMessageContainerBase, it manages a map of message key and MessageAndCallbackBatch.",[324,80214,80215],{},"Add a producer config to change batching type.",[48,80217,79754,80218,190],{},[55,80219,80222],{"href":80220,"rel":80221},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7996",[264],"PR-7996",[40,80224,9636],{"id":80225},"functions",[32,80227,80229],{"id":80228},"enable-kubernetes-runtime-to-customize-function-instance-class-path","Enable Kubernetes runtime to customize function instance class path",[48,80231,80232],{},"Before 2.6.2, the function worker's classpath is used to configure the function instance (runner)'s classpath. When the broker (function worker) uses an image that is different from the function instance (runner) for Kubernetes runtime, the classpath is wrong and the function instance could not load the instance classes.",[48,80234,80235],{},"In 2.6.2, we add a function instance classpath entry to the Kubernetes runtime config, and construct the function launch command accordingly.",[48,80237,79754,80238,190],{},[55,80239,80242],{"href":80240,"rel":80241},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7844",[264],"PR-7844",[32,80244,80246],{"id":80245},"set-dryrun-of-kubernetes-runtime-to-null","Set dryrun of Kubernetes Runtime to null",[48,80248,80249],{},"Before 2.6.2, we upgraded the client-java of Kubernetes to 0.9.2 to enhance security. However, during the creation of statefulsets, secrets, and services, the value of dryrun was set to true, which was not accepted by Kubernetes. Only All is allowed in Kubernetes.",[48,80251,80252],{},"In 2.6.2, we set the dryrun of Kubernetes Runtime to null.",[48,80254,79754,80255,190],{},[55,80256,80259],{"href":80257,"rel":80258},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8064",[264],"PR-8064",[40,80261,74373],{"id":74372},[32,80263,80265],{"id":80264},"upgrade-presto-version-to-332","Upgrade Presto version to 332",[48,80267,80268],{},"Upgrade Presto version to 332. Resolve different packages between prestosql and prestodb. Although the latest version is 334, versions higher than 333 require Java 11.",[48,80270,79754,80271,190],{},[55,80272,80275],{"href":80273,"rel":80274},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7194",[264],"PR-7194",[40,80277,38169],{"id":38169},[32,80279,80281],{"id":80280},"add-cli-command-to-get-the-last-message-id","Add CLI command to get the last message ID",[48,80283,80284],{},"Add last-message-id command in CLI, so users can get the last message ID with this command.",[48,80286,79754,80287,190],{},[55,80288,80291],{"href":80289,"rel":80290},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8082",[264],"PR-8082",[32,80293,80295],{"id":80294},"support-deleting-schema-ledgers-when-deleting-topics","Support deleting schema ledgers when deleting topics",[48,80297,80298],{},"Users could not delete schema of topics with the PersistentTopics#deleteTopic and PersistentTopics#deletePartitionedTopic in REST APIs. After topics were deleted, the schema ledgers still existed with adding an empty schema ledger.",[48,80300,80195],{},[321,80302,80303,80306,80309,80312],{},[324,80304,80305],{},"Add a deleteSchema query param to REST APIs of deleting topics\u002Fpartitioned topics;",[324,80307,80308],{},"Add a map to record the created ledgers in BookkeeperSchemaStorage;",[324,80310,80311],{},"Expose deleteSchema param in pulsar-admin APIs;",[324,80313,80314],{},"Delete schema ledgers when deleting the cluster with -a option.",[48,80316,79754,80317,190],{},[55,80318,80321],{"href":80319,"rel":80320},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8167",[264],"PR-8167",[32,80323,80325],{"id":80324},"support-deleting-all-data-associated-with-a-cluster","Support deleting all data associated with a cluster",[48,80327,80328],{},"When multiple broker clusters shared the same bookie cluster, if users wanted to remove a broker cluster, the associated ledgers in bookies were not deleted as expected.",[48,80330,80331],{},"In 2.6.2, we add a cluster delete command to enable users to delete all the data associated with the cluster.",[48,80333,79754,80334,190],{},[55,80335,80338],{"href":80336,"rel":80337},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8133",[264],"PR-8133",[40,80340,80342],{"id":80341},"pulsar-perf","Pulsar Perf",[32,80344,80346],{"id":80345},"enable-users-to-configure-iothread-number-in-pulsar-perf","Enable users to configure ioThread number in pulsar-perf",[48,80348,80349],{},"In pulsar-perf, the default Pulsar client ioThread number is Runtime.getRuntime().availableProcessors() and users could not configure it in the command line. When running a pulsar-perf producer, it may cause messages to enqueue competition and lead to high latency.",[48,80351,80195],{},[1666,80353,80354,80357],{},[324,80355,80356],{},"Enable users to configure the ioThread number in the command line;",[324,80358,80359],{},"Change the default ioThead number from Runtime.getRuntime().availableProcessors() to 1",[48,80361,79754,80362,190],{},[55,80363,80366],{"href":80364,"rel":80365},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F8090",[264],"PR-8090",[40,80368,78580],{"id":78579},[321,80370,80371,80377],{},[324,80372,80373,80374,190],{},"To download Apache Pulsar 2.6.2, click ",[55,80375,36195],{"href":53730,"rel":80376},[264],[324,80378,80379,80380,4003,80384,190],{},"For more information about Apache Pulsar 2.6.2, see [2.6.2 release notes](",[55,80381,80382],{"href":80382,"rel":80383},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#2.6.2",[264],[55,80385,80388],{"href":80386,"rel":80387},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpulls?q=is%3Apr+label%3Arelease%2F2.6.2+is%3Aclosed",[264],"2.6.2 PR list",[48,80390,78604],{},[321,80392,80393,80397,80401,80406],{},[324,80394,80395],{},[55,80396,78612],{"href":78611},[324,80398,80399],{},[55,80400,78618],{"href":78617},[324,80402,78621,80403],{},[55,80404,36242],{"href":36242,"rel":80405},[264],[324,80407,78627,80408],{},[55,80409,57760],{"href":57760,"rel":80410},[264],[48,80412,78633,80413,190],{},[55,80414,75345],{"href":36230,"rel":80415},[264],{"title":18,"searchDepth":19,"depth":19,"links":80417},[80418,80431,80436,80440,80446,80450,80453,80458,80461],{"id":61064,"depth":19,"text":61065,"children":80419},[80420,80421,80422,80423,80424,80425,80426,80427,80428,80429,80430],{"id":79744,"depth":279,"text":79745},{"id":79762,"depth":279,"text":79763},{"id":79779,"depth":279,"text":79780},{"id":79793,"depth":279,"text":79794},{"id":79847,"depth":279,"text":79848},{"id":79872,"depth":279,"text":79873},{"id":79895,"depth":279,"text":79896},{"id":79918,"depth":279,"text":79919},{"id":79935,"depth":279,"text":79936},{"id":79960,"depth":279,"text":79961},{"id":79995,"depth":279,"text":79996},{"id":68240,"depth":19,"text":68241,"children":80432},[80433,80434,80435],{"id":80020,"depth":279,"text":80021},{"id":80034,"depth":279,"text":80035},{"id":80051,"depth":279,"text":80052},{"id":72442,"depth":19,"text":80071,"children":80437},[80438,80439],{"id":80074,"depth":279,"text":80075},{"id":80091,"depth":279,"text":80092},{"id":80125,"depth":19,"text":80126,"children":80441},[80442,80443,80444,80445],{"id":80129,"depth":279,"text":80130},{"id":80154,"depth":279,"text":80155},{"id":80171,"depth":279,"text":80172},{"id":80188,"depth":279,"text":80189},{"id":80225,"depth":19,"text":9636,"children":80447},[80448,80449],{"id":80228,"depth":279,"text":80229},{"id":80245,"depth":279,"text":80246},{"id":74372,"depth":19,"text":74373,"children":80451},[80452],{"id":80264,"depth":279,"text":80265},{"id":38169,"depth":19,"text":38169,"children":80454},[80455,80456,80457],{"id":80280,"depth":279,"text":80281},{"id":80294,"depth":279,"text":80295},{"id":80324,"depth":279,"text":80325},{"id":80341,"depth":19,"text":80342,"children":80459},[80460],{"id":80345,"depth":279,"text":80346},{"id":78579,"depth":19,"text":78580},"2020-11-20","Learn the most interesting and major features added to Pulsar 2.6.2.","\u002Fimgs\u002Fblogs\u002F63d7969208160538fd6d056d_63a38f1df6cde868265ad11f_262-top.webp",{},"\u002Fblog\u002Fapache-pulsar-2-6-2",{"title":79730,"description":80463},"blog\u002Fapache-pulsar-2-6-2",[302,821],"lDvgpgeAilldd6cEDFa6uO9DqElvdVPno3Pjg7vFRQ8",{"id":80472,"title":80473,"authors":80474,"body":80475,"category":821,"createdAt":290,"date":81423,"description":81424,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":81425,"navigation":7,"order":296,"path":81426,"readingTime":81427,"relatedResources":290,"seo":81428,"stem":81429,"tags":81430,"__hash__":81431},"blogs\u002Fblog\u002Fbenchmarking-pulsar-and-kafka-report-2020.md","Benchmarking Pulsar and Kafka - The Full Benchmark Report - 2020",[808,806],{"type":15,"value":80476,"toc":81396},[80477,80483,80489,80492,80494,80497,80500,80503,80506,80580,80583,80586,80589,80615,80618,80621,80624,80632,80638,80641,80644,80650,80653,80656,80667,80670,80676,80679,80682,80688,80691,80694,80697,80705,80708,80714,80717,80722,80725,80728,80730,80741,80744,80750,80753,80759,80765,80768,80770,80773,80792,80794,80797,80800,80803,80809,80811,80817,80820,80826,80830,80833,80839,80845,80849,80852,80858,80864,80867,80873,80877,80880,80885,80890,80893,80896,80899,80907,80913,80917,80920,80926,80929,80932,80938,80944,80947,80950,80956,80958,80964,80967,80970,80976,80978,80984,80987,80990,80993,81004,81010,81014,81017,81023,81025,81031,81037,81043,81046,81052,81058,81061,81075,81079,81082,81088,81091,81097,81103,81105,81116,81120,81123,81129,81131,81137,81140,81146,81152,81155,81169,81173,81176,81182,81188,81191,81197,81203,81206,81219,81222,81224,81227,81245,81248,81251,81253,81257,81260,81263,81274,81280,81284,81287,81290,81301,81306,81309,81311,81314,81329,81332,81334,81338,81341,81344,81350,81353,81355,81358,81364,81368,81371,81374,81379,81382,81385],[48,80478,80479],{},[384,80480],{"alt":80481,"src":80482},"illustration of pulsar vs kafka","\u002Fimgs\u002Fblogs\u002F63a383dd44faf8eda315ccc9_benchmark-pulsar-kafka-top.jpeg",[40,80484,80486,80487,190],{"id":80485},"get-the-updated-apache-pulsar-vs-apache-kafka-2022-benchmark-here","Get the updated Apache Pulsar vs. Apache Kafka 2022 Benchmark ",[55,80488,267],{"href":21458},[48,80490,80491],{},"Read below for details on the 2020 Benchmark.",[40,80493,46],{"id":42},[48,80495,80496],{},"Having identified multiple issues in Confluent’s approach to evaluating various performance factors, we decided to repeat their benchmark on Pulsar and Kafka with some adjustments. We wanted to improve the accuracy of the test to facilitate more meaningful comparisons between the two systems. We also wanted to get a more comprehensive view, so we broadened the scope of our test to include additional performance measures and simulated real-world scenarios.",[48,80498,80499],{},"Our benchmark repeated Confluent’s original tests with the appropriate corrections and included all the durability levels supported by Pulsar and Kafka. As a result, we were able to compare throughput and latency at equivalent levels of durability. In addition, we benchmarked new performance factors and conditions, such as varying numbers of partitions, subscriptions, and clients. We also emulated real-world use cases by testing mixed workloads containing writes, tailing-reads, and catch-up reads.",[48,80501,80502],{},"In this report, we describe the tests we performed in detail and share our results and conclusions.",[40,80504,80505],{"id":79112},"Content",[321,80507,80508,80514,80520,80526,80532,80538,80544,80550,80556,80562,80568,80574],{},[324,80509,80510],{},[55,80511,80513],{"href":80512},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#maximum-throughput-test","Maximum Throughput Test",[324,80515,80516],{},[55,80517,80519],{"href":80518},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#1-100-partitions-1-subscription-2-producers-and-2-consumers","#1 100 partitions, 1 subscription, 2 producers and 2 consumers",[324,80521,80522],{},[55,80523,80525],{"href":80524},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#2-2000-partitions-1-subscription-2-producers-and-2-consumers","#2 2000 partitions, 1 subscription, 2 producers and 2 consumers",[324,80527,80528],{},[55,80529,80531],{"href":80530},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#3-1-partition-1-subscription-2-producers-and-2-consumers","#3 1 partition, 1 subscription, 2 producers and 2 consumers",[324,80533,80534],{},[55,80535,80537],{"href":80536},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#4-1-partition-1-subscription-1-producer-and-1-consumer","#4 1 partition, 1 subscription, 1 producer and 1 consumer",[324,80539,80540],{},[55,80541,80543],{"href":80542},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#publish-and-end-to-end-latency-test","Publish and End-to-End Latency Test",[324,80545,80546],{},[55,80547,80549],{"href":80548},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#1-100-partitions-1-subscription","#1 100 partitions, 1 subscription",[324,80551,80552],{},[55,80553,80555],{"href":80554},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#2-100-partitions-10-subscriptions","#2 100 partitions, 10 subscriptions",[324,80557,80558],{},[55,80559,80561],{"href":80560},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#3-100-5000-8000-10000-partitions","#3 100, 5000, 8000, 10000 partitions",[324,80563,80564],{},[55,80565,80567],{"href":80566},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#catch-up-read-test","Catch-up Read Test",[324,80569,80570],{},[55,80571,80573],{"href":80572},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#mixed-workload-test","Mixed Workload Test",[324,80575,80576],{},[55,80577,80579],{"href":80578},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report\u002F#conclusions","Conclusions",[40,80581,80513],{"id":80582},"maximum-throughput-test",[48,80584,80585],{},"The following is the test setup.",[48,80587,80588],{},"We designed this test to determine the maximum throughput Pulsar and Kafka can achieve when processing workloads that consist of publish and tailing-reads. We varied the number of partitions to see how each change impacted throughput. Our test strategy included the following principles and expected guarantees:",[321,80590,80591,80594,80597,80600,80603,80606,80609,80612],{},[324,80592,80593],{},"Each message was replicated three times to ensure fault tolerance.",[324,80595,80596],{},"We varied the number of acknowledgements to determine the maximum throughput of each system under various replication durability guarantees.",[324,80598,80599],{},"We enabled batching for Kafka and Pulsar, batching up to 1 MB of data for a maximum of 10 ms.",[324,80601,80602],{},"We tested varying numbers of partitions—specifically, 1, 100, and 2000—to measure the maximum throughput for each condition.",[324,80604,80605],{},"When benchmarking the maximum throughput for 100 and 2000 partitions, we ran 2 producers and 2 consumers.",[324,80607,80608],{},"When benchmarking the maximum throughput for a single partition, we varied the number of producers and consumers to measure changes in throughput under different conditions.",[324,80610,80611],{},"We used a message size of 1 KB.",[324,80613,80614],{},"For each scenario, we tested the maximum throughput under various durability levels.",[48,80616,80617],{},"The following is the result for each test.",[32,80619,80519],{"id":80620},"_1-100-partitions-1-subscription-2-producers-and-2-consumers",[48,80622,80623],{},"Our first test benchmarked maximum throughput on Pulsar and Kafka with 100 partitions under two different durability guarantees. We used one subscription, two producers, and two consumers for each system. Our test results are described below.",[321,80625,80626,80629],{},[324,80627,80628],{},"When configured to provide Level-1 durability guarantees (sync replication durability, sync local durability), Pulsar achieved a maximum throughput of ~300 MB\u002Fs, which reached the physical limit of the journal disk’s bandwidth. Kafka was able to achieve ~420 MB\u002Fs with 100 partitions. It should be noted that when providing level-1 durability, Pulsar was configured to use one disk as journal disk for writes and the other disk as ledger disk for reads, comparing to Kafka use both disks for writes and reads. While Pulsar's setup is able to provide better I\u002FO isolation, its throughput was also limited by the maximum bandwidth of a single disk (~300 MB\u002Fs). Alternative disk configurations can be beneficial to Pulsar and allow for more cost effective operation, which will be discussed in a later blog post.",[324,80630,80631],{},"When configured to provide Level-2 durability guarantees (sync replication durability, async local durability), Pulsar and Kafka each achieved a maximum throughput of ~600 MB\u002Fs. Both systems reached the physical limit of disk bandwidth.",[48,80633,80634],{},[384,80635],{"alt":80636,"src":80637},"graph pulsar and Kafka with 100 partitions","\u002Fimgs\u002Fblogs\u002F63a383dcf609c7a4a2ac8a9d_image19.png",[48,80639,80640],{},"Figure 1: Maximum throughput with 100 partitions on Pulsar and Kafka (sync local durability)",[48,80642,80643],{},"Figure 2 shows the maximum throughput on Pulsar and Kafka with 100 partitions under async local durability.",[48,80645,80646],{},[384,80647],{"alt":80648,"src":80649},"graph of Maximum throughput with 100 partitions on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a383dc44faf875a515ccc8_image21.png",[32,80651,80525],{"id":80652},"_2-2000-partitions-1-subscription-2-producers-and-2-consumers",[48,80654,80655],{},"Our second test benchmarked maximum throughput using the same durability guarantees (acks = 2) on Pulsar and Kafka. However, we increased the number of partitions from 100 to 2000. We used one subscription, two producers, and two consumers. Our test results are described below.",[321,80657,80658,80661,80664],{},[324,80659,80660],{},"Pulsar’s maximum throughput remained at ~300 MB\u002Fs under a Level-1 durability guarantee and increased to ~600 MB\u002Fs under Level-2 durability.",[324,80662,80663],{},"Kafka’s maximum throughput decreased from 600 MB\u002Fs (at 100 partitions) to ~300 MB\u002Fs when flushing data for each message individually (kafka-ack-all-sync).",[324,80665,80666],{},"Kafka’s maximum throughput decreased from ~500 MB\u002Fs (at 100 partitions) to ~300 MB\u002Fs when using the system’s default durability settings (kafka-ack-all-nosync).",[48,80668,80669],{},"To understand why Kafka’s throughput dropped, we plotted the average publish latency for each system under each durability guarantee tested. As you can see in Figure 3, when the number of partitions increased to 2000, Kafka’s average publish latency increased to 200 ms and its 99th percentile publish latency increased to 1200 ms.",[48,80671,80672],{},[384,80673],{"alt":80674,"src":80675},"graph of Maximum throughput with 2000 partitions on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a383dc1157b728cb2b51f2_image20.png",[48,80677,80678],{},"Increased publish latency can significantly impact throughput. Latency did not affect throughput on Pulsar because Pulsar clients leverage Netty’s powerful asynchronous networking framework. However, latency did impact throughput on Kafka because Kafka clients use synchronous implementation. We were able to improve throughput on Kafka by doubling the number of producers. When we increased the number of producers to four, Kafka achieved a throughput of ~600 MB\u002Fs.",[48,80680,80681],{},"Figure 4 shows the publish latency for Pulsar and Kafka with 2000 partitions.",[48,80683,80684],{},[384,80685],{"alt":80686,"src":80687},"graph of publish latency with 2000 partitions on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a383dc3eafb5bd2f4ff50a_image23.png",[32,80689,80531],{"id":80690},"_3-1-partition-1-subscription-2-producers-and-2-consumers",[48,80692,80693],{},"Adding more brokers and partitions helps increase throughput on both Pulsar and Kafka. To gain a better understanding of each system’s efficiency, we benchmarked maximum throughput using only one partition. For this test, we used one subscription, two producers, and two consumers.",[48,80695,80696],{},"We observed the following:",[321,80698,80699,80702],{},[324,80700,80701],{},"Pulsar achieved a maximum throughput of ~300 MB\u002Fs at all levels of durability.",[324,80703,80704],{},"Kafka achieved a maximum throughput of ~300 MB\u002Fs under async replication durability, but only ~160 MB\u002Fs under sync replication durability.",[48,80706,80707],{},"Figure 5 shows the maximum throughput on Pulsar and Kafka with one partition under sync local durability.",[48,80709,80710],{},[384,80711],{"alt":80712,"src":80713},"graph of Maximum throughput with 1 partition on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a383dd65aad4bdd64fca1b_image22.png",[48,80715,80716],{},"Figure 6 shows the maximum throughput on Pulsar and Kafka with one partition under async local durability.",[48,80718,80719],{},[384,80720],{"alt":80712,"src":80721},"\u002Fimgs\u002Fblogs\u002F63a383ddc62d6ea186f1d2a7_image25.png",[32,80723,80537],{"id":80724},"_4-1-partition-1-subscription-1-producer-and-1-consumer",[48,80726,80727],{},"We benchmarked maximum throughput on Pulsar and Kafka using only one partition and one subscription, as in the previous test. However, for this test, we used only one producer and one consumer (instead of two of each).",[48,80729,80696],{},[321,80731,80732,80735,80738],{},[324,80733,80734],{},"Pulsar sustained a maximum throughput of ~300 MB\u002Fs at all durability levels.",[324,80736,80737],{},"Kafka’s maximum throughput decreased from ~300 MB\u002Fs (in Test #3) to ~230 MB\u002Fs under async replication durability.",[324,80739,80740],{},"Kafka’s throughput was dropped from ~160 MB\u002Fs (in Test #3) to ~100 MB\u002Fs under sync replication durability.",[48,80742,80743],{},"Figure 7 shows the maximum throughput Pulsar and Kafka achieved with one partition, one producer, and one consumer under sync local durability.",[48,80745,80746],{},[384,80747],{"alt":80748,"src":80749},"Graph of maximum throughput with 1 partition, 1 producer, 1 consumer on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a383dd1157b77ef12b51f3_image24.png",[48,80751,80752],{},"To understand why Kafka’s throughput dropped, we plotted the average publish latency (see Figure 8) and end-to-end latency (see Figure 9) for each system under different durability guarantees. As you can see from the graphics below, even with just one partition, Kafka’s publish and end-to-end latency increased from single-digit values to multiple hundreds of milliseconds. Reducing the number of producers and consumers greatly impacted Kafka’s throughput. In contrast, Pulsar consistently offered predictable low single-digit latency.",[48,80754,80755],{},[384,80756],{"alt":80757,"src":80758},"graph of publish latency with 1 partition and 1 producer and 1 consumer on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a383dd9245eb04e0ae0373_image27.png",[48,80760,80761],{},[384,80762],{"alt":80763,"src":80764},"graph of End-to-end latency with 1 partition, 1 producer, and 1 consumer on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a383dd944d6ad6c7c09c4e_image26.png",[40,80766,80543],{"id":80767},"publish-and-end-to-end-latency-test",[48,80769,80585],{},[48,80771,80772],{},"The test was designed to determine the lowest latency each system can achieve when processing workloads that consist of publish and tailing-reads. We varied the number of subscriptions and the number of partitions to see how each change impacted both publish and end-to-end latency. Our test strategy included the following principles and expected guarantees:",[321,80774,80775,80777,80780,80783,80786,80789],{},[324,80776,80593],{},[324,80778,80779],{},"We varied the number of acknowledgments to measure variances in throughput using different replication durability guarantees.",[324,80781,80782],{},"We varied the number of subscriptions (from 1 to 10) to measure latency for each.",[324,80784,80785],{},"We varied the number of partitions (from 100 to 10000) to measure latency for each.",[324,80787,80788],{},"We used a message size of 1KB.",[324,80790,80791],{},"The producer sent messages at a fixed rate of 200000\u002Fs (~200 MB\u002Fs) and the tailing-read consumers processed the messages while the producer continued to send them.",[48,80793,80617],{},[32,80795,80549],{"id":80796},"_1-100-partitions-1-subscription",[48,80798,80799],{},"We started with 100 partitions and 1 subscription to benchmark the lowest latency Pulsar and Kafka can achieve under all different durability guarantees.",[48,80801,80802],{},"Our test showed Pulsar’s publish and end-to-end latency to be two to five times lower than Kafka’s at all levels of durability. You can see the actual test results in Table 1.",[48,80804,80805],{},[384,80806],{"alt":80807,"src":80808},"graph of Publish latency on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a385731ef736229cded551_image30.png",[48,80810,3931],{},[48,80812,80813],{},[384,80814],{"alt":80815,"src":80816}," table actual publish latency test results on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a385b21157b71beb2bceaa_-Actual-publish-latency-test-results-on-Pulsar-and-Kafka.webp",[48,80818,80819],{},"To gain a better understanding of how latency changes over the time, we plotted the 99th percentile publish latency for Pulsar and Kafka using various replication durability settings. As you can see in Figure 11, Pulsar’s latency stayed consistent (~5 ms) but Kafka’s latency was spiky. Stable and consistently low latency is crucial to mission-critical services.",[48,80821,80822],{},[384,80823],{"alt":80824,"src":80825},"99th percentile publish latency on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a385c6f366f794c8b3f81b_image28.png",[3933,80827,80829],{"id":80828},"end-to-end-latency-sync-local-durability","End-to-End Latency - Sync Local Durability",[48,80831,80832],{},"Figure 12 shows the differences in end-to-end latency between Pulsar and Kafka using two replication durability settings (ack-1 and ack-2, respectively) and sync local durability. Table 3 shows the exact latency numbers for each case. As you can see, Pulsar’s 99th percentile end-to-end latency was three times lower than Kafka’s under async replication durability (ack-1) and five times lower under sync replication durability (ack-2).",[48,80834,80835],{},[384,80836],{"alt":80837,"src":80838},"graph End-to-end latency with 1 subscription on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a385c79c01af43582a3970_image29.png",[48,80840,80841],{},[384,80842],{"alt":80843,"src":80844},"table Actual end-to-end latency test results with 1 subscription on Pulsar and Kafka","https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F63a386279c01af2c622a3c41_image%201%20(7).webp",[3933,80846,80848],{"id":80847},"publish-latency-async-local-durability","Publish Latency - Async Local Durability",[48,80850,80851],{},"Figure 13 shows the differences in publish latency between Pulsar and Kafka using two replication durability settings (ack-1 and ack-2, respectively) and async local durability. Table 4 shows the exact latency numbers for each case. As you can see, Kafka performed better in the async replication durability (ack-1) case. But Pulsar’s 99th percentile publish latency stayed consistent (below 5 ms) and increasing the replication durability guarantee (from ack-1 to ack-2) did not impact latency. However, Kafka’s 99th percentile publish latency with sync replication durability (ack-2) was much higher than Pulsar’s.",[48,80853,80854],{},[384,80855],{"alt":80856,"src":80857},"graph Publish latency on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a385c77d7fba7e0e8d280d_image31.png",[48,80859,80860],{},[384,80861],{"alt":80862,"src":80863},"table actual publish latency test results on Pulsar and Kafka (without data sync)","\u002Fimgs\u002Fblogs\u002F63a386b26559a475f920b5e3_Actual-publish-latency-test-results-on-Pulsar-and-Kafka.webp",[48,80865,80866],{},"To gain a better understanding of how publish latency changes over the time, we plotted the 99th percentile publish latency for Pulsar and Kafka under various replication durability settings. As you can see in Figure 14, Pulsar’s latency stayed consistently low (below 5 ms) and Kafka’s was about two times of Pulsar’s with sync replication durability.",[48,80868,80869],{},[384,80870],{"alt":80871,"src":80872},"graph 99th percentile publish latency on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a386c47ccf6353e2ce36b0_image32.png",[3933,80874,80876],{"id":80875},"end-to-end-latency-async-local-durability","End-to-End Latency - Async Local Durability",[48,80878,80879],{},"Figure 15 shows the differences in end-to-end latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and async local durability. Table 5 shows the exact latency numbers for each case. As you can see, Pulsar performed consistently better than Kafka in all cases. Pulsar’s 99th percentile end-to-end latency stayed consistent (~ 5 ms) and varying the replication durability setting had no impact. Kafka’s 99th percentile end-to-end latency was 1.5 times higher than Pulsar’s for ack-1 and 2 times higher for ack-2.",[48,80881,80882],{},[384,80883],{"alt":80837,"src":80884},"\u002Fimgs\u002Fblogs\u002F63a386c406e3a143c4914254_image33.png",[48,80886,80887],{},[384,80888],{"alt":80843,"src":80889},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F63a387117b38f7d4842a7a09_Actual%20end-to-end%20latency%20test%20results%20with%201%20subscription%20on%20Pulsar%20and%20Kafka%20(1).webp",[32,80891,80555],{"id":80892},"_2-100-partitions-10-subscriptions",[48,80894,80895],{},"Once we understood how Pulsar and Kafka performed with just one subscription, we wanted to see how varying the number of subscriptions affected publish and end-to-end latency. So, we increased the number of subscriptions from 1 to 10 and assigned 2 consumers to each subscription.",[48,80897,80898],{},"As you can see from the details in Table 6, our test results showed the following:",[321,80900,80901,80904],{},[324,80902,80903],{},"Pulsar’s 99th percentile publish and end-to-end latency stayed between 5 and 10 ms.",[324,80905,80906],{},"Kafka’s 99th percentile publish and end-to-end latency were greatly impacted by increasing the number of subscriptions and went up to multiple seconds.",[48,80908,80909],{},[384,80910],{"alt":80911,"src":80912},"table Publish and end-to-end latency test results with 10 subscriptions","\u002Fimgs\u002Fblogs\u002F63a38755864b1d449a14e1ce_Publish-and-end-to-end-latency-test-results-with-10-subscriptions.webp",[3933,80914,80916],{"id":80915},"publish-latency-sync-local-durability","Publish Latency: Sync Local Durability",[48,80918,80919],{},"Figure 16 shows the differences in publish latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and sync local durability. Table 7 shows the exact latency numbers for each case. As you can see, Pulsar’s 99th percentile publish latency was still three times lower than Kafka’s under async replication durability (ack-1). But under sync replication durability (ack-2), Pulsar’s publish latency was 160 times lower than Kafka’s (as compared to 5 times lower with only one subscription).",[48,80921,80922],{},[384,80923],{"alt":80924,"src":80925},"table actual publish latency test results with 10 subscriptions on Pulsar and Kafka (with data sync)","\u002Fimgs\u002Fblogs\u002F63a387a87d7fba66e08db8cd_Actual-publish-latency-test-results-with-10-subscriptions-on-Pulsar-and-Kafka.webp",[3933,80927,80829],{"id":80928},"end-to-end-latency-sync-local-durability-1",[48,80930,80931],{},"Figure 17 shows the differences in end-to-end latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and sync local durability. Table 8 shows the exact latency numbers for each case. As you can see, Pulsar’s 99th percentile latency was 20 times lower than Kafka’s under async replication durability (ack-1) and 110 times lower under sync replication durability (ack-2).",[48,80933,80934],{},[384,80935],{"alt":80936,"src":80937},"graph end-to-end latency with 10 subscriptions on Pulsar and Kafka (with data sync)","\u002Fimgs\u002Fblogs\u002F63a387c5c62d6ee7d6f564c7_image35.png",[48,80939,80940],{},[384,80941],{"alt":80942,"src":80943},"tabs actual end-to-end latency test results with 10 subscriptions on Pulsar and Kafka (with data sync)","\u002Fimgs\u002Fblogs\u002F63a3881cf86febf8c3b30b47_Actual-end-to-end-latency-test-results-with-10-subscriptions-on-Pulsar-and-Kafka.webp",[3933,80945,80848],{"id":80946},"publish-latency-async-local-durability-1",[48,80948,80949],{},"Figure 18 shows the differences in publish latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and async local durability. Table 9 shows the exact latency numbers for each case. As you can see, Pulsar outperformed Kafka significantly. Pulsar’s average publish latency was ~3 ms and its 99th percentile latency was within 5 ms. Kafka’s performance was satisfactory under async replication durability (ack-1), but significantly worse under sync replication durability (ack-2). Kafka’s 99th percentile publish latency under sync replication durability was 270 times higher than Pulsar’s.",[48,80951,80952],{},[384,80953],{"alt":80954,"src":80955},"graph publish latency quantiles","\u002Fimgs\u002Fblogs\u002F63a38838c62d6e349bf58606_image36.png",[48,80957,3931],{},[48,80959,80960],{},[384,80961],{"alt":80962,"src":80963},"tabs actual publish latency test results with 10 subscriptions on Pulsar and Kafka","https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F63a388917d7fba668e8f2410_Actual%20publish%20latency%20test%20results%20with%2010%20subscriptions%20on%20Pulsar%20and%20Kafka%20(1).webp",[3933,80965,80876],{"id":80966},"end-to-end-latency-async-local-durability-1",[48,80968,80969],{},"Figure 19 shows the differences in end-to-end latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and async local durability. Table 10 shows the exact latency numbers for different cases. As you can see, Pulsar performed consistently better than Kafka in all cases. Pulsar’s end-to-end latency consistently stayed between 4 and 7 ms and varying the replication durability setting had no impact. Kafka’s 99th percentile end-to-end latency was 13 times higher than Pulsar’s for ack-1 and 187 times higher for ack-2.",[48,80971,80972],{},[384,80973],{"alt":80974,"src":80975},"graph end to end latency","\u002Fimgs\u002Fblogs\u002F63a388b3ac43eff0270cb9f1_image37.png",[48,80977,3931],{},[48,80979,80980],{},[384,80981],{"alt":80982,"src":80983},"tabs of Actual end-to-end latency test results with 10 subscriptions","\u002Fimgs\u002Fblogs\u002F63a3893cac43ef7b2c0d5b69_Actual-end-to-end-latency-test-results-with-10-subscriptions.webp",[32,80985,80561],{"id":80986},"_3-100-5000-8000-10000-partitions",[48,80988,80989],{},"Having learned how varying the number of subscriptions affects publish latency in both Pulsar and Kafka, we wanted to vary the number of partitions and observe the effects. So, we increased the number of partitions in increments from 100 to 10000 and looked for changes.",[48,80991,80992],{},"As you can see from the details in Table 11, our test results showed the following:",[321,80994,80995,80998,81001],{},[324,80996,80997],{},"Pulsar’s 99th percentile publish latency remained stable at ~5 ms when the number of partitions increased.",[324,80999,81000],{},"Kafka’s 99th percentile publish latency was greatly impacted by incremental increases in the number of partitions and went up to multiple seconds.",[324,81002,81003],{},"When the number of partitions exceeded 5000, Kafka’s consumer was unable to keep up with the publish throughput.",[48,81005,81006],{},[384,81007],{"alt":81008,"src":81009},"tabs ctual publish latency test results with varying acks and durability","\u002Fimgs\u002Fblogs\u002F63a389833eafb5612a525792_Actual-publish-latency-test-results-with-varying-acks-and-durability.webp",[3933,81011,81013],{"id":81012},"ack-1-sync-local-durability","Ack = 1, Sync local durability",[48,81015,81016],{},"Figure 20 and Figure 21 show the differences in publish and end-to-end latency, respectively, between Pulsar and Kafka when varying the number of partitions under sync local durability and async replication durability (ack = 1).",[48,81018,81019],{},[384,81020],{"alt":81021,"src":81022},"graph Publish latency quantiles","\u002Fimgs\u002Fblogs\u002F63a3899b46c3eab3c419b745_image38.png",[48,81024,3931],{},[48,81026,81027],{},[384,81028],{"alt":81029,"src":81030},"graph end to end latency quantiles","\u002Fimgs\u002Fblogs\u002F63a3899bf86feb2ef3b445ff_image9.png",[48,81032,81033],{},[384,81034],{"alt":81035,"src":81036},"actual publish latency test results with varying numbers of partitions and one acknowledgement","\u002Fimgs\u002Fblogs\u002F63a38a4d7bdd4387cc098676_actual-publish-latency-test-results-with-varying-numbers-of-partitions-and-one-acknowledgement.webp",[48,81038,81039],{},[384,81040],{"alt":81041,"src":81042},"table actual end-to-end latency test results with varying numbers of partitions","\u002Fimgs\u002Fblogs\u002F63a38a74f6cde82a8255ed1e_actual-end-to-end-latency-test-results-with-varying-numbers-of-partitions.webp",[48,81044,81045],{},"Figure 22 shows end-to-end latency with varying numbers of partitions and one acknowledgment on Pulsar. Figure 23 shows end-to-end latency with varying numbers of partitions and one acknowledgement on Kafka.",[48,81047,81048],{},[384,81049],{"alt":81050,"src":81051},"graph End-to-end latency with varying numbers of partitions and 1 ack on Pulsar ","\u002Fimgs\u002Fblogs\u002F63a38a931ef736ca72e51144_image10.png",[48,81053,81054],{},[384,81055],{"alt":81056,"src":81057},"End-to-end latency with varying numbers of partitions and 1 ack on Kafka","\u002Fimgs\u002Fblogs\u002F63a38a937b38f76fbe2d4bea_image11.png",[48,81059,81060],{},"As you can see from the figures and tables above,",[321,81062,81063,81066,81069,81072],{},[324,81064,81065],{},"Pulsar’s 99th percentile publish latency remained stable at ~5 ms. Varying the number of partitions had no effect.",[324,81067,81068],{},"Pulsar’s 99th percentile end-to-end latency remained stable at ~6 ms. Varying the number of partitions had no effect.",[324,81070,81071],{},"Kafka’s 99th percentile publish latency degraded incrementally as the number of partitions increased and was 5 times higher at 10000 partitions (as compared to 100). This was 10 times higher than Pulsar’s.",[324,81073,81074],{},"Kafka’s 99th percentile end-to-end latency degraded incrementally as the number of partitions increased and was 10000 times higher at 10000 partitions (as compared to 100). Kafka’s 99th percentile end-to-end latency at 10000 partitions increased to 180 s and was 280000 times higher than Pulsar’s.",[3933,81076,81078],{"id":81077},"ack-2-sync-local-durability","Ack = 2, Sync local durability",[48,81080,81081],{},"Figure 24 shows the differences in publish latency between Pulsar and Kafka when varying the number of partitions under sync local durability and sync replication durability (ack = 2). Table 14 shows the exact latency numbers for each case.",[48,81083,81084],{},[384,81085],{"alt":81086,"src":81087},"table Actual publish latency test results with varying numbers of partitions and all\u002F2 ack","\u002Fimgs\u002Fblogs\u002F63a38af435743d08e932c3f2_Actual-publish-latency-test-results.webp",[48,81089,81090],{},"Figure 25 and Figure 26 show how varying the number of partitions affects end-to-end latency in Pulsar and Kafka, respectively.",[48,81092,81093],{},[384,81094],{"alt":81095,"src":81096},"graph  End-to-end latency with varying numbers of partitions and 2 acks on Pulsar ","\u002Fimgs\u002Fblogs\u002F63a38b14ac43ef4c560f0f41_image13.png",[48,81098,81099],{},[384,81100],{"alt":81101,"src":81102},"graph End-to-end latency with varying numbers of partitions and 2 acks on Kafka ","\u002Fimgs\u002Fblogs\u002F63a38b14f86febb0b5b5488c_image14.png",[48,81104,81060],{},[321,81106,81107,81110,81113],{},[324,81108,81109],{},"Pulsar’s 99th percentile publish latency remained stable at ~10 ms. Increasing the number of partitions had no effect. Kafka’s 99th percentile publish latency degraded incrementally as the number of partitions increased and was 30 times higher at 10000 partitions (as compared to 100). Kafka’s 99th percentile publish latency at 10000 partitions increased to 1.7 s and was 126 times higher than Pulsar’s.",[324,81111,81112],{},"Pulsar’s 99th percentile end-to-end latency remained stable at ~10 ms. Increasing the number of partitions had only a slight impact on Pulsar’s 99th percentile end-to-end latency. But even at 10000 partitions, it remained relatively low at ~50 ms.",[324,81114,81115],{},"Kafka’s 99th percentile end-to-end latency degraded incrementally as the number of partitions increased. At 10000 partitions, - Kafka’s 99th percentile end-to-end latency increased to 200 s and was 14771 times higher than Pulsar’s.",[3933,81117,81119],{"id":81118},"ack-1-async-local-durability","Ack = 1, Async local durability",[48,81121,81122],{},"Figure 27 shows the publish latency difference between Pulsar and Kafka when varying the number of partitions under async local durability and async replication durability (ack = 1). Table 15 shows the exact latency numbers for different cases.",[48,81124,81125],{},[384,81126],{"alt":81127,"src":81128},"graph Publish latency with varying numbers of partitions and 1 ack","\u002Fimgs\u002Fblogs\u002F63a38b15192232cb66cd6f6e_image15.png",[48,81130,3931],{},[48,81132,81133],{},[384,81134],{"alt":81135,"src":81136},"Actual publish latency test results with varying numbers of partitions and 1 ack","\u002Fimgs\u002Fblogs\u002F63a38b5e7bdd437a430ad550_Actual-publish-latency-test-results-with-varying-numbers-of-partitions-and-1-ack.webp",[48,81138,81139],{},"Figure 28 and Figure 29 show how varying the number of partitions affected end-to-end latency in Pulsar and Kafka, respectively.",[48,81141,81142],{},[384,81143],{"alt":81144,"src":81145},"graph  End-to-end latency with varying numbers of partitions and 1 ack on Pulsar","\u002Fimgs\u002Fblogs\u002F63a38b751157b7d7273110ba_image16.png",[48,81147,81148],{},[384,81149],{"alt":81150,"src":81151},"graph End-to-end latency with varying numbers of partitions and 1 ack on Kafka","\u002Fimgs\u002Fblogs\u002F63a38b767d7fba7ce09166bd_image17.png",[48,81153,81154],{},"As you can see from the above figures and tables,",[321,81156,81157,81160,81163,81166],{},[324,81158,81159],{},"Pulsar’s 99th percentile publish latency remained stable at ~4 to ~5 ms. Increasing the number of partitions had no impact.",[324,81161,81162],{},"Kafka’s 99th percentile publish latency degraded incrementally as the number of partitions increased and was 13 times higher at 10000 partitions (as compared to 100). Kafka’s 99th percentile publish latency at 10000 partitions increased to 41 ms and was 8 times higher than Pulsar’s.",[324,81164,81165],{},"Pulsar’s 99th percentile end-to-end latency remained stable at ~4 to ~6 ms. Increasing the number of partitions had a slight impact on Pulsar’s 99.9th percentile end-to-end latency but, even at 10000 partitions, it stayed relatively low (within 24 ms).",[324,81167,81168],{},"Kafka’s 99th percentile end-to-end latency degraded incrementally as the number of partitions increased. At 10000 partitions, Kafka’s 99th percentile end-to-end latency went up to 180 s and was 34416 times higher than Pulsar’s.",[3933,81170,81172],{"id":81171},"ack-2-async-local-durability","Ack = 2, Async local durability",[48,81174,81175],{},"Figure 30 shows the differences in publish latency between Pulsar and Kafka when varying the number of partitions under async local durability and sync replication durability (ack = 2). Table 16 shows the exact latency numbers for each case.",[48,81177,81178],{},[384,81179],{"alt":81180,"src":81181},"graph Publish latency with varying numbers of partitions and all\u002F2 ack","\u002Fimgs\u002Fblogs\u002F63a38b7665aad404da54f54f_image18.png",[48,81183,81184],{},[384,81185],{"alt":81186,"src":81187},"Actual publish latency test results with varying numbers of partitions and all\u002F2 ack","\u002Fimgs\u002Fblogs\u002F63a38bf0a089770738b473aa_Actual-publish-latency-test-results-with-varying-numbers-of-partitions-and-all:2-ack.jpg",[48,81189,81190],{},"Figure 31 and Figure 32 show how varying the number of partitions affected end-to-end latency on Pulsar and Kafka, respectively.",[48,81192,81193],{},[384,81194],{"alt":81195,"src":81196},"graph  End-to-end latency with varying numbers of partitions and 2 acks on Pulsar","\u002Fimgs\u002Fblogs\u002F63a38c08ac43ef73d80fda3c_image1.png",[48,81198,81199],{},[384,81200],{"alt":81201,"src":81202},"graph End-to-end latency with varying numbers of partitions with 2 acks on Kafka","\u002Fimgs\u002Fblogs\u002F63a38c0865aad4d14e552638_image2.png",[48,81204,81205],{},"As you can see:",[321,81207,81208,81210,81213,81216],{},[324,81209,81159],{},[324,81211,81212],{},"Kafka’s 99th percentile publish latency degraded incrementally as the number of partitions increased and was 202 times higher at 10000 partitions (as compared to 100). At 10000 partitions, Kafka’s 99th percentile publish latency increased to 1.7 s and was 278 times higher than Pulsar’s.",[324,81214,81215],{},"Pulsar’s 99th percentile end-to-end latency remained stable at ~4 to ~6 ms. Increasing the number of partitions had a slight impact on Pulsar’s 99.9th percentile end-to-end latency, but it remained relatively low within 28 ms.",[324,81217,81218],{},"Kafka’s 99th percentile end-to-end latency degraded incrementally as the number of partitions increased. At 10000 partitions, Kafka’s 99th percentile end-to-end latency increased to 200 s and was 32362 times higher than Pulsar’s.",[40,81220,80567],{"id":81221},"catch-up-read-test",[48,81223,80585],{},[48,81225,81226],{},"The test was designed to determine the maximum throughput Pulsar and Kafka can achieve when processing workloads that contain catch-up reads only. Our test strategy included the following principles and expected guarantees:",[321,81228,81229,81231,81234,81237,81240,81243],{},[324,81230,80593],{},[324,81232,81233],{},"We varied the number of acknowledgements to measure changes in throughput under various replication durability guarantees.",[324,81235,81236],{},"We enabled batching on Kafka and Pulsar, batching up to 1 MB of data for a maximum of 10 ms.",[324,81238,81239],{},"We benchmarked both systems with 100 partitions.",[324,81241,81242],{},"We ran a total of four clients—two producers and two consumers.",[324,81244,80611],{},[48,81246,81247],{},"At the beginning of the test, the producer started sending messages at a fixed rate of 200 K\u002Fs. When 512 GB of data had accumulated in a queue, the consumers began processing. The consumers first read the accumulated data from beginning to end, and then went on to process incoming data as it arrived. The producer continued to send messages at the same rate for the duration of the test.",[48,81249,81250],{},"We evaluated how quickly each system was able to read the 512 GB of backlog data. We compared Kafka and Pulsar under different durability settings.",[48,81252,80617],{},[32,81254,81256],{"id":81255},"_1-async-local-durability-with-pulsars-journal-bypassing-feature-enabled","#1 Async local durability with Pulsar’s journal bypassing feature enabled",[48,81258,81259],{},"In this test, we used equivalent async local durability guarantees on Pulsar and Kafka. We enabled Pulsar’s new journal bypassing feature to match the local durability guarantee provided by Kafka’s default fsync settings.",[48,81261,81262],{},"As you can see from our results shown in Figure 33 below,",[321,81264,81265,81268,81271],{},[324,81266,81267],{},"Pulsar’s maximum throughput reached 3.7 million messages\u002Fs (3.5 GB\u002Fs) when processing catch-up reads only.",[324,81269,81270],{},"Kafka’s only reached a maximum throughput of 1 million messages\u002Fs (1 GB\u002Fs).",[324,81272,81273],{},"Pulsar processed catch-up reads 75% faster than Kafka.",[48,81275,81276],{},[384,81277],{"alt":81278,"src":81279},"graph Catch-up read throughput on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a38c0824d535608ed41949_image3.png",[32,81281,81283],{"id":81282},"_2-async-local-durability-with-pulsars-journal-bypassing-feature-disabled","#2 Async local durability with Pulsar’s journal bypassing feature disabled",[48,81285,81286],{},"In this test, we used async local durability guarantees on Pulsar and Kafka, but disabled Pulsar’s journal bypassing feature.",[48,81288,81289],{},"As you can see from our results shown in Figure 34 below,",[321,81291,81292,81295,81298],{},[324,81293,81294],{},"Pulsar’s maximum throughput when processing catch-up reads reached 1.8 million messages\u002Fs (1.7 GB\u002Fs).",[324,81296,81297],{},"Kafka only reached a maximum throughput of 1 million messages\u002Fs (1 GB\u002Fs).",[324,81299,81300],{},"Pulsar processed catch-up reads twice as fast as Kafka.",[48,81302,81303],{},[384,81304],{"alt":81278,"src":81305},"\u002Fimgs\u002Fblogs\u002F63a38c08690dcd5f32b71a96_image5.png",[40,81307,80573],{"id":81308},"mixed-workload-test",[48,81310,80585],{},[48,81312,81313],{},"This test was designed to evaluate how catch-up reads affect publish and tailing-reads in mixed workloads. Our test strategy included the following principles and expected guarantees:",[321,81315,81316,81318,81320,81322,81325,81327],{},[324,81317,80593],{},[324,81319,80599],{},[324,81321,81239],{},[324,81323,81324],{},"We compared Kafka and Pulsar under different durability settings.",[324,81326,81242],{},[324,81328,80611],{},[48,81330,81331],{},"At the beginning of the test, both producers started sending data at a fixed rate of 200 K\u002Fs and one of the two consumers began processing tailing-reads immediately. When 512 GB of data had accumulated in a queue, the other (catch-up) consumer began reading the accumulated data from beginning to end, and then went on to process incoming data as it arrived. For the duration of the test, both producers continued to publish at the same rate and the tailing-read consumer continued to consume data at the same rate.",[48,81333,80617],{},[32,81335,81337],{"id":81336},"_1-async-local-durability-with-pulsar-enabled-bypass-journal-feature","#1 Async local durability with Pulsar enabled bypass-journal feature",[48,81339,81340],{},"In this test, we compared Kafka and Pulsar with equivalent async local durability guarantees. We enabled Pulsar’s new journal bypassing feature to match the local durability guarantee provided by Kafka’s default fsync settings.",[48,81342,81343],{},"As you can see in Figure 36 below, catch-up reads caused significant write delays in Kafka, but had little impact on Pulsar. Kafka’s 99th percentile publishing latency increased to 1-3 seconds while Pulsar’s remained steady at several milliseconds to tens of milliseconds.",[48,81345,81346],{},[384,81347],{"alt":81348,"src":81349},"graph Effect of catch-up reads on publish latency on Pulsar and Kafka","\u002Fimgs\u002Fblogs\u002F63a38c0863ad9e04e4c1a617_image6.png",[32,81351,81283],{"id":81352},"_2-async-local-durability-with-pulsars-journal-bypassing-feature-disabled-1",[48,81354,81286],{},[48,81356,81357],{},"As you can see in Figure 37 below, catch-up reads caused significant write delays on Kafka, but had little impact on Pulsar. Kafka’s 99th percentile publishing latency increased to 2-3 seconds while Pulsar’s remained steady at several milliseconds to tens of milliseconds.",[48,81359,81360],{},[384,81361],{"alt":81362,"src":81363},"graph Effect of catch-up reads on publish latency on Pulsar and Kafka ","\u002Fimgs\u002Fblogs\u002F63a38c087d7fba37b9922636_image7.png",[32,81365,81367],{"id":81366},"_3-sync-local-durability","#3 Sync local durability",[48,81369,81370],{},"In this test, we compared Kafka and Pulsar with equivalent sync local durability guarantees.",[48,81372,81373],{},"As you can see in Figure 38 below, catch-up reads caused significant write delays on Kafka, but had little impact on Pulsar. Kafka’s 99th percentile publishing latency increased to ~1.2 to ~1.4 seconds while Pulsar’s remained steady at several milliseconds to tens of milliseconds.",[48,81375,81376],{},[384,81377],{"alt":81348,"src":81378},"\u002Fimgs\u002Fblogs\u002F63a38c0881ff9184eab90d4f_image8.png",[40,81380,80579],{"id":81381},"conclusions",[48,81383,81384],{},"Below is a summary of our findings based on the results of our benchmark.",[321,81386,81387,81390,81393],{},[324,81388,81389],{},"After the configuration and tuning errors were corrected, Pulsar matched the end-to-end latency Kafka had achieved in Confluent’s limited use case.",[324,81391,81392],{},"Under equivalent durability guarantees, Pulsar outperformed Kafka in workloads that simulated real-world use cases.",[324,81394,81395],{},"Pulsar delivered significantly better latency and better I\u002FO isolation than Kafka in every test, regardless of the durability guarantee settings used or the numbers of subscriptions, partitions, or clients specified.",{"title":18,"searchDepth":19,"depth":19,"links":81397},[81398,81400,81401,81402,81408,81413,81417,81422],{"id":80485,"depth":19,"text":81399},"Get the updated Apache Pulsar vs. Apache Kafka 2022 Benchmark here.",{"id":42,"depth":19,"text":46},{"id":79112,"depth":19,"text":80505},{"id":80582,"depth":19,"text":80513,"children":81403},[81404,81405,81406,81407],{"id":80620,"depth":279,"text":80519},{"id":80652,"depth":279,"text":80525},{"id":80690,"depth":279,"text":80531},{"id":80724,"depth":279,"text":80537},{"id":80767,"depth":19,"text":80543,"children":81409},[81410,81411,81412],{"id":80796,"depth":279,"text":80549},{"id":80892,"depth":279,"text":80555},{"id":80986,"depth":279,"text":80561},{"id":81221,"depth":19,"text":80567,"children":81414},[81415,81416],{"id":81255,"depth":279,"text":81256},{"id":81282,"depth":279,"text":81283},{"id":81308,"depth":19,"text":80573,"children":81418},[81419,81420,81421],{"id":81336,"depth":279,"text":81337},{"id":81352,"depth":279,"text":81283},{"id":81366,"depth":279,"text":81367},{"id":81381,"depth":19,"text":80579},"2020-11-09","This is the complete 2020 benchmark report of Pulsar and Kafka.",{},"\u002Fblog\u002Fbenchmarking-pulsar-and-kafka-report-2020","35 min read",{"title":80473,"description":81424},"blog\u002Fbenchmarking-pulsar-and-kafka-report-2020",[799,821,10503],"vNYcOsHR_eiHLEqO8Tj3xUyLvYykdTVhce-Vs3_c6v0",{"id":81433,"title":81434,"authors":81435,"body":81436,"category":821,"createdAt":290,"date":81423,"description":81873,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":81874,"navigation":7,"order":296,"path":62288,"readingTime":81875,"relatedResources":290,"seo":81876,"stem":81877,"tags":81878,"__hash__":81879},"blogs\u002Fblog\u002Fperspective-on-pulsars-performance-compared-to-kafka.md","A More Accurate Perspective on Pulsar’s Performance Compared to Kafka",[808,806],{"type":15,"value":81437,"toc":81854},[81438,81442,81465,81467,81470,81473,81476,81479,81484,81487,81490,81493,81496,81499,81510,81513,81516,81520,81524,81530,81536,81542,81546,81549,81552,81558,81564,81567,81570,81577,81581,81584,81587,81590,81594,81597,81601,81604,81608,81611,81617,81626,81629,81632,81638,81642,81649,81651,81659,81662,81665,81685,81687,81695,81698,81701,81715,81717,81725,81728,81731,81733,81741,81744,81747,81750,81812,81819,81821,81824,81827,81830,81833,81835,81852],[48,81439,81440],{},[384,81441],{"alt":78983,"src":80482},[916,81443,81444],{},[48,81445,81446,81447,81452,81453,81456,81457,1186,81461,20571],{},"Note: This post was published in 2020 presenting StreamNative’s response to Confluent’s article ",[55,81448,81451],{"href":81449,"rel":81450},"https:\u002F\u002Fwww.confluent.io\u002Fblog\u002Fkafka-fastest-messaging-system\u002F#benchmarking-framework",[264],"“Benchmarking Apache Kafka, Apache Pulsar, and RabbitMQ: Which is the Fastest?” (2020)",". For the latest Pulsar vs. Kafka performance comparison, read our ",[55,81454,81455],{"href":27690},"2022 Benchmark Report",". For a brief overview of these systems, see our review of Pulsar vs. Kafka (",[55,81458,81460],{"href":81459},"\u002Fblog\u002Ftech\u002F2020-07-08-pulsar-vs-kafka-part-1","part 1",[55,81462,81464],{"href":81463},"\u002Fblog\u002Ftech\u002F2020-07-22-pulsar-vs-kafka-part-2","part 2",[40,81466,45531],{"id":45530},[48,81468,81469],{},"Today, many companies are looking at real-time data streaming applications to develop new products and services. Organizations must first understand the advantages and differentiators of the different event streaming systems before they can select the technology best-suited to meet their business needs.",[48,81471,81472],{},"Benchmarks are one method organizations use to compare and measure the performance of different technologies. In order for these benchmarks to be meaningful, they must be done correctly and provide accurate information. Unfortunately, it is all too easy for benchmarks to fail to provide accurate insights due to any number of issues.",[48,81474,81475],{},"Confluent recently ran a benchmark to evaluate how Kafka, Pulsar, and RabbitMQ compare in terms of throughput and latency. According to Confluent’s blog, Kafka was able to achieve the “best throughput” with “low latency” and RabbitMQ was able to provide “low latency” at “lower throughputs”. Overall, their benchmark declared Kafka the clear winner in terms of “speed”.",[48,81477,81478],{},"While Kafka is an established technology, Pulsar is the top streaming technology of choice for many companies today, from global corporations to innovative start-ups. In fact, at the recent Splunk summit, conf20, Sendur Sellakumar, Splunk’s Chief Product Officer, discussed their decision to adopt Pulsar over Kafka:",[916,81480,81481],{},[48,81482,81483],{},"\"... we've shifted to Apache Pulsar as our underlying streaming. It is our bet on the long term architecture for enterprise-grade multi-tenant streaming.\"    - Sendur Sellakumar, CPO, Splunk",[48,81485,81486],{},"This is just one of many examples of companies adopting Pulsar. These companies choose Pulsar because it provides the ability to horizontally and cost effectively scale to massive data volumes, with no single point of failure, in modern elastic cloud environments, like Kubernetes. At the same time, built-in features like automatic data rebalancing, multi-tenancy, geo-replication, and tiered storage with infinite retention, simplify operations and make it easier for teams to focus on business goals.",[48,81488,81489],{},"Ultimately, developers are adopting Pulsar for its features, performance, and because all of the unique aspects of Pulsar, mentioned above, make it well suited to be the backbone for streaming data.",[48,81491,81492],{},"Knowing what we know, we had to take a closer look at Confluent’s benchmark to try to understand their results. We found two issues that were highly problematic. First, and the largest source of inaccuracy, is Confluent’s limited knowledge of Pulsar. Without understanding the technology, they were not able to set-up the test in a way that could accurately measure Pulsar’s performance.",[48,81494,81495],{},"Second, their performance measurements were based on a narrow set of test parameters. This limited the applicability of the results and failed to provide readers with an accurate picture of the technologies’ capabilities across different workloads and real-world use cases.",[48,81497,81498],{},"In order to provide the community a more accurate picture, we decided to address these issues and repeat the test. Key updates included:",[1666,81500,81501,81504,81507],{},[324,81502,81503],{},"We updated the benchmark setup to include all of the durability levels supported by Pulsar and Kafka. This allowed us to compare throughput and latency at the same level of durability.",[324,81505,81506],{},"We fixed the OpenMessaging Benchmark (OMB) framework to eliminate the variants introduced by using different instances, and corrected configuration errors in their OMB Pulsar driver.",[324,81508,81509],{},"Finally, we measured additional performance factors and conditions, such as varying numbers of partitions and mixed workloads that contain writes, tailing-reads, and catch-up reads to provide a more comprehensive view of performance.",[48,81511,81512],{},"With these updates made, we repeated the test. The result - Pulsar significantly outperformed Kafka in scenarios that more closely resembled real-world workloads and matched Kafka’s performance in the basic scenario Confluent used.",[48,81514,81515],{},"The following section highlights the most important findings. A more comprehensive performance report in the section StreamNative Benchmark Results also gives detail of our test setup and additional commentary.",[40,81517,81519],{"id":81518},"streamnative-benchmark-result-highlights","StreamNative Benchmark Result Highlights",[32,81521,81523],{"id":81522},"_1-with-the-same-durability-guarantee-as-kafka-pulsar-achieves-605-mbs-publish-and-end-to-end-throughput-same-as-kafka-and-35-gbs-catch-up-read-throughput-35-times-higher-than-kafka-increasing-the-number-of-partitions-and-changing-durability-levels-have-no-impact-on-pulsars-throughput-however-the-kafkas-throughput-was-severely-impacted-when-changing-the-number-of-partitions-or-changing-durability-levels","#1 With the same durability guarantee as Kafka, Pulsar achieves 605 MB\u002Fs publish and end-to-end throughput (same as Kafka), and 3.5 GB\u002Fs catch-up read throughput (3.5 times higher than Kafka). Increasing the number of partitions and changing durability levels have no impact on Pulsar's throughput. However, the Kafka's throughput was severely impacted when changing the number of partitions or changing durability levels.",[48,81525,81526],{},[384,81527],{"alt":81528,"src":81529},"Table 2: End-to-End P99 Latency between Pulsar and Kafka of different number of subscriptions with different durability guarantees","\u002Fimgs\u002Fblogs\u002F63bf3f5d66c03cd868633ff4_image-1.png",[48,81531,81532],{},[384,81533],{"alt":81534,"src":81535},"Table 7: Durability Configuration Settings in Pulsar","\u002Fimgs\u002Fblogs\u002F63bf409667bf2934a199ec57_image-1.png",[48,81537,81538],{},[384,81539],{"alt":81540,"src":81541},"Table 9: Pulsar’s Local Durability Mode Parameters","\u002Fimgs\u002Fblogs\u002F63bf40f4cdf00fc12079341e_image-1.png",[32,81543,81545],{"id":81544},"durability-in-kafka","Durability in Kafka",[48,81547,81548],{},"Kafka offers three durability levels: Level 1, Level 2 and Level 4. Kafka can provide replication durability at Level 2 (default settings), but offers no durability guarantees at Level 4 because it lacks the ability to fsync data to disks before acknowledging writes. Kafka can be configured to operate as a Level 1 system by setting flush.messages to 1 and flush.ms to 0. However such setup is rarely seen in Kafka production deployments.",[48,81550,81551],{},"Kafka’s ISR replication protocol controls replication durability. You can tune Kafka’s replication durability mode by adjusting the acks and min.insync.replicas parameters associated with this protocol. The settings for these parameters are described in Table 10 below. The durability levels supported by Kafka are described in Table 11 below. (A detailed explanation of Kafka’s replication protocol is beyond the scope of this article; however, we will explore how Kafka’s protocol differs from Pulsar’s in a future blog post.)",[48,81553,81554],{},[384,81555],{"alt":81556,"src":81557},"Table 10: Durability Configuration Settings in Kafka","\u002Fimgs\u002Fblogs\u002F63bf411467bf29eaa099fa66_image-1.png",[48,81559,81560],{},[384,81561],{"alt":81562,"src":81563},"Table 11: Durability Levels in Kafka","\u002Fimgs\u002Fblogs\u002F63bf413a67bf296d819a2434_image-1.png",[48,81565,81566],{},"Unlike Pulsar, Kafka does not write data to a separate journal disk(s). Instead, Kafka acknowledges writes before fsyncing data to disks. This operation minimizes I\u002FO contention between writes and reads, and prevents performance degradation.",[48,81568,81569],{},"Kafka’s does offer the ability to fsync after every message, with the above flush.messages = 1 and flush.ms = 0, and while this can be used to greatly reduce the likelihood of message loss, however it severely impacts the throughput and latency, which ultimately means such settings is rarely used in production deployments.",[48,81571,81572,81573,190],{},"Kafka’s inability to journal data makes it vulnerable to data loss in the event of a machine failure or power outage. This is a significant weakness, and one of the main reasons why ",[55,81574,81576],{"href":81575},"\u002Fwhitepaper\u002Fcase-study-apache-pulsar-tencent-billing","Tencent chose Pulsar for their new billing system",[32,81578,81580],{"id":81579},"durability-differences-between-pulsar-and-kafka","Durability Differences Between Pulsar and Kafka",[48,81582,81583],{},"Pulsar’s durability settings are highly configurable and allow users to optimize durability settings to meet the requirements of an individual application, use case, or hardware configuration.",[48,81585,81586],{},"Because Kafka offers less flexibility, depending on the scenario, it is not always possible to establish equivalent durability settings in both systems. This makes benchmarking difficult. To address this, the OMB Framework recommends using the closest settings available.",[48,81588,81589],{},"With this background, we can now describe the gaps in Confluent’s benchmark. Confluent attempted to simulate Pulsar’s fsyncing behavior. In Kafka, the settings Confluent chose provide async durability. However, the settings they chose for Pulsar provide sync durability. This discrepancy produced flawed test results that inaccurately portrayed Pulsar’s performance as inferior. As you will see when we review the results of our own benchmark later, Pulsar performs as well as or better than Kafka, while offering stronger durability guarantees.",[40,81591,81593],{"id":81592},"streamnative-benchmark","StreamNative Benchmark",[48,81595,81596],{},"To get a more accurate picture of Pulsar’s performance, we needed to address the issues with the Confluent benchmark. We focused on tuning Pulsar’s configuration, ensuring the durability settings on both systems were equivalent, and including additional performance factors and conditions, such as varying numbers of partitions and mixed workloads, to enable us to measure performance across different use cases. The following sections explain the changes we made in detail.",[32,81598,81600],{"id":81599},"streamnative-setup","StreamNative Setup",[48,81602,81603],{},"Our benchmarking setup included all the durability levels supported by Pulsar and Kafka. This allowed us to compare throughput and latency at the same level of durability. The durability settings we used are described below.",[3933,81605,81607],{"id":81606},"replication-durability-setup","Replication Durability Setup",[48,81609,81610],{},"Our replication durability setup was identical to Confluent’s. Although we made no changes, we are sharing the specific settings we used in Table 12 for completeness.",[48,81612,81613],{},[384,81614],{"alt":81615,"src":81616},"Table 12: Replication Durability Setup Settings","\u002Fimgs\u002Fblogs\u002F63bf416e8a4172dfe064079f_image-1.png",[48,81618,81619,81620,81625],{},"A new Pulsar ",[55,81621,81624],{"href":81622,"rel":81623},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fbookkeeper\u002Fpull\u002F2401",[264],"feature"," gives applications the option to skip journaling, which relaxes the local durability guarantee, avoids write amplification, and improves write throughput. (This feature will be available in the next release of Apache BookKeeper). However, this feature will not be made the default, nor do we recommend it for most scenarios, as it still introduces the potential for message loss.",[48,81627,81628],{},"We used this feature in our benchmark to ensure an accurate performance comparison between the two systems. Bypassing journaling on Pulsar provides the same local durability guarantee as Kafka’s default fsync settings.",[48,81630,81631],{},"Pulsar’s new feature includes a new local durability mode (Async - Bypass journal). We used this mode to configure Pulsar to match Kafka’s default level of local durability. Table 13 shows the specific settings for our benchmark.",[48,81633,81634],{},[384,81635],{"alt":81636,"src":81637},"Figure 14: 99th percentile fsync latency on 3 different instances","\u002Fimgs\u002Fblogs\u002F63bf4219d0ea431959552fdd_figure14.png",[40,81639,81641],{"id":81640},"streamnative-benchmark-results","StreamNative Benchmark Results",[48,81643,81644,81645,190],{},"We have summarized our benchmark results below. You can find our ",[55,81646,81648],{"href":81647},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report","complete benchmark report",[32,81650,80513],{"id":80582},[916,81652,81653],{},[48,81654,81655,81656,190],{},"See the full report of “Maximum Throughput Test” ",[55,81657,267],{"href":81658},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#maximum-throughput-test",[48,81660,81661],{},"The Maximum Throughput Test was designed to determine the maximum throughput each system can achieve when processing workloads that include publish and tailing-reads under different durability guarantees. We also varied the number of topic partitions to see how each change impacted the maximum throughput.",[48,81663,81664],{},"We found that:",[1666,81666,81667,81670,81673,81676,81679,81682],{},[324,81668,81669],{},"When configured to provide level-1 durability (sync replication durability and sync local durability), Pulsar achieved a throughput of ~300 MB\u002Fs, which reached the physical limit of the journal disk’s bandwidth. Pulsar is implemented on top of a scalable and durable log storage (Apache BookKeeper) to make maximum use of disk bandwidth without sacrificing durability guarantees. Kafka was able to achieve ~420 MB\u002Fs with 100 partitions. It should be noted that when providing level-1 durability, Pulsar was configured to use one disk as journal disk for writes and the other disk as ledger disk for reads, comparing to Kafka use both disks for writes and reads. While Pulsar's setup is able to provide better I\u002FO isolation, its throughput was also limited by the maximum bandwidth of a single disk (~300 MB\u002Fs). Alternative disk configurations can be beneficial to Pulsar and allow for more cost effective operation, which will be discussed in a later blog post.",[324,81671,81672],{},"When configured to provide level-2 durability (sync replication durability and async local durability), Pulsar and Kafka each achieved a max throughput of ~600 MB\u002Fs. Both systems reached the physical limit of disk bandwidth.",[324,81674,81675],{},"The maximum throughput of Kafka on one partition is only ½ of the max throughput of Pulsar.",[324,81677,81678],{},"Varying the number of partitions had no effect on Pulsar’s throughput, but it did affect Kafka’s.",[324,81680,81681],{},"Pulsar sustained maximum throughput (~300 MB\u002Fs under a level-1 durability guarantee and ~600 MB\u002Fs under a level-2 durability guarantee) as the number of partitions was increased from 100 to 2000.",[324,81683,81684],{},"Kafka’s throughput decreased by half as the number of partitions was increased from 100 to 2000.",[32,81686,80543],{"id":80767},[916,81688,81689],{},[48,81690,81691,81692,190],{},"See the full report of “Publish and End-to-End Latency Test” ",[55,81693,267],{"href":81694},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#publish-and-end-to-end-latency-test",[48,81696,81697],{},"The Publish and End-to-End Latency Test was designed to determine the lowest latency each system can achieve when processing workloads that consist of publish and tailing-reads under different durability guarantees. We varied the number of subscriptions and the number of partitions to see how each change impacted both publish and end-to-end latency.",[48,81699,81700],{},"We found that",[1666,81702,81703,81706,81709,81712],{},[324,81704,81705],{},"Pulsar’s publish and end-to-end latency were significantly (up to hundreds of times) lower than Kafka’s in all test cases, which evaluated various durability guarantees and varying numbers of partitions and subscriptions. Pulsar’s 99th percentile publish latency and end-to-end latency stayed within 10 milliseconds, even as the number of partitions was increased from 100 to 10000 or as the number of subscriptions was increased from 1 to 10.",[324,81707,81708],{},"Kafka’s publish and end-to-end latency was greatly affected by variations in the numbers of subscriptions and partitions.",[324,81710,81711],{},"Both publish and end-to-end latency increased from ~5 milliseconds to ~13 seconds as the number of subscriptions was increased from 1 to 10.",[324,81713,81714],{},"Both publish and end-to-end latency increased from ~5 milliseconds to ~200 seconds as the number of topic partitions was increased from 100 to 10000.",[32,81716,80567],{"id":81221},[916,81718,81719],{},[48,81720,81721,81722,190],{},"See the full report of “Catch-up Read Test” ",[55,81723,267],{"href":81724},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#catch-up-read-test",[48,81726,81727],{},"The Catch-up Read Test was designed to determine the maximum throughput each system can achieve when processing workloads that contain catch-up reads only. At the beginning of the test, a producer sent messages at a fixed rate of 200K per second. When the producer had sent 512GB of data, consumers began to read the messages that had been received. The consumers processed the accumulated messages and had no difficulty keeping up with the producer, which continued to send new messages at the same speed.",[48,81729,81730],{},"When processing catch-up reads, Pulsar’s maximum throughput was 3.5 times faster than Kafka’s. Pulsar achieved a maximum throughput of 3.5 GB\u002Fs (3.5 million messages\u002Fsecond) while Kafka achieved a throughput of only 1 GB\u002Fs (1 million messages\u002Fsecond).",[32,81732,80573],{"id":81308},[916,81734,81735],{},[48,81736,81737,81738,190],{},"See the full report of “Mixed Workload Test” ",[55,81739,267],{"href":81740},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#mixed-workload-test",[48,81742,81743],{},"This Mixed Workload Test was designed to determine the impact of catch-up reads on publish and tailing reads in mixed workloads. At the beginning of the test, producers sent messages at a fixed rate of 200K per second and consumers consume messages in tailing mode. After the producer produces 512GB of messages, it will start a new set of catch-up consumers to read all the messages from the beginning. At the same time, producers and existing tailing-read consumers continued to publish and consume messages at the same speed.",[48,81745,81746],{},"We tested Kafka and Pulsar using different durability settings and found that catch-up reads seriously affected Kafka’s publish latency, but had little impact on Pulsar. Kafka’s 99th percentile publish latency increased from 5 milliseconds to 1-3 seconds. However, Pulsar maintained a 99th percentile publish latency ranging from several milliseconds to tens of milliseconds.",[48,81748,81749],{},"The links below provide convenient access to individual sections of our benchmark report.",[321,81751,81752,81757,81763,81769,81775,81781,81785,81791,81797,81803,81808],{},[324,81753,81754],{},[55,81755,81756],{"href":81658},"Max Throughput Test",[324,81758,81759],{},[55,81760,81762],{"href":81761},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#1-100-partitions-1-subscription-2-producers-and-2-consumers","100 partitions, 1 subscription, 2 producers \u002F 2 consumers",[324,81764,81765],{},[55,81766,81768],{"href":81767},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#2-2000-partitions-1-subscription-2-producers-and-2-consumers","2000 partitions, 1 subscription, 2 producers \u002F 2 consumers",[324,81770,81771],{},[55,81772,81774],{"href":81773},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#3-1-partition-1-subscription-2-producers-and-2-consumers","1 partition, 1 subscription, 2 producers \u002F 2 consumers",[324,81776,81777],{},[55,81778,81780],{"href":81779},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#4-1-partition-1-subscription-1-producer-and-1-consumer","1 partition, 1 subscription, 1 producer \u002F 1 consumer",[324,81782,81783],{},[55,81784,80543],{"href":81694},[324,81786,81787],{},[55,81788,81790],{"href":81789},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#1-100-partitions-1-subscription","100 partitions, 1 subscription",[324,81792,81793],{},[55,81794,81796],{"href":81795},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#2-100-partitions-10-subscriptions","100 partitions, 10 subscriptions",[324,81798,81799],{},[55,81800,81802],{"href":81801},"\u002Fblog\u002Ftech\u002F2020-11-09-benchmark-pulsar-kafka-performance-report#3-100-5000-8000-10000-partitions","Different partitions: 100, 1000, 2000, 5000, 8000, 10000",[324,81804,81805],{},[55,81806,81807],{"href":81724},"Catchup Read Throughput Test",[324,81809,81810],{},[55,81811,80573],{"href":81740},[48,81813,81814,81815,190],{},"All the raw data of the benchmark results are also available at ",[55,81816,267],{"href":81817,"rel":81818},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fopenmessaging-benchmark\u002Ftree\u002Fblog\u002Fresults",[264],[40,81820,2125],{"id":2122},[48,81822,81823],{},"A tricky aspect of benchmarks is that they often represent only a narrow combination of business logic and configuration options, which may or may not reflect real-world use cases or best practices. Benchmarks can further be compromised by issues in their framework, set-up, and methodology. We noted all of these issues in the recent Confluent benchmark.",[48,81825,81826],{},"At the community’s request, the team at StreamNative set out to run this benchmark in order to provide knowledge, insights, and transparency into Pulsar’s true performance capabilities. In order to run a more accurate benchmark, we identified and fixed the issues with the Confluent benchmark, and also added new test parameters that would provide insights into how the technologies compared in more real-world use cases.",[48,81828,81829],{},"The results to our benchmark showed that, with the same durability guarantee as Kafka, Pulsar is able to outperform Kafka in workloads resembling real-world use cases and to achieve the same end-to-end through as Kafka in Confluent’s limited use case. Furthermore, Pulsar delivers significantly better latency than Kafka in each of the different test cases, including varying subscriptions, topics, and durability guarantees, and better I\u002FO isolation than Kafka.",[48,81831,81832],{},"As noted, no benchmark can replace testing done on your own hardware with your own workloads. We encourage you to test Pulsar and Kafka using your own setups and workloads in order to understand how each system performs in your particular production environment.",[40,81834,40413],{"id":36476},[321,81836,81837,81842,81847],{},[324,81838,81839,81840,190],{},"Get the 2022 Pulsar vs. Kafka Benchmark Report: Read the latest performance comparison on maximum throughput, publish latency, and historical read rate ",[55,81841,267],{"href":27690},[324,81843,81844,81845,47757],{},"Make an inquiry: Have questions on benchmark results? Interested in a fully-managed Pulsar offering built by the original creators of Pulsar? ",[55,81846,38404],{"href":45219},[324,81848,45223,81849,45227],{},[55,81850,31914],{"href":31912,"rel":81851},[264],[48,81853,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":81855},[81856,81857,81862,81865,81871,81872],{"id":45530,"depth":19,"text":45531},{"id":81518,"depth":19,"text":81519,"children":81858},[81859,81860,81861],{"id":81522,"depth":279,"text":81523},{"id":81544,"depth":279,"text":81545},{"id":81579,"depth":279,"text":81580},{"id":81592,"depth":19,"text":81593,"children":81863},[81864],{"id":81599,"depth":279,"text":81600},{"id":81640,"depth":19,"text":81641,"children":81866},[81867,81868,81869,81870],{"id":80582,"depth":279,"text":80513},{"id":80767,"depth":279,"text":80543},{"id":81221,"depth":279,"text":80567},{"id":81308,"depth":279,"text":80573},{"id":2122,"depth":19,"text":2125},{"id":36476,"depth":19,"text":40413},"Learn about how Pulsar compares to Kafka in terms of durability, maximum throughput, end-to-end latency, and catch-up reads.",{},"24 min read",{"title":81434,"description":81873},"blog\u002Fperspective-on-pulsars-performance-compared-to-kafka",[799,7347],"RRqm3vqAZr_Vdu5bitz9KFDwBD3L-rbwx7pUCrT0Vis",{"id":81881,"title":81882,"authors":81883,"body":81884,"category":7338,"createdAt":290,"date":81999,"description":82000,"extension":8,"featured":294,"image":82001,"isDraft":294,"link":290,"meta":82002,"navigation":7,"order":296,"path":82003,"readingTime":11508,"relatedResources":290,"seo":82004,"stem":82005,"tags":82006,"__hash__":82007},"blogs\u002Fblog\u002Fpulsar-summit-asia-2020-schedule-is-now-online.md","Pulsar Summit Asia 2020 Schedule is Now Online",[69353,69515],{"type":15,"value":81885,"toc":81995},[81886,81894,81901,81904,81962,81965,81972,81975,81977,81983,81986,81988,81993],[48,81887,81888,81889,81893],{},"The Pulsar Summit is a global conference dedicated to sharing best practices, project updates, and insights across the Apache Pulsar community. Pulsar’s inaugural global summit, the ",[55,81890,81892],{"href":76832,"rel":81891},[264],"Pulsar Summit Virtual Conference 2020",", took place in June 2020 and featured more than 30 sessions from top Pulsar experts, developers and thought-leaders from companies such as Salesforce, Verizon Media, and Splunk, and the conference attracted 600+ attendees.",[48,81895,81896,81897,81900],{},"The rapid adoption of Apache Pulsar over the past few years has led to a high demand for Pulsar events. Today, StreamNative, a cloud-native event streaming company powered by Apache Pulsar, and also the host of ",[55,81898,76841],{"href":76839,"rel":81899},[264],", announced more details on the upcoming event. Taking place on November 28th & 29th, the two-day event will feature more than 30 live sessions by tech leads, open-source developers, software engineers, and software architects from Splunk, Yahoo! JAPAN, TIBCO, China Mobile, Tencent, Dada Group, KingSoft Cloud, Tuya Smart, and PingCAP, and will include sessions on Pulsar use cases, its ecosystem, operations, and technology deep dives.",[48,81902,81903],{},"See below for some of our featured sessions, which include both English and Mandarin tracks:",[321,81905,81906,81914,81922,81930,81938,81946,81954],{},[324,81907,81908,81913],{},[55,81909,81912],{"href":81910,"rel":81911},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020\u002Fsessions\u002Fhow-splunk-is-using-pulsar-io",[264],"How Splunk is using Pulsar IO （English）"," - In this talk, Jerry Peng, Principal Software Engineer at Splunk will share insights on Splunk’s evaluation and decision to adopt the Pulsar IO framework, details on how Splunk's DSP product leverages the Pulsar IO framework, and insights on batch sources, a feature that was recently added to Pulsar IO.",[324,81915,81916,81921],{},[55,81917,81920],{"href":81918,"rel":81919},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020\u002Fsessions\u002Fapache%E2%80%93pulsar%E2%80%93at%E2%80%93yahoo%E2%80%93japan%E2%80%93adoption%E2%80%93operational%E2%80%93experiences%E2%80%93and%E2%80%93future",[264],"Apache Pulsar at Yahoo! JAPAN - Adoption, Operational Insights and the Future（English）"," - In this talk, Nozomi Kurihara, Manager of the Messaging Platform team in Yahoo!Japan Corporation will share practical use cases of Apache Pulsar on production and insights on how to operate Apache Pulsar for large scale data streams.",[324,81923,81924,81929],{},[55,81925,81928],{"href":81926,"rel":81927},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020\u002Fsessions\u002Frunning-apache-pulsar-on-tencent-cloud-new-challenges-discussion-practice",[264],"Running Apache Pulsar on Tencent Cloud: New Challenges, Discussion, Practice (Mandarin)"," - In this talk, Lin Lin, senior engineer of Tencent Cloud will address how Pulsar helps solve challenges with message queues on Tencent Cloud, such as dynamic expansion and contraction, and large numbers of partitions.",[324,81931,81932,81937],{},[55,81933,81936],{"href":81934,"rel":81935},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020\u002Fsessions\u002Fhow-bigo-builds-real-time-message-system-with-apache-pulsar-and-flink",[264],"How BIGO built a Real-Time Message System with Apache Pulsar and Flink (Mandarin) ","- In this talk, Hang Chen, Leader of the Messaging Platform team from BIGO will share how BIGO leveraged Apache Pulsar to build a real-time message system and how they tune Pulsar for production.",[324,81939,81940,81945],{},[55,81941,81944],{"href":81942,"rel":81943},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020\u002Fsessions\u002Fa-daredevil-story-apache-pulsar-in-zhaopin-com",[264],"A Daredevil' Story: Apache Pulsar in Zhaopin.com (Mandarin)"," - In this talk, Shunli Gao, Senior Engineer at Zhaopin will share details on the development and future prospects of Apache Pulsar at Zhaopin.",[324,81947,81948,81953],{},[55,81949,81952],{"href":81950,"rel":81951},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020\u002Fsessions\u002Ftransactional-event-streaming-with-apache-pulsar",[264],"Transactional Event Streaming with Apache Pulsar (Mandarin)"," - In this talk, Bo Cong, software engineer at StreamNative will share how Pulsar transaction works and how it is supported by Pulsar Functions.",[324,81955,81956,81961],{},[55,81957,81960],{"href":81958,"rel":81959},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020\u002Fsessions\u002Fbenchmarking-pulsar-vs-kafka-on-aws-process-results",[264],"Benchmarking Pulsar vs. Kafka on AWS: Process & Results (Mandarin) ","- In this talk, Penghui Li, the Apache Pulsar PMC member and software engineer at StreamNative will share the results of a benchmark test comparing Pulsar and Kafka that was run on AWS. The test ran Pulsar and Kafka under the same hardware environments on the write throughput, tailing read throughput, catchup read throughput, publish latency, and end-to-end latency of these two systems.",[48,81963,81964],{},"More featured talks coming soon!",[48,81966,81967,81968,35539],{},"The number and diversity of the sessions demonstrate the accelerated adoption of Pulsar in PoC and production environments, as well as the rapid development in functionalities and diverse ecosystems. To learn more about how companies leverage Pulsar for messaging and event streaming, serverless computing, real-time analytics, event-driven applications, and mission-critical deployment management in production, ",[55,81969,81971],{"href":79682,"rel":81970},[264],"RSVP",[48,81973,81974],{},"We would like to say special thanks to the speakers for sharing their Pulsar expertise and experience with the community.",[40,81976,39828],{"id":39827},[48,81978,81979,81982],{},[55,81980,821],{"href":23526,"rel":81981},[264]," is a cloud-native, distributed messaging and streaming platform that manages hundreds of billions of events per day. Pulsar was originally developed at Yahoo! as the unified messaging platform connecting critical Yahoo applications such as Yahoo Finance, Yahoo Mail, and Flickr to data.",[48,81984,81985],{},"Today, Pulsar is used for real-time event streaming use cases, including data pipelines, microservices, and stream processing. Its cloud-native architecture and built-in multi-tenancy differentiate it from its predecessors and uniquely position it as an enterprise-ready, event streaming platform. Pulsar's multi-layer architecture enables stability, reliability, scalability, and high performance, simplifies management and reduces costs. Its built-in multi-tenancy and geo-replication ensure that companies are able to build applications with disaster recovery.",[40,81987,10248],{"id":10247},[48,81989,81990,81992],{},[55,81991,4496],{"href":62260},", founded by the original developers of Apache Pulsar and Apache BookKeeper, enables organizations to build the next generation of messaging and event streaming applications. Leveraging Apache Pulsar and BookKeeper, we optimize for scalability and resiliency while reducing the overhead management and complexity required by incumbent technologies. We do this by offering Pulsar and StreamNative’s ‘products as a service’. StreamNative is building a world-class team that is passionate about building amazing products and committed to customer success.",[48,81994,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":81996},[81997,81998],{"id":39827,"depth":19,"text":39828},{"id":10247,"depth":19,"text":10248},"2020-11-07","Pulsar Summit Asia 2020 Schedule is Now Online, featuring more than 30 sessions on Pulsar use cases, its ecosystem, operations, and technology deep dives.","\u002Fimgs\u002Fblogs\u002F63d796afedaf2b5e74d59892_63a38e12e1d5c07e52fde0b1_pulsar-summit-asia-2020-schedule-top.webp",{},"\u002Fblog\u002Fpulsar-summit-asia-2020-schedule-is-now-online",{"title":81882,"description":82000},"blog\u002Fpulsar-summit-asia-2020-schedule-is-now-online",[5376,821],"aX5_lXR15lm6J9XG913M87PJ6gS17KWOYxNMhOKMqDs",{"id":82009,"title":82010,"authors":82011,"body":82012,"category":821,"createdAt":290,"date":82207,"description":82026,"extension":8,"featured":294,"image":82208,"isDraft":294,"link":290,"meta":82209,"navigation":7,"order":296,"path":82210,"readingTime":42793,"relatedResources":290,"seo":82211,"stem":82212,"tags":82213,"__hash__":82214},"blogs\u002Fblog\u002Fintroducing-cloud-storage-sink-connector.md","Introducing Cloud Storage Sink Connector - Streaming Data From Apache Pulsar to Cloud Objects",[78658],{"type":15,"value":82013,"toc":82195},[82014,82017,82020,82024,82027,82030,82044,82048,82051,82054,82057,82061,82064,82066,82087,82091,82103,82109,82114,82120,82124,82129,82135,82138,82143,82149,82153,82156,82162,82166,82169,82175,82177,82180,82187],[48,82015,82016],{},"Exporting data to objects in cloud storage is ubiquitous and key to almost every software architecture. Cloud storage can help to save costs by reducing on-premise hardware and software management, simplifying monitoring, and reducing the need for extensive capacity planning. Cloud storage can also protect data against ransomware by offering backup security advantages.",[48,82018,82019],{},"Pulsar users commonly store data on cloud platforms such as Amazon Simple Storage Service (Amazon S3) or Google Cloud Storage (Google GCS). Without a unified application to migrate topic-level data to cloud storage, users must write custom solutions, which can be a cumbersome task. Today, we are excited to announce the launch of the Cloud Storage sink connector, which provides users a simple and reliable way to stream their data from Apache Pulsar to objects in cloud storage.",[40,82021,82023],{"id":82022},"what-is-cloud-storage-sink-connector","What is Cloud Storage Sink Connector",[48,82025,82026],{},"The Cloud Storage sink connector periodically polls data from Pulsar and in turn moves it to objects in cloud storage (AWS S3, Google GCS, etc.) in either Avro, JSON, or Parquet formats without duplicates. Depending on your environment, the Cloud Storage sink connector can export data by guaranteeing exactly-once delivery semantics to its consumers.",[48,82028,82029],{},"The Cloud Storage sink connector provides partitioners that support default partitioning based on Pulsar partitions and time-based partitioning in days or hours. A partitioner is used to split the data of every Pulsar partition into chunks. Each chunk of data acts as an object whose virtual path encodes the Pulsar partition and the start offset of this data chunk. The size of each data chunk is determined by the number of records written to objects in cloud storage and by schema compatibility. If no partitioner is specified in the configuration, the default partitioner, which preserves Pulsar partitioning, is used. The Cloud Storage sink connector provides the following features:",[321,82031,82032,82035,82038,82041],{},[324,82033,82034],{},"Ensure exactly-once delivery. Records, which are exported using a deterministic partitioner, are delivered with exactly-once semantics regardless of the eventual consistency of cloud storage.",[324,82036,82037],{},"Support data formats with or without a Schema. The Cloud Storage sink connector supports writing data to objects in cloud storage in either Avro, JSON, or Parquet format. Generally, the Cloud Storage sink connector may accept any data format that provides an implementation of the Format interface.",[324,82039,82040],{},"Support time-based partitioner. The Cloud Storage sink connector supports the TimeBasedPartitioner class based on the publishTime timestamp of Pulsar messages. Time-based partitioning options are daily or hourly.",[324,82042,82043],{},"Support more kinds of object storage. The Cloud Storage sink connector uses jclouds as an implementation of cloud storage. You can use the JAR package of the jclouds object storage to connect to more types of object storage. If you need to customize credentials, you can register ʻorg.apache.pulsar.io.jcloud.credential.JcloudsCredential` via the Service Provider Interface (SPI).",[40,82045,82047],{"id":82046},"why-cloud-storage-sink-connector","Why Cloud Storage Sink Connector",[48,82049,82050],{},"Pulsar has a rich connector ecosystem, connecting Pulsar with other data systems. In August 2018, Pulsar IO was released to enable users to ingress or egress data from and to Pulsar and the external systems (such as MySQL, Kafka) by using the existing Pulsar Functions framework. Yet, there was still a strong demand from those looking to export data from Apache Pulsar to cloud storage. These users were forced to build custom solutions and manually run them.",[48,82052,82053],{},"To address these challenges and simplify the process, the Cloud Storage sink connector was developed. With the Cloud Storage sink connector, all the benefits of Pulsar IO, such as fault tolerance, parallelism, elasticity, load balancing, on-demand updates, and much more, can be used by applications that export data from Pulsar.",[48,82055,82056],{},"A key benefit of the Cloud Storage sink connector is ease of use. It enables users to run an object storage connector which supports multiple object storage service providers, flexible data format, and custom data partitioning.",[40,82058,82060],{"id":82059},"try-it-out","Try it Out",[48,82062,82063],{},"In this section we’ll walk you through an exercise to set up the Cloud Storage sink connector and use the connector to export data to the cloud objects. This demonstration leverages AWS S3 as an example. In this demo, we run the cloud storage sink connector by using time-based partitioning and therefore group Pulsar records in Parquet format to AWS S3.",[32,82065,10104],{"id":10103},[321,82067,82068,82071,82079],{},[324,82069,82070],{},"Create an AWS account and sign in to the AWS Management Console.",[324,82072,82073,82074,190],{},"Create an AWS S3 bucket. For details, see ",[55,82075,82078],{"href":82076,"rel":82077},"https:\u002F\u002Fdocs.aws.amazon.com\u002FAmazonS3\u002Flatest\u002Fgsg\u002FCreatingABucket.html",[264],"Creating a bucket",[324,82080,82081,82082,190],{},"Obtain the credentials for the AWS S3 bucket. For details, see ",[55,82083,82086],{"href":82084,"rel":82085},"https:\u002F\u002Fdocs.aws.amazon.com\u002FIAM\u002Flatest\u002FUserGuide\u002Fgetting-started_create-admin-group.html",[264],"Creating an administrator IAM user and group (console)",[32,82088,82090],{"id":82089},"step-1-install-cloud-storage-sink-connector-and-run-pulsar-broker","Step 1: Install Cloud Storage Sink Connector and Run Pulsar Broker",[1666,82092,82093,82100],{},[324,82094,82095,82099],{},[55,82096,54401],{"href":82097,"rel":82098},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-cloud-storage\u002Freleases",[264]," the NAR file of the Cloud Storage sink connector.",[324,82101,82102],{},"Add this to the connector path in your Pulsar broker configuration file.",[8325,82104,82107],{"className":82105,"code":82106,"language":8330},[8328],"\ncp pulsar-io-cloud-storage-2.5.1.nar apache-pulsar-2.6.1\u002Fconnectors\u002Fpulsar-io-cloud-storage-2.5.1.nar\n\n",[4926,82108,82106],{"__ignoreMap":18},[1666,82110,82111],{},[324,82112,82113],{},"Start Pulsar broker with the configuration file.",[8325,82115,82118],{"className":82116,"code":82117,"language":8330},[8328],"\ncd apache-pulsar-2.6.1\nbin\u002Fpulsar standalone\n\n",[4926,82119,82117],{"__ignoreMap":18},[32,82121,82123],{"id":82122},"step-2-configure-and-start-cloud-storage-sink-connector","Step 2: Configure and Start Cloud Storage Sink Connector",[1666,82125,82126],{},[324,82127,82128],{},"Define the Cloud Storage connector by creating a manifest file and save the manifest file cloud-storage-sink-config.yaml.",[8325,82130,82133],{"className":82131,"code":82132,"language":8330},[8328],"\ntenant: \"public\"\nnamespace: \"default\"\nname: \"cloud-storage-sink\"\ninputs: \n- \"user-avro-topic\"\narchive: \"connectors\u002Fpulsar-io-cloud-storage-2.5.1.nar\"\nparallelism: 1\n\nconfigs:\nprovider: \"aws-s3\",\naccessKeyId: \"accessKeyId\"\nsecretAccessKey: \"secretAccessKey\"\nrole: \"\"\nroleSessionName: \"\"\nbucket: \"s3-sink-test\"\nregion: \"\"\nendpoint: \"us-standard\"\nformatType: \"parquet\"\npartitionerType: \"time\"\ntimePartitionPattern: \"yyyy-MM-dd\"\ntimePartitionDuration: \"1d\"\nbatchSize: 10\nbatchTimeMs: 1000\n\n",[4926,82134,82132],{"__ignoreMap":18},[48,82136,82137],{},"Replace the accessKeyId and secretAccessKey with your AWS credentials. If you need to further control permissions, you can set the roleand roleSessionName fields.",[1666,82139,82140],{},[324,82141,82142],{},"Start Pulsar sink locally.",[8325,82144,82147],{"className":82145,"code":82146,"language":8330},[8328],"\n$PULSAR_HOME\u002Fbin\u002Fpulsar-admin sink localrun --sink-config-file cloud-storage-sink-config.yaml\n\n",[4926,82148,82146],{"__ignoreMap":18},[32,82150,82152],{"id":82151},"step-3-send-pulsar-messages","Step 3: Send Pulsar Messages",[48,82154,82155],{},"Run the following command to send Pulsar messages with the Avro schema. Currently, only Avro schema and JSON schema are available for Pulsar messages.",[8325,82157,82160],{"className":82158,"code":82159,"language":8330},[8328],"\ntry (\n           PulsarClient pulsarClient = PulsarClient.builder()\n                   .serviceUrl(\"pulsar:\u002F\u002Flocalhost:6650\")\n                   .build();\n           Producer producer = pulsarClient.newProducer(Schema.AVRO(TestRecord.class))\n                   .topic(\"public\u002Fdefault\u002Ftest-parquet-avro\")\n                   .create();\n           ) {\n           List testRecords = Arrays.asList(\n                   new TestRecord(\"key1\", 1, null),\n                   new TestRecord(\"key2\", 1, new TestRecord.TestSubRecord(\"aaa\"))\n           );\n           for (TestRecord record : testRecords) {\n               producer.send(record);\n           }\n       }\n \n",[4926,82161,82159],{"__ignoreMap":18},[32,82163,82165],{"id":82164},"step-4-validate-cloud-storage-data","Step 4: Validate Cloud Storage Data",[48,82167,82168],{},"The view on the AWS S3 Management Console confirms the real-time upload from Pulsar to objects in AWS S3.",[48,82170,82171],{},[384,82172],{"alt":82173,"src":82174},"AWS S3 Management Console illustration","\u002Fimgs\u002Fblogs\u002F63a3821c1ef7364612dba625_check-data-through-aws-console.png",[40,82176,2125],{"id":2122},[48,82178,82179],{},"We hope to have piqued your interest in the Cloud Storage sink connector and convinced you that this is a super easy way to egress data from Pulsar to objects in cloud storage.",[48,82181,82182,82183,82186],{},"For any problems in the use of the Cloud Storage sink connector, you can create an issue in the connector’s ",[55,82184,39680],{"href":57353,"rel":82185},[264]," repo. We will reply to you as soon as possible. Meanwhile, we look forward to your contribution to the Cloud Storage sink connector.",[48,82188,82189,82190,1154,82193,190],{},"Have something to say about this article? Share it with us on ",[55,82191,39691],{"href":33664,"rel":82192},[264],[55,82194,24379],{"href":45219},{"title":18,"searchDepth":19,"depth":19,"links":82196},[82197,82198,82199,82206],{"id":82022,"depth":19,"text":82023},{"id":82046,"depth":19,"text":82047},{"id":82059,"depth":19,"text":82060,"children":82200},[82201,82202,82203,82204,82205],{"id":10103,"depth":279,"text":10104},{"id":82089,"depth":279,"text":82090},{"id":82122,"depth":279,"text":82123},{"id":82151,"depth":279,"text":82152},{"id":82164,"depth":279,"text":82165},{"id":2122,"depth":19,"text":2125},"2020-10-20","\u002Fimgs\u002Fblogs\u002F63d797063a45cd7ee3ccea76_63a38126e7434918626d0036_s3-connector-top.webp",{},"\u002Fblog\u002Fintroducing-cloud-storage-sink-connector",{"title":82010,"description":82026},"blog\u002Fintroducing-cloud-storage-sink-connector",[302,28572],"xlFB7PAxGzg4kcKf0EvwbsOQc14N-5qlGuTxFPuFDDU",{"id":82216,"title":82217,"authors":82218,"body":82220,"category":821,"createdAt":290,"date":82361,"description":82362,"extension":8,"featured":294,"image":82363,"isDraft":294,"link":290,"meta":82364,"navigation":7,"order":296,"path":82365,"readingTime":4475,"relatedResources":290,"seo":82366,"stem":82367,"tags":82368,"__hash__":82369},"blogs\u002Fblog\u002Fpulsar-functions-deep-dive.md","Pulsar Functions Deep Dive",[82219],"Sanjeev Kulkarni",{"type":15,"value":82221,"toc":82353},[82222,82233,82237,82240,82243,82246,82252,82255,82258,82262,82265,82271,82274,82277,82283,82286,82290,82293,82296,82302,82306,82309,82315,82319,82322,82325,82328,82334,82338,82341],[48,82223,82224,82225,82228,82229,82232],{},"The open source data technology framework ",[55,82226,821],{"href":75279,"rel":82227},[264]," provides a built-in stream processor for lightweight computations, called Pulsar Functions. I recently gave a talk at ",[55,82230,81892],{"href":76832,"rel":82231},[264]," on Pulsar Functions. In this post, I will provide a deep dive into its architecture and implementation details.",[40,82234,82236],{"id":82235},"a-brief-introduction-on-pulsar-functions","A Brief Introduction on Pulsar Functions",[48,82238,82239],{},"Pulsar Functions are the core computing infrastructure of the Pulsar messaging system. They enable the creation of complex processing logic on a per message basis and bring simplicity and serverless concepts to event streaming, thereby eliminating the need to deploy a separate system such as Apache Storm or Apache Heron.",[48,82241,82242],{},"These lightweight compute functions consume messages from one or more Pulsar topics, apply user-supplied processing logic to each message, and publish computation results to other topics. The benefits of Pulsar Functions include increased developer productivity, easier troubleshooting and operational simplicity because there is no need for an external processing system.",[48,82244,82245],{},"Moreover, developers do not need to learn new APIs with Pulsar Functions. Any developer that knows the Java programming language, for instance, can use the Java SDK to write a function. For example:",[8325,82247,82250],{"className":82248,"code":82249,"language":8330},[8328],"\nimport java.util.function.Function; \npublic class ExclamationFunction implements Function\n  @Override \n  public String apply(String input) { \n      return input + \"!\"; \n  }\n\n} \n",[4926,82251,82249],{"__ignoreMap":18},[48,82253,82254],{},"The goal of Pulsar Functions is not to replace heavyweight streaming engines, such as Spark or Flink. It's to use a simple API and execution framework for the most common streaming use cases, such as filtering, routing and enrichment.",[48,82256,82257],{},"Anyone can write a Pulsar Functions, submit it to the Pulsar cluster and use it right away with the built in full lifecycle management capabilities of Pulsar Functions. With its CRUD-based REST API, a developer can also submit functions from any workflow and have them up and running immediately.",[40,82259,82261],{"id":82260},"submission-workflows","Submission Workflows",[48,82263,82264],{},"The workflow process of submitting a function is called a Function Representation. It's structure, called a FunctionConfig, has a tenant, a namespace and a name. The functions consume inputs and outputs, user configurations, secret management support by submitting a JAR file or a Python, for example. You can run one function, or even ten, at once.",[8325,82266,82269],{"className":82267,"code":82268,"language":8330},[8328],"\npublic class FunctionConfig { \n       private String tenant; \n       private String namespace; \n       private String name; \n       private String className; \n       private Collection inputs; \n       private String output; \n       private ProcessingGuarantees processingGuarantees; \n       private Map userConfig; \n       private Map secrets; \n       private Integer parallelism; \n       private Resources resources; \n       ... \n} \n \n",[4926,82270,82268],{"__ignoreMap":18},[48,82272,82273],{},"After submitting the function, a Submission Check or validations are run to verify that the user has privileges to submit the function to the specific namespace and tenant. For Java, the classes are loaded at the submission time to make sure that the specified classes are actually in the JAR file. All of this is done so that the user gets an error message as quickly as possible, avoiding having to look at the error logs themselves.",[48,82275,82276],{},"The next step is copying the code to the BookKeeper. All of the parameters of the code are represented as FunctionMetaData in a protocol buffer structure as below:",[8325,82278,82281],{"className":82279,"code":82280,"language":8330},[8328],"\nmessage FunctionMetaData {\n    FunctionDetails functionDetails ;\n    PackageLocationMetaData packageLocation;\n    uint64 version ;\n    uint64 createTime;\n    map instanceStates ;\n    FunctionAuthenticationSpec functionAuthSpec ;\n}\n \n",[4926,82282,82280],{"__ignoreMap":18},[48,82284,82285],{},"This FunctionMetaData structure is all managed by the Function MetaData Manager. From a Worker perspective, the Function MetaData Manager maintains the system of record. It maps from the Fully Qualified Function Name (FQFN) to the Function MetaData that is all backed by the information in the Pulsar Topic with the namespace and function information. It also updates and manages the machine states, as well as any conflicts when multiple Workers are submitted, based on what is submitted, and writes the metadata to the Topic.",[40,82287,82289],{"id":82288},"scheduling-workflows","Scheduling Workflows",[48,82291,82292],{},"Once the system has accepted a function, it gets scheduled using the Pluggable Scheduler. It's invoked once a new function is submitted and executed only by a Leader. A Leader is elected by having a Failover Subscription on a Coordination Topic. The consumer of the Topic is then elected as the Leader.",[48,82294,82295],{},"The Leader writes assignments to the Topics, known as Assignment Topics. They exist within a particular namespace within Pulsar and are assigned to individual Workers. All Workers know about all Assignments which are compacted and include all system logic such as the FQFN and Instance ID within the Assignment Tables.",[48,82297,82298],{},[384,82299],{"alt":82300,"src":82301},"illustration of Scheduling Workflows","\u002Fimgs\u002Fblogs\u002F63a3804565aad42fdb49e1d8_pulsar-functions-deep-dive-1.png",[40,82303,82305],{"id":82304},"execution-workflows","Execution Workflows",[48,82307,82308],{},"The execution workflow is triggered by changes to the Assignment Table. The components within the worker, called the Function RunTime Manager, manage the function lifecycle assignment such as starting or stopping a message using a Spawner.",[48,82310,82311],{},[384,82312],{"alt":82313,"src":82314},"illustration of Execution Workflows","\u002Fimgs\u002Fblogs\u002F63a38045f609c751faa8c6cc_pulsar-functions-deep-dive-2.png",[40,82316,82318],{"id":82317},"java-instances-and-pulsar-io","Java Instances and Pulsar IO",[48,82320,82321],{},"The Pulsar Java Instance itself is encapsulated as a Source, a function, which is the actual logic, and a Sink ensemble. Source is a construct that abstracts reading from input Topics and Sink abstracts writing from Topics.",[48,82323,82324],{},"With a regular function, the \"Source\" is a Pulsar Source that is reading from Pulsar, and a \"Sink\" is a Pulsar Sink because it writes to a Pulsar Topic.",[48,82326,82327],{},"However, if a non-Pulsar Source is submitted, such as a Google Pub Sub, that becomes a connector using Pulsar IO which acts like a Pulsar Function. The function is an Identity Function and lets data pass through the system. The Pulsar Sink then writes it to a Topic. A non-Pulsar Sink writes to an external system. The ability to consume external data is the reason Pulsar IO is written on top of Pulsar Functions.",[48,82329,82330],{},[384,82331],{"alt":82332,"src":82333},"Java Instances and Pulsar IO illustration","\u002Fimgs\u002Fblogs\u002F63a380457b38f766b6264366_pulsar-functions-deep-dive-3.png",[40,82335,82337],{"id":82336},"getting-started-on-pulsar-functions","Getting Started on Pulsar Functions",[48,82339,82340],{},"Pulsar Functions increase developer productivity, provide easier troubleshooting and operational simplicity because there is no need for an external processing system. They use a simple, lightweight SDK-less API and execution framework for the ninety percent of streaming use cases which are filtering, routing and enrichment. Anyone can write a Pulsar function, submit it to a Pulsar cluster and use it right away with the built in full lifecycle management capabilities of Pulsar Functions. Moreover, with Pulsar IO, non-Pulsar sources can be processed and written to external systems.",[48,82342,82343,82344,75419,82348,82352],{},"To find out more, you can view my presentation ",[55,82345,267],{"href":82346,"rel":82347},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EHoh_0Inxk8&list=PLqRma1oIkcWjVlPfaWlf3VO9W-XWsF_4-&index=12&t=27s",[264],[55,82349,82351],{"href":57760,"rel":82350},[264],"Apache Pulsar Slack channel"," to engage directly with the community.",{"title":18,"searchDepth":19,"depth":19,"links":82354},[82355,82356,82357,82358,82359,82360],{"id":82235,"depth":19,"text":82236},{"id":82260,"depth":19,"text":82261},{"id":82288,"depth":19,"text":82289},{"id":82304,"depth":19,"text":82305},{"id":82317,"depth":19,"text":82318},{"id":82336,"depth":19,"text":82337},"2020-10-06","The open source data technology framework Apache Pulsar provides a built-in stream processor for lightweight computations, called Pulsar Functions. Sanjeev Kulkarni recently gave a talk at Pulsar Summit Virtual Conference 2020 on Pulsar Functions. In this post, he will provide a deep dive into its architecture and implementation details.","\u002Fimgs\u002Fblogs\u002F63d79736cac36f4537985cca_63a37fdead154ce0fe7a72cc_pulsar-functions-deep-dive-head.webp",{},"\u002Fblog\u002Fpulsar-functions-deep-dive",{"title":82217,"description":82362},"blog\u002Fpulsar-functions-deep-dive",[9636,821,5376],"c012Ja0pfXBFkLvwKPMmNAsoufj9-4VW4CReYFmV9Dc",{"id":82371,"title":82372,"authors":82373,"body":82374,"category":821,"createdAt":290,"date":82567,"description":82568,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":82569,"navigation":7,"order":296,"path":82570,"readingTime":3556,"relatedResources":290,"seo":82571,"stem":82572,"tags":82573,"__hash__":82574},"blogs\u002Fblog\u002Fannouncing-mqtt-on-pulsar.md","Announcing MQTT-on-Pulsar: Bringing Native MQTT Protocol Support to Apache Pulsar",[808,53434],{"type":15,"value":82375,"toc":82556},[82376,82382,82394,82397,82400,82403,82408,82412,82415,82419,82433,82436,82439,82453,82456,82460,82463,82469,82473,82477,82480,82491,82494,82498,82501,82504,82507,82521,82526,82533,82535,82547],[48,82377,82378],{},[384,82379],{"alt":82380,"src":82381},"MQTT and MoP Proxy with pulsar illustration","\u002Fimgs\u002Fblogs\u002F63a37eccc8ac503908ab1c10_mop-proxy.png",[48,82383,82384,82385,82388,82389,82393],{},"We are excited to announce that StreamNative is open-sourcing \"MQTT on Pulsar\" (MoP). MoP brings the native MQTT protocol support to Apache Pulsar by introducing an MQTT protocol handler on Pulsar brokers. Similar to ",[55,82386,35093],{"href":29592,"rel":82387},[264],", MoP is also an implementation of the ",[55,82390,82392],{"href":67379,"rel":82391},[264],"pluggable protocol handler",". By adding the MoP protocol handler in your existing Pulsar cluster, you can migrate your existing MQTT applications and services to Pulsar without modifying the code. This enables MQTT applications to leverage Pulsar’s multi-layer system architecture (separation of compute and storage) and powerful features, such as infinite event stream retention with Apache BookKeeper and tiered storage.",[40,82395,82396],{"id":62870},"What is Apache Pulsar",[48,82398,82399],{},"Apache Pulsar is a cloud-native, distributed messaging and streaming platform that manages hundreds of billions of events per day. Pulsar was originally developed and deployed inside Yahoo as the consolidated messaging platform connecting critical Yahoo applications such as Yahoo Finance, Yahoo Mail, and Flickr, to data. Pulsar was contributed to open source by Yahoo in 2016 and became a top-level Apache Software Foundation project in 2018.",[48,82401,82402],{},"Apache Pulsar is a multi-tenant, high-performance solution for server-to-server messaging, including features such as native support for multiple clusters in a Pulsar instance, seamless geo-replication of messages across clusters, low publish and end-to-end latency, seamless scalability to over a million topics, and guaranteed message delivery with persistent message storage provided by Apache BookKeeper, among others.",[48,82404,82405,82406,190],{},"Currently, Apache Pulsar is used in a wide variety of industries. Enterprises, such as Tencent, Verizon Media, Splunk, ChinaMobile, and BIGO, have deployed Apache Pulsar to achieve their business goals. For more use cases, click ",[55,82407,267],{"href":10293},[40,82409,82411],{"id":82410},"what-is-mqtt","What is MQTT",[48,82413,82414],{},"Message Queuing Telemetry Transport (MQTT) is a lightweight publish-subscribe messaging transport protocol. MQTT is built on the TCP\u002FIP protocol and was created by IBM in 1999. It is ideal for connecting remote devices with a small code footprint and minimal network bandwidth to provide real-time and reliable messaging services. Today, as a low-overhead and low-bandwidth real-time communication protocol, MQTT today is adopted widely across multiple industries, such as the Internet of Things (IoT), small microcontrollers, automotive, etc.",[40,82416,82418],{"id":82417},"why-mop","Why MoP",[48,82420,82421,82422,82426,82427,82432],{},"Apache Pulsar provides a unified messaging model for both queueing and streaming workloads. Pulsar implemented its own Protobuf-based binary protocol to provide high performance and low latency. This choice of Protobuf makes it convenient to implement ",[55,82423,82425],{"href":67133,"rel":82424},[264],"Pulsar clients"," and the project already supports Java, Go, Python, and C++ languages alongside ",[55,82428,82431],{"href":82429,"rel":82430},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fclient-libraries\u002F#thirdparty-clients",[264],"third-party clients"," provided by the Pulsar community.",[48,82434,82435],{},"Because Apache Pulsar’s multi-tenancy and the overall architecture with Apache Bookkeeper is able to simplify operations, an increasing number of companies are exploring a way to shift and build the foundation of their services on Pulsar. However, to adopt Pulsar’s new unified messaging protocol, existing applications written using other messaging protocols have to be rewritten.",[48,82437,82438],{},"To address this, StreamNative has been working on a number of new projects to make this transition easier for organizations adopting Pulsar. Earlier this year, StreamNative announced KoP (Kafka-on-Pulsar) and AoP (AMQP-on-Pulsar) protocol handlers to facilitate the migration to Pulsar from Kafka and AMQP.",[321,82440,82441,82447],{},[324,82442,82443,82444,190],{},"KoP brings the native Apache Kafka protocol support to Apache Pulsar by introducing a Kafka protocol handler on Pulsar brokers. For details, see ",[55,82445,267],{"href":82446},"\u002Fblog\u002Ftech\u002F2020-03-24-bring-native-kafka-protocol-support-to-apache-pulsar",[324,82448,82449,82450,190],{},"AoP brings the native AMQP protocol support to Apache Pulsar by introducing an AMQP protocol handler on Pulsar brokers. For details, see ",[55,82451,267],{"href":82452},"\u002Fblog\u002Ftech\u002F2020-06-15-announcing-aop-on-pulsar",[48,82454,82455],{},"Over the past several months, StreamNative received a lot of inbound requests for help migrating services from MQTT to Pulsar and recognized the need to also support MQTT protocol natively on Pulsar. StreamNative invested engineering time and effort to introduce a general protocol handler framework in Pulsar that would allow developers who use the MQTT protocol to use Pulsar.",[40,82457,82459],{"id":82458},"mop-architecture","MoP Architecture",[48,82461,82462],{},"MoP is implemented as a pluggable protocol handler that can support native MQTT protocol on Pulsar by leveraging Pulsar features such as Pulsar topics, cursors etc. The diagram below illustrates a Pulsar cluster with the MoP protocol handler. Both the MQTT Proxy and MQTT protocol handler can run along with Pulsar brokers.",[48,82464,82465],{},[384,82466],{"alt":82467,"src":82468},"Illustration of MoP Architecture","\u002Fimgs\u002Fblogs\u002F63a37ecc172b6b2b9141645b_mop-architecture.png",[40,82470,82472],{"id":82471},"mop-concepts","MoP Concepts",[32,82474,82476],{"id":82475},"qos-levels","QoS Levels",[48,82478,82479],{},"MQTT has defined three Quality of Service (QoS) levels:",[321,82481,82482,82485,82488],{},[324,82483,82484],{},"QoS level 0 (at most once): This service level guarantees a best-effort delivery. There is no guarantee of delivery. The receiver does not acknowledge receipt of the message and the message is not stored and re-transmitted by the sender.",[324,82486,82487],{},"QoS level 1 (at least once): This service level guarantees that the message is transferred successfully to the receiver. The sender stores the message until it gets a PUBACK packet from the receiver that acknowledges receipt of the message. If the sender does not receive an acknowledgement, it will resend the message with the duplicate (DUP) flag set. It is possible for a message to be delivered multiple times.",[324,82489,82490],{},"QoS level 2 (exactly once): This service level guarantees that each message is received only once by the receiver. The guarantee is provided by a sequence of four messages between the sender and the receiver to ensure that the message has been sent and that the acknowledgement has been received. QoS level 2 is the highest service level in MQTT.",[48,82492,82493],{},"Currently, the MoP protocol handler only supports QoS level 0 and QoS level 1. In future releases, QoS level 2 will be supported.",[32,82495,82497],{"id":82496},"mop-proxy","MoP Proxy",[48,82499,82500],{},"The MoP Proxy is an optional component for MoP. It extends MoP to multiple nodes to realize horizontal expansion of services. The MoP Proxy is mainly used to forward messages delivered between the MQTT Client and the Pulsar broker. Therefore, the MQTT Client only needs to connect to the MoP Proxy to send and receive data, regardless of the Pulsar broker to which topics are dispatched.",[48,82502,82503],{},"The MoP Proxy can sense the status of Pulsar brokers. Once a Pulsar broker is disconnected or unreachable, the MoP Proxy will send messages from the MQTT Client to a new Pulsar broker.",[48,82505,82506],{},"The following figure illustrates the MoP Proxy service workflow.",[1666,82508,82509,82512,82515,82518],{},[324,82510,82511],{},"The MQTT client creates a connection with the MoP Proxy.",[324,82513,82514],{},"The MoP Proxy service sends a lookup request to Pulsar cluster to find out the owner broker URL of the topic along with the connection.",[324,82516,82517],{},"The Pulsar cluster returns the owner broker URL to the MoP Proxy.",[324,82519,82520],{},"The MoP Proxy builds a connection to the owner broker and starts to transfer data between the MQTT client and the owner Broker.",[48,82522,82523],{},[384,82524],{"alt":82525,"src":82381},"Illstration of MoP Proxy",[48,82527,82528,82529,190],{},"At present, the MoP Proxy works with the Pulsar broker. Users could choose whether to start the MoP Proxy service through the related configuration. For details, see ",[55,82530,267],{"href":82531,"rel":82532},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fmop#how-to-use-proxy",[264],[40,82534,82060],{"id":82059},[48,82536,82537,82538,82542,82543,190],{},"MoP is open-sourced under Apache License V2. You can ",[55,82539,36195],{"href":82540,"rel":82541},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fmop\u002Freleases\u002F",[264]," the MoP protocol handler to try out all the features of MoP. For details about how to use the MoP protocol handler, see ",[55,82544,267],{"href":82545,"rel":82546},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fmop\u002Fblob\u002Fmaster\u002FREADME.md",[264],[48,82548,82549,82550,82555],{},"For any problems in the use of the MoP protocol handler, you can create an issue in the ",[55,82551,82554],{"href":82552,"rel":82553},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fmop\u002Fissues",[264],"MoP repository",". We will reply to you as soon as possible. Meanwhile, we look forward to your contribution to MoP.",{"title":18,"searchDepth":19,"depth":19,"links":82557},[82558,82559,82560,82561,82562,82566],{"id":62870,"depth":19,"text":82396},{"id":82410,"depth":19,"text":82411},{"id":82417,"depth":19,"text":82418},{"id":82458,"depth":19,"text":82459},{"id":82471,"depth":19,"text":82472,"children":82563},[82564,82565],{"id":82475,"depth":279,"text":82476},{"id":82496,"depth":279,"text":82497},{"id":82059,"depth":19,"text":82060},"2020-09-28","MoP is also an implementation of the pluggable protocol handler. By adding the MoP protocol handler in your existing Pulsar cluster, you can migrate your existing MQTT applications and services to Pulsar without modifying the code. This enables MQTT applications to leverage Pulsar’s powerful features, such as infinite event stream retention with Apache BookKeeper and tiered storage.",{},"\u002Fblog\u002Fannouncing-mqtt-on-pulsar",{"title":82372,"description":82568},"blog\u002Fannouncing-mqtt-on-pulsar",[51871,821],"aintNm5gDWWRFU-Oq7S6dRTyhlRaHp_ZW3u44B1cb8c",{"id":82576,"title":82577,"authors":82578,"body":82579,"category":821,"createdAt":290,"date":82768,"description":82769,"extension":8,"featured":294,"image":82770,"isDraft":294,"link":290,"meta":82771,"navigation":7,"order":296,"path":82772,"readingTime":11180,"relatedResources":290,"seo":82773,"stem":82774,"tags":82775,"__hash__":82776},"blogs\u002Fblog\u002Fpulsar-flink-connector-2-5-0.md","Pulsar Flink Connector 2.5.0",[78658],{"type":15,"value":82580,"toc":82759},[82581,82589,82592,82596,82599,82613,82616,82627,82630,82641,82645,82648,82651,82654,82658,82661,82675,82682,82686,82689,82696,82700,82708,82712,82720,82723,82731,82735,82738,82743,82745],[48,82582,82583,82584,82588],{},"Pulsar Flink connector 2.5.0 is released on August 28, 2020, thank Pulsar community for the great efforts. The ",[55,82585,76361],{"href":82586,"rel":82587},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink\u002Ftree\u002Frelease-2.5.0",[264]," integrates Apache Pulsar and Apache Flink (the data processing engine), allowing Apache Flink to read\u002Fwrite data from\u002Fto Apache Pulsar.",[48,82590,82591],{},"I will introduce some major features in Pulsar Flink connector 2.5.0.",[40,82593,82595],{"id":82594},"backgrounds","Backgrounds",[48,82597,82598],{},"Apache Flink is a distributed computing engine that is upgraded rapidly. In version 1.11, Apache Flink supports the following new features:",[321,82600,82601,82604,82607,82610],{},[324,82602,82603],{},"The core engine introduces unaligned checkpoints, which improves the fault tolerance mechanism of Flink, and improves checkpointing performance under heavy backpressure.",[324,82605,82606],{},"A new Source API that simplifies the implementation of (custom) sources by unifying batch and streaming execution, as well as offloading internals such as event-time handling, watermark generation or idleness detection to Flink.",[324,82608,82609],{},"Flink SQL supports Change Data Capture (CDC) to easily consume and interpret database changelogs from tools like Debezium. The renewed FileSystem Connector also expands the set of use cases and formats supported in the Table API\u002FSQL, enabling scenarios like streaming data directly from Kafka to Hive.",[324,82611,82612],{},"Multiple performance optimizations to PyFlink, including support for vectorized User-defined Functions (Python UDFs). This improves interoperability with libraries like Pandas and NumPy, making Flink more powerful for data science and ML workloads.",[48,82614,82615],{},"After Apache Flink 1.11 was released, we upgraded the Pulsar Flink connector to support Apache Flink 1.11. We met some difficulties in upgrade:",[321,82617,82618,82621,82624],{},[324,82619,82620],{},"Public APIs supported by Apache Flink 1.11 are changed greatly.",[324,82622,82623],{},"Schema, which is originally checked through Table API, is checked at the start-up stage.",[324,82625,82626],{},"The connector is converted to Catalog at runtime.",[48,82628,82629],{},"The new Pulsar Flink connector is not compatible with the previous versions. Therefore, we decided to upgrade Pulsar Flink connector through the following two components:",[321,82631,82632,82635,82638],{},[324,82633,82634],{},"pulsar-flink-1.11 module",[324,82636,82637],{},"Pulsar Schema",[324,82639,82640],{},"Pulsar Schema contains the type structure information of the message. Therefore, Pulsar Schema works well with Flink Table. In Apache Flink 1.9, the SQL type is bound to the physical type and used as Pulsar SchemaType. However, in Apache Flink 1.11, after the Table is changed, the SQL type can only use the default physical type, and Pulsar SchemaType does not support the default physical type of the Apache Flink date and event. We added new native types to Pulsar Schema so that Pulsar Schema can work with the Flink SQL type system.",[40,82642,82644],{"id":82643},"major-features","Major features",[48,82646,82647],{},"Here are some major features introduced in Pulsar Flink connector 2.5.0.",[32,82649,82634],{"id":82650},"pulsar-flink-111-module",[48,82652,82653],{},"This section describes some new features about the pulsar-flink-1.11 module.",[3933,82655,82657],{"id":82656},"support-apache-flink-111-and-flink-sql-ddl","Support Apache Flink 1.11 and Flink SQL DDL",[48,82659,82660],{},"In Apache Flink 1.11, some public APIs are added or deleted, causing the Pulsar Flink connectors of Apache Flink 1.9 and Apache Flink 1.11 incompatible. Therefore, the project is divided into two modules to support different Apache Flink versions.",[321,82662,82663,82666,82669,82672],{},[324,82664,82665],{},"Support Apache Flink 1.11.",[324,82667,82668],{},"Support Flink SQL Data Definition Language (DDL).",[324,82670,82671],{},"Update the topic partition policy to consume\u002Fdispatch messages evenly.",[324,82673,82674],{},"Make Apache Flink 1.11 compatible with Pulsar Schema.",[48,82676,79754,82677,190],{},[55,82678,82681],{"href":82679,"rel":82680},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink\u002Fpull\u002F115",[264],"PR-115",[3933,82683,82685],{"id":82684},"support-pulsardeserializationschema-interface","Support pulsardeserializationSchema interface",[48,82687,82688],{},"Add a PulsarDeserializationSchema interface between actual deserialization and user-defined deserialization schema, so users can use the custom deserialization schema to consume messages.",[48,82690,79754,82691,190],{},[55,82692,82695],{"href":82693,"rel":82694},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink\u002Fpull\u002F95",[264],"PR-95",[3933,82697,82699],{"id":82698},"support-json-schema-in-flink-sink","Support JSON Schema in Flink Sink",[48,82701,82702,82703,190],{},"Flink Sink supports JSON schema. For more information about implementation, see ",[55,82704,82707],{"href":82705,"rel":82706},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink\u002Fpull\u002F116",[264],"PR-116",[3933,82709,82711],{"id":82710},"implement-pulsarcatalog-based-on-genericinmemorycatalog","Implement PulsarCatalog based on GenericInMemoryCatalog",[48,82713,82714,82715,190],{},"Implement PulsarCatalog based on GenericInMemoryCatalog by extending PulsarCatalog from in-memory catalog. For more information about implementation, see ",[55,82716,82719],{"href":82717,"rel":82718},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-flink\u002Fpull\u002F91",[264],"PR-91",[32,82721,82637],{"id":82722},"pulsar-schema",[48,82724,82725,82726,190],{},"Add Java 8 time and date type to Pulsar primitive Schema. Support Instant, LocalDate, LocalTime, LocalDateTime types in Pulsar primitive Schema. For more information about implementation, see ",[55,82727,82730],{"href":82728,"rel":82729},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7874",[264],"PR-7874",[40,82732,82734],{"id":82733},"thanks","Thanks",[48,82736,82737],{},"The release of Pulsar Flink connector 2.5.0 is a big milestone for this rapidly developing project. Special thanks to Hang Chen, Zhanpeng Wu, Sijie Guo, and Jianyun Zhao who contributed to this release.",[48,82739,78633,82740,190],{},[55,82741,76361],{"href":55929,"rel":82742},[264],[40,82744,52473],{"id":52472},[321,82746,82747,82753],{},[324,82748,82749],{},[55,82750,82752],{"href":82586,"rel":82751},[264],"Pulsar Flink connector v2.5.0",[324,82754,82755],{},[55,82756,82758],{"href":78942,"rel":82757},[264],"Pulsar Flink issues",{"title":18,"searchDepth":19,"depth":19,"links":82760},[82761,82762,82766,82767],{"id":82594,"depth":19,"text":82595},{"id":82643,"depth":19,"text":82644,"children":82763},[82764,82765],{"id":82650,"depth":279,"text":82634},{"id":82722,"depth":279,"text":82637},{"id":82733,"depth":19,"text":82734},{"id":52472,"depth":19,"text":52473},"2020-09-17","Learn the most interesting and major features added to Pulsar Flink connector 2.5.0.","\u002Fimgs\u002Fblogs\u002F63d797663a45cdb17dcd23bd_63a37dbd2be9e6050c3fb18e_flink-top.webp",{},"\u002Fblog\u002Fpulsar-flink-connector-2-5-0",{"title":82577,"description":82769},"blog\u002Fpulsar-flink-connector-2-5-0",[302,28572],"6cBgzqpAyP0PnrCtkn83xMrikDs3trf_vA1hDhubjtg",{"id":82778,"title":82779,"authors":82780,"body":82781,"category":3550,"createdAt":290,"date":82976,"description":82977,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":82978,"navigation":7,"order":296,"path":82979,"readingTime":4475,"relatedResources":290,"seo":82980,"stem":82981,"tags":82982,"__hash__":82983},"blogs\u002Fblog\u002Fstreamnative-cloud-getting-started.md","StreamNative Cloud - Getting Started",[60441,69353],{"type":15,"value":82782,"toc":82967},[82783,82789,82793,82812,82816,82819,82822,82826,82829,82849,82852,82869,82872,82875,82879,82882,82888,82892,82895,82908,82911,82915,82918,82941,82945,82953,82965],[48,82784,82785],{},[384,82786],{"alt":82787,"src":82788},"image of getting started on streamnative cloud","\u002Fimgs\u002Fblogs\u002F63a37bf1ba4f90290e52696e_getting-started-on-cloud-2.jpeg",[40,82790,82792],{"id":82791},"guide-overview","Guide Overview",[1666,82794,82795,82797,82800,82803,82806,82809],{},[324,82796,77553],{},[324,82798,82799],{},"Before You Get Started",[324,82801,82802],{},"Notes On Beta",[324,82804,82805],{},"How to Use StreamNative Cloud",[324,82807,82808],{},"A Step-by-Step on Getting Started",[324,82810,82811],{},"StreamNative Getting Started Video Tutorial",[40,82813,82815],{"id":82814},"_1-why-streamnative-cloud","1. Why StreamNative Cloud?",[48,82817,82818],{},"StreamNative Cloud offers a simple and fast solution for companies looking to adopt, or even just to test out Apache Pulsar. StreamNative Cloud works just like the open-source Apache Pulsar, with the same APIs and open-source clients being used to send and receive messages.",[48,82820,82821],{},"StreamNative Cloud makes adopting Pulsar simple and fast. The StreamNative team manages the heavy lifting of operations to ensure your cluster is running and optimized to meet the demands of your application. As the core contributors to Pulsar, the StreamNative team is well-versed in the technology and the perfect partner for you.",[40,82823,82825],{"id":82824},"_2-before-you-get-started","2. Before You Get Started",[48,82827,82828],{},"There are two things you need to do before you get started:",[1666,82830,82831,82834,82837,82840,82843,82846],{},[324,82832,82833],{},"Identify Your Cluster Needs",[324,82835,82836],{},"Setting up a cluster is simple and fast, you can do it in just a few minutes. There are a few pieces of data you will want to gather before you start:",[324,82838,82839],{},"Availability requirements",[324,82841,82842],{},"Peak write rate",[324,82844,82845],{},"Peak read rate",[324,82847,82848],{},"Estimated storage capacity",[48,82850,82851],{},"Once you have this data ready, you will want to choose an application, more on that below.",[1666,82853,82854,82857],{},[324,82855,82856],{},"Connect an Application to StreamNative Cloud",[324,82858,82859,82860,82864,82865,190],{},"The next step is to select an application to connect to StreamNative Cloud in order to start sending messages. Apache Pulsar has clients for a variety of languages and all are compatible with StreamNative Cloud, but for a quick test, we recommend you start with the included tools in the latest release of the open-source Pulsar distribution, which you can download and simply unpack from ",[55,82861,82862],{"href":82862,"rel":82863},"http:\u002F\u002Fpulsar.apache.org\u002Fdownload\u002F",[264]," or use our homebrew formula at ",[55,82866,82867],{"href":82867,"rel":82868},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fhomebrew-streamnative",[264],[48,82870,82871],{},"With the pular-admin and pulsar-client CLI tools installed, our Cloud Manager UI will help you send your first message by generating commands customized for your cluster.",[48,82873,82874],{},"Once you are ready to connect your application, the Cloud Manager UI can also help you get connected with recipes for using the Pulsar Java, Go, C++, Python, and NodeJS clients.",[40,82876,82878],{"id":82877},"_3-notes-on-beta","3. Notes on Beta",[48,82880,82881],{},"We are excited to launch StreamNative Cloud beta! In addition to the options and features available today, we have a lot more coming soon. See below for more details.",[48,82883,82884],{},[384,82885],{"alt":82886,"src":82887},"table with notes on beta streamnative cloud","\u002Fimgs\u002Fblogs\u002F63a37c46ba4f90315652a212_Notes-on-Beta.webp",[40,82889,82891],{"id":82890},"_4-how-to-use-streamnative-cloud","4. How to Use StreamNative Cloud",[48,82893,82894],{},"StreamNative Cloud provides a fully-managed instance of Apache Pulsar along with a suite of tools to help administrate your cluster, with support for managing:",[321,82896,82897,82899,82901,82904,82906],{},[324,82898,42839],{},[324,82900,42846],{},[324,82902,82903],{},"Namespace policies",[324,82905,42853],{},[324,82907,42860],{},[48,82909,82910],{},"StreamNative Cloud works by creating a cluster exposed on the public internet, secured with TLS encryption and Oauth2 authentication, that your applications can connect and use.",[40,82912,82914],{"id":82913},"_5-a-step-by-step-on-getting-started","5. A Step-by-Step On Getting Started",[48,82916,82917],{},"The Cloud Management UI has a built-in tour to help you create and connect to your first StreamNative Cloud cluster. We have also included a step by step below to give you a preview of the process.",[1666,82919,82920,82923,82926,82929,82932,82935,82938],{},[324,82921,82922],{},"First, create an organization. An organization allows you to invite team members to help manage your cluster.",[324,82924,82925],{},"Create a Pulsar instance, which can either be single- or multi-zone. Single-zone clusters are a cost-effective option for most production workloads. For enhanced availability, we recommend a multi-zone cluster that can withstand a zone-wide outage in the underlying cloud provider. (Additional features will come in the future, for example, instances consisting of multiple Pulsar clusters across geographic regions.)",[324,82927,82928],{},"Create a Pulsar cluster for your given throughput and storage needs. (You should have these numbers from the prep work you did above.) The Cloud Management UI provides guidance on the capacity you can expect from a given configuration, as well as an estimate of the costs.",[324,82930,82931],{},"Create a Service Account and download the credentials. The credentials will be used to authenticate against the Pulsar Cluster.",[324,82933,82934],{},"Grant permissions to your Service Account. A newly created Service Account doesn’t have any permissions, so these need to be added. The Cloud Management UI will walk you through authorizing a new role in the default namespace in your Pulsar cluster.",[324,82936,82937],{},"Connect and publish your first messages with the pulsar-client CLI tool. The StreamNative Cloud console provides a quickstart of the commands you need to run to publish messages to a test topic.",[324,82939,82940],{},"Congratulations! Your Pulsar cluster is provisioned and ready to start processing messages for your applications.",[40,82942,82944],{"id":82943},"_6-streamnative-cloud-getting-started-video","6. StreamNative Cloud Getting Started Video",[48,82946,82947,82948,82952],{},"For your convenience, we have also created a video tutorial. This 9-minute video will show you everything you need to get up and running on StreamNative Cloud. Click ",[55,82949,267],{"href":82950,"rel":82951},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4SuyItkbOB4",[264]," to play the video.",[48,82954,82955,82956,82959,82960,82964],{},"Now you’re ready to get started on StreamNative Cloud. Click ",[55,82957,267],{"href":17075,"rel":82958},[264]," to get started. If you have questions, contact us via the ",[55,82961,82963],{"href":82962},"\u002Fcloud\u002Fsupport","support portal"," or Live chat.",[48,82966,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":82968},[82969,82970,82971,82972,82973,82974,82975],{"id":82791,"depth":19,"text":82792},{"id":82814,"depth":19,"text":82815},{"id":82824,"depth":19,"text":82825},{"id":82877,"depth":19,"text":82878},{"id":82890,"depth":19,"text":82891},{"id":82913,"depth":19,"text":82914},{"id":82943,"depth":19,"text":82944},"2020-09-10","The Getting Started Guide provides a quick and easy overview of StreamNative Cloud.",{},"\u002Fblog\u002Fstreamnative-cloud-getting-started",{"title":82779,"description":82977},"blog\u002Fstreamnative-cloud-getting-started",[3550,821],"9IdaAYGfv9sQ81IsYGM6qksjF1PiFyi89ZjrDejEMDs",{"id":82985,"title":82986,"authors":82987,"body":82988,"category":7338,"createdAt":290,"date":83136,"description":83137,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":83138,"navigation":7,"order":296,"path":83139,"readingTime":7986,"relatedResources":290,"seo":83140,"stem":83141,"tags":83142,"__hash__":83143},"blogs\u002Fblog\u002Fpulsar-summit-asia-2020-cfp-is-open.md","Pulsar Summit Asia 2020 CFP is open",[78659],{"type":15,"value":82989,"toc":83126},[82990,82996,82999,83003,83027,83038,83040,83054,83058,83061,83075,83079,83084,83087,83095,83098,83100,83105,83112,83115,83117,83119,83121,83124],[48,82991,82992,82995],{},[384,82993],{"alt":18,"src":82994},"\u002Fimgs\u002Fblogs\u002F63d7978ca532651e6c1d5b97_63a37b7ee1d5c08aa7ed7928_pulsar-summit-asia-2020-top.webp","\nThe Pulsar Summit is an annual conference dedicated to the Apache Pulsar community. The summit brings together an international audience of CTOs\u002FCIOs, developers, data architects, data scientists, Apache Pulsar committers\u002Fcontributors, and the messaging and streaming community. Together, they share experiences, ideas, and insights on Pulsar and its growing community, and receive hands-on training sessions led by Pulsar experts.",[48,82997,82998],{},"After a very successful Pulsar Summit Virtual Conference in June, we have decided to present our Pulsar Summit Asia 2020 in the same way on November 28-29, 2020. The two-day conference will be free to attend! Are you interested in presenting? Suggested topics include Pulsar use cases, operations, technology deep dive, and ecosystem. CFP and registration are now open!",[40,83000,83002],{"id":83001},"speak-at-pulsar-summit","Speak at Pulsar Summit",[48,83004,83005,83006,83008,83009,83014,83015,83020,83021,83026],{},"The opportunity to speak at the second global Pulsar Summit is a great chance to participate in the rapidly growing Apache Pulsar community. Join us for the opportunity to be on stage with top Pulsar thought-leaders, including Apache Pulsar PMC members Sijie Guo and Jia Zhai from ",[55,83007,4496],{"href":10259},", Penghui Li from Zhaopin.com, Nozomi Kurihara from ",[55,83010,83013],{"href":83011,"rel":83012},"https:\u002F\u002Fabout.yahoo.co.jp\u002F",[264],"Yahoo Japan Corporation",", and other community leaders such as Dezhi Liu from ",[55,83016,83019],{"href":83017,"rel":83018},"https:\u002F\u002Fwww.tencent.com\u002Fen-us",[264],"Tencent",", Vincent Xie from ",[55,83022,83025],{"href":83023,"rel":83024},"https:\u002F\u002Fwww.bestpay.com.cn\u002F",[264],"Orange Finance",". Proposals for speaker presentations are currently being accepted. Suggested topics include Pulsar use cases, operations, technology deep dive, and ecosystem. Submissions are open until October 14, 2020.",[48,83028,83029,83030,83032,83033,190],{},"If you have questions about submitting a proposal, or want some feedback or advice in general, please do not hesitate to reach out to ",[55,83031,39814],{"href":39813},". We are happy to help out! Details are available on the ",[55,83034,83037],{"href":83035,"rel":83036},"https:\u002F\u002Fpulsar-summit.org\u002Fen\u002Fevent\u002Fasia-2020\u002Fcfp",[264],"CFP website",[40,83039,56358],{"id":56357},[321,83041,83042,83045,83048,83051],{},[324,83043,83044],{},"CFP opens: September 1, 2020",[324,83046,83047],{},"CFP closes: October 21, 2020 - 23:59 (CST: China Standard Time\u002FUTC+8 time zone)",[324,83049,83050],{},"CFP notification: October 28, 2020",[324,83052,83053],{},"Schedule announcement: November 4, 2020",[40,83055,83057],{"id":83056},"speaker-benefits","Speaker benefits",[48,83059,83060],{},"When your speaking proposal is approved, you will enjoy the following benefits:",[321,83062,83063,83066,83068,83071,83073],{},[324,83064,83065],{},"The opportunity to expand your network and raise your profile in the Apache Pulsar community.",[324,83067,77406],{},[324,83069,83070],{},"Your name, title, company, and bio will be featured on the Pulsar Summit Asia 2020 website.",[324,83072,69684],{},[324,83074,77414],{},[40,83076,83078],{"id":83077},"speaker-requirements","Speaker requirements",[48,83080,83081,83082,76292],{},"In addition to your talk, we ask that you actively participate in promoting the event via your personal and company channels. These include posting on your Twitter, LinkedIn, WeChat, Weibo, blog and other channels. We would also like to work directly with your marketing team on co-marketing opportunities. These include, but are not limited to, posting to your company’s Twitter, LinkedIn, WeChat and other developer communities and sending a dedicated Pulsar Summit email to your company’s email list. Contact us at ",[55,83083,39814],{"href":39813},[40,83085,58582],{"id":83086},"registration",[48,83088,83089,83090,83094],{},"If you are interested in attending Pulsar Summit Asia 2020, please sign in Hopin and ",[55,83091,83093],{"href":79682,"rel":83092},[264],"checkout our event",". Your ideas are very important to us, and we will prepare the content accordingly.",[48,83096,83097],{},"After you checkout the event in Hopin, you will be notified with the event update at the first time when announcing.",[40,83099,56379],{"id":56378},[48,83101,83102,83103,38617],{},"Pulsar Summit is a community run conference and your support is needed. Sponsoring this event will provide a great opportunity for your organization to further engage with the Apache Pulsar community. ",[55,83104,38404],{"href":77457},[48,83106,83107,83108,83111],{},"Help us make #PulsarSummit 2020 a big success by spreading the word and submitting your proposal! Follow us on Twitter (",[55,83109,39823],{"href":39821,"rel":83110},[264],") to receive the latest updates of the conference!",[48,83113,83114],{},"Hope to see you at Pulsar Summit Asia 2020!",[40,83116,39828],{"id":39827},[48,83118,82399],{},[40,83120,10248],{"id":10247},[48,83122,83123],{},"StreamNative is the organizer of Pulsar Summit Asia 2020. StreamNative is enabling organizations to build the next generation of messaging and event streaming applications. Leveraging Apache Pulsar and BookKeeper, we optimize for scalability and resiliency while reducing the overhead management and complexity required by incumbent technologies. We do this by offering Pulsar and StreamNative’s \"products as a service\". StreamNative is building a world-class team that is passionate about building amazing products and committed to customer success.",[48,83125,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":83127},[83128,83129,83130,83131,83132,83133,83134,83135],{"id":83001,"depth":19,"text":83002},{"id":56357,"depth":19,"text":56358},{"id":83056,"depth":19,"text":83057},{"id":83077,"depth":19,"text":83078},{"id":83086,"depth":19,"text":58582},{"id":56378,"depth":19,"text":56379},{"id":39827,"depth":19,"text":39828},{"id":10247,"depth":19,"text":10248},"2020-09-01","Pulsar Summit Asia 2020 CFP and sign-up are now open",{},"\u002Fblog\u002Fpulsar-summit-asia-2020-cfp-is-open",{"title":82986,"description":83137},"blog\u002Fpulsar-summit-asia-2020-cfp-is-open",[5376,821],"WCaVbh309-yJlS_m2ElytfwDQH3JA-VI3UxExxlIkLg",{"id":83145,"title":83146,"authors":83147,"body":83148,"category":3550,"createdAt":290,"date":83226,"description":83227,"extension":8,"featured":294,"image":83228,"isDraft":294,"link":290,"meta":83229,"navigation":7,"order":296,"path":83230,"readingTime":11180,"relatedResources":290,"seo":83231,"stem":83232,"tags":83233,"__hash__":83234},"blogs\u002Fblog\u002Fstreamnative-announces-free-cloud-offering.md","StreamNative Announces Free Cloud Offering",[69353],{"type":15,"value":83149,"toc":83224},[83150,83154,83157,83160,83164,83166,83169,83172,83175,83178,83183,83187,83202,83207,83210,83218],[8300,83151,83153],{"id":83152},"streamnative-cloud-accelerates-application-development","StreamNative Cloud Accelerates Application Development",[48,83155,83156],{},"In August 2020 we announced the launch of our Cloud Hosted offering. The goal with StreamNative Cloud was to make it easier for companies to adopt Pulsar by taking on the heavy lifting of operations and cluster management. Without having to dedicate time and bandwidth to operations, developers would be able to focus their time and attention on developing applications to meet their core business needs.",[48,83158,83159],{},"For many companies, StreamNative Cloud is doing just that. BestPay is an early user of StreamNative Cloud and Weisheng Xie, Chief Data Scientist at Bestpay, shares how it has enabled his team to launch Pulsar clusters with ease.",[916,83161,83162],{},[48,83163,79702],{},[48,83165,79705],{},[48,83167,83168],{},"StreamNative Cloud is the industry’s only fully-managed, cloud-native messaging and event streaming platform powered by the original developers of Apache Pulsar. It works just like the open-source Apache Pulsar, with the same APIs and open-source clients being used to send and receive messages.",[8300,83170,83146],{"id":83171},"streamnative-announces-free-cloud-offering",[48,83173,83174],{},"While StreamNative Cloud is a great fit for companies that have already decided on Apache Pulsar, we wanted to make it easier for new companies and developers to try Pulsar.",[48,83176,83177],{},"For Sijie Guo, CEO and co-founder of StreamNative, and, also, one of the original developers of Apache Pulsar and Apache Bookkeeper, the Free Cloud Offering is an opportunity to enable more developers and organizations to try Pulsar.",[916,83179,83180],{},[48,83181,83182],{},"“When we launched StreamNative Cloud, we wanted to remove barriers to adoption for Pulsar. While it was a great fit for companies that were further along in their adoption journey, there was high demand from people who were just getting started. Namely, developers who wanted to try Pulsar without having to put down a credit card or get approvals. The free offering makes this possible.”",[321,83184,83185],{},[324,83186,79639],{},[48,83188,83189,83190,83195,83196,83201],{},"You can read more about StreamNative’s Free Cloud Offering in our ",[55,83191,83194],{"href":83192,"rel":83193},"https:\u002F\u002Fwww.prnewswire.com\u002Fnews-releases\u002Fpulsar-adoption-continues--free-pulsar-as-a-service-launches-301160854.html",[264],"press release"," and in this ",[55,83197,83200],{"href":83198,"rel":83199},"https:\u002F\u002Fwww.datanami.com\u002F2020\u002F10\u002F27\u002Ffree-apache-pulsar-cloud-offered-by-streamnative\u002F",[264],"Datanami"," feature.",[48,83203,83204],{},[384,83205],{"alt":66921,"src":83206},"\u002Fimgs\u002Fblogs\u002F63a383443eafb5b10c4f57df_free-cloud-figure-1.png",[48,83208,83209],{},"Built and operated by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative Cloud provides a scalable, resilient, and secure messaging and event streaming platform for enterprises.",[48,83211,83212,83213,190],{},"With StreamNative’s Free Cloud Offering, you can spin up a fully-functional Pulsar cluster in minutes and with no credit card required. Get started ",[55,83214,83217],{"href":83215,"rel":83216},"https:\u002F\u002Fauth.streamnative.cloud\u002Flogin?state=g6Fo2SA0RldsMUdpNTRlV21wVEMzMXNmZU8tdlRIMHJkMUdxSaN0aWTZIGRMYkxnR3BiSUdYdm5iMnpPOWxESElXOVk5SHZvMUpuo2NpZNkgNmVyNzNxS3E0MnFCMHdic3IxU09NYVliYXU3S2hsZXc&client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&protocol=oauth2&audience=https%3A%2F%2Fapi.streamnative.cloud&redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&scope=openid%20profile%20email%20offline_access&response_type=code&response_mode=query&nonce=TkxBVndZNFlBbWJiZV8tOEV1aHJITEk5RElzNHhmRVV0cWh0cElCUjVjbQ%3D%3D&code_challenge=9xmOoOMKM1COH7R-8NBmFFBHlamonfQxJ_NUI2cPfRQ&code_challenge_method=S256&auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTIuMSJ9",[264],"today",[48,83219,83220,83221,190],{},"You can view details on the StreamNative Free Cloud Offering ",[55,83222,267],{"href":83223},"\u002Fdocs\u002Fcloud\u002Fstable\u002Fconcepts\u002Fconcepts#cluster-type",{"title":18,"searchDepth":19,"depth":19,"links":83225},[],"2020-08-28","We’re excited to announce the launch of StreamNative’s Free Cloud Offering. Launched in August, StreamNative Cloud provides a simple, fast, reliable, and cost-effective way to run Pulsar in the cloud, and is the industry’s only fully-managed, cloud-native messaging and event streaming platform powered by the original developers of Apache Pulsar. Our free offering allows you to get started with exploring Pulsar or even running a small scale production Pulsar in just a few clicks.","\u002Fimgs\u002Fblogs\u002F63d796e62a4dc2e829bf2cd2_63a383142221ae3b8344f1c8_top-free-cloud.webp",{},"\u002Fblog\u002Fstreamnative-announces-free-cloud-offering",{"title":83146,"description":83227},"blog\u002Fstreamnative-announces-free-cloud-offering",[302,3550,821],"138E2NXLH0nee3VZUUfsRe7mcakFgVblJvOyXfadspo",{"id":83236,"title":83237,"authors":83238,"body":83239,"category":7338,"createdAt":290,"date":83551,"description":83552,"extension":8,"featured":294,"image":83553,"isDraft":294,"link":290,"meta":83554,"navigation":7,"order":296,"path":83555,"readingTime":11508,"relatedResources":290,"seo":83556,"stem":83557,"tags":83558,"__hash__":83559},"blogs\u002Fblog\u002Fapache-pulsar-celebrates-300th-contributor.md","Apache Pulsar Celebrates 300th contributor",[78659,69353],{"type":15,"value":83240,"toc":83541},[83241,83244,83247,83255,83258,83264,83267,83271,83286,83325,83333,83336,83339,83342,83361,83369,83371,83374,83378,83389,83393,83404,83410,83413,83415,83418,83500,83507,83509,83512,83534],[48,83242,83243],{},"Dear Pulsar community,",[48,83245,83246],{},"Over the last few years, the shift to real-time streaming technologies has bolstered the adoption of Pulsar and there has been a major increase in both the interest and adoption of Pulsar in 2020 alone. With Pulsar being sought out by companies developing messaging and event-streaming applications — from Fortune 100 companies to forward-thinking start-ups — the community is growing quickly.",[48,83248,83249,83250,83254],{},"This community growth has contributed to a new milestone - our ",[55,83251,83253],{"href":69560,"rel":83252},[264],"300th contributor"," to the Pulsar repository. The number of contributors in an open-source project is meaningful because it signals project adoption and more contributors means accelerated development of the open-source technology. This milestone is even more exciting given that we added 100 contributors in the last 8 months alone!",[48,83256,83257],{},"As many of you know, Apache Pulsar is a cloud-native messaging and event streaming platform that has experienced rapid growth since it was committed to open source in 2016. Pulsar graduated as a Top-Level Project (TLP) in September 2018, has launched 92 releases, attracted 5100+ commits from 300 contributors, received 6.5k+ stars, 1.6k+ forks, and 2.2k+ Slack users.",[48,83259,83260],{},[384,83261],{"alt":83262,"src":83263},"apache pulsar interface","\u002Fimgs\u002Fblogs\u002F63a37ae91ef7363dc1d5d174_p-300-commits.jpeg",[48,83265,83266],{},"The influx of developers joining the Pulsar community is in large part due to the high market demand for next-generation messaging technologies, big-data insights, and real-time streaming. Top developers and industry leaders are joining the Pulsar community for the opportunity to help shape the future of this technology.",[40,83268,83270],{"id":83269},"community-events","Community Events",[48,83272,83273,83274,83279,83280,83285],{},"To meet the high demand for education and training in the Pulsar community, the community has launched some key initiatives this year. We host weekly TGIP (Thank Goodness It's Pulsar) training, which features Pulsar thought-leaders and Pulsar PMC Members. To meet global demand, we currently host two different weekly trainings. One ",[55,83275,83278],{"href":83276,"rel":83277},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Vc_a2ppRzlI&list=PLqRma1oIkcWhWAhKgImEeRiQi5vMlqTc-",[264],"TGIP training"," runs on Pacific Time, and the other ",[55,83281,83284],{"href":83282,"rel":83283},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Ftgip-cn",[264],"TGIP-CN training"," runs on Beijing Time.",[48,83287,83288,83289,83294,83295,1186,83298,83302,83303,1186,83307,1186,83311,1186,83316,1186,83320,83324],{},"We also host monthly ",[55,83290,83293],{"href":83291,"rel":83292},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=mncXc_T6JkU&list=PLqRma1oIkcWhfmUuJrMM5YIG8hjju62Ev",[264],"webinars"," to bring together Pulsar and messaging community thought-leaders to share best practices, insights and product news. Thank ",[55,83296,807],{"href":72179,"rel":83297},[264],[55,83299,60441],{"href":83300,"rel":83301},"https:\u002F\u002Ftwitter.com\u002Faddisonjh",[264],", Joe Francis, ",[55,83304,48929],{"href":83305,"rel":83306},"https:\u002F\u002Ftwitter.com\u002FShivjiJha",[264],[55,83308,77592],{"href":83309,"rel":83310},"https:\u002F\u002Ftwitter.com\u002FDevinBost",[264],[55,83312,83315],{"href":83313,"rel":83314},"https:\u002F\u002Ftwitter.com\u002FPierreZ",[264],"Pierre Zemb",[55,83317,28870],{"href":83318,"rel":83319},"https:\u002F\u002Ftwitter.com\u002Fjessetanderson",[264],[55,83321,806],{"href":83322,"rel":83323},"https:\u002F\u002Ftwitter.com\u002Fsijieg",[264]," and other speakers their time and insights.",[48,83326,83327,83328,83332],{},"This year also marked our first global summit, held in June 2020. Hosted by StreamNative and Splunk, the first-ever ",[55,83329,83331],{"href":35357,"rel":83330},[264],"Pulsar Summit Virtual Conference"," featured 30+ talks from 20+ organizations. We would like to thank all of the speakers for sharing their stories about Pulsar, and thank all of the attendees for joining the event.",[48,83334,83335],{},"To meet the global demands of the Pulsar community, we are excited to announce that we will be hosting Pulsar Summit Asia 2020 on November 28th and 29th. The call for presentations for this event will be coming soon. You can sign up for the Pulsar Newsletter or join the Pulsar Slack channel to receive updates regarding this event.",[40,83337,83338],{"id":48197},"Pulsar Adoption",[48,83340,83341],{},"In addition to the growth in contributors, we are excited to see accelerated adoption of Pulsar in PoC and production environments. Pulsar is helping companies globally to unlock the power of real-time data and to grow their businesses with efficiency and simplicity.",[48,83343,83344,83345,83349,83350,83355,83356,190],{},"Key adoption stories illustrate Pulsar's ability to handle mission-critical applications. These include ",[55,83346,83348],{"href":83347},"\u002Fsuccess-stories\u002Ftencent","Tencent’s adoption"," of Pulsar for its transactional billing system, which processes more than 10 billion transactions and 10+ TBs of data daily. ",[55,83351,83354],{"href":83352,"rel":83353},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=FXQvsHz_S1A",[264],"Verizon Media is another success story",", having operated Pulsar in production for more than 5 years, managing millions of write requests\u002Fsecond, and supporting the business across six global data centers. Most recently Splunk, which had used Kafka in production environments for years, ",[55,83357,83360],{"href":83358,"rel":83359},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=_q8s3_0-BRQ",[264],"adopted Pulsar for their new data processor",[48,83362,83363,83364,190],{},"For more insights on Pulsar adoption, you can find a list for companies using or contributing to Apache Pulsar on ",[55,83365,83368],{"href":83366,"rel":83367},"http:\u002F\u002Fpulsar.apache.org\u002Fen\u002Fpowered-by\u002F",[264],"Pulsar Powered by page",[40,83370,71556],{"id":71555},[48,83372,83373],{},"Committed community partners have also contributed to key project advancements. Below, we look at two recent product launches.",[32,83375,83377],{"id":83376},"ovhcloud-helps-companies-move-from-kafka-to-pulsar","OVHCloud Helps Companies Move from Kafka to Pulsar",[48,83379,83380,83381,83384,83385,83388],{},"In March 2020, ",[55,83382,83383],{"href":82446},"OVHCloud and StreamNative launched Kafka-on-Pulsar (KoP)",", the result of the two companies working closely in partnership. ",[55,83386,35093],{"href":29592,"rel":83387},[264]," enables Kafka users to migrate their existing Kafka applications and services to Pulsar without modifying the code. Although only recently released, KoP has already been adopted by several organizations and is being used in production environments. Moreover, KoP's availability is helping to expand Pulsar's adoption.",[32,83390,83392],{"id":83391},"china-mobile-helps-companies-move-from-rabbitmq-to-pulsar","China Mobile Helps Companies Move from RabbitMQ to Pulsar",[48,83394,83395,83396,83399,83400,83403],{},"In June 2020, ",[55,83397,83398],{"href":82452},"China Mobile and StreamNative announced the launch of another major platform upgrade, AMQP-on-Pulsar (AoP)",". Similar to KoP, ",[55,83401,37239],{"href":37237,"rel":83402},[264]," allows organizations currently using RabbitMQ (or other AMQP message brokers) to migrate existing applications and services to Pulsar without code modification. Again, this is a key initiative that will help drive the adoption and usage of Pulsar.",[48,83405,83406,83407,190],{},"You can find a number of other connections and integrations, such as MQTT-on-Pulsar for building IoT applications, in the ",[55,83408,38697],{"href":35258,"rel":83409},[264],[48,83411,83412],{},"These events and initiatives illustrate the Pulsar community's firm commitment to education and ecosystem development. More importantly, they demonstrate the momentum and growth we can expect in the future.",[40,83414,79225],{"id":79577},[48,83416,83417],{},"We would like to thank the Pulsar community, contributors and committers, who have helped to drive development, growth and adoption for Pulsar. We would especially like to recognize our distinguished contributors and committers (including but not limited to):",[321,83419,83420,83431,83443,83451,83460,83470,83480,83491],{},[324,83421,83422,24268,83426],{},[55,83423,807],{"href":83424,"rel":83425},"https:\u002F\u002Fgithub.com\u002Fmerlimat",[264],[55,83427,83430],{"href":83428,"rel":83429},"https:\u002F\u002Fwww.splunk.com\u002F",[264],"Splunk",[324,83432,83433,24268,83438],{},[55,83434,83437],{"href":83435,"rel":83436},"https:\u002F\u002Fgithub.com\u002Frdhabalia",[264],"Rajan Dhabalia",[55,83439,83442],{"href":83440,"rel":83441},"https:\u002F\u002Fwww.verizonmedia.com\u002F",[264],"Verizon Media",[324,83444,83445,24268,83449],{},[55,83446,806],{"href":83447,"rel":83448},"https:\u002F\u002Fgithub.com\u002Fsijie",[264],[55,83450,4496],{"href":10259},[324,83452,83453,24268,83457],{},[55,83454,82219],{"href":83455,"rel":83456},"https:\u002F\u002Fgithub.com\u002Fsrkukarni",[264],[55,83458,83430],{"href":83428,"rel":83459},[264],[324,83461,83462,24268,83467],{},[55,83463,83466],{"href":83464,"rel":83465},"https:\u002F\u002Fgithub.com\u002Fjerrypeng",[264],"Boyang Jerry Peng",[55,83468,83430],{"href":83428,"rel":83469},[264],[324,83471,83472,24268,83477],{},[55,83473,83476],{"href":83474,"rel":83475},"https:\u002F\u002Fgithub.com\u002Fivankelly",[264],"Ivan Brendan Kelly",[55,83478,83430],{"href":83428,"rel":83479},[264],[324,83481,83482,24268,83486],{},[55,83483,808],{"href":83484,"rel":83485},"https:\u002F\u002Fgithub.com\u002Fcodelipenghui",[264],[55,83487,83490],{"href":83488,"rel":83489},"http:\u002F\u002Fwww.zhaopin.com\u002F",[264],"Zhaopin.com",[324,83492,83493,24268,83498],{},[55,83494,83497],{"href":83495,"rel":83496},"https:\u002F\u002Fgithub.com\u002Fjiazhai",[264],"Jia Zhai",[55,83499,4496],{"href":10259},[48,83501,83502,83503,190],{},"To view other contributors, see ",[55,83504,83506],{"href":69560,"rel":83505},[264],"Pulsar contributor list",[40,83508,66091],{"id":39646},[48,83510,83511],{},"We invite you to join this fast-growing community. Together, we will continue to develop technology to meet today’s most innovative messaging and event-streaming use cases and to help companies unlock the value of real-time data.",[48,83513,83514,83515,83518,83519,83523,83524,10259,83529,83533],{},"Whether it is joining our ",[55,83516,31906],{"href":36242,"rel":83517},[264],", sharing your Pulsar story via a sponsored ",[55,83520,83522],{"href":83291,"rel":83521},[264],"webinar"," or case study, joining a ",[55,83525,83528],{"href":83526,"rel":83527},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Ftgip",[264],"TGIP",[55,83530,83532],{"href":83282,"rel":83531},[264],"TGIP-CN",", or attending or speaking at the next Pulsar Summit, we look forward to connecting with you.",[48,83535,83536,83537,4003,83539,190],{},"You can also subscribe our mailing lists: ",[55,83538,78612],{"href":48254},[55,83540,78618],{"href":48257},{"title":18,"searchDepth":19,"depth":19,"links":83542},[83543,83544,83545,83549,83550],{"id":83269,"depth":19,"text":83270},{"id":48197,"depth":19,"text":83338},{"id":71555,"depth":19,"text":71556,"children":83546},[83547,83548],{"id":83376,"depth":279,"text":83377},{"id":83391,"depth":279,"text":83392},{"id":79577,"depth":19,"text":79225},{"id":39646,"depth":19,"text":66091},"2020-08-24","Apache Pulsar celebrates 300th contributor!","\u002Fimgs\u002Fblogs\u002F63d797be61830d71f325530c_63a37ae994af5327cc666af6_pulsar-300-contributors-top.webp",{},"\u002Fblog\u002Fapache-pulsar-celebrates-300th-contributor",{"title":83237,"description":83552},"blog\u002Fapache-pulsar-celebrates-300th-contributor",[821,303],"arQmLGKHKCWMVyG7j0ZI9FIDjRbuRT1BHCt7pGlgm20",{"id":83561,"title":83562,"authors":83563,"body":83564,"category":821,"createdAt":290,"date":84175,"description":84176,"extension":8,"featured":294,"image":84177,"isDraft":294,"link":290,"meta":84178,"navigation":7,"order":296,"path":84179,"readingTime":62820,"relatedResources":290,"seo":84180,"stem":84181,"tags":84182,"__hash__":84183},"blogs\u002Fblog\u002Fapache-pulsar-2-6-1.md","Apache Pulsar 2.6.1",[53434],{"type":15,"value":83565,"toc":84132},[83566,83569,83572,83574,83578,83589,83596,83600,83603,83606,83613,83617,83620,83626,83629,83636,83640,83643,83646,83653,83657,83660,83666,83669,83676,83680,83683,83690,83694,83697,83700,83711,83718,83722,83725,83732,83736,83739,83742,83749,83753,83756,83762,83765,83772,83776,83779,83786,83789,83793,83801,83804,83807,83813,83816,83823,83825,83829,83832,83839,83843,83846,83853,83857,83860,83863,83866,83873,83877,83880,83883,83886,83893,83897,83906,83914,83916,83920,83923,83930,83934,83937,83944,83948,83951,83954,83961,83965,83968,83975,83979,83982,83989,83991,83995,83998,84005,84009,84018,84024,84027,84034,84036,84040,84043,84050,84052,84055,84076,84079,84094,84099,84107,84110,84121,84129],[48,83567,83568],{},"We are very glad to see that the Apache Pulsar community has successfully released 2.6.1 version after a lot of hard work. It is a great milestone for this fast-growing project and the Pulsar community. Pulsar 2.6.1 is the result of a big effort from the community, with over 100 commits and a long list of improvements and bug fixes.",[48,83570,83571],{},"Here are some highlights and major features added in Pulsar 2.6.1.",[40,83573,61065],{"id":61064},[32,83575,83577],{"id":83576},"limit-the-batch-size-to-the-minimum-of-the-maxnumberofmessages-and-maxsizeofmessages","Limit the batch size to the minimum of the maxNumberOfMessages and maxSizeOfMessages",[1666,83579,83580,83583,83586],{},[324,83581,83582],{},"Batch size is not limited to the minimum of the maxNumberOfMessages and maxSizeOfMessages from the BatchReceive policy.",[324,83584,83585],{},"When the batch size is greater than the receiveQ of the consumer (for example, the batch size is 3000 and a receiveQ is 500), the following issue occurs:",[324,83587,83588],{},"In a multi-topic (pattern) consumer, the client stops receiving any messages. The client gets paused and never resumed when setting a timeout in the batch policy. Only one batch is fetched and the client is never resumed.",[48,83590,79754,83591,190],{},[55,83592,83595],{"href":83593,"rel":83594},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6865",[264],"PR-6865",[32,83597,83599],{"id":83598},"fix-hash-range-conflict-issue-in-key_shared-subscription-with-sticky-hash-range","Fix hash range conflict issue in Key_Shared subscription with sticky hash range",[48,83601,83602],{},"In Key_Shared subscription where the stickyHashRange is used, consumers are not allowed to use interleaving hashes.",[48,83604,83605],{},"The pull request fixes the hash range conflict issue in Key_Shared with sticky hash range.",[48,83607,79754,83608,190],{},[55,83609,83612],{"href":83610,"rel":83611},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7231",[264],"PR-7231",[32,83614,83616],{"id":83615},"fix-get-lookup-permission-error","Fix get lookup permission error",[48,83618,83619],{},"If the canProduce or canConsume method throws an exception, the canLookup method just throws the exception and does not check other permissions. The code snippet is as follows:",[8325,83621,83624],{"className":83622,"code":83623,"language":8330},[8328],"\ntry {\n    return canLookupAsync(topicName, role, authenticationData)\n            .get(conf.getZooKeeperOperationTimeoutSeconds(), SECONDS);\n}\n",[4926,83625,83623],{"__ignoreMap":18},[48,83627,83628],{},"PR-7234 invokes canLookupAsync. When Pulsar AuthorizationService checks lookup permission, if the user has the canProducer or canConsumer role, the user performs canLookup operations.",[48,83630,79754,83631,190],{},[55,83632,83635],{"href":83633,"rel":83634},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7234",[264],"PR-7234",[32,83637,83639],{"id":83638},"avoid-introducing-null-read-position-for-the-managed-cursor","Avoid introducing null read position for the managed cursor",[48,83641,83642],{},"Avoid introducing null read position for the managed cursor. The most doubtful thing is the getNextValidPosition method in the ManagedLedgerImpl. If a given position is greater than the position added last time, it returns a null value, and the read position is also null.",[48,83644,83645],{},"In this PR, we add a log and print the stack trace to find the root cause and fallback to the next position if the null occurs at the next valid position.",[48,83647,79754,83648,190],{},[55,83649,83652],{"href":83650,"rel":83651},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7264",[264],"PR-7264",[32,83654,83656],{"id":83655},"fix-error-in-creation-of-non-durable-cursor","Fix error in creation of non-durable cursor",[48,83658,83659],{},"An NPE occurs when we fail to create a non-durable cursor and continue to create the subscription instance.",[8325,83661,83664],{"className":83662,"code":83663,"language":8330},[8328],"\ntry {\n    cursor = ledger.newNonDurableCursor(startPosition, subscriptionName);\n} catch (ManagedLedgerException e) {\n    subscriptionFuture.completeExceptionally(e);\n}\nreturn new PersistentSubscription(this, subscriptionName, cursor, false);\n\n",[4926,83665,83663],{"__ignoreMap":18},[48,83667,83668],{},"Additionally, the NPE leads to the topic usage count increasing to 1. When deleting a topic, the topic cannot be deleted even if you use the force flag.",[48,83670,79754,83671,190],{},[55,83672,83675],{"href":83673,"rel":83674},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7355",[264],"PR-7355",[32,83677,83679],{"id":83678},"avoid-an-npe-occurs-in-the-managedledgerimplisoffloadedneedsdelete-method","Avoid an NPE occurs in the ManagedLedgerImpl.isOffloadedNeedsDelete method",[48,83681,83682],{},"When the default value of the offload-deletion-lag is set to null, an NPE occurs. To fix the bug, null check is added in the ManagedLedgerImpl.isOffloadedNeedsDelete method.",[48,83684,79754,83685,190],{},[55,83686,83689],{"href":83687,"rel":83688},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7389",[264],"PR-7389",[32,83691,83693],{"id":83692},"fix-producer-stuck-issue-due-to-npe-when-creating-a-new-ledger","Fix producer stuck issue due to NPE when creating a new ledger",[48,83695,83696],{},"NPE occurs when creating a ledger if the network address is unresolvable. If NPE occurs before adding the timeout task, the timeout mechanism does not work. The unresolvable network address is common in the Kubernetes environment. It happens when a bookie pod or a worker node restarts.",[48,83698,83699],{},"This pull request fixes from the following perspectives:",[1666,83701,83702,83705,83708],{},[324,83703,83704],{},"Catch the NPE when creating a new ledger.",[324,83706,83707],{},"When the timeout task is triggered, it always executes the callback. It is totally fine because we already have the logic to ensure the callback is triggered only once.",[324,83709,83710],{},"Add a mechanism to detect that the CreatingLedger state is not moving.",[48,83712,79754,83713,190],{},[55,83714,83717],{"href":83715,"rel":83716},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7401",[264],"PR-7401",[32,83719,83721],{"id":83720},"fix-npe-when-using-advertisedlisteners","Fix NPE when using advertisedListeners",[48,83723,83724],{},"The broker failed to acquire ownership for the namespace bundle when using advertisedListeners=internal:pulsar:\u002F\u002Fnode1:6650,external:pulsar:\u002F\u002Fnode1.external:6650 with external listener name. Correct BrokerServiceUrlTls when TLS is not enabled.",[48,83726,79754,83727,190],{},[55,83728,83731],{"href":83729,"rel":83730},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7620",[264],"PR-7620",[32,83733,83735],{"id":83734},"fix-the-issue-that-the-deduplication-cursor-cannot-be-deleted-after-message-deduplication-is-disabled","Fix the issue that the deduplication cursor cannot be deleted after message deduplication is disabled",[48,83737,83738],{},"When enabling the message deduplication in the broker.conf file, disabling it and then restarting the broker, the deduplication cursor is not deleted.",[48,83740,83741],{},"This PR fixes the issue, so when you disable message deduplication, you can delete the deduplication cursor.",[48,83743,79754,83744,190],{},[55,83745,83748],{"href":83746,"rel":83747},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7656",[264],"PR-7656",[32,83750,83752],{"id":83751},"fix-the-issue-that-getlastentry-reads-entry-1","Fix the issue that GetLastEntry() reads entry -1",[48,83754,83755],{},"Previously, the code does not include a return statement. If the entry is set to -1, after sending code, the response reads the entry and sends a second response, as shown in the following example.",[8325,83757,83760],{"className":83758,"code":83759,"language":8330},[8328],"\n16:34:25.779 [pulsar-io-54-7:org.apache.bookkeeper.client.LedgerHandle@748] ERROR org.apache.bookkeeper.client.LedgerHandle - IncorrectParameterException on ledgerId:0 firstEntry:-1 lastEntry:-1\n16:34:25.779 [pulsar-client-io-82-1:org.apache.pulsar.client.impl.ConsumerImpl@1986] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [persistent:\u002F\u002Fexternal-repl-prop\u002Fpulsar-function-admin\u002Fassignment][c-use-fw-localhost-0-function-assignment-initialize-reader-b21f7607c9] Successfully getLastMessageId 0:-1\n16:34:25.779 [pulsar-client-io-82-1:org.apache.pulsar.client.impl.ClientCnx@602] WARN  org.apache.pulsar.client.impl.ClientCnx - [id: 0xc78f4a0e, L:\u002F127.0.0.1:55657 - R:localhost\u002F127.0.0.1:55615] Received error from server: Failed to get batch size for entry org.apache.bookkeeper.mledger.ManagedLedgerException: Incorrect parameter input\n16:34:25.779 [pulsar-client-io-82-1:org.apache.pulsar.client.impl.ClientCnx@612] WARN  org.apache.pulsar.client.impl.ClientCnx - [id: 0xc78f4a0e, L:\u002F127.0.0.1:55657 - R:localhost\u002F127.0.0.1:55615] Received unknown request id from server: 10\n\n",[4926,83761,83759],{"__ignoreMap":18},[48,83763,83764],{},"PR-7495 adds a return statement to code, so GetLastEntry() reads the last entry, instead of -1.",[48,83766,79754,83767,190],{},[55,83768,83771],{"href":83769,"rel":83770},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7495",[264],"PR-7495",[32,83773,83775],{"id":83774},"fix-the-error-of-updating-partitions-for-non-persistent-topic","Fix the error of updating partitions for non-persistent topic",[48,83777,83778],{},"When updating partitions on a non-persistent topic, Error 409 is returned. The pull request fixes partitions errors for non-persistent topics.",[48,83780,79754,83781,190],{},[55,83782,83785],{"href":83783,"rel":83784},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7459",[264],"PR-7459",[40,83787,66218],{"id":83788},"zookeeper",[32,83790,83792],{"id":83791},"use-hostname-for-bookie-rack-awareness-mapping","Use hostname for bookie rack awareness mapping",[48,83794,38720,83795,83800],{},[55,83796,83799],{"href":83797,"rel":83798},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F5607",[264],"PR-5607",", the useHostName() is added with return false. The rack-aware policy passes the Bookie's hostname into an IP address and then uses that IP address to figure out to which rack the bookie belongs.",[48,83802,83803],{},"Then two issues occur: 1. The IP does not match the hostname which is recorded in the \u002Fbookies z-node 2. If there is an error in parsing the bookie hostname (eg: transient DNS error), an NPE is triggered and the BK client never realizes that this bookie is available in the cluster.",[48,83805,83806],{},"The exception is thrown at Line 77(as shown in the following code snippet), since getAddress() returns a null given that the address is parsed.",[8325,83808,83811],{"className":83809,"code":83810,"language":8330},[8328],"\n74        if (dnsResolver.useHostName()) {\n75            names.add(addr.getHostName());\n76        } else {\n77            names.add(addr.getAddress().getHostAddress());\n78        }\n\n",[4926,83812,83810],{"__ignoreMap":18},[48,83814,83815],{},"The default implementation for the DnsResolver.useHostName() returns true.",[48,83817,79754,83818,190],{},[55,83819,83822],{"href":83820,"rel":83821},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7361",[264],"PR-7361",[40,83824,80071],{"id":72442},[32,83826,83828],{"id":83827},"fix-the-issue-that-the-http-header-used-in-athenz-authentication-can-not-be-renamed","Fix the issue that the HTTP header used in Athenz authentication can not be renamed",[48,83830,83831],{},"The authentication plugin for Athenz allows users to change the name of the HTTP header for sending an authentication token to a broker server with a parameter named roleHeader. The change uses the value of the roleHeader parameter on the AuthenticationAthenz side, and uses it directly as the header name.",[48,83833,79754,83834,190],{},[55,83835,83838],{"href":83836,"rel":83837},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7311",[264],"PR-7311",[32,83840,83842],{"id":83841},"fix-the-issue-that-batch-ack-set-is-recycled-multiple-times","Fix the issue that batch ack set is recycled multiple times",[48,83844,83845],{},"The batch ack sets are recycled multiple times, due to race condition in group ack flush and cumulative Ack. So we add a recycled state check for the ack set in PR-7409, and fix the recycle issue.",[48,83847,79754,83848,190],{},[55,83849,83852],{"href":83850,"rel":83851},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7409",[264],"PR-7409",[32,83854,83856],{"id":83855},"add-authentication-client-with-oauth2-support","Add authentication client with OAuth2 support",[48,83858,83859],{},"Pulsar supports authenticating clients using OAuth 2.0 access tokens. You can use tokens to identify a Pulsar client and associate with some \"principal\" (or \"role\") that is permitted to do some actions, for example, publish messages to a topic or consume messages from a topic.",[48,83861,83862],{},"This module is to support Pulsar Client Authentication Plugin for OAuth 2.0 directly. The client communicates with the Oauth 2.0 server, gets an access token from the Oauth 2.0 server, and passes the access token to Pulsar broker to do the authentication.",[48,83864,83865],{},"So, the broker can use org.apache.pulsar.broker.authentication.AuthenticationProviderToken, and the user can add their own AuthenticationProvider to work with this module.",[48,83867,79754,83868,190],{},[55,83869,83872],{"href":83870,"rel":83871},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7420",[264],"PR-7420",[32,83874,83876],{"id":83875},"not-subscribe-to-the-topic-when-the-consumer-is-closed","Not subscribe to the topic when the consumer is closed",[48,83878,83879],{},"Fix race condition on the closed consumer while reconnecting to the broker.",[48,83881,83882],{},"The race condition happens when the consumer reconnects to the broker. The connection of the consumer is set to null when the consumer reconnects to the broker. If the consumer is not connected to broker at this time, the client does not send the consumer command to the broker. So, when the consumer reconnects to the broker, the consumer sends the subscribe command again.",[48,83884,83885],{},"This pull request adds a state check when the connectionOpened() of the consumer opens. If the consumer is in closing or closed state, the consumer does not send the subscribe command.",[48,83887,79754,83888,190],{},[55,83889,83892],{"href":83890,"rel":83891},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7589",[264],"PR-7589",[32,83894,83896],{"id":83895},"oauth2-authentication-plugin-uses-asynchttpclient","OAuth2 authentication plugin uses AsyncHttpClient",[48,83898,83899,83900,83905],{},"Previously, the OAuth2 client authentication plugin used Apache HTTP client lib to make requests, Apache HTTP client is used to validate hostname. As suggested in ",[55,83901,83904],{"href":83902,"rel":83903},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F7612",[264],"#7612",", we get rid of the dependency of using Apache HTTP client.",[48,83907,83908,83909,190],{},"In PR-7615, OAuth2 client authentication plugin uses AsyncHttpClient, which is used in client and broker. For more information about implementation, see ",[55,83910,83913],{"href":83911,"rel":83912},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7615",[264],"PR-7615",[40,83915,80126],{"id":80125},[32,83917,83919],{"id":83918},"cpp-oauth2-authentication-client","CPP Oauth2 authentication client",[48,83921,83922],{},"Pulsar supports authenticating clients using OAuth 2.0 access tokens. You can use tokens to identify a Pulsar client and associate with some \"principal\" (or \"role\") that is permitted to do some actions (eg: publish messages to a topic or consume messages from a topic). This change tries to support it in cpp client.",[48,83924,79754,83925,190],{},[55,83926,83929],{"href":83927,"rel":83928},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7467",[264],"PR-7467",[32,83931,83933],{"id":83932},"fix-partition-index-error-in-close-callback","Fix partition index error in close callback",[48,83935,83936],{},"In partitioned producer\u002Fconsumer's close callback, the partition index is always 0. The ProducerImpl\u002FConsumerImpl internal partition index field should be passed to PartitionedProducerImpl\u002FPartitionedConsumerImpl close callback.",[48,83938,79754,83939,190],{},[55,83940,83943],{"href":83941,"rel":83942},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7282",[264],"PR-7282",[32,83945,83947],{"id":83946},"fix-segment-crashes-caused-by-race-condition-of-timer-in-cpp-client","Fix segment crashes caused by race condition of timer in CPP client",[48,83949,83950],{},"Segment crashes occur in a race condition: - The close operation calls the keepAliveTimer_.reset(). - The keepAliveTimer is called by startConsumerStatsTimer and handleKeepAliveTimeout methods. Actually, the keepAliveTimer should not be called by those two methods.",[48,83952,83953],{},"This pull request fixes those issues.",[48,83955,79754,83956,190],{},[55,83957,83960],{"href":83958,"rel":83959},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7572",[264],"PR-7572",[32,83962,83964],{"id":83963},"add-support-to-read-credentials-from-file","Add support to read credentials from file",[48,83966,83967],{},"Support reading credentials from a file to make it align with the Java client.",[48,83969,79754,83970,190],{},[55,83971,83974],{"href":83972,"rel":83973},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7606",[264],"PR-7606",[32,83976,83978],{"id":83977},"fix-multi-topic-consumer-segfault-on-connection-error","Fix multi-topic consumer segfault on connection error",[48,83980,83981],{},"The multi-topic consumer triggers a segfault when an error occurs in creating a consumer. This is due to the calls to close the partial consumers with a null callback.",[48,83983,79754,83984,190],{},[55,83985,83988],{"href":83986,"rel":83987},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7588",[264],"PR-7588",[40,83990,9636],{"id":80225},[32,83992,83994],{"id":83993},"use-fully-qualified-hostname-as-default-to-advertise-worker","Use fully qualified hostname as default to advertise worker",[48,83996,83997],{},"There is a difference in getting hostnames between Java 8 and Java 11. In Java 8, InetAddress.getLocalHost().getHostName() returns the fully qualified hostname; in Java 11, it returns a simple hostname. In this case, we should rather use the getCanonicalHostName(), which returns the fully qualified hostname. This is the same method to get the advertised address for workers as well.",[48,83999,79754,84000,190],{},[55,84001,84004],{"href":84002,"rel":84003},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7360",[264],"PR-7360",[32,84006,84008],{"id":84007},"fix-the-function-bc-issue-introduced-in-release-260","Fix the function BC issue introduced in release 2.6.0",[48,84010,84011,84012,84017],{},"A backwards compatibility breakage is introduced in ",[55,84013,84016],{"href":84014,"rel":84015},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F5985",[264],"PR-5985",". When the running function workers are separated from brokers, updating workers and brokers independently from release 2.5.0 to 2.6.0 results in the following error:",[8325,84019,84022],{"className":84020,"code":84021,"language":8330},[8328],"\njava.lang.NullPointerException: null\\n\\tat java.net.URI$Parser.parse(URI.java:3104) ~[?:?]\njava.net.URI.(URI.java:600) ~[?:?]\\n\\tat java.net.URI.create(URI.java:881) ~[?:?]\norg.apache.pulsar.functions.worker.WorkerUtils.initializeDlogNamespace(WorkerUtils.java:160) ~[org.apache.pulsar-pulsar-functions-worker-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]\norg.apache.pulsar.functions.worker.Worker.initialize(Worker.java:155) ~[org.apache.pulsar-pulsar-functions-worker-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT] \norg.apache.pulsar.functions.worker.Worker.start(Worker.java:69) ~[org.apache.pulsar-pulsar-functions-worker-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT] \norg.apache.pulsar.functions.worker.FunctionWorkerStarter.main(FunctionWorkerStarter.java:67) [org.apache.pulsar-pulsar-functions-worker-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]\n \n",[4926,84023,84021],{"__ignoreMap":18},[48,84025,84026],{},"This is because the broker 2.5.0 supports \"bookkeeperMetadataServiceUri\" and the admin client returns a null field, thus causing the NPE.",[48,84028,79754,84029,190],{},[55,84030,84033],{"href":84031,"rel":84032},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7528",[264],"PR-7528",[40,84035,80341],{"id":80341},[32,84037,84039],{"id":84038},"support-tlsallowinsecureconnection-in-pulsar-perf-produceconsumeread-performance-tests","Support tlsAllowInsecureConnection in pulsar-perf produce\u002Fconsume\u002Fread performance tests",[48,84041,84042],{},"Add tlsAllowInsecureConnection config to the CLI tool pulsar-perf, to support produce\u002Fconsume\u002Fread performance tests to clusters with insecure TLS connections.",[48,84044,79754,84045,190],{},[55,84046,84049],{"href":84047,"rel":84048},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7300",[264],"PR-7300",[40,84051,52473],{"id":52472},[32,84053,75345],{"id":84054},"pulsar",[321,84056,84057,84063],{},[324,84058,84059,84060,190],{},"To download Apache Pulsar 2.6.1, click ",[55,84061,267],{"href":53730,"rel":84062},[264],[324,84064,84065,84066,4003,84071,190],{},"For more information about Apache Pulsar 2.6.1, see ",[55,84067,84070],{"href":84068,"rel":84069},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#2.6.1",[264],"2.6.1 release notes",[55,84072,84075],{"href":84073,"rel":84074},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpulls?q=is%3Apr+label%3Arelease%2F2.6.1+is%3Aclosed",[264],"2.6.1 PR list",[48,84077,84078],{},"If you have any questions or suggestions, contact us by Pulsar mailing list or Slack.",[321,84080,84081,84088],{},[324,84082,84083,84084,4003,84086,190],{},"Pulsar mailing list: ",[55,84085,78612],{"href":78611},[55,84087,78618],{"href":78617},[324,84089,84090,84091,190],{},"Pulsar Slack: ",[55,84092,36242],{"href":36242,"rel":84093},[264],[48,84095,78633,84096,190],{},[55,84097,75345],{"href":36230,"rel":84098},[264],[48,84100,84101,84102,190],{},"This post was originally published by Xiaolong Ran on ",[55,84103,84106],{"href":84104,"rel":84105},"https:\u002F\u002Fpulsar.apache.org\u002Fblog\u002F2020\u002F08\u002F21\u002FApache-Pulsar-2-6-1\u002F",[264],"Apache Pulsar blog",[32,84108,4496],{"id":84109},"streamnative",[48,84111,84112,84113,84116,84117,190],{},"If you are interested in Pulsar community news, Pulsar development details, and Pulsar user stories on production, navigate to ",[55,84114,84115],{"href":10259},"StreamNative website"," or follow ",[55,84118,84120],{"href":33664,"rel":84119},[264],"@streamnativeio on Twitter",[48,84122,84123,84124,190],{},"If you are interested in Pulsar examples, demos, tools, and extensions, check out ",[55,84125,84128],{"href":84126,"rel":84127},"https:\u002F\u002Fgithub.com\u002Fstreamnative",[264],"StreamNative GitHub",[48,84130,84131],{},"Start your journey now with StreamNative!",{"title":18,"searchDepth":19,"depth":19,"links":84133},[84134,84147,84150,84157,84164,84168,84171],{"id":61064,"depth":19,"text":61065,"children":84135},[84136,84137,84138,84139,84140,84141,84142,84143,84144,84145,84146],{"id":83576,"depth":279,"text":83577},{"id":83598,"depth":279,"text":83599},{"id":83615,"depth":279,"text":83616},{"id":83638,"depth":279,"text":83639},{"id":83655,"depth":279,"text":83656},{"id":83678,"depth":279,"text":83679},{"id":83692,"depth":279,"text":83693},{"id":83720,"depth":279,"text":83721},{"id":83734,"depth":279,"text":83735},{"id":83751,"depth":279,"text":83752},{"id":83774,"depth":279,"text":83775},{"id":83788,"depth":19,"text":66218,"children":84148},[84149],{"id":83791,"depth":279,"text":83792},{"id":72442,"depth":19,"text":80071,"children":84151},[84152,84153,84154,84155,84156],{"id":83827,"depth":279,"text":83828},{"id":83841,"depth":279,"text":83842},{"id":83855,"depth":279,"text":83856},{"id":83875,"depth":279,"text":83876},{"id":83895,"depth":279,"text":83896},{"id":80125,"depth":19,"text":80126,"children":84158},[84159,84160,84161,84162,84163],{"id":83918,"depth":279,"text":83919},{"id":83932,"depth":279,"text":83933},{"id":83946,"depth":279,"text":83947},{"id":83963,"depth":279,"text":83964},{"id":83977,"depth":279,"text":83978},{"id":80225,"depth":19,"text":9636,"children":84165},[84166,84167],{"id":83993,"depth":279,"text":83994},{"id":84007,"depth":279,"text":84008},{"id":80341,"depth":19,"text":80341,"children":84169},[84170],{"id":84038,"depth":279,"text":84039},{"id":52472,"depth":19,"text":52473,"children":84172},[84173,84174],{"id":84054,"depth":279,"text":75345},{"id":84109,"depth":279,"text":4496},"2020-08-21","Learn the most interesting and major features added to Pulsar 2.6.1.","\u002Fimgs\u002Fblogs\u002F63d797d7acb94784fd9d9a37_63a379687ccf63ef4dbed36a_261-top.webp",{},"\u002Fblog\u002Fapache-pulsar-2-6-1",{"title":83562,"description":84176},"blog\u002Fapache-pulsar-2-6-1",[302,821],"3mLi76TWFK1ssHOd3T90eky-zmufOkQtQkuuakl9YH8",{"id":84185,"title":84186,"authors":84187,"body":84189,"category":3550,"createdAt":290,"date":84316,"description":84317,"extension":8,"featured":294,"image":84318,"isDraft":294,"link":290,"meta":84319,"navigation":7,"order":296,"path":84320,"readingTime":11508,"relatedResources":290,"seo":84321,"stem":84322,"tags":84323,"__hash__":84324},"blogs\u002Fblog\u002Fannouncing-streamnative-cloud-apache-pulsar-as-a-service.md","Announcing StreamNative Cloud - Apache Pulsar as a Service",[84188,806],"Eron Wright",{"type":15,"value":84190,"toc":84306},[84191,84194,84197,84200,84203,84206,84208,84211,84214,84217,84220,84224,84227,84230,84244,84247,84251,84254,84257,84261,84264,84267,84270,84274,84277,84281,84294,84304],[40,84192,84193],{"id":36542},"INTRO",[48,84195,84196],{},"As companies look to develop real-time data streaming capabilities, they must find new technologies to support these initiatives. With Pulsar’s ability to provide a unified messaging model, including both streaming and long-term storage capabilities, built-in multi-tenancy, and instant scalability via its multilayer architecture, it is increasingly the top choice for companies looking to build next-generation messaging and event streaming applications.",[48,84198,84199],{},"We’re excited to announce StreamNative Cloud, providing Apache Pulsar®-as-a-Service. For organizations looking to leverage Pulsar as the backbone for their real-time data, for core business applications, or as their microservice messaging platform, StreamNative Cloud is the simple, fast, reliable, and cost-effective way to run Pulsar in the cloud.",[48,84201,84202],{},"With StreamNative Cloud, we provide a turnkey solution to help organizations make the transition to a “streaming first” architecture. StreamNative Cloud enables developers to focus on building applications, instead of managing and maintaining complex systems and data services. Now, developers can spin up a Pulsar-based messaging and event streaming service in the public cloud in minutes.",[48,84204,84205],{},"As the company behind Apache Pulsar, StreamNative’s mission is to empower Pulsar users and to help drive adoption in this fast-growing open-source community. StreamNative Cloud helps companies to adopt and integrate Pulsar, without the heavy-lifting.",[32,84207,77553],{"id":77552},[48,84209,84210],{},"StreamNative Cloud enables users to enjoy all the benefits that make Pulsar a next-generation cloud-native messaging and event streaming technology. This out-of-the-box solution helps organizations by accelerating application development and improving time-to-market.",[48,84212,84213],{},"By taking on the responsibility of cluster management, StreamNative’s Cloud offering enables teams to focus on building the applications and products needed to achieve their business goals, without having to worry about the management and maintenance of the messaging platform.",[48,84215,84216],{},"With StreamNative Cloud, users can get started on Pulsar in just minutes by creating a cluster in the web UI or by automating the creation of clusters through StreamNative’s command line tools. All other details, including software updates, configurations, and security patches, will be taken care of by the StreamNative team.",[48,84218,84219],{},"In addition to cluster management, StreamNative Cloud also offers a robust UI for managing the complete set of core Pulsar features, such as tenants, namespaces, topics, schemas, and more.",[32,84221,84223],{"id":84222},"support-for-pulsar-and-the-pulsar-ecosystem","Support for Pulsar and the Pulsar Ecosystem",[48,84225,84226],{},"StreamNative Cloud is built on the open-source foundation of Apache Pulsar, providing full compatibility with Pulsar’s APIs and protocol. This enables users to adopt everything in Pulsar’s thriving open-source ecosystem and to work seamlessly with all open-source tools.",[48,84228,84229],{},"Over the last year, StreamNative has created several tools to help developers leverage Pulsar’s powerful messaging and streaming ecosystem. Some highlights include:",[321,84231,84232,84235,84238,84241],{},[324,84233,84234],{},"Kafka-on-Pulsar, which provides seamless Kafka integration for Kafka 2.0 protocol",[324,84236,84237],{},"AMQP-on-Pulsar, which provides support for applications written in AMQP-0.9.1 protocol",[324,84239,84240],{},"MQTT-on-Pulsar for building IoT applications",[324,84242,84243],{},"StreamNative Hub, which hosts several connections and integrations",[48,84245,84246],{},"While support for Pulsar Functions, IO Connectors, Pulsar SQL and some of the StreamNative tools are still in progress, these will all be available in StreamNative Cloud in the future.",[32,84248,84250],{"id":84249},"committed-to-open-source","Committed To Open-Source",[48,84252,84253],{},"StreamNative is committed to open source and to providing companies with the flexibility to avoid vendor lock-in. Instead of worrying about lock-in via cloud-provider messaging services, proprietary APIs, or closed source extensions that are critical to a messaging solution, StreamNative Cloud provides Pulsar without the restrictions that make migration difficult.",[48,84255,84256],{},"Whether you want to move cloud providers with StreamNative Cloud, or even move away from StreamNative Cloud to a self-hosted Pulsar instance, the core features of Pulsar that make it powerful and easy to operate will continue to be available. Pulsar’s unified messaging model, built-in multi-tenancy, tiered storage and much more, will always be integral to core Pulsar and StreamNative will continue to improve these features in the open-source distribution.",[32,84258,84260],{"id":84259},"streamnatives-cloud-offering","StreamNative’s Cloud Offering",[48,84262,84263],{},"StreamNative’s Cloud provides flexible options to make getting started easy, including both Cloud-Hosted and Cloud-Managed options:",[48,84265,84266],{},"The Cloud-Hosted service provides the ability to spin up a StreamNative-hosted Pulsar cluster on a cloud provider of your choice in just minutes. (Today, we support Google Cloud, with AWS and more cloud providers coming soon!) The cluster provisioning is fully automated. We take care of managing both the infrastructure and software to ensure a scalable, resilient, and secure messaging and event streaming platform so that teams can focus on building applications.",[48,84268,84269],{},"With our Cloud-Managed service, StreamNative offers a fully-managed Pulsar cluster deployable to a public or private cloud environment, fully customized to meet user needs. With this option, users can ensure the data stays in their environment in order to meet any security and compliance requirements. We manage Pulsar so that users don’t have to spend time or resources to deploy, upgrade, and maintain clusters.",[32,84271,84273],{"id":84272},"powered-by-apache-pulsars-core-developers","Powered By Apache Pulsar’s Core Developers",[48,84275,84276],{},"As the core developers of Pulsar, the StreamNative team is deeply versed in the technology, the community, and the use cases, and has experience operating Pulsar in large scale production environments, including at both Twitter and Yahoo!. The StreamNative team’s unmatched operational experience on Pulsar and Bookkeeper is now available to you through StreamNative Cloud.",[32,84278,84280],{"id":84279},"beta-access-available-now","Beta Access Available Now",[48,84282,84283,84284,84288,84289,84293],{},"StreamNative Cloud is a fully managed, scalable messaging and event streaming service that provides a turnkey solution for enterprise companies looking to build and launch event streaming applications in the cloud. We are offering Beta Access of the ",[55,84285,84287],{"href":84286},"\u002Fcloud\u002Fhosted","StreamNative Cloud-Hosted service"," to a handful of users first before the general release, ",[55,84290,29176],{"href":84291,"rel":84292},"https:\u002F\u002Fconsole.streamnative.cloud\u002F?defaultMethod=singup",[264]," now to be part of the StreamNative Cloud Beta Access program.",[48,84295,84296,84297,84300,84301,84303],{},"If you are interested in a ",[55,84298,84299],{"href":66747},"StreamNative-Managed cluster"," in your own environment, please ",[55,84302,24379],{"href":6392},". We can get you started immediately.",[48,84305,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":84307},[84308],{"id":36542,"depth":19,"text":84193,"children":84309},[84310,84311,84312,84313,84314,84315],{"id":77552,"depth":279,"text":77553},{"id":84222,"depth":279,"text":84223},{"id":84249,"depth":279,"text":84250},{"id":84259,"depth":279,"text":84260},{"id":84272,"depth":279,"text":84273},{"id":84279,"depth":279,"text":84280},"2020-08-18","Announce StreamNative Cloud, providing Apache Pulsar®-as-a-Service. For organizations looking to leverage Pulsar as the backbone for their real-time data, for core business applications, or as their microservice messaging platform, StreamNative Cloud is the simple, fast, reliable, and cost-effective way to run Pulsar in the cloud.","\u002Fimgs\u002Fblogs\u002F63d797fb8be3dd802d4db67d_63a378da641e2f20abdccbf2_cloud-announcement.webp",{},"\u002Fblog\u002Fannouncing-streamnative-cloud-apache-pulsar-as-a-service",{"title":84186,"description":84317},"blog\u002Fannouncing-streamnative-cloud-apache-pulsar-as-a-service",[302,3550,821,27847,303],"9L5sknphdr9E7QYfpF6abK8P7VKRdQYZm55ECKwUauA",{"id":84326,"title":84327,"authors":84328,"body":84329,"category":821,"createdAt":290,"date":84687,"description":84688,"extension":8,"featured":294,"image":84689,"isDraft":294,"link":290,"meta":84690,"navigation":7,"order":296,"path":84691,"readingTime":31039,"relatedResources":290,"seo":84692,"stem":84693,"tags":84694,"__hash__":84695},"blogs\u002Fblog\u002Fapache-pulsar-adoption-why-companies-use-streaming-messaging-platform.md","Apache Pulsar Adoption: Why Companies Use the Streaming and Messaging Platform",[69353,60441,806],{"type":15,"value":84330,"toc":84665},[84331,84338,84340,84343,84346,84349,84352,84355,84358,84362,84365,84373,84376,84379,84383,84390,84394,84402,84406,84414,84417,84421,84424,84427,84431,84434,84437,84440,84444,84447,84450,84453,84455,84463,84467,84470,84473,84476,84479,84482,84485,84489,84492,84496,84527,84530,84533,84537,84546,84560,84569,84577,84579,84581,84584,84586,84589,84592,84594,84597,84600,84603,84606,84609,84614,84618,84630,84637,84639,84642,84646,84663],[48,84332,84333,84334,84337],{},"This is Part 2 of a two-part series in which we share our perspectives on Pulsar vs. Kafka. In ",[55,84335,47846],{"href":84336},"\u002Fblog\u002Ftech\u002Fpulsar-vs-kafka-part-1",", we compared Pulsar and Kafka from an engineering perspective and discussed performance, architecture, and features. In Part 2, we aim to provide a broader business perspective by sharing insights into Pulsar's rapidly growing popularity.",[40,84339,84193],{"id":36542},[48,84341,84342],{},"Data is transforming the business landscape with major industry leaders like Amazon, Uber, and Netflix demonstrating how access to real-time data, data messaging, and processing capabilities can translate to better products and customer experiences, disrupt entire industries, and generate billions in revenue. This need for real-time insights across industries is driving adoption and innovation in the messaging space.",[48,84344,84345],{},"As companies look to adopt real-time streaming solutions for new, innovative applications and to improve their existing systems, business leaders are seeking to better understand the respective advantages and disadvantages associated with the top technologies in the space, namely Pulsar, Kafka, and RabbitMQ.",[48,84347,84348],{},"Today, companies' messaging needs are increasingly complex and many organizations require a more comprehensive solution than RabbitMQ or Kafka can provide on their own. While RabbitMQ is best suited for message queueing and Kafka can manage data pipelines, Pulsar can accomplish both.",[48,84350,84351],{},"Companies that have a need for both types of messaging are increasingly choosing Pulsar for its flexibility, scalability, and ability to simplify operations by delivering multiple messaging functions on the same platform. Pulsar provides unique, sought-after capabilities, such as unified messaging and the ability to build streaming-first applications, which are powering some of today's most advanced companies.",[48,84353,84354],{},"However, because Pulsar is a younger technology, some are less familiar with its capabilities. In this post, we will address some common misconceptions about Pulsar and show Pulsar's growing popularity as evidenced by its rapid growth in adoption, an increase in the number and variety of use cases, and its ever-expanding community. We will also address the risks associated with adopting a new technology and explain why maintaining the status quo presents the risk of being left behind in a quickly changing landscape.",[48,84356,84357],{},"We have chosen to frame our discussion around commonly asked questions.",[40,84359,84361],{"id":84360},"_1-how-mature-is-pulsars-technology-and-has-it-been-tested-in-real-world-applications","#1: How mature is Pulsar's technology and has it been tested in real-world applications?",[48,84363,84364],{},"To provide some insight into Pulsar's maturity and real-world use cases, we'll start with a brief background on its origin and development.",[48,84366,84367,84372],{},[55,84368,84371],{"href":84369,"rel":84370},"https:\u002F\u002Fyahooeng.tumblr.com\u002Fpost\u002F150078336821\u002Fopen-sourcing-pulsar-pub-sub-messaging-at-scale#notes?ref_url=https:\u002F\u002Fyahooeng.tumblr.com\u002Fpost\u002F150078336821\u002Fopen-sourcing-pulsar-pub-sub-messaging-at-scale\u002Fembed#_=_",[264],"Pulsar's development began within Yahoo"," in 2012. It was committed to open source in 2016 and became a top-level Apache project in 2018. It has enterprise support from StreamNative. Pulsar enjoys several advantages as a newer entrant into the messaging space. Specifically, its developers at Yahoo had worked on Kafka and other traditional messaging technologies previously and knew the shortcomings associated with these platforms first-hand. As a result, they designed Pulsar with some distinct advantages that make it easier to operate as well as to provide features - such as unified messaging and tiered storage - which introduce new capabilities that are well-suited for emerging use cases.",[48,84374,84375],{},"By comparison, Kafka originated within LinkedIn. It was committed to open source in 2011 and became a top-level Apache project in 2012. As the first major event-streaming platform on the market, it is widely recognized and widely adopted. Kafka receives enterprise support from a number of companies, including Confluent. Compared to Pulsar, Kafka is a more mature technology that is popular, has a bigger community, and a more advanced ecosystem.",[48,84377,84378],{},"Pulsar has seen tremendous growth, particularly over the past 18 months. It has been adopted by a growing list of global media companies, technology companies, and financial institutions. Below are examples of significant enterprise-level use cases that illustrate Pulsar's ability to handle mission-critical applications.",[32,84380,84382],{"id":84381},"tencent-builds-their-payment-platform-on-pulsar","Tencent Builds Their Payment Platform on Pulsar",[48,84384,84385,84389],{},[55,84386,84388],{"href":84387},"\u002Fblog\u002Ftech\u002F2019-10-22-powering-tencent-billing-platform-with-apache-pulsar\u002F","Tencent's adoption"," of Pulsar for their transactional billing system, Midas, demonstrates Pulsar's ability to handle mission-critical applications and provides compelling evidence that the technology has been rigorously tested and performs well in demanding environments. Midas operates at a massive scale, processing more than 10 billion transactions and 10+ TBs of data daily. The billing system is a critical piece of infrastructure for a company with over $50 billion in annual revenue.",[32,84391,84393],{"id":84392},"five-years-of-success-at-verizon-media","Five Years of Success at Verizon Media",[48,84395,84396,84397,84401],{},"Verizon Media provides another compelling use case, having successfully operated Pulsar in production for over five years. Verizon Media, via its acquisition of Yahoo, is the original developer of Pulsar. In their recent ",[55,84398,84400],{"href":83352,"rel":84399},[264],"Pulsar Summit talk",", Joe Francis and Ludwig Pummer of Verizon Media described Pulsar as a \"battle-tested\" system that is being used throughout the Verizon Media landscape. They shared that Pulsar routinely handles up to 3 million write requests\u002Fsecond on more than 2.8 million distinct topics. Pulsar has satisfied Verizon Media's need for a low-latency, highly available system that can be scaled easily and has the ability to support a business that operates across six global data centers.",[32,84403,84405],{"id":84404},"splunk-adopts-pulsar-for-their-data-stream-processor","Splunk Adopts Pulsar for Their Data Stream Processor",[48,84407,84408,84409,84413],{},"Another key adoption story comes from Splunk, a company that has used Kafka in production environments for years. During a recent Pulsar Summit talk, \"",[55,84410,84412],{"href":83358,"rel":84411},[264],"Why Splunk Chose Pulsar","\", Karthik Ramasamy shared Splunk's reasons for choosing Pulsar to power its next-generation analytics product, Splunk DSP, which handles billions of events per day. Ramasamy explained that Pulsar was able to meet 18 key requirements and cited its ease of scalability, lower operating costs, better performance, and strong open-source community as major factors in their decision to adopt Pulsar.",[48,84415,84416],{},"The above use cases clearly demonstrate that Pulsar is a powerful solution that many industry leaders are choosing to power critical business infrastructure. Although Kafka is more mature and more widely used, Pulsar's rapid rate of adoption is evidence of its strong capabilities and readiness for mission-critical use cases.",[40,84418,84420],{"id":84419},"_2-what-are-the-key-differences-between-the-competing-technologies-and-what-business-advantages-are-associated-with-each","#2: What are the key differences between the competing technologies, and what business advantages are associated with each?",[48,84422,84423],{},"While major technology and media companies, such as Uber and Netflix, have been able to successfully build unified batch and stream processing and streaming-first applications to power their real-time data needs, most companies lack the vast engineering and financial resources these applications typically require. However, Pulsar offers advanced messaging capabilities that enable companies to overcome many of these challenges.",[48,84425,84426],{},"Below, we highlight three unique capabilities - some current and others still in development - that distinctly set Pulsar apart from its competitors.",[32,84428,84430],{"id":84429},"unified-messaging-model","Unified Messaging Model",[48,84432,84433],{},"Two of the most common types of messaging used today are application messaging (traditional queuing systems) and data pipelines. Application messaging is used to enable asynchronous communications (often developed on platforms such as RabbitMQ, AMQP, JMS, among others), while data pipelines are used to move high volumes of data between different systems (such as Apache Kafka or AWS Kinesis). Because these two types of messaging are performed on different systems and serve different functions, companies often need to operate both. Developing and managing separate systems is not only expensive and complex, but can also make it difficult to integrate systems and centralize data.",[48,84435,84436],{},"Pulsar's core technology gives users the ability both to deploy it as a traditional queuing system and use it in data pipelines, uniquely positioning Pulsar as the ideal platform to provide unified messaging capabilities. Unified messaging makes it easier for organizations to capture and distribute their data, which facilitates the use of real-time data to drive business innovation.",[48,84438,84439],{},"Pulsar also recently added tools - Kafka-on-Pulsar (KoP) and AMQP-on-Pulsar (AoP) - that make it even easier for companies to leverage these unified messaging capabilities. (We discuss KoP and AoP in more detail below.)",[32,84441,84443],{"id":84442},"batch-and-event-stream-storage","Batch and Event-Stream Storage",[48,84445,84446],{},"Because companies today need to be able to make timely decisions and react to change quickly, the need for real-time, meaningful data has never been more critical. At the same time, it is crucial to be able to integrate and understand large amounts of historical data in order to gain a complete picture of a business.",[48,84448,84449],{},"Traditional Big Data systems (such as Hadoop) facilitate decision-making by allowing organizations to analyze massive historical data sets. However, as these systems can take minutes, hours, or even days to process data, they struggle to integrate real-time data and the results they produce are often of limited value.",[48,84451,84452],{},"Stream processors, such as Kafka Streams, are adept at processing streaming data and computing answers closer to real-time, but are not a good fit for processing large historical datasets. Many organizations need to run both batch and streaming data processors in order to gain the insights they need for their business. However, maintaining multiple systems is expensive and each system has its own respective challenges.",[48,84454,71872],{},[48,84456,84457,84458,84462],{},"By contrast, ",[55,84459,84461],{"href":75514,"rel":84460},[264],"Pulsar's tiered storage model"," provides the batch storage capabilities needed to support batch processing in Flink. In the near future, Flink's batch processing capabilities will be integrated with Pulsar, enabling companies to query both historical and real-time data quickly and more easily, unlocking a unique competitive advantage.",[32,84464,84466],{"id":84465},"streaming-first-applications","\"Streaming-First\" Applications",[48,84468,84469],{},"Web application development is in the midst of a major transformation as companies look to develop more sophisticated software. The traditional application model that pairs a single monolithic application with a large SQL database is giving way to applications composed of many, smaller components, or \"microservices.\"",[48,84471,84472],{},"Many organizations are now adopting microservices because they offer greater flexibility to meet changing business needs and help facilitate development across growing engineering teams. However, microservices introduce new challenges, such as the need to enable communication among various components and keep them synchronized.",[48,84474,84475],{},"With a newer microservices technique called \"event sourcing,\" applications produce and broadcast streams of events into a shared messaging system which captures the event history in a centralized log. This improves the flow of data and helps keep applications in sync.",[48,84477,84478],{},"But event sourcing can be difficult to implement as it requires both traditional messaging capabilities and the ability to store event history for long periods of time. While Kafka is capable of storing streams of events for days or weeks, event sourcing typically requires longer retention times. This added challenge often requires users to build multiple tiers of Kafka clusters to manage the growth of event data, plus additional systems to manage and track data collectively.",[48,84480,84481],{},"By contrast, Pulsar's unified messaging model is a natural fit, as it can easily distribute events to other components and effectively store event streams for indefinite periods of time. This unique design feature makes Pulsar especially attractive to companies looking to acquire dynamic, streaming-first capabilities.",[48,84483,84484],{},"While unified messaging, combined batch and event-streaming storage, and a \"streaming-first\" approach might be feasible to achieve with other systems, these features would be complex to implement and would require a great deal of effort and investment. In contrast, Pulsar's design includes all of these features, enabling users to adapt to the changing technology landscape easily and with far less complexity.",[40,84486,84488],{"id":84487},"_3-does-pulsar-have-the-community-and-enterprise-support-it-needs-to-continue-to-develop-and-garner-further-adoption","#3: Does Pulsar have the community and enterprise support it needs to continue to develop and garner further adoption?",[48,84490,84491],{},"A snapshot comparison of the Pulsar and Kafka communities today reflects that Kafka's is larger overall, with more Slack users and more stack overflow questions. While Pulsar's community is currently smaller, it is highly engaged and rapidly growing. Below are some highlights of its recent momentum.",[32,84493,84495],{"id":84494},"pulsars-first-global-summit","Pulsar's First Global Summit",[48,84497,84498,84499,84504,84505,84509,84510,1186,84514,1186,84518,5422,84522,190],{},"In June, ",[55,84500,84503],{"href":84501,"rel":84502},"https:\u002F\u002Ffinance.yahoo.com\u002Fnews\u002Frise-apache-pulsar-first-ever-162100598.html",[264],"Pulsar held its first global event"," - the ",[55,84506,81892],{"href":84507,"rel":84508},"https:\u002F\u002Fpulsar-summit.org\u002Fschedule\u002Ffirst-day",[264],". The event featured more than 30 speaker sessions from Pulsar's top contributors, thought leaders, and developers. We heard real-world Pulsar adoption stories and received insights from companies such as ",[55,84511,83442],{"href":84512,"rel":84513},"https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fverizon-media\u002F",[264],[55,84515,83430],{"href":84516,"rel":84517},"https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fsplunk\u002F",[264],[55,84519,96],{"href":84520,"rel":84521},"https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fiterable\u002F",[264],[55,84523,84526],{"href":84524,"rel":84525},"https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fovhgroup\u002F",[264],"OVHcloud",[48,84528,84529],{},"With more than 600 sign-ups - including attendees from top internet, technology, and financial institutions such as Google, Microsoft, AMEX, Salesforce, Disney, and Paypal - the event revealed a highly engaged and global Pulsar community and demonstrated that interest in Pulsar is burgeoning.",[48,84531,84532],{},"In fact, the global Pulsar community subsequently asked us to host dedicated regional events in Asia and Europe soon. To meet this growing demand, we have scheduled Pulsar Summit Asia 2020 in October and are currently planning Pulsar Summit Europe.",[32,84534,84536],{"id":84535},"community-support-training-and-events","Community Support - Training and Events",[48,84538,84539,84540,84545],{},"In addition to facilitating large, widely attended summits, the Pulsar community is focusing on interactive training and online events. For example, earlier this year, the community, led by StreamNative, launched a weekly ",[55,84541,84544],{"href":84542,"rel":84543},"https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqRma1oIkcWhWAhKgImEeRiQi5vMlqTc-",[264],"live-streaming, interactive tutorial"," called TGIP (Thank Goodness It's Pulsar) that provides technology updates and hands-on tutorials highlighting various operational aspects. TGIP sessions are available on YouTube and StreamNative.io and are helping to augment Pulsar's growing knowledge base.",[48,84547,84548,84549,84553,84554,84559],{},"In 2020, the Pulsar community also launched ",[55,84550,84552],{"href":76852,"rel":84551},[264],"monthly webinars"," to share best practices, new use cases, and technology updates. Recent webinars have been hosted by strategic commercial and open-source partners such as OVHCloud, Overstock, and Nutanix. On July 28th, StreamNative will be hosting ",[55,84555,84558],{"href":84556,"rel":84557},"https:\u002F\u002Fus02web.zoom.us\u002Fwebinar\u002Fregister\u002FWN_xMt6QBJ9TWiyeVdifqKITg",[264],"Operating Pulsar in Production"," as a panel discussion with additional participants from Verizon Media and Splunk.",[48,84561,84562,84563,84568],{},"Pulsar's ecosystem has further evolved with the expansion of professional training, which is available through StreamNative and other partners. In fact, Pulsar and Kafka expert Jesse Anderson recently led an in-depth training session on ",[55,84564,84567],{"href":84565,"rel":84566},"https:\u002F\u002Fgumroad.com\u002Fl\u002FsuukG",[264],"Developing Pulsar Applications",". Professional training sessions help to enlarge the pool of Pulsar-trained engineers and allow Pulsar users to accelerate their messaging and streaming platform development initiatives.",[48,84570,84571,84572,84576],{},"In addition, an increase in the ",[55,84573,84575],{"href":84574},"\u002Fresource","publication of whitepapers"," is helping to expand Pulsar's knowledge base.",[48,84578,83373],{},[3933,84580,83377],{"id":83376},[48,84582,84583],{},"In March 2020, OVHCloud and StreamNative launched Kafka-on-Pulsar (KoP), the result of the two companies working closely in partnership. KoP enables Kafka users to migrate their existing Kafka applications and services to Pulsar without modifying the code. Although only recently released, KoP has already been adopted by several organizations and is being used in production environments. Moreover, KoP's availability is helping to expand Pulsar's adoption.",[3933,84585,83392],{"id":83391},[48,84587,84588],{},"In June 2020, China Mobile and StreamNative announced the launch of another major platform upgrade, AMQP on Pulsar (AoP). Similar to KoP, AoP allows organizations currently using RabbitMQ (or other AMQP message brokers) to migrate existing applications and services to Pulsar without code modification. Again, this is a key initiative that will help drive the adoption and usage of Pulsar.",[48,84590,84591],{},"The events and initiatives described above illustrate the Pulsar community's firm commitment to education and ecosystem development. More importantly, they demonstrate the momentum and growth we can expect in the future.",[40,84593,2125],{"id":2122},[48,84595,84596],{},"In today's ever-changing business landscape, access to data can unlock innovative business opportunities, define new categories, and propel companies ahead of the competition. As a result, organizations are increasingly seeking to leverage their data and the insights that can be gained from it to develop competitive advantages, and they are seeking new technologies to help them achieve these goals.",[48,84598,84599],{},"In this post, we set out to address some common business concerns organizations face when evaluating a new technology. These include the technology's proven capabilities, its ability to enable in-demand business use cases, and, in the case of open-source technologies, the size and level of engagement within the project's community.",[48,84601,84602],{},"The Tencent, Verizon Media, and Splunk use cases described earlier demonstrate Pulsar's ability to deliver mission-critical applications in the real world. Beyond its proven capabilities, Pulsar's ability to deliver unified messaging and streaming-first applications provides a marked advantage by enabling organizations to build disruptive, competitive technologies without requiring extensive resources. Pulsar's integration with Flink, which is currently in development, will provide yet another competitive advantage: the ability to perform both batch and stream processing on the same platform.",[48,84604,84605],{},"While the Pulsar community and a few other key areas, such as documentation, are still small, their growth has increased considerably in the past 18 months. Pulsar's highly engaged and quickly growing community and ecosystem are committed to contributing to the ongoing expansion of Pulsar's knowledge base and training materials, while also accelerating the development of key capabilities.",[48,84607,84608],{},"Disruption can happen quickly and organizations evaluating any technology need to consider not only the strengths and weaknesses it has today, but also how the technology will continue to grow and evolve to meet business needs in the future. The combination of Pulsar's enhanced messaging offering and unique capabilities make it a strong alternative that should be considered by any company looking to develop real-time data streaming capabilities.",[48,84610,84611,84612,190],{},"For a deeper dive into Pulsar vs. Kafka — A More Accurate Perspective on Performance, Architecture, and Features, please read Part 1 of this series ",[55,84613,267],{"href":84336},[32,84615,84617],{"id":84616},"learn-more-about-pulsar","Learn More About Pulsar",[48,84619,84620,84621,84625,84626,190],{},"We encourage you to sign up for the ",[55,84622,84624],{"href":34070,"rel":84623},[264],"Pulsar Newsletter"," to stay up-to-date on upcoming events and technology updates. If you would like to chat with current Pulsar users, you can join the ",[55,84627,84629],{"href":57760,"rel":84628},[264],"Pulsar Slack Channel",[48,84631,84632,84633,84636],{},"And don't forget to join our webinar, ",[55,84634,84558],{"href":84556,"rel":84635},[264],", on Tuesday, July 28th at 10 am. This will be a highly interactive roundtable discussion with additional participants from Verizon Media, Splunk, and StreamNative.",[32,84638,79225],{"id":79577},[48,84640,84641],{},"We would like to thank the many members of the Pulsar community who contributed to this article - especially, Jerry Peng, Jesse Anderson, Joe Francis, Matteo Merli, Sanjeev Kulkarni, and Addison Higham.",[32,84643,84645],{"id":84644},"links-resources","Links & Resources",[321,84647,84648,84655],{},[324,84649,84650,84651,190],{},"For more on Pulsar documentation and training, visit ",[55,84652,84654],{"href":84653},"\u002Fresources","StreamNative's Resources page",[324,84656,84657,84658,84662],{},"You can also access ",[55,84659,84661],{"href":84660},"\u002Fwhitepapers","recent whitepapers"," from Tuya, OVHCloud, Tencent, Yahoo!Japan, and more.",[48,84664,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":84666},[84667,84668,84673,84678,84682],{"id":36542,"depth":19,"text":84193},{"id":84360,"depth":19,"text":84361,"children":84669},[84670,84671,84672],{"id":84381,"depth":279,"text":84382},{"id":84392,"depth":279,"text":84393},{"id":84404,"depth":279,"text":84405},{"id":84419,"depth":19,"text":84420,"children":84674},[84675,84676,84677],{"id":84429,"depth":279,"text":84430},{"id":84442,"depth":279,"text":84443},{"id":84465,"depth":279,"text":84466},{"id":84487,"depth":19,"text":84488,"children":84679},[84680,84681],{"id":84494,"depth":279,"text":84495},{"id":84535,"depth":279,"text":84536},{"id":2122,"depth":19,"text":2125,"children":84683},[84684,84685,84686],{"id":84616,"depth":279,"text":84617},{"id":79577,"depth":279,"text":79225},{"id":84644,"depth":279,"text":84645},"2020-07-22","In this blog, we look at emerging adoption stories, new messaging trends, technology differentiators, and community growth to better understand the key advantages and disadvantages of the top technologies in the messaging and event streaming space.","\u002Fimgs\u002Fblogs\u002F63c7c024fe34bd0cf4bdc094_63a377b02be9e607dd3ac516_top-1.jpeg",{},"\u002Fblog\u002Fapache-pulsar-adoption-why-companies-use-streaming-messaging-platform",{"title":84327,"description":84688},"blog\u002Fapache-pulsar-adoption-why-companies-use-streaming-messaging-platform",[35559,799,821],"Jufz8Lw-0o4g4ueGFu5a25l21aI2eWq63wx8nrL0doE",{"id":84697,"title":84698,"authors":84699,"body":84700,"category":821,"createdAt":290,"date":84687,"description":84941,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":84942,"navigation":7,"order":296,"path":84943,"readingTime":42793,"relatedResources":290,"seo":84944,"stem":84945,"tags":84946,"__hash__":84947},"blogs\u002Fblog\u002Fpulsar-vs-kafka-part-2-adoption-use-cases-differentiators-and-community.md","Pulsar vs Kafka - Part 2 - Adoption, Use Cases, Differentiators, and Community",[69353,60441,806],{"type":15,"value":84701,"toc":84919},[84702,84708,84713,84715,84717,84719,84721,84723,84725,84727,84729,84731,84736,84738,84740,84742,84746,84748,84753,84755,84760,84762,84764,84766,84768,84770,84772,84774,84776,84778,84780,84782,84784,84786,84791,84793,84795,84797,84799,84801,84803,84805,84807,84809,84811,84831,84833,84835,84837,84842,84850,84855,84859,84861,84863,84865,84867,84869,84871,84873,84875,84877,84879,84881,84883,84887,84889,84897,84902,84904,84906,84908],[48,84703,84704],{},[384,84705],{"alt":84706,"src":84707},"pulsar and kafka logo on blue and black background","\u002Fimgs\u002Fblogs\u002F63a377b02be9e607dd3ac516_top.jpeg",[48,84709,84333,84710,84337],{},[55,84711,47846],{"href":84712},"\u002Fblog\u002Fguide-apache-pulsar-compare-features-architecture-to-apache-kafka",[40,84714,84193],{"id":36542},[48,84716,84342],{},[48,84718,84345],{},[48,84720,84348],{},[48,84722,84351],{},[48,84724,84354],{},[48,84726,84357],{},[40,84728,84361],{"id":84360},[48,84730,84364],{},[48,84732,84733,84372],{},[55,84734,84371],{"href":84369,"rel":84735},[264],[48,84737,84375],{},[48,84739,84378],{},[32,84741,84382],{"id":84381},[48,84743,84744,84389],{},[55,84745,84388],{"href":84387},[32,84747,84393],{"id":84392},[48,84749,84396,84750,84401],{},[55,84751,84400],{"href":83352,"rel":84752},[264],[32,84754,84405],{"id":84404},[48,84756,84408,84757,84413],{},[55,84758,84412],{"href":83358,"rel":84759},[264],[48,84761,84416],{},[40,84763,84420],{"id":84419},[48,84765,84423],{},[48,84767,84426],{},[32,84769,84430],{"id":84429},[48,84771,84433],{},[48,84773,84436],{},[48,84775,84439],{},[32,84777,84443],{"id":84442},[48,84779,84446],{},[48,84781,84449],{},[48,84783,84452],{},[48,84785,71872],{},[48,84787,84457,84788,84462],{},[55,84789,84461],{"href":75514,"rel":84790},[264],[32,84792,84466],{"id":84465},[48,84794,84469],{},[48,84796,84472],{},[48,84798,84475],{},[48,84800,84478],{},[48,84802,84481],{},[48,84804,84484],{},[40,84806,84488],{"id":84487},[48,84808,84491],{},[32,84810,84495],{"id":84494},[48,84812,84498,84813,84504,84816,84509,84819,1186,84822,1186,84825,5422,84828,190],{},[55,84814,84503],{"href":84501,"rel":84815},[264],[55,84817,81892],{"href":84507,"rel":84818},[264],[55,84820,83442],{"href":84512,"rel":84821},[264],[55,84823,83430],{"href":84516,"rel":84824},[264],[55,84826,96],{"href":84520,"rel":84827},[264],[55,84829,84526],{"href":84524,"rel":84830},[264],[48,84832,84529],{},[48,84834,84532],{},[32,84836,84536],{"id":84535},[48,84838,84539,84839,84545],{},[55,84840,84544],{"href":84542,"rel":84841},[264],[48,84843,84548,84844,84553,84847,84559],{},[55,84845,84552],{"href":76852,"rel":84846},[264],[55,84848,84558],{"href":84556,"rel":84849},[264],[48,84851,84562,84852,84568],{},[55,84853,84567],{"href":84565,"rel":84854},[264],[48,84856,84571,84857,84576],{},[55,84858,84575],{"href":84574},[48,84860,83373],{},[3933,84862,83377],{"id":83376},[48,84864,84583],{},[3933,84866,83392],{"id":83391},[48,84868,84588],{},[48,84870,84591],{},[40,84872,2125],{"id":2122},[48,84874,84596],{},[48,84876,84599],{},[48,84878,84602],{},[48,84880,84605],{},[48,84882,84608],{},[48,84884,84611,84885,190],{},[55,84886,267],{"href":84336},[32,84888,84617],{"id":84616},[48,84890,84620,84891,84625,84894,190],{},[55,84892,84624],{"href":34070,"rel":84893},[264],[55,84895,84629],{"href":57760,"rel":84896},[264],[48,84898,84632,84899,84636],{},[55,84900,84558],{"href":84556,"rel":84901},[264],[32,84903,79225],{"id":79577},[48,84905,84641],{},[32,84907,84645],{"id":84644},[321,84909,84910,84915],{},[324,84911,84650,84912,190],{},[55,84913,84654],{"href":84914},"\u002Fresource#pulsar",[324,84916,84657,84917,84662],{},[55,84918,84661],{"href":10293},{"title":18,"searchDepth":19,"depth":19,"links":84920},[84921,84922,84927,84932,84936],{"id":36542,"depth":19,"text":84193},{"id":84360,"depth":19,"text":84361,"children":84923},[84924,84925,84926],{"id":84381,"depth":279,"text":84382},{"id":84392,"depth":279,"text":84393},{"id":84404,"depth":279,"text":84405},{"id":84419,"depth":19,"text":84420,"children":84928},[84929,84930,84931],{"id":84429,"depth":279,"text":84430},{"id":84442,"depth":279,"text":84443},{"id":84465,"depth":279,"text":84466},{"id":84487,"depth":19,"text":84488,"children":84933},[84934,84935],{"id":84494,"depth":279,"text":84495},{"id":84535,"depth":279,"text":84536},{"id":2122,"depth":19,"text":2125,"children":84937},[84938,84939,84940],{"id":84616,"depth":279,"text":84617},{"id":79577,"depth":279,"text":79225},{"id":84644,"depth":279,"text":84645},"Emerging adoption stories, new messaging trends, technology differentiators, and community growth to better understand the key advantages and disadvantages of the top technologies in the messaging and event streaming space",{},"\u002Fblog\u002Fpulsar-vs-kafka-part-2-adoption-use-cases-differentiators-and-community",{"title":84698,"description":84941},"blog\u002Fpulsar-vs-kafka-part-2-adoption-use-cases-differentiators-and-community",[799,35559],"xnq9Tdmmhp0Z0NN3l_1_m535tr-hx_8LAcvws1QXtE8",{"id":84949,"title":84950,"authors":84951,"body":84952,"category":821,"createdAt":290,"date":85524,"description":85525,"extension":8,"featured":294,"image":85526,"isDraft":294,"link":290,"meta":85527,"navigation":7,"order":296,"path":84712,"readingTime":38438,"relatedResources":290,"seo":85528,"stem":85529,"tags":85530,"__hash__":85531},"blogs\u002Fblog\u002Fguide-apache-pulsar-compare-features-architecture-to-apache-kafka.md","A Guide to Apache Pulsar: Compare Features and Architecture to Apache Kafka",[69353,806,60441],{"type":15,"value":84953,"toc":85493},[84954,84960,84963,84984,84987,84993,84998,85002,85006,85009,85015,85018,85021,85025,85028,85034,85037,85040,85043,85047,85050,85054,85058,85061,85075,85078,85081,85084,85127,85130,85134,85137,85181,85185,85188,85191,85194,85198,85201,85208,85230,85234,85247,85266,85269,85282,85286,85290,85293,85302,85309,85313,85316,85327,85335,85338,85342,85345,85348,85351,85354,85358,85366,85374,85378,85381,85384,85388,85392,85395,85425,85429,85432,85435,85439,85442,85445,85447,85450,85453,85456,85459,85461,85464,85466],[48,84955,84956],{},[384,84957],{"alt":84958,"src":84959},"logo pulsar and kafka on blue an black background","\u002Fimgs\u002Fblogs\u002F63a3758fe1d5c02d6ae8dc76_top.png",[48,84961,84962],{},"The shift to real-time streaming technologies has bolstered the adoption of Pulsar and there has been a marked increase in both the interest and adoption of Pulsar. With Pulsar being sought out by companies developing messaging and event-streaming applications — from Fortune 100 companies to forward-thinking start-ups — and so much growth around the Pulsar project, it has garnered a lot of recent press and attention.",[48,84964,84965,84966,1186,84969,1186,84973,5422,84978,84983],{},"For the most part, the recent press and articles have helped to provide valuable education and transparency into Pulsar’s use cases and capabilities. Companies such as ",[55,84967,83442],{"href":83352,"rel":84968},[264],[55,84970,96],{"href":84971,"rel":84972},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=NrDvSNewNT0",[264],[55,84974,84977],{"href":84975,"rel":84976},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zAHxgG_U67Q",[264],"Nutanix",[55,84979,84982],{"href":84980,"rel":84981},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pmaCG1SHAW8",[264],"Overstock.com",", are just a handful of companies who have recently presented their Pulsar use cases and shared insights into how they are leveraging Pulsar to achieve their business goals.",[48,84985,84986],{},"However, not all recent press has been entirely accurate and we have received a number of requests from the Pulsar community to address a recent Confluent blog comparing Kafka, Pulsar, and RabbitMQ. We appreciate that Pulsar is a quickly growing and evolving technology and we would like to take this opportunity to provide a deep dive into Pulsar’s capabilities.",[48,84988,84989,84990,84992],{},"In today’s post, we will leverage in-depth knowledge of the Pulsar technology, community, and ecosystem to provide a more balanced and holistic picture of the event-streaming landscape. This post will be the first in a two-part series and here we will concentrate on the differences between Pulsar and Kafka in terms of performance, architecture, and features. In the ",[55,84991,15616],{"href":84943},", we focus on adoption, use cases, support, and community.",[916,84994,84995],{},[48,84996,84997],{},"Note Given that Kafka is more widely-known and has widespread documentation available, we will focus our efforts on providing education and transparency into the lesser-known Pulsar technology.",[40,84999,85001],{"id":85000},"pulsar-fundamentals","Pulsar Fundamentals",[32,85003,85005],{"id":85004},"components-of-a-pulsar-cluster","Components of a Pulsar Cluster",[48,85007,85008],{},"Pulsar is composed of 3 main components: a broker, which is a stateless service that clients connect to for core messaging, and two stateful services, Apache BookKeeper and Apache ZooKeeper. BookKeeper nodes (bookies) store the actual messages and cursor positions while ZooKeeper is used strictly for metadata storage by both brokers and bookies. Additionally, BookKeeper leverages RocksDB as an embedded database, which is used to store internal indices, but it is not managed independently of BookKeeper.",[48,85010,85011],{},[384,85012],{"alt":85013,"src":85014},"An illustration of main components that pulsar is composed","\u002Fimgs\u002Fblogs\u002F63a3758ff0583901aeac52c6_1.png",[48,85016,85017],{},"Unlike Kafka, which employs a monolithic architecture model that tightly couples serving and storage, Pulsar leverages a multi-layer design which allows it to manage these functions in separate layers. Pulsar’s broker performs computing on one layer and the bookie manages stateful storage on another.",[48,85019,85020],{},"While, on the surface, it may seem like Pulsar’s architecture is more complicated compared with Kafka’s, the reality is more nuanced. Architectural decisions come with trade-offs and Pulsar’s inclusion of BookKeeper enables it to provide more flexible scalability, lower operational burden, faster, and more consistent performance. We will talk in more detail about each of these benefits later on.",[32,85022,85024],{"id":85023},"pulsars-storage-architecture","Pulsar's Storage Architecture",[48,85026,85027],{},"The architectural differences in Pulsar also extend to how Pulsar stores data. Pulsar breaks topic partitions into segments and then distributes the segments across the storage nodes in Apache BookKeeper to get better performance, scalability, and availability.",[48,85029,85030],{},[384,85031],{"alt":85032,"src":85033},"storage architecture of pulsar","\u002Fimgs\u002Fblogs\u002F63a1eb9b2deb275c1af75e32_pulsar-partition-log-segment.png",[48,85035,85036],{},"Pulsar’s infinite distributed log is segment centric and implemented by leveraging scale-out log storage (via Apache BookKeeper) with built-in tiered storage support which enables segments to be distributed evenly across storage nodes. Because the data associated with any given topic is not tied to any specific storage node, it is easy to replace nodes and to scale up or down. Moreover, the smallest or slowest node in the cluster cannot impose any storage or bandwidth limitations.",[48,85038,85039],{},"Pulsar’s partition-rebalance-free architecture ensures instant scalability and higher availability. Both of these factors are extremely important and make Pulsar well-suited for building mission-critical services such as billing platforms for financial use cases, transaction processing systems for e-commerce and retailers, and real-time risk control systems for financial institutions.",[48,85041,85042],{},"By leveraging the powerful Netty framework, data is zero-copied when it is transferred from producers to brokers to bookies. This works extremely well for all streaming use cases because the data is transferred directly over the network or to disk without any performance penalties.",[32,85044,85046],{"id":85045},"message-consumption-on-pulsar","Message Consumption on Pulsar",[48,85048,85049],{},"Pulsar’s consumption model takes a streaming-pull approach. This is an enhanced version of long-polling as it eliminates the wait time between individual calls and requests and provides bi-directional message streaming. The streaming-pull model enables Pulsar to achieve lower end-to-end latency than any other existing long-polling-based messaging solutions, such as Kafka.",[40,85051,85053],{"id":85052},"ease-of-use","Ease of Use",[32,85055,85057],{"id":85056},"operational-simplicity","Operational Simplicity",[48,85059,85060],{},"When evaluating the operational simplicity for a given technology, it’s important to consider not only the initial set-up but also its long-term maintenance and scalability. Helpful questions to consider include:",[321,85062,85063,85066,85069,85072],{},[324,85064,85065],{},"How quickly and simply can you scale your cluster to keep up with your business growth?",[324,85067,85068],{},"Does your cluster provide out-of-the-box features for multi-tenancy that map well to multiple teams and users?",[324,85070,85071],{},"Will the operational tasks, such as replacing hardware, require maintenance that potentially can impact the availability and reliability of your business?",[324,85073,85074],{},"Can your system easily replicate data for geographic redundancy or different access patterns?",[48,85076,85077],{},"Long-time Kafka users will know these are not easy questions to answer when operating Kafka. Most of these tasks require a suite of tools external to Kafka, such as cruise control for managing rebalancing of clusters and Kafka mirror-maker\u002Freplicator for any replication needs.",[48,85079,85080],{},"Many organizations also develop tooling for provisioning and managing multiple distinct clusters as Kafka can be difficult to share across teams. These types of tools are critical to run Kafka at scale successfully but also add to its complexity. The most capable tools for managing Kafka clusters have been developed as proprietary, closed source tooling. It is no surprise that Kafka’s complex overhead and operations have pushed many businesses to use Confluent.",[48,85082,85083],{},"By contrast, Pulsar’s goal is to streamline operations and scalability. Below we respond to the same questions with respect to Pulsar’s capabilities:",[321,85085,85086,85088,85096,85098,85107,85109,85116,85118],{},[324,85087,85065],{},[324,85089,85090,85091,85095],{},"New compute and storage capacity is automatically and immediately utilized with Pulsar’s automatic ",[55,85092,36113],{"href":85093,"rel":85094},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fadministration-load-balance\u002F",[264],". This allows migrating topics to equalize load among brokers and new bookie nodes immediately receiving write traffic for new segments, with no manual rebalancing or broker management required.",[324,85097,85068],{},[324,85099,85100,85101,85106],{},"Pulsar provides a hierarchical structure of ",[55,85102,85105],{"href":85103,"rel":85104},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fconcepts-multi-tenancy\u002F",[264],"tenants and namespaces"," which map logically to organizations and teams, with these same constructs allowing for simple ACLs, quotas, self-service controls, and even resources isolation to allow cluster operators to confidently manage shared clusters.",[324,85108,85071],{},[324,85110,85111,85112,85115],{},"The stateless ",[55,85113,61064],{"href":66205,"rel":85114},[264]," of Pulsar is able to be replaced easily, as there is no risk of data loss. Bookie nodes will automatically replicate any under-replicated segments of data and tools for decommissioning and replacing nodes is built-in and easily automatable.",[324,85117,85074],{},[324,85119,85120,85121,85126],{},"Pulsar has built-in replication, which can be used to seamlessly ",[55,85122,85125],{"href":85123,"rel":85124},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fadministration-geo\u002F",[264],"span geographic regions"," or replicate data to additional clusters for other purposes (disaster recovery, analytics, and so on.)",[48,85128,85129],{},"In comparison to Kafka, Pulsar’s batteries included approach provides a more complete solution to the real-world problems of streaming data. With this added perspective, the overall simplicity of use favors Pulsar as it offers a more complete core feature set and allows operators and developers to focus on the core needs of their business.",[32,85131,85133],{"id":85132},"documentation-and-learning","Documentation and Learning",[48,85135,85136],{},"Pulsar has been rapidly building out its documentation and training resources. Here are some of the most notable accomplishments:",[321,85138,85139,85146,85153,85159,85167,85173],{},[324,85140,85141,85145],{},[55,85142,85144],{"href":35357,"rel":85143},[264],"7 Pulsar Summits"," across North America, Asia, and Europe, featured hundreds of sessions with speakers from top companies such as Google, AWS, Intuit, and Databricks, attracting thousands of attendees sign-ups.",[324,85147,85148,85152],{},[55,85149,85151],{"href":31912,"rel":85150},[264],"On-demand Pulsar courses, tutorials, and hands-on labs"," for developers, operators, and business leaders.",[324,85154,85155,85158],{},[55,85156,85157],{"href":43640},"Instructor-led, hands-on Pulsar training"," for developers and operators.",[324,85160,85161,85166],{},[55,85162,85165],{"href":85163,"rel":85164},"https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLqRma1oIkcWhfmUuJrMM5YIG8hjju62Ev",[264],"Meetups and webinars"," featuring speakers from adjacent communities like Flink and Nifi and companies including Splunk, Uber, and Elastic.",[324,85168,85169,85172],{},[55,85170,85171],{"href":84653},"eBooks, whitepapers, and case studies"," from Iterable, Tencent, Weibo, Tuya, and more.",[324,85174,85175,85180],{},[55,85176,85179],{"href":85177,"rel":85178},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F2.11.x\u002F",[264],"Documentation portal"," that holds a variety of topics, tutorials, guides, and reference material to help you work with Pulsar.",[32,85182,85184],{"id":85183},"enterprise-support","Enterprise Support",[48,85186,85187],{},"Kafka and Pulsar both have enterprise-grade support offerings. Kafka has enterprise-grade support offerings from multiple large vendors, including Confluent. Pulsar has enterprise-grade support from StreamNative, a newer entrant on the scene. StreamNative offers fully managed Pulsar services for enterprises as well as enterprise-grade support for Pulsar.",[48,85189,85190],{},"StreamNative has a fast-growing and highly-experienced team with deep roots in the messaging and event-streaming space. StreamNative was founded by the core team of Pulsar and BookKeeper. In just a few short years, StreamNative has helped to significantly grow the Pulsar ecosystem — more on this in our next post — including garnering the support of committed strategic partners who are helping to further Pulsar development to meet the needs of a wide number of use cases.",[48,85192,85193],{},"Some major recent developments include the launch of Kafka-on-Pulsar, or KoP, which was launched in March 2020 by OVHCloud and StreamNative. By adding the KoP protocol handler to an existing Pulsar cluster, you can now migrate existing Kafka applications and services to Pulsar without modifying the code. In June 2020, China Mobile and StreamNative announced the launch of another major platform upgrade, AMQP on Pulsar (AoP). This enables RabbitMQ applications to leverage Pulsar’s powerful features, such as infinite event stream retention with Apache BookKeeper and tiered storage. We will talk about each of these in more detail in our next post.",[32,85195,85197],{"id":85196},"integrations","Integrations",[48,85199,85200],{},"Alongside the rapid growth in the number of Pulsar adoptions, we have seen the Pulsar community develop into a large, highly-engaged, and global user community. This active Pulsar community has played a key role in driving growth in the number of integrations in the ecosystem. In just the past six months, the number of officially supported connectors in the Pulsar ecosystem has grown tremendously.",[48,85202,85203,85204,85207],{},"To further support this community effort, StreamNative recently launched ",[55,85205,38697],{"href":35258,"rel":85206},[264],", which provides a convenient central location where users can find and download integrations. This resource will help accelerate the growth of Pulsar’s connector and plug-in ecosystem.",[48,85209,85210,85211,85214,85215,5157,85220,85224,85225,85229],{},"The Pulsar community has also been actively working with other communities on integrating with their projects. For example, Pulsar has been working closely with the Flink community on developing the ",[55,85212,55931],{"href":55929,"rel":85213},[264]," as part of ",[55,85216,85219],{"href":85217,"rel":85218},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FFLINK\u002FFLIP-72%3A+Introduce+Pulsar+Connector",[264],"FLIP-72",[55,85221,24177],{"href":85222,"rel":85223},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-spark",[264]," provides developers the capability of using Apache Spark to process events in Apache Pulsar. ",[55,85226,85228],{"href":85227},"\u002Fblog\u002Fuse-apache-skywalking-to-trace-apache-pulsar-messages","SkyWalking Pulsar Plugin"," integrates Apache SkyWalking with Apache Pulsar, allowing people to trace Pulsar messages using SkyWalking. These are just a few examples of a large collection of integrations the Pulsar community is currently working on.",[32,85231,85233],{"id":85232},"client-library-diversity","Client Library Diversity",[48,85235,85236,85237,85242,85243,4031],{},"Pulsar currently supports 7 languages officially, compared with Kafka’s 1 language. While the Confluent post reported that Kafka currently supports 22 languages, it is important to note that most of the 22 languages Confluent referred to are not official clients, and many are no longer actively maintained. At last count, ",[55,85238,85241],{"href":85239,"rel":85240},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fkafka\u002Ftree\u002Ftrunk\u002Fclients\u002Fsrc\u002Fmain\u002Fjava\u002Forg\u002Fapache\u002Fkafka\u002Fclients",[264],"the Apache Kafka project had only one officially released client",", compared with the ",[55,85244,85246],{"href":77720,"rel":85245},[264],"seven officially supported by Apache Pulsar",[321,85248,85249,85251,85254,85256,85258,85260,85263],{},[324,85250,11285],{},[324,85252,85253],{},"C",[324,85255,43705],{},[324,85257,11288],{},[324,85259,43713],{},[324,85261,85262],{},".NET",[324,85264,85265],{},"Node",[48,85267,85268],{},"Pulsar also supports a rapidly growing list of community developed clients, which includes the following:",[321,85270,85271,85274,85276,85279],{},[324,85272,85273],{},"Rust",[324,85275,78317],{},[324,85277,85278],{},"Ruby",[324,85280,85281],{},"Erlang",[40,85283,85285],{"id":85284},"performance-and-availability","Performance and Availability",[32,85287,85289],{"id":85288},"throughput-latency-and-scale","Throughput, Latency, and Scale",[48,85291,85292],{},"Both Pulsar and Kafka have successfully been leveraged in a number of enterprise use cases and each system has its advantages, with both systems being capable of handling large amounts of traffic with similar amounts of hardware. One common misconception of Pulsar is that because it has more components, it must require more servers to achieve the same performance. While this may be true in some hardware configurations, in many configurations Pulsar can get more from the same resources.",[48,85294,85295,85296,85301],{},"As an example, Splunk recently shared that one of the reasons they choose Pulsar over Kafka is that ",[55,85297,85300],{"href":85298,"rel":85299},"https:\u002F\u002Fwww.slideshare.net\u002Fstreamnative\u002Fwhy-splunk-chose-pulsarkarthik-ramasamy",[264],"Pulsar is 1.5x - 2x lower in CAPEX cost with 5x - 50x improvement in latency and 2x - 3x lower in OPEX due to layered architecture"," (from slide 34). They found this was due to Pulsar being better able to utilize disk IO with lower CPU utilization and better control over memory.",[48,85303,85304,85305,85308],{},"More generally, companies such as Tencent have chosen Pulsar in large part due to its performance attributes. As discussed in a recent whitepaper ",[55,85306,85307],{"href":81575},"Tencent’s billing platform, which serves over a million merchants and manages 30 billion escrow accounts",", is currently using Pulsar to process hundreds of millions of dollars in revenue per day. Tencent chose Pulsar over Kafka for its predictable low latency, stronger consistency, and durability guarantees.",[32,85310,85312],{"id":85311},"ordering-guarantees","Ordering Guarantees",[48,85314,85315],{},"Apache Pulsar offers four distinct subscription modes. The four modes and their associated ordering guarantees are described below. An individual application’s ordering and consumption scalability requirements determine which subscription mode is appropriate for that application.",[321,85317,85318,85321,85324],{},[324,85319,85320],{},"Both the Exclusive and Failover subscription modes provide very strong ordering guarantees at a partition level even when consuming a topic in parallel across many consumers.",[324,85322,85323],{},"Shared mode allows you to scale the number of consumers beyond the number of partitions, thus making this mode well-suited for worker queue use cases.",[324,85325,85326],{},"Key_Shared mode combines the advantages of the other subscription modes. It allows scaling the number of consumers beyond the number of partitions and provides a strong ordering guarantee at a key level.",[48,85328,85329,85330,190],{},"For more information about Pulsar’s subscription types and their associated ordering guarantees, see ",[55,85331,85334],{"href":85332,"rel":85333},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fconcepts-messaging\u002F#subscriptions",[264],"subscriptions",[40,85336,85337],{"id":81624},"Feature",[32,85339,85341],{"id":85340},"built-in-stream-processing","Built-In Stream Processing",[48,85343,85344],{},"Pulsar and Kafka have two different goals when it comes to built-in stream processing. Pulsar integrates with Flink and Spark, two mature, full-fledged stream processing frameworks, for more complex stream processing needs and developed Pulsar Functions to focus on lightweight computation. Kafka developed Kafka Streams with the goal of providing a full-fledged stream processing engine.",[48,85346,85347],{},"As a result, Kafka Streams is more complex. Users need to figure out where and how to run the KStreams application and it is unnecessarily complicated for most lightweight computing use cases.",[48,85349,85350],{},"Pulsar Functions, on the other hand, makes lightweight computing use cases easy to implement and enables developers to create complex processing logic without deploying a separate neighboring system. Additionally, it provides language-native and easy-to-use API. Developers don’t have to learn a complicated API in order to start writing event streaming applications.",[48,85352,85353],{},"A Pulsar Improvement Proposal (PIP) was recently submitted to the Pulsar project to introduce Function Mesh. Function Mesh is a serverless event-streaming framework that combines multiple Pulsar Functions together to facilitate building complex event-streaming applications.",[32,85355,85357],{"id":85356},"exactly-once-processing","Exactly-Once Processing",[48,85359,85360,85361,85365],{},"Pulsar currently supports exactly-once producers via ",[55,85362,85364],{"href":71250,"rel":85363},[264],"broker-side deduplication"," and we are happy to share a major upgrade is presently in development and will be available soon!",[48,85367,85368,85369,85373],{},"Support for transactional message streaming started in ",[55,85370,71440],{"href":85371,"rel":85372},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-31:-Transaction-Support",[264]," and is currently in development. This feature will improve Pulsar’s message delivery semantics and processing guarantees. With transactional streaming, each message is written or processed exactly once with no duplication or data loss, even when a broker or function instance fails. Transactional messaging not only makes it easier to write applications using Pulsar or Pulsar Functions, but it also expands the scope of the use cases that Pulsar can support. We are making rapid progress on this feature and it will be included in Pulsar 2.7.0 which is scheduled for release in September 2020.",[32,85375,85377],{"id":85376},"topic-log-compaction","Topic (Log) Compaction",[48,85379,85380],{},"Pulsar was designed to provide users a choice of formats for consuming data. Applications can choose to consume either raw data or compacted data, as appropriate. By doing this, Pulsar allows for non-compacted data to have a retention policy, keeping control over unbounded growth, but still allowing periodic compaction to generate the most recent materialized view around. The built-in tiered storage feature also allows Pulsar to offload the non-compacted data from BookKeeper to cloud storage and makes it much cheaper to store events for a much longer period.",[48,85382,85383],{},"Unlike Pulsar, Kafka does not offer users the option to consume raw data. Kafka removes raw data immediately after it is compacted.",[40,85385,85387],{"id":85386},"use-case","Use Case",[32,85389,85391],{"id":85390},"event-streaming","Event Streaming",[48,85393,85394],{},"Pulsar was originally developed as a unified pub\u002Fsub messaging platform in Yahoo! (known as Cloud Messaging). However, Pulsar has grown beyond a messaging platform and become a unified messaging and event streaming platform. Pulsar includes a complete set of tools as part of the platform, to provide all the fundamentals necessary for building event streaming applications. Pulsar encompasses the following event streaming capabilities:",[321,85396,85397,85400,85403,85413,85416,85419,85422],{},[324,85398,85399],{},"Infinite event stream storage makes it possible to store events at scale by leveraging scale-out log storage (via Apache BookKeeper) with built-in tiered storage support to cost-effective systems like S3, HDFS, and so on.",[324,85401,85402],{},"Unified pub\u002Fsub messaging model allows developers to add messaging to their applications easily. This model can be scaled both based on traffic and on the user’s needs.",[324,85404,85405,85406,85408,85409,85412],{},"Protocol handler framework and protocol compatibility with Kafka (via ",[55,85407,70645],{"href":82446},") and AMQP (via ",[55,85410,85411],{"href":82452},"AMQP-on-Pulsar",") allow applications to produce and consume events from anywhere using any existing protocols.",[324,85414,85415],{},"Pulsar IO provides a set of connectors integrating larger ecosystems, allowing users to ingest data from external systems without writing any code.",[324,85417,85418],{},"Integration with Flink enables comprehensive event processing.",[324,85420,85421],{},"Pulsar Functions offers a lightweight serverless framework for processing events as they arrive.",[324,85423,85424],{},"Integration with Presto (Pulsar SQL) allows data scientists and developers to use ANSI-compliant SQL to gain insights into their data and business.",[32,85426,85428],{"id":85427},"message-routing","Message Routing",[48,85430,85431],{},"Pulsar provides comprehensive routing capabilities through Pulsar IO, Pulsar Functions, and Pulsar Protocol Handler. Pulsar’s routing capabilities include content-based routing, message transformation, and message enrichment.",[48,85433,85434],{},"Pulsar has more robust routing capabilities compared with Kafka. Pulsar provides a flexible deployment model for connectors and functions. These can be run within a broker, allowing for easy deployment. Alternatively, they can be run in a dedicated pool of nodes (similar to Kafka Streams) which allows for massive scale-out. Pulsar also integrates natively with Kubernetes. In addition, Pulsar can be configured to schedule function and connector workloads as pods, thus fully leveraging the elasticity of Kubernetes.",[32,85436,85438],{"id":85437},"message-queuing","Message Queuing",[48,85440,85441],{},"As noted above, Pulsar was originally developed as a unified pub\u002Fsub messaging platform. The Pulsar team learned a lot of the pros and cons of operating existing open-source messaging systems and applied their experiences to designing Pulsar’s unified messaging model. The Pulsar messaging API combines both queueing and streaming capabilities. It not only allows implementing a worker queue that delivers messages round-robin to competing consumers (via Shared subscription) but also supports event streaming by delivering messages based on the order of messages in a partition (via Failover subscription) or a key range (via Key_Shared subscription). Developers are able to build both messaging and event streaming applications on the same set of data without duplicating it to different siloed systems.",[48,85443,85444],{},"Additionally, The Pulsar community is also working on bringing the native support of different messaging protocols (such as AoP and KoP) to Apache Pulsar to extend Pulsar’s messaging capabilities.",[40,85446,2125],{"id":2122},[48,85448,85449],{},"This is a very exhilarating time marked by tremendous growth and change in the Pulsar community. Pulsar’s ecosystem is developing and expanding as its technology continues to evolve and new use cases are added.",[48,85451,85452],{},"Pulsar offers many advantages that make it an attractive choice for companies seeking to adopt a unified messaging and event streaming platform. Compared with Kafka, Pulsar is more resilient and less complex to operate and scale.",[48,85454,85455],{},"Like any new technology, it can take time to roll-out and adopt, however, Pulsar provides a turnkey solution that is ready for production upon installation with lower ongoing maintenance costs. Pulsar covers all the fundamentals necessary for building event streaming applications and incorporates many built-in features, including a rich set of tools. Pulsar’s tools are available for immediate use and do not require additional installation steps.",[48,85457,85458],{},"At StreamNative, we are continuously working on developing new features and enhancements to strengthen Pulsar’s capabilities and grow the community.",[40,85460,79225],{"id":79577},[48,85462,85463],{},"We would be remiss not to thank the many members across the Pulsar community who contributed to this article. Namely, Jerry Peng, Jesse Anderson, Joe Francis, Matteo Merli, Sanjeev Kulkarni, and Addison Higham.",[40,85465,36477],{"id":36476},[321,85467,85468,85473,85477,85483,85488],{},[324,85469,36219,85470,38411],{},[55,85471,85472],{"href":21458},"2022 Pulsar vs. Kafka benchmark",[324,85474,38414,85475,38418],{},[55,85476,38417],{"href":35424},[324,85478,55539,85479,85482],{},[55,85480,84629],{"href":31692,"rel":85481},[264]," to connect with the community.",[324,85484,85485,62245],{},[55,85486,10265],{"href":45212,"rel":85487},[264],[324,85489,85490,62252],{},[55,85491,62251],{"href":31912,"rel":85492},[264],{"title":18,"searchDepth":19,"depth":19,"links":85494},[85495,85500,85507,85511,85516,85521,85522,85523],{"id":85000,"depth":19,"text":85001,"children":85496},[85497,85498,85499],{"id":85004,"depth":279,"text":85005},{"id":85023,"depth":279,"text":85024},{"id":85045,"depth":279,"text":85046},{"id":85052,"depth":19,"text":85053,"children":85501},[85502,85503,85504,85505,85506],{"id":85056,"depth":279,"text":85057},{"id":85132,"depth":279,"text":85133},{"id":85183,"depth":279,"text":85184},{"id":85196,"depth":279,"text":85197},{"id":85232,"depth":279,"text":85233},{"id":85284,"depth":19,"text":85285,"children":85508},[85509,85510],{"id":85288,"depth":279,"text":85289},{"id":85311,"depth":279,"text":85312},{"id":81624,"depth":19,"text":85337,"children":85512},[85513,85514,85515],{"id":85340,"depth":279,"text":85341},{"id":85356,"depth":279,"text":85357},{"id":85376,"depth":279,"text":85377},{"id":85386,"depth":19,"text":85387,"children":85517},[85518,85519,85520],{"id":85390,"depth":279,"text":85391},{"id":85427,"depth":279,"text":85428},{"id":85437,"depth":279,"text":85438},{"id":2122,"depth":19,"text":2125},{"id":79577,"depth":19,"text":79225},{"id":36476,"depth":19,"text":36477},"2020-07-08","Learn the differences between Pulsar and Kafka in architecture, ease of use, performance and availability, and use cases.","\u002Fimgs\u002Fblogs\u002F63d0699ba635bf2a05fe2088_Screen-Shot-2023-01-24-at-3.28.16-PM.png",{},{"title":84950,"description":85525},"blog\u002Fguide-apache-pulsar-compare-features-architecture-to-apache-kafka",[799,7347],"DR1YpMnF5cqiU7ZaBqBheTrRQZUc6mK8uPsBg0nVnoo",{"id":85533,"title":85534,"authors":85535,"body":85536,"category":821,"createdAt":290,"date":86514,"description":86515,"extension":8,"featured":294,"image":86516,"isDraft":294,"link":290,"meta":86517,"navigation":7,"order":296,"path":86518,"readingTime":47804,"relatedResources":290,"seo":86519,"stem":86520,"tags":86521,"__hash__":86522},"blogs\u002Fblog\u002Fapache-pulsar-2-6-0.md","Apache Pulsar 2.6.0",[808],{"type":15,"value":85537,"toc":86448},[85538,85541,85544,85548,85555,85558,85564,85580,85587,85590,85593,85608,85614,85622,85637,85644,85647,85653,85669,85676,85679,85682,85688,85704,85711,85714,85717,85733,85740,85743,85749,85752,85758,85774,85781,85784,85800,85807,85810,85816,85819,85825,85834,85841,85844,85847,85853,85856,85865,85871,85874,85877,85880,85886,85899,85905,85908,85911,85917,85925,85929,85932,85941,85950,85954,85957,85960,85966,85973,85977,85980,85983,85989,85998,86002,86011,86020,86024,86027,86030,86036,86045,86049,86052,86055,86058,86064,86073,86077,86080,86086,86094,86098,86101,86107,86116,86123,86132,86136,86139,86142,86151,86153,86157,86160,86163,86172,86176,86180,86183,86192,86196,86199,86208,86210,86214,86217,86226,86230,86239,86243,86246,86255,86259,86268,86272,86275,86284,86286,86290,86293,86296,86305,86309,86312,86321,86323,86327,86330,86333,86342,86346,86349,86352,86361,86365,86368,86371,86380,86382,86384,86405,86407,86420,86425,86432,86434,86441,86446],[48,85539,85540],{},"We are very glad to see the Apache Pulsar community has successfully released the wonderful 2.6.0 version after accumulated hard work. It is a great milestone for this fast-growing project and the whole Pulsar community. This is the result of a huge effort from the community, with over 450 commits and a long list of new features, improvements, and bug fixes.",[48,85542,85543],{},"Here is a selection of some of the most interesting and major features added to Pulsar 2.6.0.",[40,85545,85547],{"id":85546},"core-pulsar","Core Pulsar",[32,85549,85551,85554],{"id":85550},"pip-37-large-message-size-support",[2628,85552,85553],{},"PIP-37"," Large message size support",[48,85556,85557],{},"This PIP adds support for producing and consuming large size messages by splitting the large message into multiple chunks. This is a very powerful feature for sending and consuming very large messages. Currently, this feature only works for the non-shared subscription and it has client-side changes. You need to upgrade the Pulsar client version to 2.6.0. You can enable the message trunk at the producer side as below.",[8325,85559,85562],{"className":85560,"code":85561,"language":8330},[8328],"\nclient.newProducer()\n    .topic(\"my-topic\")\n    .enableChunking(true)\n    .create();\n\n",[4926,85563,85561],{"__ignoreMap":18},[321,85565,85566,85572],{},[324,85567,85568,85569,190],{},"For more information about PIP-37, see ",[55,85570,267],{"href":60382,"rel":85571},[264],[324,85573,85574,85575,190],{},"For more information about implementation details, see ",[55,85576,85579],{"href":85577,"rel":85578},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F4400",[264],"PR-4440",[32,85581,85583,85586],{"id":85582},"pip-39-namespace-change-events-system-topic",[2628,85584,85585],{},"PIP-39"," Namespace change events (system topic)",[48,85588,85589],{},"This PIP introduces the system topic to store namespace change events.",[48,85591,85592],{},"Previously, Pulsar only allowed you to set the namespace policy, all topics under the namespace followed the namespace policy. Many users want to set the policy for topics. The main reason for not using the same way as namespace level policy is to avoid introducing more workload on ZooKeeper. The original intention of the system topic is to be able to store topic policy in a topic rather than ZooKeeper. So this is the first step to achieve topic level policy. And we can easily add support for the topic level policy with this feature.",[321,85594,85595,85601],{},[324,85596,85597,85598,190],{},"For more information about PIP-39, see ",[55,85599,267],{"href":78478,"rel":85600},[264],[324,85602,85574,85603,190],{},[55,85604,85607],{"href":85605,"rel":85606},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F4955",[264],"PR-4955",[32,85609,85611,85613],{"id":85610},"pip-45-pluggable-metadata-interface",[2628,85612,41945],{}," Pluggable metadata interface",[48,85615,85616,85617,190],{},"We have been advancing to enable Pulsar to use other metastore services rather than ZooKeeper. This PIP converts ManagedLedger to use the MetadataStore interface. This facilitates the metadata server plug-in process. Through the MetadataStore interface, it is easy to add other metadata servers into Pulsar such as ",[55,85618,85621],{"href":85619,"rel":85620},"https:\u002F\u002Fgithub.com\u002Fetcd-io\u002Fetcd",[264],"etcd",[321,85623,85624,85630],{},[324,85625,85626,85627,190],{},"For more information about PIP-45, see ",[55,85628,267],{"href":26433,"rel":85629},[264],[324,85631,85574,85632,190],{},[55,85633,85636],{"href":85634,"rel":85635},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F5358",[264],"PR-5358",[32,85638,85640,85643],{"id":85639},"pip-54-support-acknowledgment-at-the-batch-index-level",[2628,85641,85642],{},"PIP-54"," Support acknowledgment at the batch index level",[48,85645,85646],{},"Previously, the broker only tracked the acknowledged state in the batch message level. If a subset of the batch messages was acknowledged, the consumer could still get the acknowledged message of that batch message while the batch message redelivery happened. This PIP adds support for acknowledging the local batch index of a batch. This feature is not enabled by default. You can enable it in the broker.conf as below.",[8325,85648,85651],{"className":85649,"code":85650,"language":8330},[8328],"\nbatchIndexAcknowledgeEnable=true\n\n",[4926,85652,85650],{"__ignoreMap":18},[321,85654,85655,85662],{},[324,85656,85657,85658,190],{},"For more information about PIP-54, see ",[55,85659,267],{"href":85660,"rel":85661},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-54:-Support-acknowledgment-at-batch-index-level",[264],[324,85663,85574,85664,190],{},[55,85665,85668],{"href":85666,"rel":85667},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6052",[264],"PR-6052",[32,85670,85672,85675],{"id":85671},"pip-58-support-consumers-setting-custom-message-retry-delay",[2628,85673,85674],{},"PIP-58"," Support consumers setting custom message retry delay",[48,85677,85678],{},"For many online business systems, various exceptions usually occur in business logic processing, so the message needs to be re-consumed, but users hope that this delay time can be controlled flexibly. Previously, processing methods were usually to send messages to special retry topics, because production can specify any delay, so consumers subscribe to the business topic and retry topic at the same time.",[48,85680,85681],{},"Previously, Pulsar redelivered messages immediately when clients sent negative acknowledgments to brokers. Now you can set a retry delay for each message as below.",[8325,85683,85686],{"className":85684,"code":85685,"language":8330},[8328],"\nConsumer consumer = pulsarClient.newConsumer(Schema.BYTES)\n                .enableRetry(true)\n                .receiverQueueSize(100)\n                .deadLetterPolicy(DeadLetterPolicy.builder()\n                    .maxRedeliverCount(maxRedeliveryCount)\n                   .retryLetterTopic(\"persistent:\u002F\u002Fmy-property\u002Fmy-ns\u002Fmy-subscription-custom-Retry\")\n                        .build())\n                .subscribe();\n\nconsumer.reconsumeLater(message, 10 , TimeUnit.SECONDS);\n \n",[4926,85687,85685],{"__ignoreMap":18},[321,85689,85690,85697],{},[324,85691,85692,85693,190],{},"For more information about PIP-58, see ",[55,85694,267],{"href":85695,"rel":85696},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-58-%3A-Support-Consumers--Set-Custom-Retry-Delay",[264],[324,85698,85574,85699,190],{},[55,85700,85703],{"href":85701,"rel":85702},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6449",[264],"PR-6449",[32,85705,85707,85710],{"id":85706},"pip-60-support-sni-routing-to-support-various-proxy-servers",[2628,85708,85709],{},"PIP-60"," Support SNI routing to support various proxy servers",[48,85712,85713],{},"Previously, Pulsar did not provide support to use other proxies, such as Apache Traffic Server (ATS), HAProxy, Nginx, and Envoy, which are more scalable and secured. Most of these proxy servers support SNI routing which can route traffic to a destination without having to terminate the SSL connection.",[48,85715,85716],{},"This PIP adds SNI routing and makes changes to the Pulsar client.",[321,85718,85719,85726],{},[324,85720,85721,85722,190],{},"For more information about PIP-60, see ",[55,85723,267],{"href":85724,"rel":85725},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-60:-Support-Proxy-server-with-SNI-routing",[264],[324,85727,85574,85728,190],{},[55,85729,85732],{"href":85730,"rel":85731},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6566",[264],"PR-6566",[32,85734,85736,85739],{"id":85735},"pip-61-advertise-multiple-addresses",[2628,85737,85738],{},"PIP-61"," Advertise multiple addresses",[48,85741,85742],{},"This PIP allows the broker to expose multiple advertised listeners and to support the separation of internal and external network traffic. You can specify multiple advertised listeners in broker.conf as below.",[8325,85744,85747],{"className":85745,"code":85746,"language":8330},[8328],"\nadvertisedListeners=internal:pulsar:\u002F\u002F192.168.1.11:6660,external:pulsar:\u002F\u002F110.95.234.50:6650\n\n",[4926,85748,85746],{"__ignoreMap":18},[48,85750,85751],{},"From the client side, you can specify the listener name for the client as below.",[8325,85753,85756],{"className":85754,"code":85755,"language":8330},[8328],"\nPulsarClient.builder()\n.serviceUrl(url)\n.listenerName(\"internal\")\n.build();\n\n",[4926,85757,85755],{"__ignoreMap":18},[321,85759,85760,85767],{},[324,85761,85762,85763,190],{},"For more information about PIP-61, see ",[55,85764,267],{"href":85765,"rel":85766},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-61%3A-Advertised-multiple-addresses",[264],[324,85768,85574,85769,190],{},[55,85770,85773],{"href":85771,"rel":85772},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6903",[264],"PR-6903",[32,85775,85777,85780],{"id":85776},"pip-65-adapt-pulsar-io-sources-to-support-batchsources",[2628,85778,85779],{},"PIP-65"," Adapt Pulsar IO sources to support BatchSources",[48,85782,85783],{},"This PIP introduces BatchSource as a new interface for writing batch-based connectors. It also introduces BatchSourceTriggerer as an interface to trigger the data collection of a BatchSource. It then provides system implementation in BatchSourceExecutor.",[321,85785,85786,85793],{},[324,85787,85788,85789,190],{},"For more information about PIP-65, see ",[55,85790,267],{"href":85791,"rel":85792},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-65%3A-Adapting-Pulsar-IO-Sources-to-support-Batch-Sources",[264],[324,85794,85574,85795,190],{},[55,85796,85799],{"href":85797,"rel":85798},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7090",[264],"PR-7090",[32,85801,85803,85806],{"id":85802},"load-balancer-add-thresholdshedder-strategy-for-the-load-balancer",[2628,85804,85805],{},"Load balancer"," Add ThresholdShedder strategy for the load balancer",[48,85808,85809],{},"The ThresholdShedder strategy is more flexible than LoadSheddingStrategy for Pulsar. The ThresholdShedder calculates the average resource usage of the brokers, and individual broker resource usage compares with the average value. If it is greater than the average value plus threshold, the overload shedder is triggered. You can enable it in broker.conf as below.",[8325,85811,85814],{"className":85812,"code":85813,"language":8330},[8328],"\nloadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder\n\n",[4926,85815,85813],{"__ignoreMap":18},[48,85817,85818],{},"You can customize more parameters for the ThresholdShedder if needed as below.",[8325,85820,85823],{"className":85821,"code":85822,"language":8330},[8328],"\n# The broker resource usage threshold.\n# When the broker resource usage is greater than the pulsar cluster average resource usage,\n# the threshold shedder will be triggered to offload bundles from the broker.\n# It only takes effect in ThresholdSheddler strategy.\nloadBalancerBrokerThresholdShedderPercentage=10\n\n# When calculating new resource usage, the history usage accounts for.\n# It only takes effect in ThresholdSheddler strategy.\nloadBalancerHistoryResourcePercentage=0.9\n\n# The BandWithIn usage weight when calculating new resource usage.\n# It only takes effect in ThresholdShedder strategy.\nloadBalancerBandwidthInResourceWeight=1.0\n\n# The BandWithOut usage weight when calculating new resource usage.\n# It only takes effect in ThresholdShedder strategy.\nloadBalancerBandwidthOutResourceWeight=1.0\n\n# The CPU usage weight when calculating new resource usage.\n# It only takes effect in ThresholdShedder strategy.\nloadBalancerCPUResourceWeight=1.0\n\n# The heap memory usage weight when calculating new resource usage.\n# It only takes effect in ThresholdShedder strategy.\nloadBalancerMemoryResourceWeight=1.0\n\n# The direct memory usage weight when calculating new resource usage.\n# It only takes effect in ThresholdShedder strategy.\nloadBalancerDirectMemoryResourceWeight=1.0\n\n# Bundle unload minimum throughput threshold (MB), avoiding bundle unload frequently.\n# It only takes effect in ThresholdShedder strategy.\nloadBalancerBundleUnloadMinThroughputThreshold=10\n\n",[4926,85824,85822],{"__ignoreMap":18},[321,85826,85827],{},[324,85828,85574,85829,190],{},[55,85830,85833],{"href":85831,"rel":85832},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6772",[264],"PR-6772",[32,85835,85837,85840],{"id":85836},"key-shared-add-consistent-hashing-in-the-key_shared-distribution",[2628,85838,85839],{},"Key Shared"," Add consistent hashing in the Key_Shared distribution",[48,85842,85843],{},"Previously, the implementation of the Key_Shared subscription used a mechanism to divide their hash space across the available consumers. This was based on dividing the currently assigned hash ranges when a new consumer joined or left.",[48,85845,85846],{},"Pulsar 2.6.0 introduces a new consistent hash distribution for the Key_Shared subscription. You can enable the consistent hash distribution in broker.conf and the auto split approach is still selected by default.",[8325,85848,85851],{"className":85849,"code":85850,"language":8330},[8328],"\n# On KeyShared subscriptions, with default AUTO_SPLIT mode, use splitting ranges or\n# consistent hashing to reassign keys to new consumers\nsubscriptionKeySharedUseConsistentHashing=false\n\n# On KeyShared subscriptions, number of points in the consistent-hashing ring.\n# The higher the number, the more equal the assignment of keys to consumers\nsubscriptionKeySharedConsistentHashingReplicaPoints=100\n\n",[4926,85852,85850],{"__ignoreMap":18},[48,85854,85855],{},"We plan to use consistent hash distribution by default in the subsequent versions.",[321,85857,85858],{},[324,85859,85574,85860,190],{},[55,85861,85864],{"href":85862,"rel":85863},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6791",[264],"PR-6791",[32,85866,85868,85870],{"id":85867},"key-shared-fix-ordering-issue-in-keyshared-dispatcher-when-adding-consumers",[2628,85869,85839],{}," Fix ordering issue in KeyShared dispatcher when adding consumers",[48,85872,85873],{},"This is a great fix for the Key_Shared subscription. Previously, ordering was broken in a KeyShared dispatcher if a new consumer c2 came in and an existing consumer c1 went out. This was because messages with keys previously assigned to c1 may route to c2, which might break the message ordering dispatch guarantee in the Key_Shared subscription.",[48,85875,85876],{},"This PR introduces new consumers joining in a \"paused\" state until the previous messages are acknowledged to ensure the messages are dispatched orderly.",[48,85878,85879],{},"If you still want the relaxed ordering, you can set up at the consumer side as below.",[8325,85881,85884],{"className":85882,"code":85883,"language":8330},[8328],"\npulsarClient.newConsumer()\n    .keySharedPolicy(KeySharedPolicy.autoSplitHashRange().setAllowOutOfOrderDelivery(true))\n    .subscribe();\n\n",[4926,85885,85883],{"__ignoreMap":18},[321,85887,85888],{},[324,85889,85574,85890,4003,85894,190],{},[55,85891,85893],{"href":79820,"rel":85892},[264],"PR-7106",[55,85895,85898],{"href":85896,"rel":85897},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7108",[264],"PR-7108",[32,85900,85902,85904],{"id":85901},"key-shared-add-support-for-key-hash-range-reading",[2628,85903,85839],{}," Add support for key hash range reading",[48,85906,85907],{},"This PR supports sticky key hash range reader. A broker only dispatches messages whose hash of the message key contains by a specified key hash range.",[48,85909,85910],{},"Besides, multiple key hash ranges can be specified on a reader.",[8325,85912,85915],{"className":85913,"code":85914,"language":8330},[8328],"\npulsarClient.newReader()\n                    .topic(topic)\n                    .startMessageId(MessageId.earliest)\n                    .keyHashRange(Range.of(0, 10000), Range.of(20001, 30000))\n                    .create();\n\n",[4926,85916,85914],{"__ignoreMap":18},[321,85918,85919],{},[324,85920,85574,85921,190],{},[55,85922,85924],{"href":79557,"rel":85923},[264],"PR-5928",[32,85926,85928],{"id":85927},"use-pure-java-air-compressor-instead-of-jni-based-libraries","Use pure-java Air-Compressor instead of JNI based libraries",[48,85930,85931],{},"Previously, JNI based libraries were used to perform data compression. While these libraries do have an overhead in terms of size and affect the JNI overhead which is typically measurable when compressing many small payloads.",[48,85933,85934,85935,85940],{},"This PR replaces compression libraries for LZ4, ZStd, and Snappy with ",[55,85936,85939],{"href":85937,"rel":85938},"https:\u002F\u002Fgithub.com\u002Fairlift\u002Faircompressor",[264],"AirCompressor",", which is a pure Java compression library used by Presto.",[321,85942,85943],{},[324,85944,85574,85945,190],{},[55,85946,85949],{"href":85947,"rel":85948},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F5390",[264],"PR-5390",[32,85951,85953],{"id":85952},"support-multiple-pulsar-clusters-using-the-same-bookkeeper-cluster","Support multiple Pulsar clusters using the same BookKeeper cluster",[48,85955,85956],{},"This PR allows multiple Pulsar clusters to use the specified BookKeeper cluster by pointing BookKeeper client to the ZooKeeper connection string of BookKeeper cluster.",[48,85958,85959],{},"This PR adds a configuration (bookkeeperMetadataServiceUri) to discover BookKeeper cluster metadata store and uses metadata service URI to initialize BookKeeper clients.",[8325,85961,85964],{"className":85962,"code":85963,"language":8330},[8328],"\n# Metadata service uri that bookkeeper is used for loading corresponding metadata driver\n# and resolving its metadata service location.\n# This value can be fetched using `bookkeeper shell whatisinstanceid` command in BookKeeper cluster.\n# For example: zk+hierarchical:\u002F\u002Flocalhost:2181\u002Fledgers\n# The metadata service uri list can also be semicolon separated values like below:\n# zk+hierarchical:\u002F\u002Fzk1:2181;zk2:2181;zk3:2181\u002Fledgers\nbookkeeperMetadataServiceUri=\n\n",[4926,85965,85963],{"__ignoreMap":18},[321,85967,85968],{},[324,85969,85574,85970,190],{},[55,85971,84016],{"href":84014,"rel":85972},[264],[32,85974,85976],{"id":85975},"support-deleting-inactive-topics-when-subscriptions-are-caught-up","Support deleting inactive topics when subscriptions are caught up",[48,85978,85979],{},"Previously, Pulsar supported deleting inactive topics which do not have active producers and subscriptions. This PR supports deleting inactive topics when all subscriptions of the topic are caught up and when there are no active producers or consumers.",[48,85981,85982],{},"This PR exposes inactive topic delete mode in broker.conf. In the future, we can support a namespace level configuration for the inactive topic delete mode.",[8325,85984,85987],{"className":85985,"code":85986,"language":8330},[8328],"\n# Set the inactive topic delete mode. Default is delete_when_no_subscriptions\n# 'delete_when_no_subscriptions' mode only delete the topic which has no subscriptions and no active producers\n# 'delete_when_subscriptions_caught_up' mode only delete the topic that all subscriptions has no backlogs(caught up)\n# and no active producers\u002Fconsumers\nbrokerDeleteInactiveTopicsMode=delete_when_no_subscriptions\n\n",[4926,85988,85986],{"__ignoreMap":18},[321,85990,85991],{},[324,85992,85574,85993,190],{},[55,85994,85997],{"href":85995,"rel":85996},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6077",[264],"PR-6077",[32,85999,86001],{"id":86000},"add-a-flag-to-skip-broker-shutdown-on-transient-oom","Add a flag to skip broker shutdown on transient OOM",[48,86003,86004,86005,86010],{},"A high dispatch rate on one of the topics may cause a broker to go OOM temporarily. It is a transient error and the broker can recover within a few seconds as soon as some memory gets released. However, in 2.4 release (",[55,86006,86009],{"href":86007,"rel":86008},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F4196",[264],"#4196","), the “restarted broker on OOM” feature can cause huge instability in a cluster, where a topic moves from one broker to another and restarts multiple brokers and disrupt other topics as well. So this PR provides a dynamic flag to skip broker shutdown on OOM to avoid instability in a cluster.",[321,86012,86013],{},[324,86014,85574,86015,190],{},[55,86016,86019],{"href":86017,"rel":86018},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6634",[264],"PR-6634",[32,86021,86023],{"id":86022},"make-zookeeper-cache-expiry-time-configurable","Make ZooKeeper cache expiry time configurable",[48,86025,86026],{},"Previously, ZooKeeper cache expiry time was hardcoded and it needed to be configurable to refresh value based on various requirements, for example, refreshing value quickly in case of zk-watch miss, avoiding frequent cache refresh to avoid zk-read or avoiding issue due to zk read timeout, and so on.",[48,86028,86029],{},"Now you can configure ZooKeeper cache expiry time in broker.conf as below.",[8325,86031,86034],{"className":86032,"code":86033,"language":8330},[8328],"\n# ZooKeeper cache expiry time in seconds\nzooKeeperCacheExpirySeconds=300\n\n",[4926,86035,86033],{"__ignoreMap":18},[321,86037,86038],{},[324,86039,85574,86040,190],{},[55,86041,86044],{"href":86042,"rel":86043},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6668",[264],"PR-6668",[32,86046,86048],{"id":86047},"optimize-consumer-fetch-messages-in-case-of-batch-message","Optimize consumer fetch messages in case of batch message",[48,86050,86051],{},"When a consumer sends a fetch request to a broker server, it contains a fetch message number telling the server how many messages should be pushed to a consumer client. However, the broker server stores data in BookKeeper or broker cache according to entry rather than a single message if the producer produces messages using the batch feature. There is a gap to map the number of messages to the number of entries when dealing with consumer fetch requests.",[48,86053,86054],{},"This PR adds a variable avgMessagesPerEntry to record average messages stored in one entry. It updates when a broker server pushes messages to a consumer. When dealing with consumer fetch requests, it maps fetch request number to entry number. Additionally, this PR exposes the avgMessagePerEntry static value to consumer stat metric json.",[48,86056,86057],{},"You can enable preciseDispatcherFlowControl in broker.conf as below.",[8325,86059,86062],{"className":86060,"code":86061,"language":8330},[8328],"\n# Precise dispatcher flow control according to history message number of each entry\npreciseDispatcherFlowControl=false\n\n",[4926,86063,86061],{"__ignoreMap":18},[321,86065,86066],{},[324,86067,85574,86068,190],{},[55,86069,86072],{"href":86070,"rel":86071},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6719",[264],"PR-6719",[32,86074,86076],{"id":86075},"introduce-precise-topic-publish-rate-limiting","Introduce precise topic publish rate limiting",[48,86078,86079],{},"Previously, Pulsar supported the publish rate limiting but it is not a precise control. Now, for some use cases that need precise control, you can enable it in broker.conf as below.",[8325,86081,86084],{"className":86082,"code":86083,"language":8330},[8328],"\npreciseTopicPublishRateLimiterEnable=true\n\n",[4926,86085,86083],{"__ignoreMap":18},[321,86087,86088],{},[324,86089,85574,86090,190],{},[55,86091,86093],{"href":79477,"rel":86092},[264],"PR-7078",[32,86095,86097],{"id":86096},"expose-check-delay-of-new-entries-in-brokerconf","Expose check delay of new entries in broker.conf",[48,86099,86100],{},"Previously, the check delay of new entries was 10 ms and could not be changed by users. Currently, for consumption latency sensitive scenarios, you can set the value of check delay of new entries to a smaller value or 0 in broker.conf as below. Using a smaller value may degrade consumption throughput.",[8325,86102,86105],{"className":86103,"code":86104,"language":8330},[8328],"\nmanagedLedgerNewEntriesCheckDelayInMillis=10\n\n",[4926,86106,86104],{"__ignoreMap":18},[321,86108,86109],{},[324,86110,85574,86111,190],{},[55,86112,86115],{"href":86113,"rel":86114},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7154",[264],"PR-7154",[32,86117,86119,86122],{"id":86118},"schema-support-null-key-and-null-value-in-keyvalue-schema",[2628,86120,86121],{},"Schema"," Support null key and null value in KeyValue schema",[321,86124,86125],{},[324,86126,85574,86127,190],{},[55,86128,86131],{"href":86129,"rel":86130},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7139",[264],"PR-7139",[32,86133,86135],{"id":86134},"support-triggering-ledger-rollover-when-maxledgerrollovertimeminutes-is-met","Support triggering ledger rollover when maxLedgerRolloverTimeMinutes is met",[48,86137,86138],{},"This PR implements a monitoring thread to check if the current topic ledger meets the constraint of managedLedgerMaxLedgerRolloverTimeMinutes and triggers a rollover to make the configuration take effect. Another important idea is that if you trigger a rollover, you can close the current ledger so that you can release the storage of the current ledger. For some less commonly used topics, the current ledger data is likely to be expired and the current rollover logic is only triggered when adding a new entry. Obviously, this results in a waste of disk space.",[48,86140,86141],{},"The monitoring thread is scheduled at a fixed time interval and the interval is set to managedLedgerMaxLedgerRolloverTimeMinutes. Each inspection makes two judgments at the same time, for example, currentLedgerEntries > 0 and currentLedgerIsFull(). When the number of current entries is equal to 0, it does not trigger a new rollover and you can use this to reduce the ledger creation.",[321,86143,86144],{},[324,86145,85574,86146,190],{},[55,86147,86150],{"href":86148,"rel":86149},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F7111",[264],"PR-7116",[40,86152,68241],{"id":68240},[32,86154,86156],{"id":86155},"add-rest-api-to-get-connection-and-topic-stats","Add REST API to get connection and topic stats",[48,86158,86159],{},"Previously, Pulsar proxy did not have useful stats to get internal information about the proxy. It is better to have internal-stats of proxy to get information, such as live connections, topic stats (with higher logging level), and so on.",[48,86161,86162],{},"This PR adds REST API to get stats for connection and topics served by proxy.",[321,86164,86165],{},[324,86166,85574,86167,190],{},[55,86168,86171],{"href":86169,"rel":86170},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6473",[264],"PR-6473",[40,86173,86175],{"id":86174},"admin","Admin",[32,86177,86179],{"id":86178},"support-getting-a-message-by-message-id-in-pulsar-admin","Support getting a message by message ID in pulsar-admin",[48,86181,86182],{},"This PR adds a new command get-message-by-id to the pulsar-admin. It allows users to check a single message by providing ledger ID and entry ID.",[321,86184,86185],{},[324,86186,85574,86187,190],{},[55,86188,86191],{"href":86189,"rel":86190},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6331",[264],"PR-6331",[32,86193,86195],{"id":86194},"support-deleting-subscriptions-forcefully","Support deleting subscriptions forcefully",[48,86197,86198],{},"This PR adds the method deleteForcefully to support force deleting subscriptions.",[321,86200,86201],{},[324,86202,85574,86203,190],{},[55,86204,86207],{"href":86205,"rel":86206},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6383",[264],"PR-6383",[40,86209,9636],{"id":80225},[32,86211,86213],{"id":86212},"built-in-functions","Built-in functions",[48,86215,86216],{},"This PR implements the possibility of creating built-in functions in the same way as adding built-in connectors.",[321,86218,86219],{},[324,86220,85574,86221,190],{},[55,86222,86225],{"href":86223,"rel":86224},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6895",[264],"PR-6895",[32,86227,86229],{"id":86228},"add-go-function-heartbeat-and-grpc-service-for-production-usage","Add Go Function heartbeat (and gRPC service) for production usage",[321,86231,86232],{},[324,86233,85574,86234,190],{},[55,86235,86238],{"href":86236,"rel":86237},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6031",[264],"PR-6031",[32,86240,86242],{"id":86241},"add-custom-property-options-to-functions","Add custom property options to functions",[48,86244,86245],{},"This PR allows users to set custom system properties while submitting functions. This can be used to pass credentials via a system property.",[321,86247,86248],{},[324,86249,85574,86250,190],{},[55,86251,86254],{"href":86252,"rel":86253},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6348",[264],"PR-6348",[32,86256,86258],{"id":86257},"separate-tls-configurations-of-function-worker-and-broker","Separate TLS configurations of function worker and broker",[321,86260,86261],{},[324,86262,85574,86263,190],{},[55,86264,86267],{"href":86265,"rel":86266},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6602",[264],"PR-6602",[32,86269,86271],{"id":86270},"add-the-ability-to-build-consumers-in-functions-and-sources","Add the ability to build consumers in functions and sources",[48,86273,86274],{},"Previously, function and source context give their writers an ability to create publishers but not consumers. This PR fixes this issue.",[321,86276,86277],{},[324,86278,85574,86279,190],{},[55,86280,86283],{"href":86281,"rel":86282},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6954",[264],"PR-6954",[40,86285,74373],{"id":74372},[32,86287,86289],{"id":86288},"support-keyvalue-schema","Support KeyValue schema",[48,86291,86292],{},"Previously, Pulsar SQL could not read the KeyValue schema data.",[48,86294,86295],{},"This PR adds KeyValue schema support for Pulsar SQL. It adds the prefix key. for the key field name and value. for the value field name.",[321,86297,86298],{},[324,86299,85574,86300,190],{},[55,86301,86304],{"href":86302,"rel":86303},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6325",[264],"PR-6325",[32,86306,86308],{"id":86307},"support-multiple-avro-schema-versions","Support multiple Avro schema versions",[48,86310,86311],{},"Previously, if you have multiple Avro schema versions for a topic, using Pulsar SQL to query data from this topic introduces some problems. With this change, You can evolve the schema of the topic and keep transitive backward compatibility of all schemas of the topic if you want to query data from this topic.",[321,86313,86314],{},[324,86315,85574,86316,190],{},[55,86317,86320],{"href":86318,"rel":86319},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F4847",[264],"PR-4847",[40,86322,72443],{"id":72442},[32,86324,86326],{"id":86325},"support-waiting-for-inflight-messages-while-closing-a-producer","Support waiting for inflight messages while closing a producer",[48,86328,86329],{},"Previously, when you closed a producer, pulsar-client immediately failed inflight messages even if they persisted successfully at the broker. Most of the time, users want to wait for those inflight messages rather than fail them. While pulsar-client lib did not provide a way to wait for inflight messages before closing the producer.",[48,86331,86332],{},"This PR supports closing API with the flag where you can control waiting for inflight messages. With this change, you can close a producer by waiting for inflight messages and pulsar-client does not fail those messages immediately.",[321,86334,86335],{},[324,86336,85574,86337,190],{},[55,86338,86341],{"href":86339,"rel":86340},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6648",[264],"PR-6648",[32,86343,86345],{"id":86344},"support-loading-tls-certskey-dynamically-from-input-stream","Support loading TLS certs\u002Fkey dynamically from input stream",[48,86347,86348],{},"Previously, pulsar-client provided TLS authentication support and default TLS provider AuthenticationTls expected file path of cert and key files. However, there were use cases where it was difficult for user applications to store certs\u002Fkey files locally for TLS authentication.",[48,86350,86351],{},"This PR adds stream support in AuthenticationTls to provide X509Certs and PrivateKey which also performs auto-refresh when streaming changes in a given provider.",[321,86353,86354],{},[324,86355,85574,86356,190],{},[55,86357,86360],{"href":86358,"rel":86359},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6760",[264],"PR-6760",[32,86362,86364],{"id":86363},"support-returning-sequence-id-when-throwing-an-exception-for-async-send-messages","Support returning sequence ID when throwing an exception for async send messages",[48,86366,86367],{},"Previously, when sending messages asynchronously failed, an exception was thrown, but did not know which message was abnormal, and users did not know which messages needed to be retried.",[48,86369,86370],{},"This PR makes changes supported on the client side. When throwing an exception, the sequenceId org.apache.pulsar.client.api.PulsarClientException is set.",[321,86372,86373],{},[324,86374,85574,86375,190],{},[55,86376,86379],{"href":86377,"rel":86378},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F6825",[264],"PR-6825",[40,86381,52473],{"id":52472},[32,86383,75345],{"id":84054},[321,86385,86386,86392],{},[324,86387,86388,86389,190],{},"To download Apache Pulsar 2.6.0, click ",[55,86390,267],{"href":53730,"rel":86391},[264],[324,86393,86394,86395,4003,86400,190],{},"For more information about Apache Pulsar 2.6.0, see ",[55,86396,86399],{"href":86397,"rel":86398},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#2.6.0",[264],"2.6.0 release notes",[55,86401,86404],{"href":86402,"rel":86403},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpulls?q=milestone%3A2.6.0+-label%3Arelease%2F2.5.2+-label%3Arelease%2F2.5.1+",[264],"2.6.0 PR list",[48,86406,84078],{},[321,86408,86409,86415],{},[324,86410,84083,86411,4003,86413,190],{},[55,86412,78612],{"href":78611},[55,86414,78618],{"href":78617},[324,86416,84090,86417,190],{},[55,86418,36242],{"href":36242,"rel":86419},[264],[48,86421,78633,86422,190],{},[55,86423,75345],{"href":36230,"rel":86424},[264],[48,86426,86427,86428,190],{},"This post was originally published by Penghui Li on ",[55,86429,84106],{"href":86430,"rel":86431},"https:\u002F\u002Fpulsar.apache.org\u002Fblog\u002F2020\u002F06\u002F18\u002FApache-Pulsar-2-6-0\u002F",[264],[32,86433,4496],{"id":84109},[48,86435,84112,86436,84116,86438,190],{},[55,86437,84115],{"href":10259},[55,86439,84120],{"href":33664,"rel":86440},[264],[48,86442,84123,86443,190],{},[55,86444,84128],{"href":84126,"rel":86445},[264],[48,86447,84131],{},{"title":18,"searchDepth":19,"depth":19,"links":86449},[86450,86487,86490,86494,86501,86505,86510],{"id":85546,"depth":19,"text":85547,"children":86451},[86452,86454,86456,86458,86460,86462,86464,86466,86468,86470,86472,86474,86476,86477,86478,86479,86480,86481,86482,86483,86484,86486],{"id":85550,"depth":279,"text":86453},"PIP-37 Large message size support",{"id":85582,"depth":279,"text":86455},"PIP-39 Namespace change events (system topic)",{"id":85610,"depth":279,"text":86457},"PIP-45 Pluggable metadata interface",{"id":85639,"depth":279,"text":86459},"PIP-54 Support acknowledgment at the batch index level",{"id":85671,"depth":279,"text":86461},"PIP-58 Support consumers setting custom message retry delay",{"id":85706,"depth":279,"text":86463},"PIP-60 Support SNI routing to support various proxy servers",{"id":85735,"depth":279,"text":86465},"PIP-61 Advertise multiple addresses",{"id":85776,"depth":279,"text":86467},"PIP-65 Adapt Pulsar IO sources to support BatchSources",{"id":85802,"depth":279,"text":86469},"Load balancer Add ThresholdShedder strategy for the load balancer",{"id":85836,"depth":279,"text":86471},"Key Shared Add consistent hashing in the Key_Shared distribution",{"id":85867,"depth":279,"text":86473},"Key Shared Fix ordering issue in KeyShared dispatcher when adding consumers",{"id":85901,"depth":279,"text":86475},"Key Shared Add support for key hash range reading",{"id":85927,"depth":279,"text":85928},{"id":85952,"depth":279,"text":85953},{"id":85975,"depth":279,"text":85976},{"id":86000,"depth":279,"text":86001},{"id":86022,"depth":279,"text":86023},{"id":86047,"depth":279,"text":86048},{"id":86075,"depth":279,"text":86076},{"id":86096,"depth":279,"text":86097},{"id":86118,"depth":279,"text":86485},"Schema Support null key and null value in KeyValue schema",{"id":86134,"depth":279,"text":86135},{"id":68240,"depth":19,"text":68241,"children":86488},[86489],{"id":86155,"depth":279,"text":86156},{"id":86174,"depth":19,"text":86175,"children":86491},[86492,86493],{"id":86178,"depth":279,"text":86179},{"id":86194,"depth":279,"text":86195},{"id":80225,"depth":19,"text":9636,"children":86495},[86496,86497,86498,86499,86500],{"id":86212,"depth":279,"text":86213},{"id":86228,"depth":279,"text":86229},{"id":86241,"depth":279,"text":86242},{"id":86257,"depth":279,"text":86258},{"id":86270,"depth":279,"text":86271},{"id":74372,"depth":19,"text":74373,"children":86502},[86503,86504],{"id":86288,"depth":279,"text":86289},{"id":86307,"depth":279,"text":86308},{"id":72442,"depth":19,"text":72443,"children":86506},[86507,86508,86509],{"id":86325,"depth":279,"text":86326},{"id":86344,"depth":279,"text":86345},{"id":86363,"depth":279,"text":86364},{"id":52472,"depth":19,"text":52473,"children":86511},[86512,86513],{"id":84054,"depth":279,"text":75345},{"id":84109,"depth":279,"text":4496},"2020-06-18","Learn the most interesting and major features added to Pulsar 2.6.0.","\u002Fimgs\u002Fblogs\u002F63d798346c268e4f9c96ec60_63a3726cefd8ad294839867a_260-top.webp",{},"\u002Fblog\u002Fapache-pulsar-2-6-0",{"title":85534,"description":86515},"blog\u002Fapache-pulsar-2-6-0",[302,821],"pL2-555qS0x1R2hUHczXRabTqZlhN9sjS_KS4S9XSXs",{"id":86524,"title":86525,"authors":86526,"body":86528,"category":821,"createdAt":290,"date":86761,"description":86762,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":86763,"navigation":7,"order":296,"path":86764,"readingTime":11508,"relatedResources":290,"seo":86765,"stem":86766,"tags":86767,"__hash__":86768},"blogs\u002Fblog\u002Famqp-pulsar-bring-native-amqp-protocol-support-to-apache-pulsar.md","Announcing AMQP-on-Pulsar: bring native AMQP protocol support to Apache Pulsar",[808,58855,86527],"Zongtang Hu",{"type":15,"value":86529,"toc":86748},[86530,86534,86536,86547,86549,86552,86556,86566,86580,86584,86587,86590,86593,86597,86601,86604,86607,86612,86616,86619,86624,86638,86642,86645,86649,86652,86656,86659,86663,86666,86669,86674,86676,86679,86682,86696,86701,86710,86712,86733,86736,86738,86741],[916,86531,86532],{},[48,86533,53634],{},[48,86535,53637],{},[48,86537,86538,86539,86542,86543,86546],{},"We are excited to announce that StreamNative and ChinaMobile are open-sourcing \"AMQP on Pulsar\" (AoP). AoP brings the native AMQP protocol support to Apache Pulsar by introducing an AMQP protocol handler on Pulsar brokers. Similar to ",[55,86540,35093],{"href":29592,"rel":86541},[264],", AoP is also an implementation of the ",[55,86544,82392],{"href":67379,"rel":86545},[264],". By adding the AoP protocol handler in your existing Pulsar cluster, you can migrate your existing RabbitMQ applications and services to Pulsar without modifying the code. This enables RabbitMQ applications to leverage Pulsar’s powerful features, such as infinite event stream retention with Apache BookKeeper and tiered storage.",[40,86548,62871],{"id":62870},[48,86550,86551],{},"Apache Pulsar is an event streaming platform designed from the ground up to be cloud-native- deploying a multi-layer and segment-centric architecture. The architecture separates serving and storage into different layers, making the system container-friendly. The cloud-native architecture provides scalability, availability, and resiliency and enables companies to expand their offerings with real-time data-enabled solutions. Pulsar has gained wide adoption since it was open-sourced in 2016 and was designated an Apache Top-Level project in 2018.",[40,86553,86555],{"id":86554},"the-need-for-aop","The Need For AoP",[48,86557,86558,86559,82426,86562,86565],{},"Pulsar provides a unified messaging model for both queueing and streaming workloads. Pulsar implemented its own protobuf-based binary protocol to provide high performance and low latency. This choice of protobuf makes it convenient to implement ",[55,86560,82425],{"href":67133,"rel":86561},[264],[55,86563,82431],{"href":82429,"rel":86564},[264]," provided by the community. However, existing applications written using other messaging protocols have to be rewritten to adopt Pulsar’s new unified messaging protocol.",[48,86567,86568,86569,4003,86574,86579],{},"To address this, the Pulsar community developed applications to facilitate the migration to Pulsar from other messaging systems. For example, Pulsar provides the ",[55,86570,86573],{"href":86571,"rel":86572},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Frabbitmq-source\u002F2.5.1",[264],"RabbitMQ Source connector",[55,86575,86578],{"href":86576,"rel":86577},"https:\u002F\u002Fhub.streamnative.io\u002Fconnectors\u002Frabbitmq-sink\u002F2.5.1",[264],"RabbitMQ Sink connector"," to get through the data transfer between Pulsar and RabbitMQ. Yet, there was still a strong demand from those looking to switch from other AMQP applications to Pulsar.",[40,86581,86583],{"id":86582},"streamnative-and-chinamobile-collaboration","StreamNative and ChinaMobile collaboration",[48,86585,86586],{},"StreamNative was receiving a lot of inbound requests for help migrating from other messaging systems to Pulsar and recognized the need to support other messaging protocols (such as AMQP and Kafka) natively on Pulsar. StreamNative began working on introducing a general protocol handler framework in Pulsar that would allow developers using other messaging protocols to use Pulsar.",[48,86588,86589],{},"ChinaMobile is the Gold Member of OpenStack Foundation and has the largest OpenStack cluster deployment practice in the world. RabbitMQ is the default integration of the message middleware in OpenStack, and ChinaMobile has encountered great challenges in the deployment and maintenance of RabbitMQ. In the OpenStack system, RabbitMQ, as an RPC communication component, has a large number of messages flowing in and out. During the operation process, there is often a backlog of messages. This will cause memory exceptions, and processes will often be stuck due to memory exceptions. On the other hand, RabbitMQ's mirrored queue is used in order to ensure high availability of data. When a node runs into an abnormal state, the entire cluster is unavailable regularly. Moreover, RabbitMQ's programming language erlang is obscure and difficult to troubleshoot. In summary, considering the instability of RabbitMQ cluster, the difficulty of operation and maintenance, and the difficulty of troubleshooting, ChinaMobile intends to develop a middleware product that can replace RabbitMQ.",[48,86591,86592],{},"At the same time, there are many customers in the ChinaMobile's public cloud who need to use AMQP message queues. However, RabbitMQ does not meet the condition for cloud access. Therefore, ChinaMobile's middleware team begins to investigate the self-developed technical route of AMQP message queue. By comparing Qpid, RocketMQ and Pulsar, ChinaMobile is attracted by Pulsar's unique architecture which decouples data serving and data storage into separate layers. After investigating Pulsar for a period of time, ChinaMobile finds that StreamNative has had KoP open-sourced, which makes ChinaMobile believe that it is feasible to develop AMQP based on Pulsar. ChinaMobile and StreamNative begin to cooperate on development of AMQP based on Pulsar.",[40,86594,86596],{"id":86595},"implementations","Implementations",[32,86598,86600],{"id":86599},"aop-architecture-overview","AoP architecture overview",[48,86602,86603],{},"AoP is implemented as a pluggable protocol handler that can support native AMQP wire protocol on Pulsar by leveraging Pulsar features such as Pulsar topics, cursors etc.",[48,86605,86606],{},"The diagram below illustrates a Pulsar cluster with the AoP protocol handler. Both the AMQP Proxy and AMQP protocol handler can run along with Pulsar brokers. Now, AoP is based on AMQP 0.9.1 wire protocol and we are considering adding support for AMQP 1.0 wire protocol.",[48,86608,86609],{},[384,86610],{"alt":86600,"src":86611},"\u002Fimgs\u002Fblogs\u002F63a3712363ad9e780baa19f1_aop-overview-1.png",[32,86613,86615],{"id":86614},"aop-basic-concepts","AoP basic concepts",[48,86617,86618],{},"AMQP 0.9.1 introduces some basic concepts such as Exchange, Queue, and Router. It’s quite different from the Pulsar model. So it needs to find an approach that can leverage Pulsar’s existing features and associate them together. The following figure illustrates the message flow in AoP and discusses the details about the message persistence, message routing, and message delivery.",[48,86620,86621],{},[384,86622],{"alt":18,"src":86623},"\u002Fimgs\u002Fblogs\u002F63a371210e673ba57a202260_aop-overview-2.png",[1666,86625,86626,86629,86632,86635],{},[324,86627,86628],{},"When a producer sends a message to the AmqpExchange, the AmqpExchange persists the message into a Pulsar topic (original message topic).",[324,86630,86631],{},"The replicator in the AmqpExchange replicates messages to the message routers.",[324,86633,86634],{},"The message router decides whether to route this message to the AmqpQueue. If yes, the AmqpQueue persists the message ID into a Pulsar topic (index message topic).",[324,86636,86637],{},"The AmqpQueue delivers the messages to the consumer.",[3933,86639,86641],{"id":86640},"amqpexchange","AmqpExchange",[48,86643,86644],{},"An AmqpExchange has an original message topic for maintaining messages produced by the AMQP producer. And it has a replicator for replicating messages to the AMQP queues. The replicator is backed by a Pulsar durable cursor, which can ensure that the messages can be replicated to the queue and not be lost.",[3933,86646,86648],{"id":86647},"amqpmessagerouter","AmqpMessageRouter",[48,86650,86651],{},"The AmqpMessageRouter maintains the message routing type and the routing rules from an AmqpExchange to an AmqpQueue. The routing type and the routing rules also persist into the Pulsar storage layer. So we can recover the message router even if the broker is restarted.",[3933,86653,86655],{"id":86654},"amqpqueue","AmqpQueue",[48,86657,86658],{},"An AmqpQueue has an index message Topic that stores IndexMessages that are routed to this queue. The IndexMessage consists of a message ID of the original message and the exchange name where the message comes from. When the AmqpQueue delivers messages to the consumers, it will read the original message data by the IndexMessage and dispatch the original message data to the consumers.",[32,86660,86662],{"id":86661},"vhost-assignment","Vhost assignment",[48,86664,86665],{},"In AoP, an AMQP Vhost can be served by a Pulsar broker and a Pulsar broker can serve multiple Vhosts. So adding more Vhosts and brokers can achieve horizontal expansion. This allows you to set up a larger AoP cluster with many Vhosts.",[48,86667,86668],{},"A Vhost in AoP is backed by a Pulsar namespace with a single bundle. If a broker crashes, other brokers can take over the Vhosts that are maintained by this crashed broker. Also the Vhosts can leverage the broker load balance mechanism. The broker can relocate Vhost with a high workload to an idle broker. The following figure illustrates the Vhost assignment.",[48,86670,86671],{},[384,86672],{"alt":86662,"src":86673},"\u002Fimgs\u002Fblogs\u002F63a371225bbfd73c8e93941e_aop-overview-3.png",[32,86675,68241],{"id":68240},[48,86677,86678],{},"The AoP Proxy is for finding the owner broker responsible for the Vhost when the client connects to the AMQP server and transfers data between the client and the owner broker. As described in the above section, the target Vhost is served by a broker in the cluster. This can be achieved by the topic lookup mechanism in Pulsar. This is why a Vhost can only be backed by a namespace with a single bundle. If the namespace has multiple bundles, we cannot find the owner broker by the Vhost name.",[48,86680,86681],{},"The following figure illustrates the AoP Proxy service workflow.",[1666,86683,86684,86687,86690,86693],{},[324,86685,86686],{},"The AMQP client creates a connection with the AoP Proxy.",[324,86688,86689],{},"The Proxy service sends a lookup request to Pulsar cluster to find out the owner broker URL of the Vhost along with the connection.",[324,86691,86692],{},"The Pulsar cluster returns the owner broker URL to the AoP Proxy.",[324,86694,86695],{},"The AoP Proxy creates a connection to the owner broker and starts to transfer data between the AMQP client and the owner Broker.",[48,86697,86698],{},[384,86699],{"alt":18,"src":86700},"\u002Fimgs\u002Fblogs\u002F63a37121f058399fa3a6e544_aop-overview-4.png",[48,86702,86703,86704,86709],{},"Currently, the AoP Proxy service works with the Pulsar Broker together. Users could choose whether to start the Proxy service by the ",[55,86705,86708],{"href":86706,"rel":86707},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Faop\u002Fwiki",[264],"configuration","(amqpProxyEnable).",[40,86711,75990],{"id":82059},[48,86713,86714,86715,86718,86719,86724,86725,86728,86729,86732],{},"AoP is open sourced under Apache License V2 in ",[55,86716,37237],{"href":37237,"rel":86717},[264],", which is available in the StreamNative Hub. You can download the AoP protocol handler through this ",[55,86720,86723],{"href":86721,"rel":86722},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Faop\u002Freleases\u002Fdownload\u002Fv0.1.0\u002Fpulsar-protocol-handler-amqp-0.1.0-SNAPSHOT.nar",[264],"link",". For details about how to install and use the AoP protocol handler, see ",[55,86726,267],{"href":71133,"rel":86727},[264],". In future, the AoP protocol handler will be embedded in the StreamNative Platform Version 1.1. You can also download the ",[55,86730,86731],{"href":37361},"StreamNative platform"," to try out all the features of AoP. If you already have a Pulsar cluster running and would like to enable AMQP protocol support on it, you can follow the instructions to install the AoP protocol handler to your existing Pulsar cluster.",[48,86734,86735],{},"Here is more information on AoP's code and document. We are looking forward to your issues, and PRs. You can also join #aop channel in Pulsar Slack to discuss all things about AMQP-on-Pulsar.",[40,86737,82734],{"id":82733},[48,86739,86740],{},"The AoP project was originally initiated by StreamNative. The ChinaMobile development team played a very important role in the development process. Many thanks to Zongtang Hu, Shaojie Wang and Hao Zhang from ChinaMobile for their contributions to this project!",[48,86742,82189,86743,1154,86746,190],{},[55,86744,39691],{"href":33664,"rel":86745},[264],[55,86747,24379],{"href":45219},{"title":18,"searchDepth":19,"depth":19,"links":86749},[86750,86751,86752,86753,86759,86760],{"id":62870,"depth":19,"text":62871},{"id":86554,"depth":19,"text":86555},{"id":86582,"depth":19,"text":86583},{"id":86595,"depth":19,"text":86596,"children":86754},[86755,86756,86757,86758],{"id":86599,"depth":279,"text":86600},{"id":86614,"depth":279,"text":86615},{"id":86661,"depth":279,"text":86662},{"id":68240,"depth":279,"text":68241},{"id":82059,"depth":19,"text":75990},{"id":82733,"depth":19,"text":82734},"2020-06-15","AoP is also an implementation of the pluggable protocol handler. By adding the AoP protocol handler in your existing Pulsar cluster, you can migrate your existing RabbitMQ applications and services to Pulsar without modifying the code.",{},"\u002Fblog\u002Famqp-pulsar-bring-native-amqp-protocol-support-to-apache-pulsar",{"title":86525,"description":86762},"blog\u002Famqp-pulsar-bring-native-amqp-protocol-support-to-apache-pulsar",[11043,3550,821,9144],"NC7_f3dqM6jAa4RwanYIWy6PBQW_I_tWiYy00zmgIrI",{"id":86770,"title":86771,"authors":86772,"body":86773,"category":821,"createdAt":290,"date":86997,"description":86998,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":86999,"navigation":7,"order":296,"path":87000,"readingTime":3556,"relatedResources":290,"seo":87001,"stem":87002,"tags":87003,"__hash__":87004},"blogs\u002Fblog\u002Fhow-to-trace-pulsar-messages-with-opentracing-jaeger.md","How to trace Pulsar messages with OpenTracing and Jaeger",[808],{"type":15,"value":86774,"toc":86989},[86775,86798,86801,86805,86813,86817,86822,86828,86831,86846,86856,86862,86866,86884,86887,86893,86897,86900,86903,86914,86918,86921,86927,86930,86933,86936,86939,86942,86945,86951,86954,86956,86958,86961,86964,86970,86983],[48,86776,86777,86782,86783,1186,86788,4003,86793,190],{},[55,86778,86781],{"href":86779,"rel":86780},"https:\u002F\u002Fopentracing.io\u002F",[264],"OpenTracing"," is an open distributed tracing standard for applications and OSS packages. Many tracing backend services support OpenTracing APIs, such as ",[55,86784,86787],{"href":86785,"rel":86786},"https:\u002F\u002Fwww.jaegertracing.io\u002F",[264],"Jaeger",[55,86789,86792],{"href":86790,"rel":86791},"https:\u002F\u002Fzipkin.io\u002F",[264],"Zipkin",[55,86794,86797],{"href":86795,"rel":86796},"https:\u002F\u002Fskywalking.apache.org\u002F",[264],"SkyWalking",[48,86799,86800],{},"This blog guides you through every step of how to trace Pulsar messages by Jaeger through OpenTracing API.",[40,86802,86804],{"id":86803},"prerequisite","Prerequisite",[48,86806,86807,86808,86812],{},"Before getting started, make sure you have installed JDK 8, Maven 3, and Pulsar (cluster or standalone). If you do not have an available Pulsar, follow the ",[55,86809,41409],{"href":86810,"rel":86811},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fstandalone\u002F",[264]," to install one.",[40,86814,86816],{"id":86815},"step-1-start-a-jaeger-backend","Step 1: start a Jaeger backend",[1666,86818,86819],{},[324,86820,86821],{},"Start a Jaeger backend in Docker.",[8325,86823,86826],{"className":86824,"code":86825,"language":8330},[8328],"docker run -d -p 6831:6831\u002Fudp -p 16686:16686 jaegertracing\u002Fall-in-one:latest\n",[4926,86827,86825],{"__ignoreMap":18},[48,86829,86830],{},"If you have successfully started Jaeger, you can open the Jaeger UI website successfully.",[916,86832,86833],{},[48,86834,86835,86836,1154,86841,190],{},"Tip : If you do not have a Jager Docker environment, you can ",[55,86837,86840],{"href":86838,"rel":86839},"https:\u002F\u002Fwww.jaegertracing.io\u002Fdownload\u002F",[264],"download the binaries",[55,86842,86845],{"href":86843,"rel":86844},"https:\u002F\u002Fwww.jaegertracing.io\u002Fdocs\u002F1.17\u002Fgetting-started\u002F#from-source",[264],"build from source",[1666,86847,86848],{},[324,86849,86850,86851,86855],{},"Visit ",[55,86852,86853],{"href":86853,"rel":86854},"http:\u002F\u002Flocalhost:16686",[264]," to open the Jaeger UI website without a username or password.",[48,86857,86858],{},[384,86859],{"alt":86860,"src":86861},"jager ui interface","\u002Fimgs\u002Fblogs\u002F63a36f2e2df0e2c291b3527c_jaeger-ui.png",[40,86863,86865],{"id":86864},"step-2-add-maven-dependencies","Step 2: add maven dependencies",[48,86867,86868,86869,86874,86875,86880,86881,190],{},"This step uses ",[55,86870,86873],{"href":86871,"rel":86872},"https:\u002F\u002Fhub.streamnative.io\u002Fmonitoring\u002Fopentracing-pulsar-client\u002F0.1.0",[264],"OpenTracing Pulsar Client",", which is integrated with the Pulsar Client and OpenTracing APIs based on ",[55,86876,86879],{"href":86877,"rel":86878},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-23:-Message-Tracing-By-Interceptors",[264],"Pulsar Client Interceptors",", to trace Pulsar messages. Developed by StreamNative, the OpenTracing Pulsar Client acts as a monitoring tool in the ",[55,86882,38697],{"href":35258,"rel":86883},[264],[48,86885,86886],{},"Add Jaeger client dependency to connect to Jaeger backend.",[8325,86888,86891],{"className":86889,"code":86890,"language":8330},[8328],"\n org.apache.pulsar\n pulsar-client\n 2.5.1\n\n io.streamnative\n opentracing-pulsar-client\n 0.1.0\n\n  io.jaegertracing\n  jaeger-client\n  1.2.0\n\n",[4926,86892,86890],{"__ignoreMap":18},[40,86894,86896],{"id":86895},"step-3-use-opentracing-pulsar-client","Step 3: use OpenTracing Pulsar Client",[48,86898,86899],{},"For easier understanding, this blog takes a usage scenario as an example. Suppose that you have three jobs and two topics. Job-1 publishes messages to the topic-A and Job-2 consumes messages from the topic-A. When Job-2 receives a message from topic-A, Job-2 sends a message to the topic-B, and then Job-3 consumes messages from topic-B. So there are two topics, two producers and two consumers in this scenario.",[48,86901,86902],{},"According to the scenario described previously, you need to start three applications to finish this job.",[321,86904,86905,86908,86911],{},[324,86906,86907],{},"Job-1: publish messages to topic-A",[324,86909,86910],{},"Job-2: consume messages from topic-A and publish messages to topic-B",[324,86912,86913],{},"Job-3: consume messages from topic-B",[32,86915,86917],{"id":86916},"job-1","Job-1",[48,86919,86920],{},"This example shows how to publish messages to topic-A in Java.",[8325,86922,86925],{"className":86923,"code":86924,"language":8330},[8328],"Configuration.SamplerConfiguration samplerConfig = Configuration.SamplerConfiguration.fromEnv().withType(\"const\").withParam(1);\nConfiguration.ReporterConfiguration reporterConfig = Configuration.ReporterConfiguration.fromEnv().withLogSpans(true);\nConfiguration configuration = new Configuration(\"Job-1\").withSampler(samplerConfig).withReporter(reporterConfig);\n\nTracer tracer = configuration.getTracer();\nGlobalTracer.registerIfAbsent(tracer);\n\nPulsarClient client = PulsarClient.builder()\n        .serviceUrl(\"pulsar:\u002F\u002Flocalhost:6650\")\n        .build();\n\nProducer producerA = client.newProducer(Schema.STRING)\n        .topic(\"topic-A\")\n        .intercept(new TracingProducerInterceptor())\n        .create();\n\nfor (int i = 0; i \n### Job-2\n\nThis example shows how to consume messages from topic-A and publish messages to topic-B in Java.\n\n",[4926,86926,86924],{"__ignoreMap":18},[48,86928,86929],{},"Configuration.SamplerConfiguration samplerConfig = Configuration.SamplerConfiguration.fromEnv().withType(\"const\").withParam(1);\nConfiguration.ReporterConfiguration reporterConfig = Configuration.ReporterConfiguration.fromEnv().withLogSpans(true);\nConfiguration configuration = new Configuration(\"Job-2\").withSampler(samplerConfig).withReporter(reporterConfig);",[48,86931,86932],{},"Tracer tracer = configuration.getTracer();\nGlobalTracer.registerIfAbsent(tracer);",[48,86934,86935],{},"PulsarClient client = PulsarClient.builder()\n       .serviceUrl(\"pulsar:\u002F\u002Flocalhost:6650\")\n       .build();",[48,86937,86938],{},"Consumer consumer = client.newConsumer(Schema.STRING)\n       .topic(\"topic-A\")\n       .subscriptionName(\"open-tracing\")\n       .subscriptionType(SubscriptionType.Shared)\n       .intercept(new TracingConsumerInterceptor\u003C>())\n       .subscribe();",[48,86940,86941],{},"Producer producerB = client.newProducer(Schema.STRING)\n       .topic(\"topic-B\")\n       .intercept(new TracingProducerInterceptor())\n       .create();",[48,86943,86944],{},"while (true) {\n   Message received = consumer.receive();\n   SpanContext context = TracingPulsarUtils.extractSpanContext(received, tracer);\n   TypedMessageBuilder messageBuilder = producerB.newMessage();\n   messageBuilder.value(received.getValue() + \" Pulsar and OpenTracing!\");\n   \u002F\u002F Inject parent span context\n   tracer.inject(context, Format.Builtin.TEXT_MAP, new TypeMessageBuilderInjectAdapter(messageBuilder));\n   messageBuilder.send();\n   consumer.acknowledge(received);\n}",[8325,86946,86949],{"className":86947,"code":86948,"language":8330},[8328],"\n### Job-3\n\nThis example shows how to consume messages from topic-B in Java.\n\n",[4926,86950,86948],{"__ignoreMap":18},[48,86952,86953],{},"Configuration.SamplerConfiguration samplerConfig = Configuration.SamplerConfiguration.fromEnv().withType(\"const\").withParam(1);\nConfiguration.ReporterConfiguration reporterConfig = Configuration.ReporterConfiguration.fromEnv().withLogSpans(true);\nConfiguration configuration = new Configuration(\"Job-3\").withSampler(samplerConfig).withReporter(reporterConfig);",[48,86955,86932],{},[48,86957,86935],{},[48,86959,86960],{},"Consumer consumer = client.newConsumer(Schema.STRING)\n       .topic(\"topic-B\")\n       .subscriptionName(\"open-tracing\")\n       .subscriptionType(SubscriptionType.Shared)\n       .intercept(new TracingConsumerInterceptor\u003C>())\n       .subscribe();",[48,86962,86963],{},"while (true) {\n   Message received = consumer.receive();\n   System.out.println(received.getValue());\n   consumer.acknowledge(received);\n}",[8325,86965,86968],{"className":86966,"code":86967,"language":8330},[8328],"\nNow, you can run Job-3, Job-2 and Job-1 one by one. You can see the Job-3 receives logs in the console as below:\n\n",[4926,86969,86967],{"__ignoreMap":18},[48,86971,86972,86975,86976,86978,86979,86982],{},[2628,86973,86974],{},"0"," Hello Pulsar and OpenTracing!\n",[2628,86977,42523],{}," Hello Pulsar and OpenTracing!\n...\n",[2628,86980,86981],{},"9"," Hello Pulsar and OpenTracing!",[8325,86984,86987],{"className":86985,"code":86986,"language":8330},[8328],"\nCongratulations, your jobs work well. Now you can open the Jaeger UI again and there are ten traces in the Jaeger.\n\n![jager ui interface search](\u002Fimgs\u002Fblogs\u002F63a36ff024d53576f2be91e7_traces.png)\n\nYou can click a job name to view the details of a trace.\n\n![jager ui interface Job 1](\u002Fimgs\u002Fblogs\u002F63a36ff0e7434951005ebe5b_trace-details.png)\n\nThe span name is formatted as To__\u003Ctopic-name> and From__\u003Ctopic-name>__\u003Csubscription_name>, which makes it easy to tell whether it is a producer or a consumer.\n\n## Summary\n\nAs you can see, [OpenTracing Pulsar Client](https:\u002F\u002Fhub.streamnative.io\u002Fmonitoring\u002Fopentracing-pulsar-client\u002F0.1.0) integrates Pulsar client and OpenTracing to trace Pulsar messages easily. If you are using Pulsar and OpenTracing in your application, do not hesitate to try it out!\n\nAdditionally, I also wrote a tech blog for How to Use Apache SkyWalking to Trace Apache Pulsar Messages. For the complete content, see [here](\u002Fblog\u002Ftech\u002F2019-10-10-use-apache-skywalking-to-trace-apache-pulsar\u002F).\n",[4926,86988,86986],{"__ignoreMap":18},{"title":18,"searchDepth":19,"depth":19,"links":86990},[86991,86992,86993,86994],{"id":86803,"depth":19,"text":86804},{"id":86815,"depth":19,"text":86816},{"id":86864,"depth":19,"text":86865},{"id":86895,"depth":19,"text":86896,"children":86995},[86996],{"id":86916,"depth":279,"text":86917},"2020-06-11","The OpenTracing Pulsar Client is an integration of the Pulsar Client and OpenTracing APIs which are based on Pulsar Client Interceptors, a monitoring tool in the StreamNative Hub.",{},"\u002Fblog\u002Fhow-to-trace-pulsar-messages-with-opentracing-jaeger",{"title":86771,"description":86998},"blog\u002Fhow-to-trace-pulsar-messages-with-opentracing-jaeger",[38442,821,26747],"Sz6aeWCu6_N3UJN_SfKu2baaERBMLwoC4S6qLNl21Pg",{"id":87006,"title":87007,"authors":87008,"body":87009,"category":821,"createdAt":290,"date":87064,"description":87065,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":87066,"navigation":7,"order":296,"path":87067,"readingTime":20144,"relatedResources":290,"seo":87068,"stem":87069,"tags":87070,"__hash__":87071},"blogs\u002Fblog\u002Fintroducing-streamnative-hub-extend-pulsar-capabilities-with-rich-integrations.md","Introducing StreamNative Hub — Extend Pulsar Capabilities with Rich Integrations",[61300],{"type":15,"value":87010,"toc":87062},[87011,87018,87024,87027,87038,87041,87053,87059],[48,87012,87013,87014,87017],{},"Today, we are proud to announce the launch of ",[55,87015,38697],{"href":35258,"rel":87016},[264],", an online service hosting a handful of plugins and integrations around the Pulsar ecosystem.",[48,87019,87020],{},[384,87021],{"alt":87022,"src":87023},"picture of streamnative hub","\u002Fimgs\u002Fblogs\u002F63a3515c50fde04214aee3a9_1.png",[48,87025,87026],{},"As Pulsar continues to evolve and is adopted by a broader user base, we noted the need for more integrations to facilitate the building of better streaming data pipelines and event-driven applications. We introduced the StreamNative Hub to provide a single experience for finding, downloading, using, storing, and sharing Pulsar related extensions, getting involved in data processing, logging, monitoring, authentication, deployment, and offering a broad spectrum of Pulsar integrations. Let’s take a look at some of the StreamNative Hub’s key components:",[321,87028,87029,87032,87035],{},[324,87030,87031],{},"Connector: allows you to move streaming data in and out of Pulsar, which simplifies integration for enterprises bringing Pulsar into their existing infrastructure. All Pulsar built-in connectors are shipped in the StreamNative Hub.",[324,87033,87034],{},"Offloader: allows you to offload the majority of the data from BookKeeper to external remote storage, which provides a cheaper form of storage that readily scales with the volume of data. Pulsar is able to retain both historic and real-time data and provides a unified view as infinite event streams, which can be easily reprocessed or backloaded into new systems. Companies can integrate Pulsar with a unified data processing engine (such as Flink or Spark) to unlock new use cases stemming from infinite data retention. AWS S3, GCS, and filesystem offloaders are supported.",[324,87036,87037],{},"Protocol handler: allows you to support other messaging protocols natively and dynamically in Pulsar brokers on runtime, which streamlines operations with Pulsar’s enterprise-grade features without modifying code. Kafka, AMQP, and MQTT are supported.",[48,87039,87040],{},"With StreamNative Hub, whether you have already run on Pulsar and just want to use several plugins, or you are eager to try plugins that are out of the Pulsar community, you can use pre-built plugins directly and quickly and do not need to build plugins from sources. Additionally, by simplifying the installation process and reducing deployment time, the StreamNative Hub allows you to focus on how to maximize business value from living data in a more efficient manner. Now, you can get started in just a few clicks with intuitive website design and complete user guides.",[48,87042,87043,87044,87047,87048,87052],{},"The StreamNative Hub is now available for everyone to use and contribute to. Want to get involved and create Pulsar integrations to this ever-growing list of extensions? Checkout ",[55,87045,267],{"href":35258,"rel":87046},[264]," to explore how to use a plugin and visit ",[55,87049,267],{"href":87050,"rel":87051},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-hub",[264]," to learn how to submit a plugin (whether it is a connector, offloader, protocol handler, or something different) with a simple pull request. The StreamNative Hub was a community effort with involvement from numerous contributors and organizations. We look forward to your contributions to make the Pulsar community and the StreamNative Platform more productive and sustainable.",[48,87054,87055],{},[384,87056],{"alt":87057,"src":87058},"illustration of human building ","\u002Fimgs\u002Fblogs\u002F63a3515cd1dcd6cbdde6be37_2.jpeg",[48,87060,87061],{},"Start your journey now with StreamNative Hub!",{"title":18,"searchDepth":19,"depth":19,"links":87063},[],"2020-05-26","Learn about how StreamNative Hub looks like and the key components. We look forward to your contributions! Try it now!",{},"\u002Fblog\u002Fintroducing-streamnative-hub-extend-pulsar-capabilities-with-rich-integrations",{"title":87007,"description":87065},"blog\u002Fintroducing-streamnative-hub-extend-pulsar-capabilities-with-rich-integrations",[28572,821,8058],"oT0B3_2rl0xV4dja6fyiOKZe-CKb_6zUPapMRbkiIPo",{"id":87073,"title":87074,"authors":87075,"body":87076,"category":821,"createdAt":290,"date":87346,"description":87347,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":87348,"navigation":7,"order":296,"path":87349,"readingTime":3556,"relatedResources":290,"seo":87350,"stem":87351,"tags":87352,"__hash__":87353},"blogs\u002Fblog\u002Fapache-pulsar-2-5-2.md","Apache Pulsar 2.5.2",[83497],{"type":15,"value":87077,"toc":87320},[87078,87081,87093,87096,87100,87103,87107,87110,87114,87117,87121,87124,87128,87131,87135,87138,87142,87145,87149,87152,87156,87159,87163,87166,87174,87178,87181,87185,87188,87192,87195,87199,87202,87206,87209,87213,87216,87227,87231,87234,87238,87241,87245,87248,87254,87257,87261,87264,87268,87271,87275,87278,87282,87285,87287,87293,87295,87315],[48,87079,87080],{},"We are very glad to see the Apache Pulsar community has successfully released the 2.5.2 version. This is the result of a huge effort from the community, with over 56 commits, general improvements and bug fixes.",[48,87082,87083,87084,32795,87088,190],{},"For detailed changes related to 2.5.2 release, refer to the ",[55,87085,23976],{"href":87086,"rel":87087},"http:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#252",[264],[55,87089,87092],{"href":87090,"rel":87091},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpulls?q=is%3Apr+label%3Arelease%2F2.5.2+is%3Aclosed",[264],"PR list for Pulsar 2.5.2",[48,87094,87095],{},"The following highlights some improved features and fixed bugs in this release.",[40,87097,87099],{"id":87098},"implement-autotopiccreation-by-namespace-level-override","Implement AutoTopicCreation by namespace level override",[48,87101,87102],{},"Introduce a new namespace policy autoTopicCreationOverride, which enables an override of broker autoTopicCreation settings on the namespace level. You can disable autoTopicCreation for the broker while allowing it on a specific namespace.",[40,87104,87106],{"id":87105},"add-customized-deletionlag-and-threshold-for-offloading-policies-per-namespace","Add customized deletionLag and threshold for offloading policies per namespace",[48,87108,87109],{},"Support configuring deletionLag and threshold in the offloading policy on the namespace level to remove data from the offloaded tiered storage.",[40,87111,87113],{"id":87112},"invalidate-managed-ledgers-zookeeper-cache-instead-of-reloading-on-watcher-triggered","Invalidate managed ledgers ZooKeeper cache instead of reloading on watcher triggered",[48,87115,87116],{},"The ZooKeeper children cache is reloaded for z-nodes when topics are frequently created or deleted. This creates additional load on the ZooKeeper and the broker, slows down brokers and makes them less stable. In this release, ZooKeeperManagedLedgerCache is introduced to invalidate instead of reloading the ZooKeeper cache, when topics are created or deleted. This helps reduce pressures on the ZooKeeper.",[40,87118,87120],{"id":87119},"respect-retention-policy-when-there-is-no-traffic","Respect retention policy when there is no traffic",[48,87122,87123],{},"In previous releases, retention is checked when the ledger rollover happens. So if the traffic is stopped, the ledgers are not cleaned up even if all the messages are already acknowledged. In Pulsar 2.5.2, retentionCheckIntervalInSeconds is introduced to check if consumed ledgers need to be trimmed between intervals. If the value is set to 0 or a negative number, the system does not check the consumed ledgers.",[40,87125,87127],{"id":87126},"bump-netty-version-to-4148final","Bump Netty version to 4.1.48.Final",[48,87129,87130],{},"The ZlibDecoders in Netty 4.1.x (before 4.1.46) allow for unbounded memory allocation while decoding a ZlibEncoded byte stream. An attacker could send a large ZlibEncoded byte stream to the Netty server, forcing the server to allocate all of its free memory to a single decoder. The bug is fixed in Netty 4.1.48.Final .",[40,87132,87134],{"id":87133},"increase-timeout-for-loading-topics","Increase timeout for loading topics",[48,87136,87137],{},"Loading replicated topics is quite an expensive operation and involves global ZooKeeper lookups and the start of many sub-processes. In Pulsar 2.5.2, we increase the timeout for loading topics which have many replicated clusters to 60 seconds.",[40,87139,87141],{"id":87140},"fix-incorrect-cursor-state-for-cursor-without-consumers","Fix incorrect cursor state for cursor without consumers",[48,87143,87144],{},"If consumers of a subscription are closed, the cursor is set to inactive. But the cursor is set to active during PulsarStats.updateStats() when the backlog size is less than backloggedCursorThresholdEntries. In Pulsar 2.5.2, we move the checkBackloggedCursors() from ManagedLedger to Topic and check the consumer list to fix this bug.",[40,87146,87148],{"id":87147},"change-non-durable-cursor-to-active-to-improve-performance","Change non-durable cursor to active to improve performance",[48,87150,87151],{},"In non-durable subscription mode, the cursor is not active, which leads to the written entries not being put into cache. This would degrade the reading performance. In Pulsar 2.5.2, we set the NonDurableCursorImpl to active and remove three override methods setActive(), isActive(), setInactive() to improve the reading performance.",[40,87153,87155],{"id":87154},"add-keystore-configurations-to-tls","Add keystore configurations to TLS",[48,87157,87158],{},"In Pulsar 2.5.2, we add keystore configurations to the TLS to allow users to define their own CA certificates while the internal communication uses an internal CA certificate. This change keeps the original TLS settings untouched, and adds new configurations in needed paths.",[40,87160,87162],{"id":87161},"close-producer-when-the-topic-does-not-exists","Close producer when the topic does not exists",[48,87164,87165],{},"In previous releases, when we create a producer for a non-existent topic, the ProducerImpl object is hanging in the dump. This leads to OOM in micro-service which by mistake tries to produce consistently to a non-existent topic. In Pulsar 2.5.2, we fix the bug in the following two aspects:",[321,87167,87168,87171],{},[324,87169,87170],{},"Fix the exception handle for a non-existent topic.",[324,87172,87173],{},"Change state to Close when the producer gets the TopicDoesNotExists exception.",[40,87175,87177],{"id":87176},"fix-topicpublishratelimiter-not-effective-after-restarting-broker","Fix topicPublishRateLimiter not effective after restarting broker",[48,87179,87180],{},"In previous releases, when a publishing rate is configured on the namespace, it can limit the publishing rate. But when the broker is restarted, the limit expires. In Pulsar 2.5.2, this bug is fixed.",[40,87182,87184],{"id":87183},"expose-pulsar_out_bytes_total-and-pulsar_out_messages_total-for-namespacesubscriptionconsumer","Expose pulsar_out_bytes_total and pulsar_out_messages_total for namespace\u002Fsubscription\u002Fconsumer",[48,87186,87187],{},"Add pulsar_out_bytes_total and pulsar_out_messages_total for the namespace, subscription, and consumer. This helps to avoid missing the rate to be computed in Prometheus or missing change of rates within the scraping interval.",[40,87189,87191],{"id":87190},"fix-ttldurationdefaultinseconds-policy","Fix ttlDurationDefaultInSeconds policy",[48,87193,87194],{},"The TTL for namespaces should be retrieved from the broker configuration if it is not configured at namespace policies. In previous releases, the code only returns the value stored in namespace policies directly without judging if the TTL is configured or not. In Pulsar 2.5.2, we add a condition to test if TTL is configured at namespace policies. If not, the broker retrieves value stored in broker configuration and returns it as the output.",[40,87196,87198],{"id":87197},"fix-long-field-parse-in-genericjsonrecord","Fix long field parse in GenericJsonRecord",[48,87200,87201],{},"For messages sent in JSON schema, the long field is decoded as int if its value is smaller than Integer.MAX_VALUE. Otherwise, the long field is decoded as a string. Pulsar 2.5.2 introduces a field type check in GenericJsonRecord to fix this bug.",[40,87203,87205],{"id":87204},"fix-the-leak-of-cursor-reset-if-message-encode-fails-in-avro-schema","Fix the leak of cursor reset if message encode fails in Avro schema",[48,87207,87208],{},"If the Avro encode for a message fails after a few bytes are written, the cursor in the stream is not reset. The following flush(), which normally resets the cursor, is skipped if there is an exception. In Pulsar 2.5.2, we introduced a flush() in the finally block to fix this bug.",[40,87210,87212],{"id":87211},"update-topic-partitions-automatically","Update topic partitions automatically",[48,87214,87215],{},"In Pulsar 2.5.2, the C++ client supports previously-created producers and consumers to automatically update partitions when the partitions for a topic are updated.",[321,87217,87218,87221,87224],{},[324,87219,87220],{},"Add a boost::asio::deadline_timer to PartitionedConsumerImpl and PartitionedProducerImpl to register a lookup task to detect partition changes periodically.",[324,87222,87223],{},"Add an unsigned int configuration parameter to indicate the period of detecting partition changes.",[324,87225,87226],{},"Unlock the mutex_ in PartitionedConsumerImpl::receive after state_ were checked.",[40,87228,87230],{"id":87229},"fix-default-message-id-in-sent-callback","Fix default message ID in sent callback",[48,87232,87233],{},"In previous releases, the MessageId in the callback is always the default value (-1, -1, -1, -1). In Pulsar 2.5.2, we remove the useless field messageId of BatchMessageContainer::MessageContainer and add the const MessageId& argument to batchMessageCallBack. Therefore, we can get the correct message ID in the callback if the message is sent successfully.",[40,87235,87237],{"id":87236},"fix-message-id-error-if-messages-are-sent-to-partitioned-topics","Fix message ID error if messages are sent to partitioned topics",[48,87239,87240],{},"If messages are sent to a partitioned topic, the partition field of the message ID is always set to -1 because the SendReceipt command only contains the ledger ID and the entry ID. In Pulsar 2.5.2, we fix this bug by adding a partition field to ProducerImpl and setting the partition field of the message ID with it in the ackReceived method.",[40,87242,87244],{"id":87243},"support-async-mode-for-pulsar-functions","Support Async mode for Pulsar Functions",[48,87246,87247],{},"In previous releases, Pulsar Functions does not support the Async mode for Pulsar Functions, such as the user passed in a Function in the following format:",[8325,87249,87252],{"className":87250,"code":87251,"language":8330},[8328],"\nFunction>\n \n",[4926,87253,87251],{"__ignoreMap":18},[48,87255,87256],{},"This kind of function is useful if the Pulsar Functions use RPCs to call external systems. Therefore, in Pulsar 2.5.2, we introduce Async mode support for Pulsar Functions.",[40,87258,87260],{"id":87259},"fix-localrunner-netty-dependency-issue","Fix localrunner netty dependency issue",[48,87262,87263],{},"In Pulsar 2.5.2, we add a Log4j2 configuration file for pulsar-functions-local-runner to log to console by default. This helps troubleshoot the problem that Netty libraries are missing and the class is not found, when pulling in pulsar-functions-local-runner as a dependency and attempting to run Pulsar Functions locally.",[40,87265,87267],{"id":87266},"fix-serde-validation-of-pulsar-functions-update","Fix SerDe validation of Pulsar Functions update",[48,87269,87270],{},"In previous releases, the outputSchemaType field is improperly used to validate parameters for Pulsar Function updates. In fact, the outputSerdeClassName parameter should be used. In Pulsar 2.5.2, we fix this bug.",[40,87272,87274],{"id":87273},"avoid-pre-fetching-too-much-data-when-offloading-data-to-hdfs","Avoid pre-fetching too much data when offloading data to HDFS",[48,87276,87277],{},"If too much data is pre-fetched when data is offloaded to HDFS, it may cause severe OOM. In Pulsar 2.5.2, the managedLedgerOffloadPrefetchRounds is introduced, which is used to set the maximum pre-fetch rounds for ledger reading for offloading data.",[40,87279,87281],{"id":87280},"jdbc-sink-handles-null-fields-in-schema","JDBC sink handles null fields in schema",[48,87283,87284],{},"JDBC sink does not handle null fields. The schema registered in Pulsar allows for it and the table schema in MySQL has a column of the same name. When messages are sent to the JDBC sink without that field, an exception is thrown. In Pulsar 2.5.2, the JDBC sink uses the setColumnNull method to properly reflect the null field value in the database row.",[40,87286,52473],{"id":52472},[48,87288,87289,87290,190],{},"To download Apache Pulsar 2.5.2, click ",[55,87291,267],{"href":53730,"rel":87292},[264],[48,87294,78604],{},[321,87296,87297,87301,87305,87310],{},[324,87298,87299],{},[55,87300,78612],{"href":78611},[324,87302,87303],{},[55,87304,78618],{"href":78617},[324,87306,78621,87307],{},[55,87308,36242],{"href":36242,"rel":87309},[264],[324,87311,78627,87312],{},[55,87313,57760],{"href":57760,"rel":87314},[264],[48,87316,78633,87317,190],{},[55,87318,75345],{"href":36230,"rel":87319},[264],{"title":18,"searchDepth":19,"depth":19,"links":87321},[87322,87323,87324,87325,87326,87327,87328,87329,87330,87331,87332,87333,87334,87335,87336,87337,87338,87339,87340,87341,87342,87343,87344,87345],{"id":87098,"depth":19,"text":87099},{"id":87105,"depth":19,"text":87106},{"id":87112,"depth":19,"text":87113},{"id":87119,"depth":19,"text":87120},{"id":87126,"depth":19,"text":87127},{"id":87133,"depth":19,"text":87134},{"id":87140,"depth":19,"text":87141},{"id":87147,"depth":19,"text":87148},{"id":87154,"depth":19,"text":87155},{"id":87161,"depth":19,"text":87162},{"id":87176,"depth":19,"text":87177},{"id":87183,"depth":19,"text":87184},{"id":87190,"depth":19,"text":87191},{"id":87197,"depth":19,"text":87198},{"id":87204,"depth":19,"text":87205},{"id":87211,"depth":19,"text":87212},{"id":87229,"depth":19,"text":87230},{"id":87236,"depth":19,"text":87237},{"id":87243,"depth":19,"text":87244},{"id":87259,"depth":19,"text":87260},{"id":87266,"depth":19,"text":87267},{"id":87273,"depth":19,"text":87274},{"id":87280,"depth":19,"text":87281},{"id":52472,"depth":19,"text":52473},"2020-05-20","Learn improvements and bug fixes in Apache Pulsar 2.5.2 release.",{},"\u002Fblog\u002Fapache-pulsar-2-5-2",{"title":87074,"description":87347},"blog\u002Fapache-pulsar-2-5-2",[302,821],"ECdU6o1csaCbC16sGZBcYv6PGXWun6H6qDrTSK3SFH0",{"id":87355,"title":87356,"authors":87357,"body":87359,"category":821,"createdAt":290,"date":87819,"description":87820,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":87821,"navigation":7,"order":296,"path":87822,"readingTime":47804,"relatedResources":290,"seo":87823,"stem":87824,"tags":87825,"__hash__":87826},"blogs\u002Fblog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-3.md","How to Build a Distributed Database with Apache BookKeeper — Part 3",[87358],"Enrico Olivelli",{"type":15,"value":87360,"toc":87806},[87361,87368,87377,87381,87390,87409,87423,87426,87435,87439,87442,87451,87460,87463,87466,87469,87483,87487,87490,87493,87501,87504,87512,87515,87526,87529,87540,87549,87553,87556,87559,87562,87565,87568,87574,87577,87580,87584,87587,87622,87625,87628,87631,87635,87638,87641,87644,87661,87664,87667,87670,87678,87681,87684,87687,87690,87693,87696,87707,87710,87713,87717,87720,87723,87726,87729,87740,87743,87746,87749,87752,87754,87757,87760,87763,87769,87772,87776,87779,87782,87785,87788,87791,87794,87798,87801,87804],[48,87362,87363,87364,87367],{},"In this series of posts, I want to share some basic architectural concepts about the possible anatomy of a distributed database with a shared-nothing architecture. In the first part, we see how you can design a database as a replicated state machine. In the second part, we see how ",[55,87365,862],{"href":23555,"rel":87366},[264]," can help us by providing powerful mechanisms to build the write ahead log of our database.",[48,87369,87370,87371,87376],{},"Now we are going to dig into ",[55,87372,87375],{"href":87373,"rel":87374},"https:\u002F\u002Fherddb.org\u002F",[264],"HerdDB",", a distributed database that relies on BookKeeper on implementing its own journal and deals with all of the problems we discussed in the previous posts.",[40,87378,87380],{"id":87379},"why-herddb","Why HerdDB ?",[48,87382,87383,87384,87389],{},"We started HerdDB at ",[55,87385,87388],{"href":87386,"rel":87387},"https:\u002F\u002Femailsuccess.com\u002F",[264],"EmailSuccess.com",", a Java application that uses an SQL database to store status of email messages to deliver. EmailSuccess is an MTA (Mail Transfer Agent) that is able to handle thousands of queues with millions of messages, even with a single machine.",[48,87391,87392,87393,1154,87397,87402,87403,87408],{},"Now HerdDB is used in other applications, like ",[55,87394,87396],{"href":78358,"rel":87395},[264],"Apache Pulsar Manager",[55,87398,87401],{"href":87399,"rel":87400},"https:\u002F\u002Fgithub.com\u002Fdiennea\u002Fcarapaceproxy\u002Fwiki",[264],"CarapaceProxy",", and we are using it in ",[55,87404,87407],{"href":87405,"rel":87406},"https:\u002F\u002Fmagnews.com\u002F",[264],"MagNews.com"," platform.",[48,87410,87411,87412,87417,87418,190],{},"Initially EmailSuccess used ",[55,87413,87416],{"href":87414,"rel":87415},"https:\u002F\u002Fwww.mysql.com\u002F",[264],"MySQL"," but we need a database that runs inside the same process of the Java application for ease deployment and management, like ",[55,87419,87422],{"href":87420,"rel":87421},"https:\u002F\u002Fwww.sqlite.org\u002F",[264],"SQLLite",[48,87424,87425],{},"Also we needed a database that could span multiple machines and possibly leverage the intrinsic multi-tenant architecture of the system: thousands of independent queues on a few machines (usually from 1 to 10).",[48,87427,87428,87429,87434],{},"So here it is HerdDB ! A distributed embeddable database written in Java. Please refer to the ",[55,87430,87433],{"href":87431,"rel":87432},"https:\u002F\u002Fgithub.com\u002Fdiennea\u002Fherddb\u002Fwiki",[264],"wiki"," and the available documentation if you want to know more about this story.",[40,87436,87438],{"id":87437},"herddb-data-model","HerdDB Data Model",[48,87440,87441],{},"An HerdDB database is made of tablespaces. Each tablespace is a set of tables and it is independent from the other tablespaces. Each table is a key-value store that maps a key (byte array) to a value (byte array).",[48,87443,87444,87445,87450],{},"In order to fully replace MySQL, HerdDB comes with a built in and efficient SQL Layer and it is able to support most of the features you expect from a SQL database, but with some trade-offs. We are using ",[55,87446,87449],{"href":87447,"rel":87448},"https:\u002F\u002Fcalcite.apache.org\u002F",[264],"Apache Calcite"," for all of the SQL parsing and planning.",[48,87452,87453,87454,87459],{},"When you are using the JDBC Driver, you have only access to the SQL API, but there are other applications of HerdDB, for instance ",[55,87455,87458],{"href":87456,"rel":87457},"https:\u002F\u002Fgithub.com\u002Fdiennea\u002Fherddb\u002Ftree\u002Fmaster\u002Fherddb-collections",[264],"HerdDB Collections Framework"," uses directly the lower level model.",[48,87461,87462],{},"Each table has an SQL schema that describes columns, the primary key, constraints and indexes.",[48,87464,87465],{},"Within a tablespace, HerdDB supports joins between tables and transactions that span multiple rows and multiple tables.",[48,87467,87468],{},"For the rest of this post, we are going to talk about only the low level model:",[321,87470,87471,87474,87477,87480],{},[324,87472,87473],{},"A tablespace is made of tables.",[324,87475,87476],{},"A table is a dictionary that maps a binary key to a binary value.",[324,87478,87479],{},"Supported operations on a table are: INSERT, UPDATE, UPSERT, DELETE, GET and SCAN.",[324,87481,87482],{},"Transactions can touch multiple tables and rows within one single tablespace.",[32,87484,87486],{"id":87485},"data-and-metadata","Data and Metadata",[48,87488,87489],{},"We have several layers of data and metadata.",[48,87491,87492],{},"Cluster metadata are about the overall system:",[321,87494,87495,87498],{},[324,87496,87497],{},"Nodes metadata (discovery service)",[324,87499,87500],{},"Tablespace metadata (nodes assigned to it, replication settings)",[48,87502,87503],{},"Tablespace metadata:",[321,87505,87506,87509],{},[324,87507,87508],{},"Tables and indexes metadata",[324,87510,87511],{},"Checkpoints metadata",[48,87513,87514],{},"Tablespace data:",[321,87516,87517,87520,87523],{},[324,87518,87519],{},"Snapshots of uncommitted transactions",[324,87521,87522],{},"Data for temporary operations",[324,87524,87525],{},"Write ahead log",[48,87527,87528],{},"Table data:",[321,87530,87531,87534,87537],{},[324,87532,87533],{},"Records",[324,87535,87536],{},"Indexes",[324,87538,87539],{},"Checkpoint metadata",[48,87541,87542,87543,87548],{},"When HerdDB runs in cluster mode, we are storing cluster metadata on ",[55,87544,87547],{"href":87545,"rel":87546},"https:\u002F\u002Fzookeeper.apache.org\u002F",[264],"Apache ZooKeeper"," and tablespace metadata and data on local disks; the write ahead log is on Apache BookKeeper.",[32,87550,87552],{"id":87551},"discovery-service-and-network-architecture","Discovery service and network architecture",[48,87554,87555],{},"We have a set of machines that participate in the cluster, and we call them nodes. Each node has an ID that identifies it uniquely in the cluster.",[48,87557,87558],{},"For each node, we store on ZooKeeper all the information useful in order to locate the node, like the current network address and supported protocols (TLS availability). In this way, network addresses can easily change, in case you do not have fixed iPhone addresses or DNS names.",[48,87560,87561],{},"For each tablespace, we define a set of replica nodes. Each node stores a copy of the whole tablespace. This happens because we support queries and transactions that span over multiple records: you may access or modify many records in many tables in a single operation and this must be very efficient.",[48,87563,87564],{},"One of the nodes is elected as leader and other nodes are then named followers.",[48,87566,87567],{},"When a client issues an operation to a tablespace, it locates the leader node using tablespace metadata and the current network address using the discovery service. All of this information is on ZooKeeper and it is cached locally.",[48,87569,87570],{},[384,87571],{"alt":87572,"src":87573},"illustration of ","\u002Fimgs\u002Fblogs\u002F63a34f899ed3df1cbf974750_bk3-1.png",[48,87575,87576],{},"Server nodes do not talk to each other but all of the updates pass through BookKeeper.",[48,87578,87579],{},"Server-to-server communication is needed only in case of a follower node that bootstraps and it has not enough local data to recover from BookKeeper.",[40,87581,87583],{"id":87582},"the-write-path","The Write Path",[48,87585,87586],{},"A write operation follows this flow:",[321,87588,87589,87592,87595,87598,87601,87604,87607,87610,87613,87616,87619],{},[324,87590,87591],{},"Client locates the leader node for the tablespace (metadata are cached locally).",[324,87593,87594],{},"Client establishes a connection (connections are pooled).",[324,87596,87597],{},"Client sends the write request.",[324,87599,87600],{},"The node parses the SQL and plans the execution operation.",[324,87602,87603],{},"Operation is validated and prepared for execution (row level locks, tablespace checkpoint lock, constraint validation, computation of the new value…).",[324,87605,87606],{},"A log entry is enqueued for write to the log.",[324,87608,87609],{},"BookKeeper sends the entry to a quorum of bookies (write quorum size configuration parameter).",[324,87611,87612],{},"The configured number of bookies (ack quorum size configuration parameter) acknowledges the write, the BookKeeper client wakes up.",[324,87614,87615],{},"The effects of the operation are applied to the local in memory copy of the table.",[324,87617,87618],{},"Clean up operations are executed (release row level locks, tablespace checkpoint lock…).",[324,87620,87621],{},"The write is acknowledged to the client.",[48,87623,87624],{},"Most of these steps are asynchronous, and this allows better throughput.",[48,87626,87627],{},"Follower nodes are continuously tailing the log: they listen for new entries from BookKeeper and they apply the same changes to the local in memory copy of the table.",[48,87629,87630],{},"We are using long-poll read mode in order to save resources. Please check BookKeeper documentation for ReadHandle#readLastAddConfirmedAndEntry.",[40,87632,87634],{"id":87633},"switch-to-a-new-leader","Switch to a new leader",[48,87636,87637],{},"BookKeeper guarantees that followers will be eventually up to date with the latest version of the table but we have to implement ourselves all of the rest of the story.",[48,87639,87640],{},"We have multiple nodes. One node is the leader and the other ones are the followers. But how can we guarantee that only one node is the leader ? We must deal with network partitions.",[48,87642,87643],{},"For each tablespace, we store in ZooKeeper a structure (Tablespace metadata) that describes all of these metadata, in particular:",[321,87645,87646,87649,87652,87655,87658],{},[324,87647,87648],{},"Set of nodes that hold the data",[324,87650,87651],{},"Current leader node",[324,87653,87654],{},"Replication parameters:",[324,87656,87657],{},"Expected number of replicas",[324,87659,87660],{},"Maximum leader activity timeout",[48,87662,87663],{},"We are not digging into how leader election works in HerdDB. Let’s focus on the mechanism that guarantees consistency of the system.",[48,87665,87666],{},"The structure above is useful for clients and for management of the system, but we need another data structure that holds the current set of ledgers that make the log and this structure will also be another key of leadership enforcement.",[48,87668,87669],{},"We have the LedgersInfo structure:",[321,87671,87672,87675],{},[324,87673,87674],{},"The list of the ledger that build the log (activeledgers)",[324,87676,87677],{},"The ID of the first ledger in the history of the tablespace (firstledgerid)",[48,87679,87680],{},"The leader node keeps only one ledger open for writing and this is always the ledger in the tail of the lactiveledgers list.",[48,87682,87683],{},"BookKeeper guarantees that the leader is the only one that can write to the log, as ledgers can be written only once and from one client.",[48,87685,87686],{},"Each follower node uses LedgerInfo to look for data on BookKeeper.",[48,87688,87689],{},"When a new follower node starts, it checks the ID of the first ledger. If the first ledger is still on the list of active ledgers, then it can perform recovery just by reading the sequence of ledgers for the first to the latest.",[48,87691,87692],{},"If this ledger is no more in the list of active ledgers, it must locate the leader and download a full snapshot of data.",[48,87694,87695],{},"When a follower node is promoted to the leader role, it performs two steps:",[321,87697,87698,87701,87704],{},[324,87699,87700],{},"It opens all of the ledgers in the activeledgers list with the \"recovery\" flag, and this will in turn fence off the current leader.",[324,87702,87703],{},"It opens a new ledger for writing.",[324,87705,87706],{},"It adds it to the list of activeledgers.",[48,87708,87709],{},"All of the writes to the LedgersInfo are performed using ZooKeeper compare-and-set built in feature, and this guarantees that only one node is enforce its leadership.",[48,87711,87712],{},"In case of two concurrent new leaders that try to append their own ledger ID to the list, one of them will fail the write to ZooKeeper and it will fail the bootstrap.",[40,87714,87716],{"id":87715},"checkpoints","Checkpoints",[48,87718,87719],{},"HerdDB cannot keep all of the ledgers forever, because they will prevent the Bookies from reclaiming space, so we must delete them when it is possible.",[48,87721,87722],{},"Each node, leader or follower, performs periodically a checkpoint operation:",[48,87724,87725],{},"During a checkpoint, the server consolidates its own local copy of data and the current position on the log: from this point in time the portion of log up to this position is useless and it could be deleted.",[48,87727,87728],{},"But in cluster mode you cannot do it naively:",[321,87730,87731,87734,87737],{},[324,87732,87733],{},"You cannot delete only a part of a ledger, but only whole ledgers.",[324,87735,87736],{},"Followers are still tailing the logs, and the leader can’t delete precious data that has not already been consumer and checkpointed on every other node.",[324,87738,87739],{},"Follower are not allowed to alter the log (only the leader can touch the LedgerInfo structure).",[48,87741,87742],{},"Current HerdDB approach is to have a configuration parameter that defines a maximum time to live of a ledger. After that time, all of the old ledgers that are useless during the checkpoint of the leader node are simply dropped.",[48,87744,87745],{},"This approach works well if you have a small set of followers (usually two) and they are up and running, which is the very case of most of HerdDB installations currently. It is not expected that a follower node is down for more than the log time to live period.",[48,87747,87748],{},"By that way, if it happens, the booting follower can connect to the leader and then download a snapshot of a recent checkpoint.",[48,87750,87751],{},"Usually an HerdDB cluster of very few machines holds tens to hundreds of tablespaces and each node is leader for some tablespaces and follower for the other ones.",[40,87753,9144],{"id":53272},[48,87755,87756],{},"Every operation must be acknowledged by BookKeeper before applying it to the local memory and returning a response to the client. BookKeeper sends the write to every bookie in the quorum and waits. This can be very slow!",[48,87758,87759],{},"Within a transaction, the client expects that the results of the operations of the transaction will be atomically applied to the table if and only if the transaction is committed successfully.",[48,87761,87762],{},"There is no need to wait for bookies to acknowledge every write belonging to a transaction, you only have to wait and check the result of the write of the final commit operation, because BookKeeper guarantees that all of the writes before it are persisted durably and successfully.",[48,87764,87765],{},[384,87766],{"alt":87767,"src":87768},"illustration of two series of writes with and without transaction","\u002Fimgs\u002Fblogs\u002F63a34f895c19935913e1aafe_bk3-2.png",[48,87770,87771],{},"This is not simple as you could expect, for instance you must deal with the fact that a client could send a long running operation in the context of a transaction and issue a rollback command due to some application level timeout: the rollback must be written to the log after all of the other operations of the transaction, otherwise followers (and the leader node itself during self recovery) will see a weird sequence of events: \"begin transaction\", operations, \"rollback transaction\" and then other operations for a transaction that does not exist anymore.",[40,87773,87775],{"id":87774},"roll-new-ledgers","Roll new ledgers",[48,87777,87778],{},"If the leader does not come into trouble and it is never restarted, it can keep only one single ledger open and continue to write to it.",[48,87780,87781],{},"In practice, this is not a good idea, because that ledger will grow without bounds and with BookKeeper you cannot delete parts of one ledger but only full ledgers, so Bookies won’t be able to reclaim space.",[48,87783,87784],{},"In HerdDB, we are rolling a new ledger after a configured amount of bytes. By this way, you can reclaim space quickly in case of continuous writes to the log.",[48,87786,87787],{},"But BookKeeper guarantees apply only while dealing with a single ledger. It guarantees that each write is acknowledged to the writer if and only if every other entry with an ID less than the ID of the entry has been successfully written.",[48,87789,87790],{},"When you start a new ledger, you must wait and check the outcome of all of the writes issued to the previous ledger (or at least the last one).",[48,87792,87793],{},"You could also take a look to Apache DistributedLog, that is a higher level API to Apache BookKeeper and it solves many of the problems I have discussed in this post.",[40,87795,87797],{"id":87796},"wrap-up","Wrap up",[48,87799,87800],{},"We have seen a real application of BookKeeper, and how you can use it in order to implement the write-ahead-log of a distributed database. Apache BookKeeper and Apache ZooKeeper provide all of the tools you need to deal with consistency of data and metadata. Dealing with asynchronous operations can be tricky and you will have to deal with lots of corner cases. You also have to design your log and let it reclaim disk space without preventing the correct behaviour of follower nodes.",[48,87802,87803],{},"HerdDB is still a young project but it is running in production in mission critical applications, the community and the product grow as much as new users propose their use cases.",[48,87805,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":87807},[87808,87809,87813,87814,87815,87816,87817,87818],{"id":87379,"depth":19,"text":87380},{"id":87437,"depth":19,"text":87438,"children":87810},[87811,87812],{"id":87485,"depth":279,"text":87486},{"id":87551,"depth":279,"text":87552},{"id":87582,"depth":19,"text":87583},{"id":87633,"depth":19,"text":87634},{"id":87715,"depth":19,"text":87716},{"id":53272,"depth":19,"text":9144},{"id":87774,"depth":19,"text":87775},{"id":87796,"depth":19,"text":87797},"2020-05-12","In this part we are going to dig into HerdDB, a distributed database that relies on BookKeeper on implementing its own journal and deals with all of the problems we discussed in the previous posts.",{},"\u002Fblog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-3",{"title":87356,"description":87820},"blog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-3",[38442,12106],"b1FvKJBu8QmraqmaT-OEzLH7eQU8NIiHwYThqAQg21c",{"id":87828,"title":76981,"authors":87829,"body":87830,"category":821,"createdAt":290,"date":88042,"description":88043,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":88044,"navigation":7,"order":296,"path":88045,"readingTime":38438,"relatedResources":290,"seo":88046,"stem":88047,"tags":88048,"__hash__":88049},"blogs\u002Fblog\u002Fhow-apache-pulsar-helps-streamline-message-system-reduces-o-m-costs-at-tuya-smart.md",[48575],{"type":15,"value":87831,"toc":88030},[87832,87835,87838,87841,87845,87848,87851,87854,87860,87864,87867,87873,87875,87878,87892,87894,87897,87901,87904,87945,87949,87952,87958,87961,87964,87967,87973,87976,87979,87981,87984,87990,87993,88001,88005,88008,88011,88014,88022,88024,88027],[48,87833,87834],{},"Tuya Smart had outgrown its existing message system which was based on Kafka. Rapidly increasing numbers of users, topics, and messages had led to mounting costs associated with storage, processing, labor, and time.",[48,87836,87837],{},"To improve system performance, Tuya required a more flexible message delivery system that could meet their need for persistence. They also needed a solution that would make it easy to classify messages by user.",[48,87839,87840],{},"After evaluating several different options, Tuya settled on Apache Pulsar because it proved to be the most adept at handling the accumulation of messages and repeated consumption. The addition of Pulsar has made Tuya’s message system much more efficient, resulting in lower operational and maintenance costs.",[40,87842,87844],{"id":87843},"introduction-to-tuya-smart","Introduction to Tuya Smart",[48,87846,87847],{},"Tuya Smart is a global, intelligent platform; an “AI+IoT” developer platform; and, the world’s leading voice AI interaction platform. Intelligently connected with the needs of consumers, production brands, OEMs, and retail chains, Tuya provides customers with a one-stop, artificial intelligence IoT solution.",[48,87849,87850],{},"Tuya offers hardware intervention, cloud services, and application software development to form a closed ecosystem of artificial intelligence plus manufacturing services. This closed ecosystem provides business-side technology and business model upgrade services for IoT smart devices for consumers, thereby meeting customers’ higher demands for hardware products.",[48,87852,87853],{},"Figure 1 shows Tuya’s current ecological model, including Tuya Cloud, Tuya OS, and Tuya APP, which together form a closed ecological loop. The IoT ecosystem on the right illustrates some of Tuya’s application scenarios, such as smart hotels, smart security, smart homes, and so on.",[48,87855,87856],{},[384,87857],{"alt":87858,"src":87859},"The Current Ecological Model of Tuya Smart","\u002Fimgs\u002Fblogs\u002F63a34d3c61e2f4e2ee3fbcfe_tuya-case.png",[40,87861,87863],{"id":87862},"tuya-smarts-message-architecture-before-pulsar","Tuya Smart’s Message Architecture Before Pulsar",[48,87865,87866],{},"Figure 2 illustrates Tuya Smart’s message architecture before the addition of Pulsar. The upper layer includes a suite of devices that are independent of IoT, such as power switches, projectors, and so on. Messages from these devices were being reported to the message system through the MQTT gateway. In addition, other IoT devices, such as sensors, were transmitting messages through the MQTT gateway before reporting them to the message system.",[48,87868,87869],{},[384,87870],{"alt":87871,"src":87872},"the Links in Tuya's Previous Message System","\u002Fimgs\u002Fblogs\u002F63a34d3cd1dcd6e008e3e354_3.png",[40,87874,19190],{"id":19189},[48,87876,87877],{},"Tuya’s previous architectural pattern had been causing the company the following pain points:",[1666,87879,87880,87883,87886,87889],{},[324,87881,87882],{},"The most glaring issue was that the HTTP delivery method was not flexible. If a user wanted to re-consume messages after the service was restarted, additional processing was required to meet the demand for message persistence. Specifically, any messages that users did not receive needed to be saved in the database.",[324,87884,87885],{},"The persistence problem could have been solved under Tuya’s existing Kafka subscription model; however, the company had additional architectural challenges that called for a different solution.",[324,87887,87888],{},"In Kafka’s delivery mode, every user is associated with a unique topic. Therefore, the number of topics increases as the number of users increases. As a result of a sharp rise in the numbers of users, topics, and messages, operation and maintenance had become costly and stressful over time. The cost of labor and time had gradually gone up as well.",[324,87890,87891],{},"Tenants were interacting with each other because messages were classified by category. The interrelationship between the tenants was greatly affected by the message distribution through Kafka.",[40,87893,79151],{"id":50969},[48,87895,87896],{},"Tuya ultimately chose Pulsar for two main reasons. First, Pulsar has unique features, such as multi-tenancy, which offer the company distinct advantages. And second, Pulsar performed better than its competitors during testing. In this section, we’ll examine which Pulsar features were the most attractive to Tuya and review the results of the performance tests.",[32,87898,87900],{"id":87899},"pulsars-advantages","Pulsar’s Advantages",[48,87902,87903],{},"The following features played a key role in Tuya’s decision to adopt Pulsar:",[321,87905,87906,87909,87912,87915,87918,87921,87924,87927,87930,87933,87936,87939,87942],{},[324,87907,87908],{},"Rich Delivery\u002FSubscription Strategy",[324,87910,87911],{},"Pulsar unifies the queue model and the stream model. A copy of the data does not need to be stored at the topic level because a piece of data can be consumed multiple times. Flexibility can be improved significantly by calculating different subscription models in streaming, queuing, and so on.",[324,87913,87914],{},"Ease of Operation and Maintenance (Compared to Kafka); Inclined to Automation",[324,87916,87917],{},"Apache Pulsar is a flexible publish-subscribe message system with a multi-layered and segmented architecture. Its main advantage is in geo-replication. With Pulsar’s cloud-native architecture that separates computing from storage, data is moved away from the broker and into shared storage. The upper layer is a stateless broker that replicates message distribution and services. The lower layer is a persistent storage layer called the bookie cluster.",[324,87919,87920],{},"With its segmented storage architecture, Pulsar allows data to expand independently and recover quickly without being constrained by scaling.",[324,87922,87923],{},"Multi-Tenancy Isolation",[324,87925,87926],{},"Multi-tenancy is the ability of a single instance of software to serve multiple tenants. A tenant is a group of users that shares a common access with specific privileges to the software instance. Tenants and namespace are two core Pulsar resources that support multi-tenancy as follows:",[324,87928,87929],{},"At the tenant level, Pulsar reserves appropriate storage space, application authorization, and authentication mechanisms for specific tenants.",[324,87931,87932],{},"At the namespace level, Pulsar has a series of configuration policies, including storage quotas, flow control, message expiration policies, and isolation policies between namespaces.",[324,87934,87935],{},"Messages can be classified either by category or by tenant (user). When Tuya started to classify messages by tenant instead of category, the interaction between tenants was resolved automatically.",[324,87937,87938],{},"Although Pulsar is not the only platform that can solve this problem, the desired outcome is difficult to achieve in Kafka because Kafka is a single-tenant system. Pulsar’s multi-tenancy feature serves Tuya’s real scenarios much better.",[324,87940,87941],{},"Excellent Online Community",[324,87943,87944],{},"The Pulsar community is very active and responsive to both technical and documentation issues.",[32,87946,87948],{"id":87947},"pulsar-performed-better-than-its-competitors","Pulsar Performed Better Than Its Competitors",[48,87950,87951],{},"Tuya certainly did their due diligence in comparing multiple message queues from the perspectives of performance, scaling, operation, and maintenance. A summary of their findings is shown in Table 1.",[48,87953,87954],{},[384,87955],{"alt":87956,"src":87957},"Performance Comparison: Pulsar vs. Its Competitors","\u002Fimgs\u002Fblogs\u002F63a34d7d9ed3dfef9a956bdf_Performance-Comparison-Pulsar-vs.-Its-Competitors.webp",[48,87959,87960],{},"LeviMQ is a MQTT-protocol-based message queue developed by Tuya.\nNSQ is a popular open-source message middleware product in Go.",[48,87962,87963],{},"Firstly, Kafka had shortcomings in scaling, especially in scaling down. Secondly, from the perspective of operation and maintenance, Kafka, as mentioned above, is more costly in terms of labor and time. Finally, with regard to ecology, LeviMQ was developed by Tuya, but it is not an open-source solution. Therefore, LeviMQ has inherent ecological limitations to some extent. NSQ is, as well, an excellent message queue—open-sourced, with changes based on the advantages of Kafka. However, the documentation for Pulsar is more complete.",[48,87965,87966],{},"After comparing various aspects of message queue performance, the advantages and disadvantages of each platform were analyzed. The results are shown in Figure 4.",[48,87968,87969],{},[384,87970],{"alt":87971,"src":87972},"Image of he Advantages and Disadvantages of LeviMQ, NSQ, Pulsar, and Kafka","\u002Fimgs\u002Fblogs\u002F63a34da21177565cd67f9a95_5.png",[48,87974,87975],{},"As Figure 4 illustrates, Pulsar is better at scaling and application scenarios. It is also more flexible than Kafka.",[48,87977,87978],{},"And, although the documentation for Pulsar is less complete than Kafka’s, the Pulsar community has been working hard to fill in the gaps, and they are making good progress.",[32,87980,36878],{"id":36877},[48,87982,87983],{},"With the addition of Apache Pulsar, Tuya’s message system architecture changed as shown in Figure 5.",[48,87985,87986],{},[384,87987],{"alt":87988,"src":87989},"Tuya's New Message System Architecture After the Addition of Apache Pulsar","\u002Fimgs\u002Fblogs\u002F63a34da2de2013577457618f_6.png",[48,87991,87992],{},"The most notable changes in the architecture were as follows:",[321,87994,87995,87998],{},[324,87996,87997],{},"A Pulsar layer was added between Kafka and message distribution. The most obvious improvement in this new architecture is the resolution of tenant isolation. Tuya now creates a new tenant for each user when Pulsar is deployed.",[324,87999,88000],{},"The software development kit (SDK) that Tuya’s customers use to subscribe to messages now supports Pulsar.",[40,88002,88004],{"id":88003},"current-and-future-plans-for-pulsar","Current and Future Plans for Pulsar",[48,88006,88007],{},"Tuya successfully applied Apache Pulsar to various application layers and the system performed well overall. They have been very satisfied with the results and are now working on implementing their short- and long-term plans.",[48,88009,88010],{},"Currently, the company is in the process of applying a set of rule engines to Pulsar to meet the growing demand for message subscriptions.",[48,88012,88013],{},"For the future, Tuya anticipates more extended business support functions that provide richer usage scenarios. Specifically,",[321,88015,88016,88019],{},[324,88017,88018],{},"At the technical level, Tuya awaits more O&M (operation and maintenance) APIs, such as the ability to view the broker and bookie associated with a specific topic.",[324,88020,88021],{},"As for documentation, Tuya would like to see more official Pulsar design documents to aid in their understanding.",[40,88023,2125],{"id":2122},[48,88025,88026],{},"With the advent of 5G, the IoT industry is facing myriad challenges and opportunities. As a global intelligent platform, Tuya Smart not only links vendors of various sales platforms, but also connects users in countless ways. Driven by the theme “Intelligence for All Things”, Tuya had a pressing need for a message system with high performance and stability.",[48,88028,88029],{},"After comparing a variety of different message systems such as Kafka and LeviMQ, Tuya finally chose Apache Pulsar. With its excellent performance and features such as geo-replication and multi-tenancy isolation, Pulsar solved many of the pain points in Tuya’s previous message system, such as inflexible delivery, increased operation costs due to a rapidly growing number of topics, interaction between tenants, and so on. This implementation has proven that Apache Pulsar has a promising future as an application for the IoT industry.",{"title":18,"searchDepth":19,"depth":19,"links":88031},[88032,88033,88034,88035,88040,88041],{"id":87843,"depth":19,"text":87844},{"id":87862,"depth":19,"text":87863},{"id":19189,"depth":19,"text":19190},{"id":50969,"depth":19,"text":79151,"children":88036},[88037,88038,88039],{"id":87899,"depth":279,"text":87900},{"id":87947,"depth":279,"text":87948},{"id":36877,"depth":279,"text":36878},{"id":88003,"depth":19,"text":88004},{"id":2122,"depth":19,"text":2125},"2020-05-08","Tuya chose Apache Pulsar because of its excellent performance and features such as geo-replication and multi-tenancy isolation; Pulsar solved many of the pain points in Tuya’s message system.",{},"\u002Fblog\u002Fhow-apache-pulsar-helps-streamline-message-system-reduces-o-m-costs-at-tuya-smart",{"title":76981,"description":88043},"blog\u002Fhow-apache-pulsar-helps-streamline-message-system-reduces-o-m-costs-at-tuya-smart",[35559,799,821,5954],"f1ouYo5geCybqeS7XdCZzEibBMe2ajIbWNgcvUeI_uo",{"id":88051,"title":88052,"authors":88053,"body":88054,"category":821,"createdAt":290,"date":88317,"description":88318,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":88319,"navigation":7,"order":296,"path":88320,"readingTime":11508,"relatedResources":290,"seo":88321,"stem":88322,"tags":88323,"__hash__":88324},"blogs\u002Fblog\u002Fapache-pulsar-2-5-1.md","Apache Pulsar 2.5.1",[73997],{"type":15,"value":88055,"toc":88297},[88056,88062,88065,88077,88080,88084,88087,88090,88094,88097,88103,88108,88112,88115,88119,88122,88125,88131,88134,88140,88144,88147,88151,88154,88158,88161,88169,88173,88176,88180,88183,88189,88193,88196,88199,88205,88209,88212,88216,88219,88223,88226,88230,88233,88237,88240,88244,88247,88251,88254,88262,88264,88270,88272,88292],[48,88057,88058],{},[384,88059],{"alt":88060,"src":88061},"image of apache pulsar release 2.5.1","\u002Fimgs\u002Fblogs\u002F63a342e4ba8ed2b6eb859cee_pulsar-251-image.png",[48,88063,88064],{},"We are very glad to see the Apache Pulsar community has successfully released 2.5.1 version. This is the result of a huge effort from the community, with over 130 commits and a long list of new features, general improvements and bug fixes.",[48,88066,88067,88068,32795,88072,190],{},"For detailed changes related to 2.5.1 release, refer to the ",[55,88069,23976],{"href":88070,"rel":88071},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#2.5.1",[264],[55,88073,88076],{"href":88074,"rel":88075},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpulls?q=is%3Apr+label%3Arelease%2F2.5.1+is%3Aclosed(%E8%BF%99%E4%B8%AA%E5%BA%94%E8%AF%A5%E6%98%AF%E6%9C%80%E6%98%8E%E7%BB%86%E7%9A%84)",[264],"PR list for Pulsar 2.5.1",[48,88078,88079],{},"The following highlights a tiny subset of new features.",[40,88081,88083],{"id":88082},"refresh-authentication-credentials","Refresh authentication credentials",[48,88085,88086],{},"In Pulsar 2.5.1, two more methods are introduced in the single AuthenticationState interface credentials holder. This helps enhance the Pulsar authentication framework to support credentials that expire over time and need to be refreshed by forcing clients to re-authenticate.",[48,88088,88089],{},"Existing authentication plugins are unaffected. If a new plugin wants to support expiration, it just overrides the isExpired() method. The Pulsar broker ensures to periodically check the expiration status for the AuthenticationState of every ServerCnx object. You can also use the authenticationRefreshCheckSeconds setting to control the frequency of the expiration check.",[40,88091,88093],{"id":88092},"upgrade-avro-to-191","Upgrade Avro to 1.9.1",[48,88095,88096],{},"The library used to handle logical datetime values has been changed from Joda-Time to JSR-310. For keeping forward compatibility, Pulsar java client uses Joda-Time conversion for logical datetime. To use JSR-310 conversion, you can enable it in the schema definition.",[8325,88098,88101],{"className":88099,"code":88100,"language":8330},[8328],"\nAvroSchema.of(SchemaDefinition.builder()\n.withJSR310ConversionEnabled(true)\n.build()\n\n",[4926,88102,88100],{"__ignoreMap":18},[916,88104,88105],{},[48,88106,88107],{},"By default, Avro 1.9.1 enables the JSR310 datetimes, which might introduce some regression problems if users use source codes generated by Avro compiler 1.8.x and the source codes contain datetimes fields. It is recommended to use Avro 1.9.x compiler to recompile. And, Avro may remove the Joda time support in the future. This may also be deleted in Pulsar in the future.",[40,88109,88111],{"id":88110},"support-unloading-all-partitions-of-a-partitioned-topic","Support unloading all partitions of a partitioned topic",[48,88113,88114],{},"Before Pulsar 2.5.1, Pulsar supports unloading a non-partitioned topic or a partition of a partitioned topic. If there is a partitioned topic with too many partitions, users need to get all partitions and unload them one by one. In Pulsar 2.5.1, we support unloading all partitions of a partitioned topic.",[40,88116,88118],{"id":88117},"supports-evenly-distributing-topics-count-when-splitting-bundle","Supports evenly distributing topics count when splitting bundle",[48,88120,88121],{},"In Pulsar 2.5.1, we introduce an option(-balance-topic-count) for bundle split. When setting this option to true, the given bundle is split into two parts and each part has the same amount of topics. In addition, we bring in a new Load Manager implementation named org.apache.pulsar.broker.loadbalance.impl.BalanceTopicCountModularLoadManager. The new Load Manager implementation splits the bundle with balance topics count.",[48,88123,88124],{},"You can enable this feature in the broker.conf:",[8325,88126,88129],{"className":88127,"code":88128,"language":8330},[8328],"\ndefaultNamespaceBundleSplitAlgorithm=topic_count_equally_divide\n\n",[4926,88130,88128],{"__ignoreMap":18},[48,88132,88133],{},"If you use the Pulsar Admin to split a bundle, you can use following command to split bundle based on topics count:",[8325,88135,88138],{"className":88136,"code":88137,"language":8330},[8328],"\nbin\u002Fpulsar-admin namespaces split-bundle -b 0x00000000_0xffffffff --split-algorithm-name topic_count_equally_divide public\u002Fdefault\n\n",[4926,88139,88137],{"__ignoreMap":18},[40,88141,88143],{"id":88142},"support-keyvalue-schema-for-pulsar-sql","Support KeyValue schema for Pulsar SQL",[48,88145,88146],{},"Before Pulsar 2.5.1, Pulsar SQL cannot read the keyValue schema data. In Pulsar 2.5.1, we add the prefix key. for the key field name, add the prefix value. for the value field name. Therefore, Pulsar SQL can read the keyValue schema data.",[40,88148,88150],{"id":88149},"update-netty-version-to-4145final","Update Netty version to 4.1.45.Final",[48,88152,88153],{},"Netty 4.1.43 has a bug, which prevents it from using Linux native Epoll transport. This makes Pulsar brokers fail over to NioEventLoopGroup even when running on Linux. The bug is fixed in Netty 4.1.45.Final .",[40,88155,88157],{"id":88156},"improve-key_shared-subscription-message-dispatching-performance","Improve Key_Shared subscription message dispatching performance",[48,88159,88160],{},"In Pulsar 2.5.1, to improve Key_Shared subscription message dispatching performance, we make the following operations for saving CPU usage which can improve non-batched message dispatch performance:",[321,88162,88163,88166],{},[324,88164,88165],{},"Reduce making hash for the message key.",[324,88167,88168],{},"Reduce the number of finding consumers for message keys..",[40,88170,88172],{"id":88171},"add-joda-time-logical-type-conversion","Add Joda time logical type conversion",[48,88174,88175],{},"In Pulsar 2.5.1, Avro is upgraded to 1.9.x and the default time conversion is changed to JSR-310. For forwarding compatibility, we add the Joda time conversion in Pulsar 2.5.1 and enable it by default",[40,88177,88179],{"id":88178},"support-deleting-inactive-topic-when-subscriptions-caught-up","Support deleting inactive topic when subscriptions caught up",[48,88181,88182],{},"Before Pulsar 2.5.1, Pulsar supported deleting inactive topics that have no active producers or subscriptions. In Pulsar 2.5.1, we expose inactive topic delete mode in broker.conf to delete inactive topics that have no active producers or consumers but all subscriptions of the topic are caught up. You can enable this feature in the broker.conf:",[8325,88184,88187],{"className":88185,"code":88186,"language":8330},[8328],"\nbrokerDeleteInactiveTopicsMode=delete_when_subscriptions_caught_up\n\n",[4926,88188,88186],{"__ignoreMap":18},[40,88190,88192],{"id":88191},"introduce-maxmessagepublishbuffersizeinmb-configuration-to-avoid-broker-oom","Introduce maxMessagePublishBufferSizeInMB configuration to avoid broker OOM",[48,88194,88195],{},"Before Pulsar 2.5.1, if a broker has a smaller direct memory (e.g. 2G) and runs pulsar-perf to write messages, the broker becomes unstable. Because the broker reads messages from the channel automatically and the ByteBuf cannot be released until the entry is written to Bookie successfully or the timeout expires.",[48,88197,88198],{},"In Pulsar 2.5.1, we introduce the maxMessagePublishBufferSizeInMB configuration to avoid broker OOM (Out of Memory). If the processing message size exceeds this value, the broker stops reading data from the connection. When the available size is greater than half of the maxMessagePublishBufferSizeInMB, the broker starts automatically reading data from the connection. You can set up the publish buffer size in broker.conf:",[8325,88200,88203],{"className":88201,"code":88202,"language":8330},[8328],"\n# Max memory size for broker handling messages sending from producers.\n# If the processing message size exceed this value, broker will stop read data\n# from the connection. The processing messages means messages are sends to broker\n# but broker have not send response to client, usually waiting to write to bookies.\n# It's shared across all the topics running in the same broker.\n# Use -1 to disable the memory limitation. Default is 1\u002F2 of direct memory.\nmaxMessagePublishBufferSizeInMB=\n\n",[4926,88204,88202],{"__ignoreMap":18},[40,88206,88208],{"id":88207},"support-bouncycastle-fips-provider","Support BouncyCastle FIPS provider",[48,88210,88211],{},"In Pulsar 2.5.1, Pulsar supports BC-FIPS (BouncyCastle FIPS) provider. Before Pulsar 2.5.1, Pulsar only supported BouncyCastle (BC) provider, and BC JARs are tied strongly into both the broker and the client code. Users fail to change from the BC provider to the BC-FIPS provider. This feature splits the BC dependency out into a separate module. Therefore, users can freely switch between the BC provider and the BC-FIPS provider.",[40,88213,88215],{"id":88214},"allow-tenant-admin-to-manage-subscription-permission","Allow tenant Admin to manage subscription permission",[48,88217,88218],{},"In previous releases, we have added support to grant subscriber-permission to manage subscription based APIs. However, grant-subscription-permission API requires super-user access and it creates too much dependency on system-admin when many tenants want to grant subscription permission. In Pulsar 2.5.1, through the Restful API or the Pulsar Admin, we allow each tenant Admin to manage subscription permission in order to reduce administrative efforts for super users.",[40,88220,88222],{"id":88221},"allow-to-enabledisable-delayed-delivery-for-messages-on-namespace","Allow to enable\u002Fdisable delayed delivery for messages on namespace",[48,88224,88225],{},"In Pulsar 2.5.1, we add the set-delayed-delivery and set-delayed-delivery-time policies for the namespace. Therefore, Pulsar 2.5.1 allows to enable or disable delayed delayed delivery for messages on namespace.",[40,88227,88229],{"id":88228},"support-offloader-at-namespace-level","Support offloader at namespace level",[48,88231,88232],{},"In previous releases, the offload operation only had the cluster-level configuration. Users cannot set the offload configuration at the namespace level. In Pulsar 2.5.1, we support using the Pulsar Admin to set the offloader at the namespace level.",[40,88234,88236],{"id":88235},"disallow-sub-auto-creation-by-admin-when-disabling-topic-auto-creation","Disallow sub auto creation by Admin when disabling topic auto creation",[48,88238,88239],{},"In previous releases, when Auto topic creation is disabled in KoP, non-partitioned topics are created with Flink Pulsar Source. To fix this bug, in Pulsar 2.5.1, we change the admin code to disable sub auto creation by the Admin when Auto topic creation is disabled.",[40,88241,88243],{"id":88242},"support-python-38-for-pulsar-client","Support Python 3.8 for Pulsar client",[48,88245,88246],{},"In pulsar 2.5.1, we add 3.8 cp38-cp38 to support Python 3.8 for the Pulsar client. Therefore, users can install the Pulsar client on Python 3.8 .",[40,88248,88250],{"id":88249},"provide-another-libpulsarwithdepsa-in-debianrpm-cpp-client-library","Provide another libpulsarwithdeps.a in Debian\u002FRPM cpp client library",[48,88252,88253],{},"Pulsar 2.5.1 mainly provides 2 additional pulsar c++ client libraries in Debian\u002FRPM:",[321,88255,88256,88259],{},[324,88257,88258],{},"pulsarSharedNossl (libpulsarnossl.so): it is similar to pulsarShared(libpulsar.so), and has no SSL statically linked.",[324,88260,88261],{},"pulsarStaticWithDeps(libpulsarwithdeps.a): it is similar to pulsarStatic(libpulsar.a), and is archived in the dependencies libraries of libboost_regex, libboost_system, libcurl, libprotobuf, libzstd and libz statically.",[40,88263,52473],{"id":52472},[48,88265,88266,88267,190],{},"To download Apache Pulsar 2.5.1, click ",[55,88268,267],{"href":53730,"rel":88269},[264],[48,88271,78604],{},[321,88273,88274,88278,88282,88287],{},[324,88275,88276],{},[55,88277,78612],{"href":78611},[324,88279,88280],{},[55,88281,78618],{"href":78617},[324,88283,78621,88284],{},[55,88285,36242],{"href":36242,"rel":88286},[264],[324,88288,78627,88289],{},[55,88290,57760],{"href":57760,"rel":88291},[264],[48,88293,78633,88294,190],{},[55,88295,75345],{"href":36230,"rel":88296},[264],{"title":18,"searchDepth":19,"depth":19,"links":88298},[88299,88300,88301,88302,88303,88304,88305,88306,88307,88308,88309,88310,88311,88312,88313,88314,88315,88316],{"id":88082,"depth":19,"text":88083},{"id":88092,"depth":19,"text":88093},{"id":88110,"depth":19,"text":88111},{"id":88117,"depth":19,"text":88118},{"id":88142,"depth":19,"text":88143},{"id":88149,"depth":19,"text":88150},{"id":88156,"depth":19,"text":88157},{"id":88171,"depth":19,"text":88172},{"id":88178,"depth":19,"text":88179},{"id":88191,"depth":19,"text":88192},{"id":88207,"depth":19,"text":88208},{"id":88214,"depth":19,"text":88215},{"id":88221,"depth":19,"text":88222},{"id":88228,"depth":19,"text":88229},{"id":88235,"depth":19,"text":88236},{"id":88242,"depth":19,"text":88243},{"id":88249,"depth":19,"text":88250},{"id":52472,"depth":19,"text":52473},"2020-04-23","Learn improvements and bug fixes in Apache Pulsar 2.5.1 release",{},"\u002Fblog\u002Fapache-pulsar-2-5-1",{"title":88052,"description":88318},"blog\u002Fapache-pulsar-2-5-1",[302,821],"okeQIu7FP_Zr46eU33STIgiv2z-2JtuFexAKZpGTUKY",{"id":88326,"title":76993,"authors":88327,"body":88329,"category":821,"createdAt":290,"date":88643,"description":88644,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":88645,"navigation":7,"order":296,"path":88646,"readingTime":33204,"relatedResources":290,"seo":88647,"stem":88648,"tags":88649,"__hash__":88650},"blogs\u002Fblog\u002Fmoved-from-apache-kafka-to-apache-pulsar.md",[88328],"Simba Khadder",{"type":15,"value":88330,"toc":88610},[88331,88336,88339,88342,88346,88350,88356,88359,88362,88364,88367,88381,88385,88391,88394,88398,88403,88406,88409,88413,88418,88421,88425,88430,88433,88437,88442,88445,88449,88452,88455,88457,88460,88464,88470,88473,88476,88479,88482,88486,88492,88495,88498,88501,88505,88511,88514,88517,88520,88524,88530,88533,88537,88542,88545,88548,88551,88554,88557,88560,88564,88566,88569,88572,88576,88583,88586,88590,88593,88595,88597,88600,88603,88607],[48,88332,88333],{},[384,88334],{"alt":78983,"src":88335},"\u002Fimgs\u002Fblogs\u002F63a3281bf1bef7e7cf3c86c1_pulsar-based.png",[48,88337,88338],{},"Apache Kafka and event streaming are practically synonymous today. Event streaming is a core part of our platform, and we recently swapped Kafka out for Pulsar. We’ve spoken about it in-person with our clients and at conferences. Recently, a friend in the Apache Pulsar community recommended that I write a post to share our experience and our reasons for switching.",[48,88340,88341],{},"We built our platform on Kafka and found ourselves writing tons of code to make the system behave as we wanted. We decided that Kafka was not the right tool for the job. Obviously, this won’t be true for many, many use cases and, even when it is, it may make sense to use it anyway instead of Pulsar. Through the rest of the post, I’ll describe the solution we built with Kafka, and why we decided to move to Pulsar.",[40,88343,88345],{"id":88344},"our-problem-statement","Our Problem Statement",[32,88347,88349],{"id":88348},"what-is-streamsql","What is StreamSQL?",[48,88351,88352],{},[384,88353],{"alt":88354,"src":88355},"StreamSQL-arch","\u002Fimgs\u002Fblogs\u002F63a3281b3e90e4edb32ba4d1_streamsql-arch.png",[48,88357,88358],{},"StreamSQL is a data storage system built around event-sourcing. Three components make up StreamSQL: Event Storage, Transformations, and Materialized State. The event storage is an immutable ledger of every domain event sent to our system. We serve the materialized state with similar APIs to Cassandra, Redis, and CockroachDB. Transformations are pure functions that map the events into the state. Every event that we receive is processed and applied to the materialized state, according to the transformation.",[48,88360,88361],{},"StreamSQL runs new transformations retroactively across all data. The end state is a true materialized of the entire event stream. Furthermore, you can generate a \"virtual\" state by rollbacking back and replaying events. The virtual state can be used to train and validate machine learning models, and for debugging purposes (like Redux for frontend development).",[32,88363,50937],{"id":50936},[48,88365,88366],{},"The system needs to be able to do the following:",[321,88368,88369,88372,88375,88378],{},[324,88370,88371],{},"Store every domain event in a system forever.",[324,88373,88374],{},"Keep materialized state consistent by guaranteeing exactly-once processing of each incoming event.",[324,88376,88377],{},"Be able to run transformations on all historical events in the same order that we received them.",[324,88379,88380],{},"Rollback and replay the event ledger and materialize the views at that point.",[40,88382,88384],{"id":88383},"the-original-kafka-based-solution","The Original Kafka-Based Solution",[48,88386,88387],{},[384,88388],{"alt":88389,"src":88390},"Kafka-arch","\u002Fimgs\u002Fblogs\u002F63a3281bf489b791590bea01_kafka-arch-full.png",[48,88392,88393],{},"The original Kafka-Based solution consisted of a stitched-together set of big data tools. The system stored past events in S3 and processed them with Spark. For streaming data, it used Kafka and Flink. Keeping events and the materialized views consistent required complex coordination between each system.",[32,88395,88397],{"id":88396},"storing-every-domain-event-indefinitely","Storing Every Domain Event Indefinitely",[48,88399,88400],{},[384,88401],{"alt":88389,"src":88402},"\u002Fimgs\u002Fblogs\u002F63a3281bb4dbad74deb2b244_kafka-arch-upload.png",[48,88404,88405],{},"Every domain event would enter the system through Kafka which would then save it into S3. This allowed us to store large amounts of seldom used data with high durability and low cost.",[48,88407,88408],{},"We attempted to use Kafka’s infinite retention on streams but found it expensive and unmaintainable. We started to see performance degradation and volatile latencies on our larger topics. We did not investigate further since we were almost entirely moved onto to Pulsar.",[32,88410,88412],{"id":88411},"bootstrapping-a-materialized-view-from-batch-data","Bootstrapping a Materialized View from Batch Data",[48,88414,88415],{},[384,88416],{"alt":88389,"src":88417},"\u002Fimgs\u002Fblogs\u002F63a3281bf1bef745603c86c2_kafka-arch-batch.png",[48,88419,88420],{},"We materialize a view by processing every event in order. We use Spark to crunch through the majority of the historical data that's stored in S3. If we could pause events while this was happening, it would simplify things. In that situation, we could read all S3 data, then switch to processing Kafka at the head of the topic. In reality, there is a delay between events persisting into S3 from Kafka, and another one between swapping the large batch processing cluster to the smaller stream processing one. We can't afford to miss processing any events, so we use Spark to process as many events as possible in S3 and then have it return the last event's ID. Since we've configured Kafka to retain the last couple weeks of data, we can backfill the rest of the events off of Kafka.",[32,88422,88424],{"id":88423},"backfilling-from-kafka","Backfilling from Kafka",[48,88426,88427],{},[384,88428],{"alt":88389,"src":88429},"\u002Fimgs\u002Fblogs\u002F63a3281b2cf67d2a984c1b9d_kafka-arch-stream.png",[48,88431,88432],{},"Spark was able to crunch through the majority of past events, but it does not get us to the latest state. To process the final set of past events, we've configured our Kafka cluster to retain the last two weeks of acknowledged events. We run a Flink job to continue the SQL transformation that Spark started. We point Flink at the first event in Kafka and have it read through, doing nothing until it reaches the messageID where Spark left off. From that point on, it continues to update the materialized view until it reaches the head of the stream. Finally, it notifies the Transformation API that the materialized view is up to date and ready to serve.",[32,88434,88436],{"id":88435},"updating-on-incoming-events","Updating on Incoming Events",[48,88438,88439],{},[384,88440],{"alt":88389,"src":88441},"\u002Fimgs\u002Fblogs\u002F63a3281ba8bd8676c8c1939a_kafka-arch-stream-full.png",[48,88443,88444],{},"StreamSQL must keep the materialized views up to date once they are bootstrapped. At this point, the problem is trivial. Kafka passes each incoming event directly to Flink which then performs the necessary updates. The Transformation API and Spark are idle at this point. However, we still persist each incoming event into S3 in case a user updates or creates a transformation.",[32,88446,88448],{"id":88447},"multi-tenancy-rollback-replay-error-handling-etc","Multi-Tenancy, Rollback & Replay, Error Handling, etc.",[48,88450,88451],{},"We coordinate Flink and Kafka to work together in keeping snapshots of materialized views. With proper coordination, we can allow seamless rollback and replay functionality. Describing this process would require a blog post to itself (which we expect to write in the near future).",[48,88453,88454],{},"In this blog post, we also won't cover how we scaled our Flink and Kafka clusters, how we handled service failures, or how we were able to have secure multi-tenancy across all these different services (hint: each solution has a different answer). If you have a pressing need to know any of the above, feel free to reach out. We're happy to share.",[40,88456,72144],{"id":72143},[48,88458,88459],{},"Pulsar was built to store events forever, rather than streaming them between systems. Furthermore, Pulsar was built at Yahoo! for teams building a wide variety of products across the globe. It natively supports geo-distribution and multi-tenancy. Performing complex deployments, such as keeping dedicated servers for certain tenants, becomes easy. We leverage these features wherever we can. This allowed us to hand off a significant portion of our custom logic into Pulsar.",[32,88461,88463],{"id":88462},"tiered-storage-into-s3","Tiered Storage into S3",[48,88465,88466],{},[384,88467],{"alt":88468,"src":88469},"Pulsar-tiered","\u002Fimgs\u002Fblogs\u002F63a3281b339d9c735aa4fe57_pulsar-tiered-storage.png",[48,88471,88472],{},"StreamSQL users can create a new materialized view at any time. These views must be a projection of all events, so every transformation processes each historical event in-order. In our Kafka-based solution, we streamed all acknowledged events into S3 or GCS. Then, a batch pipeline in Spark processed those events. The system as a whole required us to coordinate an event stream, batch storage, batch compute, stream compute, and stateful storage. In the real world, coordinating these systems is error-prone, expensive, and hard to automate.",[48,88474,88475],{},"If we could configure our event storage to keep events forever, it would allow us to merge together our batch and streaming pipelines. Both Pulsar and Kafka allow this; however, Kafka does not have tiered storage. This means all events would have to be kept on the Kafka nodes’ disks. The event ledger monotonically increases, so we would have to constantly add storage. Most historical events aren’t read very often, so the majority of our expensive disk storage sits dormant.",[48,88477,88478],{},"On the other hand, Apache Pulsar has built-in tiered storage. Pulsar breaks down every event log into segments, and offloads inactive segments to S3. This means that we get infinite, cheap storage with a simple configuration change to Kafka. We don’t have to constantly increase the size of our cluster, and we can merge our batch and stream pipelines.",[48,88480,88481],{},"We can configure Pulsar to offload events when a topic hits a specific size or we can run it manually. This gives us flexibility to set the right offload policy to balance cost and speed. We’re building machine learning models to fit our offload policy to each individual topic's specific needs.",[32,88483,88485],{"id":88484},"separate-compute-and-storage-scaling","Separate Compute and Storage Scaling",[48,88487,88488],{},[384,88489],{"alt":88490,"src":88491},"Pulsar-arch","\u002Fimgs\u002Fblogs\u002F63a3281b2a1e8c523963226b_pulsar-bookie-arch.png",[48,88493,88494],{},"Our event volume and usage patterns vary widely throughout the day and across users. Each user's different usage patterns result in either heavier storage or compute usage. Luckily, Pulsar separates its brokers from its storage layer.",[48,88496,88497],{},"There are three different operations that Pulsar can perform: tail writes, tail reads, and historical reads. Pulsar writes, like Kafka’s, always go to the end of the stream. For Pulsar there are three steps in a write. First, the broker receives the request, then the broker writes it to Bookkeeper, and finally, it caches it for subsequent tail reads. That means that tail reads are very fast and don’t touch the storage layer at all. In comparison, historical reads are very heavy on the storage layer.",[48,88499,88500],{},"Adding storage nodes is relatively easy for both Kafka and Pulsar, but it is a very expensive operation. Data must be shuffled around and copied to properly balance the storage nodes. In Kafka’s case, brokers and storage exist on the same nodes, so any scaling operation is expensive. Contrastly, in Pulsar, brokers are stateless and are easy and cheap to scale. That means that tail reads do not pose a significant scale issue. we can fit our cluster to the current usage pattern of historical reads and tail reads.",[32,88502,88504],{"id":88503},"builtin-multi-tenancy","Builtin Multi-tenancy",[48,88506,88507],{},[384,88508],{"alt":88509,"src":88510},"Pulsar-multitenancy","\u002Fimgs\u002Fblogs\u002F63a3281b3e90e458722ba541_pulsar-multitenancy.png",[48,88512,88513],{},"Pulsar was built with multi-tenancy baked in. At Yahoo!, many geographically distributed teams working on different products shared the same Pulsar cluster. The system had to handle keeping track of different budgets and various SLAs. It has a feature set that allows us to run all users on the same Pulsar cluster while maintaining performance, reliability, and security.",[48,88515,88516],{},"Every Pulsar topic belongs to a namespace, and each namespace belongs to a tenant. Every StreamSQL account maps to a tenant. Tenants are securely isolated from eachother. There is no way for one user to ever touch a different user’s streams.",[48,88518,88519],{},"The namespaces provide other interesting dynamics around isolation from a performance standpoint. We can isolate a user’s namespace to a specific set of brokers and storage nodes. This limits the effect a single user can have on a whole system. At the same time, we can set up automatic load shedding on the brokers so that a spike in a single client can be absorbed by the larger system.",[32,88521,88523],{"id":88522},"active-and-responsive-community","Active and Responsive Community",[48,88525,88526],{},[384,88527],{"alt":88528,"src":88529},"Pulsar slack","\u002Fimgs\u002Fblogs\u002F63a3281be1f54e27dcf7a0a3_pulsar-slack.png",[48,88531,88532],{},"The Pulsar community slack channel has been amazing. I receive answers to most of my questions almost immediately, and I’m always learning new things by keeping an eye on it. There are a handful of meetups and a Pulsar Summit as well for in-person learning and networking. We knew, in the worst case, we could reach out to relevant people and get help with even our most niche questions. The community gave us the confidence to move forward with Pulsar.",[40,88534,88536],{"id":88535},"the-pulsar-based-solution","The Pulsar-Based Solution",[48,88538,88539],{},[384,88540],{"alt":88541,"src":88335},"Pulsar-based",[32,88543,88397],{"id":88544},"storing-every-domain-event-indefinitely-1",[48,88546,88547],{},"Pulsar allows us to store the entire immutable ledger in a Pulsar topic. We treat it as if it's all in Pulsar, but, under the hood, Pulsar offloads events into S3. We get the simplicity benefit of working with an event ledger, with the cost and maintenance benefit of putting events in S3. It all behaves better than our Kafka system without us having to maintain any of the complexity.",[32,88549,88412],{"id":88550},"bootstrapping-a-materialized-view-from-batch-data-1",[48,88552,88553],{},"The Pulsar architecture merges our streaming and batch capabilities. This allows us to remove Spark and all the coordination code between Spark and Flink. The Pulsar -> Flink connector seamlessly swaps between batch and stream processing modes. The architecture's simplicity eliminates tons of edge cases, error handling, and maintenance costs that were present in the Kafka-based version.",[32,88555,88436],{"id":88556},"updating-on-incoming-events-1",[48,88558,88559],{},"We write one job to handle both batch and streaming data. Without any coordination from us, Flink maintains exactly-once processing and swaps between its batch and streaming modes.",[40,88561,88563],{"id":88562},"drawbacks-of-pulsar","Drawbacks of Pulsar",[32,88565,85197],{"id":85196},[48,88567,88568],{},"Pulsar has been around for almost as long as Kafka and was proven in production at Yahoo. We view Pulsar’s core as stable and reliable. Integrations are a different problem. There are a never-ending list of integrations to write. In most cases, the Pulsar community builds and maintains its integrations. For example, we wanted to set S3 as a sink and learned that no open-source connector existed. We built our own are open-sourcing our solution to push the community forward, but we expect to find into missing integrations in the future.",[48,88570,88571],{},"Given that Pulsar is nowhere near as popular as Kafka to date, a majority of the Pulsar integrations are built and maintained in the Pulsar repo. For example, the Flink connector that we use is in the Pulsar repo, but there is also an open Apache Flink ticket to build an one on their side as well. Until Pulsar becomes mainstream enough, there will continue to be missing integrations.",[32,88573,88575],{"id":88574},"lack-of-public-case-studies","Lack of Public Case Studies",[48,88577,88578,88579,88582],{},"Almost all Pulsar content is published by hosted Pulsar providers like Streamlio (acq. by Splunk), Stream Native, and Kafkaesque. It’s quite uncommon to see a Pulsar case study by a company that’s using it in production at scale with no commercial ties to Pulsar. There are many large companies using it in production with it, but they seldom publish their experiences to the public. StreamNative posted a list of them ",[55,88580,267],{"href":88581},"\u002Fsuccess-stories\u002F",". Public case studies allow us to pick up tricks and gotchas without having to reinvent the wheel.",[48,88584,88585],{},"In comparison, there are plenty of case studies on Kafka. Kafka is the most prominent event streaming platform and continuing to gain popularity, so most companies that write about their data platform will go in-depth about how they use Kafka.",[32,88587,88589],{"id":88588},"infrastructure-liability","Infrastructure Liability",[48,88591,88592],{},"Our Pulsar deployment requires a Zookeeper cluster for metadata, a Bookkeeper cluster for storage, a broker cluster, and a proxy cluster. Even with AWS and Google Cloud services, this is a lot of maintenance liability. There are a huge number of configuration possibilities for Pulsar alone, but, when you look at the lower layers, it can call for multiple specialized engineers to maintain and optimize.",[40,88594,13565],{"id":1727},[32,88596,15627],{"id":34962},[48,88598,88599],{},"Currently, we use Flink to process streaming events and update our materialized views. Flink doesn't allow new nodes to be appended to a cluster. Instead, we have to save a checkpoint and restart the cluster with a larger size. Conversely, Pulsar functions are run in a separate compute cluster that we can be dynamically resized.",[48,88601,88602],{},"Flink's processing engine is much more expressive and powerful, but is far more complex to scale. Pulsar's is easy to scale, but far more limited. We will soon be able to categorize transformations and decide where to run them with a tendency towards Pulsar functions.",[32,88604,88606],{"id":88605},"streaming-dag","Streaming DAG",[48,88608,88609],{},"StreamSQL doesn't currently allow transformations to use materialized views as state. We are working on modeling the system as a DAG (Directed Acyclic Graph), as Airflow does. Unlike Airflow, the dependencies cannot be be performed in steps, every event would have to go through the entire DAG. Pulsar will make it much easier to maintain this guarantee as each event goes through the DAG.",{"title":18,"searchDepth":19,"depth":19,"links":88611},[88612,88616,88623,88629,88634,88639],{"id":88344,"depth":19,"text":88345,"children":88613},[88614,88615],{"id":88348,"depth":279,"text":88349},{"id":50936,"depth":279,"text":50937},{"id":88383,"depth":19,"text":88384,"children":88617},[88618,88619,88620,88621,88622],{"id":88396,"depth":279,"text":88397},{"id":88411,"depth":279,"text":88412},{"id":88423,"depth":279,"text":88424},{"id":88435,"depth":279,"text":88436},{"id":88447,"depth":279,"text":88448},{"id":72143,"depth":19,"text":72144,"children":88624},[88625,88626,88627,88628],{"id":88462,"depth":279,"text":88463},{"id":88484,"depth":279,"text":88485},{"id":88503,"depth":279,"text":88504},{"id":88522,"depth":279,"text":88523},{"id":88535,"depth":19,"text":88536,"children":88630},[88631,88632,88633],{"id":88544,"depth":279,"text":88397},{"id":88550,"depth":279,"text":88412},{"id":88556,"depth":279,"text":88436},{"id":88562,"depth":19,"text":88563,"children":88635},[88636,88637,88638],{"id":85196,"depth":279,"text":85197},{"id":88574,"depth":279,"text":88575},{"id":88588,"depth":279,"text":88589},{"id":1727,"depth":19,"text":13565,"children":88640},[88641,88642],{"id":34962,"depth":279,"text":15627},{"id":88605,"depth":279,"text":88606},"2020-04-21","Pulsar was built to store events forever, rather than streaming them between systems. It natively supports geo-distribution and multi-tenancy. Performing complex deployments, such as keeping dedicated servers for certain tenants, becomes easy.",{},"\u002Fblog\u002Fmoved-from-apache-kafka-to-apache-pulsar",{"title":76993,"description":88644},"blog\u002Fmoved-from-apache-kafka-to-apache-pulsar",[799,35559],"9CXME1JsexPrLH9CSMDOnJt7AeQre0q6IlOgiAHtbLM",{"id":88652,"title":88653,"authors":88654,"body":88655,"category":821,"createdAt":290,"date":88882,"description":88883,"extension":8,"featured":294,"image":88884,"isDraft":294,"link":290,"meta":88885,"navigation":7,"order":296,"path":88886,"readingTime":3556,"relatedResources":290,"seo":88887,"stem":88888,"tags":88889,"__hash__":88890},"blogs\u002Fblog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-2.md","How to Build a Distributed Database with Apache BookKeeper — Part 2",[87358],{"type":15,"value":88656,"toc":88873},[88657,88660,88663,88666,88669,88672,88676,88679,88682,88699,88702,88708,88712,88715,88718,88729,88732,88743,88746,88749,88752,88755,88759,88762,88765,88768,88771,88774,88777,88780,88783,88786,88789,88793,88796,88799,88802,88805,88808,88811,88817,88820,88823,88831,88834,88838,88841,88844,88847,88850,88854,88857,88860,88862,88865,88868,88871],[48,88658,88659],{},"In this series of posts, I want to share some basic architectural concepts about the possible anatomy of a distributed database with a shared-nothing architecture. In the first part, we see how you can design a database as a replicated state machine.",[48,88661,88662],{},"We have a cluster of machines that do not share disks and they are only connected using the network. We have a table of records, the state of our machine is the content of the table.",[48,88664,88665],{},"Each modification to the table, including INSERT, UPDATE and DELETE operations, is written to a write ahead log and then applied to the local copy of the data.",[48,88667,88668],{},"At any time only one machine among the group is elected to be the leader and clients perform writes and reads only by issuing requests to that machine, only this node is able to modify the contents of the table.",[48,88670,88671],{},"The other machines are the so-called followers and they are continuously tailing the log: they read operations from the log and they apply them to their own local copy, exactly in the same order as they have been written by the leader.",[40,88673,88675],{"id":88674},"enter-apache-bookkeeper","Enter Apache BookKeeper",[48,88677,88678],{},"In the beginning, Apache BookKeeper was designed as a distributed write ahead log for the Hadoop HDFS Namenode as part of the Apache Zookeeper project, but soon it started its own life as an independent product.",[48,88680,88681],{},"It comes with most of the features we need to support our replicated state machine, the main features of our interests are:",[321,88683,88684,88687,88690,88693,88696],{},[324,88685,88686],{},"Decentralized architecture: all of the logic runs on a rich client model, BookKeeper servers are only containers of data, this allows us to completely scale out.",[324,88688,88689],{},"Shared nothing storage model: clients only use network, no shared disks; servers do not know about each other.",[324,88691,88692],{},"Support for fencing: BookKeeper guarantees that only one machine is able to write to the log.",[324,88694,88695],{},"Last Add Confirmed Protocol: BookKeeper allows the readers to follow the log consistently.",[324,88697,88698],{},"Automatic re-replication of data on lost storage nodes: self healing in case of lost machines and network partitions.",[48,88700,88701],{},"Let's look through all of these key features, we will see how our database is able to address all of the challenges of a distributed system.",[48,88703,88704],{},[384,88705],{"alt":88706,"src":88707},"illustration of  Apache BookKeeper","\u002Fimgs\u002Fblogs\u002F63a320fc588fca900d0f3d0c_bk2-1.png",[40,88709,88711],{"id":88710},"rich-client-model","Rich client model",[48,88713,88714],{},"The leader node runs a BookKeeper writer (WriteHandle) and it creates a ledger. A ledger is a write only segment of our log. It can be opened for writing only once and it can be read as many times as you like. You can not append more data to a ledger once the writer closes it or in case it dies.",[48,88716,88717],{},"At creation time, you need to set three parameters about replication:",[321,88719,88720,88723,88726],{},[324,88721,88722],{},"Ensemble size (ES): the number of bookies that will store ledger's data.",[324,88724,88725],{},"Write quorum size (WQ): the number of copies for each entry.",[324,88727,88728],{},"Ack quorum size (AQ): the number of required copies to be acknowledged before considering a write as successful.",[48,88730,88731],{},"For instance, if you have ES=3, WQ=2 and AQ=1 for each entry, BookKeeper will:",[321,88733,88734,88737,88740],{},[324,88735,88736],{},"Spread copies over 3 bookies",[324,88738,88739],{},"Write 2 copies of each entry",[324,88741,88742],{},"Wait only for 1 acknowledgement in order to declare an entry to be written.",[48,88744,88745],{},"If you have very few bookies (like 3), I suggest you start with 2–2–2, it is a good trade-off, you are guaranteed to have at least two copies of each entry.",[48,88747,88748],{},"Having ES > WQ is known as striping, this helps in boosting performances because writes and reads are spread to more bookies.",[48,88750,88751],{},"The writer reacts to bookie failures and chooses new bookies to use as storage, this mechanism is known as ensemble change, if you have enough bookie this is totally transparent to the application and you do not have to care about this situation.",[48,88753,88754],{},"At every ensemble change, we start a new segment of the ledger, the readers watch changes on ledger metadata and are able to automatically connect to the new bookies.",[40,88756,88758],{"id":88757},"fencing","Fencing",[48,88760,88761],{},"In our replicated state machine model, we have only one leader that is allowed to perform changes to the state of the table. Its leadership role must be supported by all of the other peers, for instance, you could use some leader election recipe with ZooKeeper, but this won’t be enough to guarantee the overall consistency of data.",[48,88763,88764],{},"In theory, you should implement some kind of low level distributed consensus protocol (like ZAB in ZooKeeper), but this would be overkilling, it will be really slow.",[48,88766,88767],{},"Here comes BookKeeper to the rescue.",[48,88769,88770],{},"When a node starts to act as a leader it performs a “recovery read” over every ledger supposed to be open by the previous leader.",[48,88772,88773],{},"This operation connects to every bookie that contains that ledger’s data and flags each ledger as fenced and if the previous leader is still alive it will receive a specific write error during the next write that tells that he has been fenced off.",[48,88775,88776],{},"BookKeeper handles every corner case, like network errors during recovery or multiple concurrent recovery operations.",[48,88778,88779],{},"You are guaranteed that only one machine will succeed in the recovery and then it can start to perform new changes to the status of the database.",[48,88781,88782],{},"But BookKeeper deals only with ledgers and you have to store somewhere the list of ledgers that are building up your write ahead log. This secondary metadata storage must handle some sort of fencing as well.",[48,88784,88785],{},"One option is to store this list on ZooKeeper and leverage its built-in distributed compare-and-set facility to deal with concurrent leaders that want to add a new ledger to the list of active ledgers. Please refer to the BookKeeper Tutorial for an example of how to deal with this part of the story.",[48,88787,88788],{},"BookKeeper provides a higher-level API, DistributedLog, that does this part for you and adds a lot of built-in features to BookKeeper low-level API.",[40,88790,88792],{"id":88791},"last-add-confirmed-protocol","Last add confirmed protocol",[48,88794,88795],{},"Each node is now using only Zookeeper and BookKeeper in order to communicate with the other peers. Let’s see how can a follower know that the leader is making progress and it is time to look for new entries.",[48,88797,88798],{},"Every time the writer adds an entry to a ledger, it also writes the greatest entry id for which it has received an acknowledgment of successful store by an AQ of bookies, we name this id the Last-Add-Confirmed entry (LAC).",[48,88800,88801],{},"Usually, the writer is faster in sending writes than the bookie to persist them and to send back an acknowledgment message, so entries would be potentially available for readers even if the writer is not considering them as persisted.",[48,88803,88804],{},"This is very dangerous because this way follower nodes may have a future view of data that has not still been accepted by the leader.",[48,88806,88807],{},"In order to address this case BookKeeper readers can read only entries up to the LAC entry. You are guaranteed that the reader is always one step behind the writer.",[48,88809,88810],{},"Readers get this LAC during reads (it is piggy backed) by reading ledger’s metadata and asking the bookies associated with the ledger.",[48,88812,88813],{},[384,88814],{"alt":88815,"src":88816},"illustration of Last add confirmed protocol BookKeeper","\u002Fimgs\u002Fblogs\u002F63a320fc5984dc2335254830_bk2-2.png",[48,88818,88819],{},"Let’s see a usual tricky case for new users of BookKeeper: the writer writes entry X and X-1 as LAC, so readers can see only up to X-1, if no more entries are written, that the follower is not able to be up to date. This is not a real problem in production, especially under heavy load, but it is difficult to understand especially the first time you play with BookKeeper.",[48,88821,88822],{},"You have to ways to fix this issue:",[321,88824,88825,88828],{},[324,88826,88827],{},"Periodically write a dummy entry if you haven't written anything",[324,88829,88830],{},"Use the ExplicitLAC feature, this basically stores a secondary LAC pointer out of the regular piggybacking mechanism",[48,88832,88833],{},"There are ways to bypass LAC protocol but this blog does not introduce them because they are not useful in our use case.",[40,88835,88837],{"id":88836},"close-a-ledger","Close a ledger",[48,88839,88840],{},"BookKeeper is optimized for high throughput but the most important features are about the consistency guarantees and this is mostly about metadata management both using ZooKeeper and the built-in fencing mechanisms.",[48,88842,88843],{},"One critical point is when you have to define the actual series of valid entries visible to readers, especially the id of the last entry.",[48,88845,88846],{},"There are several mechanics that come into play because the writer may fail and also network may fail but in the end, the writer or the recovery procedure will come to seal this range of valid entry ids.",[48,88848,88849],{},"We call this operation ‘closing’ a ledger and basically it is about writing to Zookeeper the final state of the ledger and this happens when out ‘close’ your WriteHandle. After closing a ledger, the reader is able to read up the last written entry.",[40,88851,88853],{"id":88852},"replication-in-case-of-lost-bookies","Replication in case of lost bookies",[48,88855,88856],{},"When you lose a bookie, BookKeeper is able to enforce the original replication factor (Write Quorum Size) by detecting the failure and replicate again the data supposed to be stored on the dead bookie.",[48,88858,88859],{},"This can be done manually by using BookKeeper tools, but it can also be performed by the auto-recovery daemon.",[40,88861,87797],{"id":87796},[48,88863,88864],{},"It is a good choice to use BookKeeper as a distributed write ahead log since it deals with many aspects of distributed systems. If you want to write your own log, you will fall into a lot of corner cases only when it will be too late.",[48,88866,88867],{},"Therefore, BookKeeper was designed from the ground up to be a high-performance storage system, with lots of improvements and tricks about local disk storage management, network usage, and JVM performance.",[48,88869,88870],{},"In the next part of this series, you can learn how HerdDB uses Apache BookKeeper as write ahead log and fill in the gaps of this story: how to store local data, coordinate replicas and perform checkpoints.",[48,88872,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":88874},[88875,88876,88877,88878,88879,88880,88881],{"id":88674,"depth":19,"text":88675},{"id":88710,"depth":19,"text":88711},{"id":88757,"depth":19,"text":88758},{"id":88791,"depth":19,"text":88792},{"id":88836,"depth":19,"text":88837},{"id":88852,"depth":19,"text":88853},{"id":87796,"depth":19,"text":87797},"2020-04-14","It is a good choice to use BookKeeper as a distributed write ahead log since it deals with many aspects of distributed systems.","\u002Fimgs\u002Fblogs\u002F63d7990c61830d02b126a528_63a320fc5984dc2335254830_bk2-2.webp",{},"\u002Fblog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-2",{"title":88653,"description":88883},"blog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-2",[38442,12106],"BhP9npuBsEw-7SNXZdPUVmiQ9oMSt49Du4sQPZRrpQc",{"id":88892,"title":88893,"authors":88894,"body":88895,"category":3550,"createdAt":290,"date":89156,"description":89157,"extension":8,"featured":294,"image":89158,"isDraft":294,"link":290,"meta":89159,"navigation":7,"order":296,"path":89160,"readingTime":7986,"relatedResources":290,"seo":89161,"stem":89162,"tags":89163,"__hash__":89164},"blogs\u002Fblog\u002Fkafka-on-pulsar-bring-native-kafka-protocol-support-to-apache-pulsar.md","Announcing Kafka-on-Pulsar: Bring Native Kafka Protocol Support to Apache Pulsar (KoP)",[806,83497,83315],{"type":15,"value":88896,"toc":89138},[88897,88904,88914,88916,88919,88923,88926,88941,88945,88947,88950,88953,88956,88962,88965,88969,88982,88985,88987,88990,89004,89007,89010,89016,89019,89022,89025,89029,89032,89036,89039,89043,89046,89050,89053,89057,89060,89064,89067,89070,89074,89077,89080,89082,89097,89113,89120,89126,89128,89131],[48,88898,88899,88900,88903],{},"We are excited to announce that StreamNative and OVHcloud are open-sourcing “Kafka on Pulsar\" (KoP). ",[55,88901,35093],{"href":29592,"rel":88902},[264]," brings the native Apache Kafka protocol support to Apache Pulsar by introducing a Kafka protocol handler on Pulsar brokers. By adding the KoP protocol handler to your existing Pulsar cluster, you can now migrate your existing Kafka applications and services to Pulsar without modifying the code. This enables Kafka applications to leverage Pulsar’s powerful features, such as:",[321,88905,88906,88908,88910,88912],{},[324,88907,32501],{},[324,88909,32504],{},[324,88911,32510],{},[324,88913,32513],{},[40,88915,62871],{"id":62870},[48,88917,88918],{},"Apache Pulsar is an event streaming platform designed from the ground up to be cloud-native deploying a multi-layer and segment-centric architecture. The architecture separates serving and storage into different layers, making the system container-friendly. The cloud-native architecture provides scalability, availability and resiliency and enables companies to expand their offerings with real-time data-enabled solutions. Pulsar has gained wide adoption since it was open-sourced in 2016 and was designated an Apache Top-Level project in 2018.",[40,88920,88922],{"id":88921},"the-need-for-kop","The Need for KoP",[48,88924,88925],{},"Pulsar provides a unified messaging model for both queueing and streaming workloads. Pulsar implemented its own protobuf-based binary protocol to provide high performance and low latency. This choice of protobuf makes it convenient to implement Pulsar clients and the project already supports Java, Go, Python and C++ languages alongside third-party clients provided by the community. However, existing applications written using other messaging protocols had to be rewritten to adopt Pulsar’s new unified messaging protocol.",[48,88927,88928,88929,88934,88935,88940],{},"To address this, the Pulsar community developed applications to facilitate the migration to Pulsar from other messaging systems. For example, Pulsar provides a ",[55,88930,88933],{"href":88931,"rel":88932},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fadaptors-kafka",[264],"Kafka wrapper"," on Kafka Java API, allows existing applications that already use Kafka Java Client to switch from Kafka to Pulsar ",[55,88936,88939],{"href":88937,"rel":88938},"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Cy9ev9nAZpI",[264],"without code change",". Pulsar also has a rich connector ecosystem, connecting Pulsar with other data systems. Yet, there was still a strong demand from those looking to switch from other Kafka applications to Pulsar.",[40,88942,88944],{"id":88943},"streamnative-and-ovhclouds-collaboration","StreamNative and OVHcloud's collaboration",[48,88946,86586],{},[48,88948,88949],{},"Internally, OVHcloud had been running Apache Kafka for years. Despite their experience in operating multiple clusters with millions of messages per second on Kafka, they met painful operational challenges. For example, putting thousands of topics from thousands of users into a single cluster was difficult without multi-tenancy.",[48,88951,88952],{},"As a result, OVHcloud decided to shift and build the foundation of their topic-as-a-service product, called ioStream, on Pulsar instead of Kafka. Pulsar’s multi-tenancy and the overall architecture with Apache Bookkeeper simplified operations compared to Kafka.",[48,88954,88955],{},"After spawning the first region, OVHcloud decided to implement it as a proof-of-concept proxy capable of transforming the Kafka protocol to Pulsar on the fly. They encountered some issues, mainly on how to simulate and manipulate offsets and consumer groups as they did not have access to low-level storage details. During this process, OVHcloud discovered that StreamNative was working on bringing the Kafka protocol natively to Pulsar, and they joined forces to develop KoP.",[48,88957,88958],{},[384,88959],{"alt":88960,"src":88961},"drawing of map, streamnative logo and ovhcloud logo","\u002Fimgs\u002Fblogs\u002F63a31bb8e1c4bf3a3b655b16_streamnative-ovh-with-work.png",[48,88963,88964],{},"KoP was developed to provide a streamlined and comprehensive solution leveraging Pulsar and BookKeeper’s event stream storage infrastructure and Pulsar’s pluggable protocol handler framework. KoP is implemented as a protocol handler plugin with protocol name \"kafka\". It can be installed and configured to run as part of Pulsar brokers.",[40,88966,88968],{"id":88967},"distributed-log","Distributed log",[48,88970,88971,88972,88976,88977,190],{},"Both Pulsar and Kafka share a very similar data model around log for both pub\u002Fsub messaging and event streaming. For example, both are built on top of a distributed log. A key difference between these two systems is how they implement the distributed log. Kafka implements the distributed log in a partition-basis architecture, where a distributed log (a partition in Kafka) is designated to store in a set of brokers, while Pulsar deploys a segment-based architecture to implement its distributed log by leveraging Apache BookKeeper as its scale-out segment storage layer. Pulsar’s segment based architecture provides benefits such as rebalance-free, instant scalability, and infinite event stream storage. You can learn more about the key differences between Pulsar and Kafka in ",[55,88973,88975],{"href":50009,"rel":88974},[264],"this Splunk blog"," and in ",[55,88978,88981],{"href":88979,"rel":88980},"http:\u002F\u002Fbookkeeper.apache.org\u002Fdistributedlog\u002Ftechnical-review\u002F2016\u002F09\u002F19\u002Fkafka-vs-distributedlog.html",[264],"this blog from the Bookkeeper project",[48,88983,88984],{},"Since both of the systems are built on a similar data model, a distributed log, it is very simple to implement a Kafka-compatible protocol handler by leveraging Pulsar’s distributed log storage and its pluggable protocol handler framework (introduced in the 2.5.0 release).",[40,88986,68842],{"id":68841},[48,88988,88989],{},"The implementation is done by comparing the protocols between Pulsar and Kafka. We found that there are a lot of similarities between these two protocols. Both protocols are comprised of the following operations:",[321,88991,88992,88995,88998,89001],{},[324,88993,88994],{},"Topic Lookup: All the clients connect to any broker to lookup the metadata (i.e. the owner broker) of the topics. After fetching the metadata, the clients establish persistent TCP connections to the owner brokers. Produce: The clients talk to the owner broker of a topic partition to append the messages to a distributed log.",[324,88996,88997],{},"Consume: The clients talk to the owner broker of a topic partition to read the messages from a distributed log.",[324,88999,89000],{},"Offset: The messages produced to a topic partition are assigned with an offset. The offset in Pulsar is called MessageId. Consumers can use offsets to seek to a given position within the log to read messages.",[324,89002,89003],{},"Consumption State: Both systems maintain the consumption state for consumers within a subscription (or a consumer group in Kafka). The consumption state is stored in __offsets topic in Kafka, while the consumption state is stored as cursors in Pulsar.",[48,89005,89006],{},"As you can see, these are all the primitive operations provided by a scale-out distributed log storage such as Apache BookKeeper. The core capabilities of Pulsar are implemented on top of Apache BookKeeper. Thus it is pretty easy and straightforward to implement the Kafka concepts by using the existing components that Pulsar has developed on BookKeeper.",[48,89008,89009],{},"The following figure illustrates how we add the Kafka protocol support within Pulsar. We are introducing a new Protocol Handler which implements the Kafka wire protocol by leveraging the existing components (such as topic discovery, the distributed log library - ManagedLedger, cursors and etc) that Pulsar already has.",[48,89011,89012],{},[384,89013],{"alt":89014,"src":89015},"drawing of pulsar architecture","\u002Fimgs\u002Fblogs\u002F63a31bb7569f8c5c3fa7c4ae_pulsar-architecture.jpeg",[32,89017,89018],{"id":9857},"Topic",[48,89020,89021],{},"In Kafka, all the topics are stored in one flat namespace. But in Pulsar, topics are organized in hierarchical multi-tenant namespaces. We introduce a setting kafkaNamespace in broker configuration to allow the administrator configuring to map Kafka topics to Pulsar topics.",[48,89023,89024],{},"In order to let Kafka users leverage the multi-tenancy feature of Apache Pulsar, a Kafka user can specify a Pulsar tenant and namespace as its SASL username when it uses SASL authentication mechanism to authenticate a Kafka client.",[32,89026,89028],{"id":89027},"message-id-and-offset","Message ID and offset",[48,89030,89031],{},"In Kafka, each message is assigned with an offset once it is successfully produced to a topic partition. In Pulsar, each message is assigned with a MessageID. The message id consists of 3 components, ledger-id, entry-id, and batch-index. We are using the same approach in Pulsar-Kafka wrapper to convert a Pulsar MessageID to an offset and vice versa.",[32,89033,89035],{"id":89034},"message","Message",[48,89037,89038],{},"Both a Kafka message and a Pulsar message have key, value, timestamp, and headers (note: this is called ‘properties’ in Pulsar). We convert these fields automatically between Kafka messages and Pulsar messages.",[32,89040,89042],{"id":89041},"topic-lookup","Topic lookup",[48,89044,89045],{},"We use the same topic lookup approach for the Kafka request handler as the Pulsar request handler. The request handler does topic discovery to lookup all the ownerships for the requested topic partitions and responds with the ownership information as part of Kafka TopicMetadata back to Kafka clients.",[32,89047,89049],{"id":89048},"produce-messages","Produce Messages",[48,89051,89052],{},"When the Kafka request handler receives produced messages from a Kafka client, it converts Kafka messages to Pulsar messages by mapping the fields (i.e. key, value, timestamp and headers) one by one, and uses the ManagedLedger append API to append those converted Pulsar messages to BookKeeper. Converting Kafka messages to Pulsar messages allows existing Pulsar applications to consume messages produced by Kafka clients.",[32,89054,89056],{"id":89055},"consume-messages","Consume Messages",[48,89058,89059],{},"When the Kafka request handler receives a consumer request from a Kafka client, it opens a non-durable cursor to read the entries starting from the requested offset. The Kafka request handler converts the Pulsar messages back to Kafka messages to allow existing Kafka applications to consume the messages produced by Pulsar clients.",[32,89061,89063],{"id":89062},"group-coordinator-offsets-management","Group coordinator & offsets management",[48,89065,89066],{},"The most challenging part is to implement the group coordinator and offsets management. Because Pulsar doesn’t have a centralized group coordinator for assigning partitions to consumers of a consumer group and managing offsets for each consumer group. In Pulsar, the partition assignment is managed by broker on a per-partition basis, and the offset management is done by storing the acknowledgements in cursors by the owner broker of that partition.",[48,89068,89069],{},"It is difficult to align the Pulsar model with the Kafka model. Hence, for the sake of providing full compatibility with Kafka clients, we implemented the Kafka group coordinator by storing the coordinator group changes and offsets in a system topic calledpublic\u002Fkafka\u002F__offsets in Pulsar. This allows us to bridge the gap between Pulsar and Kafka and allows people to use existing Pulsar tools and policies to manage subscriptions and monitor Kafka consumers. We add a background thread in the implemented group coordinator to periodically sync offset updates from the system topic to Pulsar cursors. Hence a Kafka consumer group is effectively treated as a Pulsar subscription. All the existing Pulsar toolings can be used for managing Kafka consumer groups as well.",[40,89071,89073],{"id":89072},"bridge-two-popular-messaging-ecosystems","Bridge two popular messaging ecosystems",[48,89075,89076],{},"At both companies, we value customer success. We believe that providing a native Kafka protocol on Apache Pulsar will reduce the barriers for people adopting Pulsar to achieve their business success. By integrating two popular event streaming ecosystems, KoP unlocks new use cases. Customers can leverage advantages from each ecosystem and build a truly unified event streaming platform with Apache Pulsar to accelerate the development of real-time applications and services.",[48,89078,89079],{},"With KoP, a log collector can continue collecting log data from its sources and producing messages to Apache Pulsar using existing Kafka integrations. The downstream applications can use Pulsar Functions to process the events arriving in the system to do serverless event streaming.",[40,89081,75990],{"id":82059},[48,89083,89084,89085,89088,89089,89091,89092,89096],{},"KoP is open sourced under Apache License V2 in ",[55,89086,29592],{"href":29592,"rel":89087},[264],". It is available as part of StreamNative Platform. You can download the ",[55,89090,86731],{"href":44437}," to try out all the features of KoP. If you already have a Pulsar cluster running and would like to enable Kafka protocol support on it, you can follow the ",[55,89093,41409],{"href":89094,"rel":89095},"http:\u002F\u002Fstreamnative.io\u002Fdocs\u002Fkop\u002F",[264]," to install the KoP protocol handler to your existing Pulsar cluster.",[48,89098,89099,89100,4003,89103,89108,89109,89112],{},"Here is more information on KoP ",[55,89101,4926],{"href":29592,"rel":89102},[264],[55,89104,89107],{"href":89105,"rel":89106},"http:\u002F\u002Fstreamnative.io\u002Fdocs\u002Fv1.0.0\u002Fconnect\u002Fkop\u002Foverview",[264],"document",". We are looking forward to your issues, and PRs. You can also join #kop channel in ",[55,89110,57762],{"href":57760,"rel":89111},[264]," to discuss all things about Kafka-on-Pulsar.",[48,89114,89115,89116,190],{},"StreamNative and OVHcloud are also hosted a webinar on KoP. You can watch the recording ",[55,89117,267],{"href":89118,"rel":89119},"https:\u002F\u002Fwww.streamnative.io\u002Fwebinars\u002Fintroducing-kafka-on-pulsar-bring-native-kafka-protocol-support-to-apache-pulsar",[264],[48,89121,89122],{},[384,89123],{"alt":89124,"src":89125},"drawing with map and ovh streamnative logos","\u002Fimgs\u002Fblogs\u002F63a31bb89fa3322eaf2cdb7e_kop-webinar.png",[40,89127,82734],{"id":82733},[48,89129,89130],{},"The KoP project was originally initiated by StreamNative. The OVHcloud team joined the project to collaborate on the development of the KoP project. Many thanks to Pierre Zemb and Steven Le Roux from OVHcloud for their contributions to this project!",[48,89132,82189,89133,1154,89136,190],{},[55,89134,39691],{"href":33664,"rel":89135},[264],[55,89137,24379],{"href":45219},{"title":18,"searchDepth":19,"depth":19,"links":89139},[89140,89141,89142,89143,89144,89153,89154,89155],{"id":62870,"depth":19,"text":62871},{"id":88921,"depth":19,"text":88922},{"id":88943,"depth":19,"text":88944},{"id":88967,"depth":19,"text":88968},{"id":68841,"depth":19,"text":68842,"children":89145},[89146,89147,89148,89149,89150,89151,89152],{"id":9857,"depth":279,"text":89018},{"id":89027,"depth":279,"text":89028},{"id":89034,"depth":279,"text":89035},{"id":89041,"depth":279,"text":89042},{"id":89048,"depth":279,"text":89049},{"id":89055,"depth":279,"text":89056},{"id":89062,"depth":279,"text":89063},{"id":89072,"depth":19,"text":89073},{"id":82059,"depth":19,"text":75990},{"id":82733,"depth":19,"text":82734},"2020-03-24","Announcing Kafka on Pulsar: bring native Kafka protocol support to Apache Pulsar (KoP)","\u002Fimgs\u002Fblogs\u002F63d7994ca532650a911e26b0_63a31bb79fa33256fc2cdb7d_ovh-streamnative.webp",{},"\u002Fblog\u002Fkafka-on-pulsar-bring-native-kafka-protocol-support-to-apache-pulsar",{"title":88893,"description":89157},"blog\u002Fkafka-on-pulsar-bring-native-kafka-protocol-support-to-apache-pulsar",[302,799],"npzV4NAXhdmqCsPm7q8doi_tXv5xt6jNJeQIrMRBsWk",{"id":89166,"title":77005,"authors":89167,"body":89168,"category":7338,"createdAt":290,"date":89219,"description":89220,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":89221,"navigation":7,"order":296,"path":89222,"readingTime":11180,"relatedResources":290,"seo":89223,"stem":89224,"tags":89225,"__hash__":89226},"blogs\u002Fblog\u002Fapache-pulsar-2020-user-survey-report.md",[806],{"type":15,"value":89169,"toc":89217},[89170,89173,89178,89181,89184,89191,89195,89202,89207,89210],[48,89171,89172],{},"For the first time ever, the Apache Pulsar PMC team is publishing a user survey report. The 2020 Apache Pulsar User Survey Report reveals Pulsar’s accelerating rate of global adoption, details how organizations are leveraging Pulsar to build real-time streaming applications, and highlights key features on Pulsar’s product roadmap.",[48,89174,89175],{},[34077,89176],{"value":89177},"cta_blog",[48,89179,89180],{},"Pulsar adoption has largely been driven by the market’s increased demand for real-time, data-enabled technologies. While companies have tried to leverage monolithic messaging systems to build-out real-time offerings, they’ve hit major roadblocks. Ultimately, these technologies are not equipped to provide the scale or reliability that mission-critical applications require.",[48,89182,89183],{},"As a result, companies have sought-out Apache Pulsar for its cloud-native, distributed messaging and streaming platform capabilities. From asynchronous applications to core business applications to ETL, companies are increasingly leveraging Pulsar to develop real-time applications.",[48,89185,89186,89187,89190],{},"Pulsar has received global adoption from major technology companies such as Verizon Media, Narvar, Overstock, Nutanix, Yahoo! JAPAN, Tencent, OVHCloud, and Clever Cloud, who rely on its ability to deliver on performance, scalability, and resiliency. As the Pulsar project and community garner increasing attention, we’re excited to share the 2020 Apache Pulsar User Survey Report.\n",[384,89188],{"alt":18,"src":89189},"\u002Fimgs\u002Fblogs\u002F63a31a957deb3d3a1402eb8c_pulsar-adoption.png","\nIn the 2020 Apache Pulsar User Survey Report, we hear from 165 users and learn how their companies are leveraging Pulsar’s cloud-native, multi-layer design architecture, built-in multi-tenancy, and multi-cluster replication, to build scalable real-time offerings. This report details insights and use cases on how organizations are deploying Pulsar today.",[48,89192,89193],{},[34077,89194],{"value":89177},[48,89196,89197,89198,4031],{},"The report also reveals Pulsar’s top-used features, its most popular applications, and how it is delivering scalable, reliable, real-time streaming solutions for organizations. In this quotation from Qiang Fei, Tech Lead for Tencent, we see how ",[55,89199,89201],{"href":89200},"\u002Fwhitepaper\u002Fcase-studay-apache-pulsar-tencent-billing\u002F","one organization is leveraging Pulsar to improve their offering",[916,89203,89204],{},[48,89205,89206],{},"Pulsar provides us with a highly consistent and highly reliable distributed message queue that fits well in our financial use cases. Multi-tenant and storage separation architecture design greatly reduces our operational and maintenance overhead. We have used Pulsar on a very large scale in our organization and we are impressed that Pulsar is able to provide high consistency while supporting high concurrent client connections.",[48,89208,89209],{},"Qiang Fei, Tech Lead at Tencent",[48,89211,89212,89213,190],{},"From its built-in multi-tenancy, which reduces architectural complexity and enables organizations to scale, to its multi-datacenter replication, which allows Pulsar to handle datacenter failures, we see how Pulsar has evolved into a robust and differentiated messaging and streaming platform. The report also reveals some of the community-driven features on Pulsar’s product roadmap for 2020 and beyond. To find out more, ",[55,89214,89216],{"href":89215},"\u002Fwhitepaper\u002Fsn-apache-pulsar-user-survey-report-2020\u002F","download the report today",{"title":18,"searchDepth":19,"depth":19,"links":89218},[],"2020-03-17","The 2020 Apache Pulsar User Survey Report reveals Pulsar’s accelerating rate of global adoption, details how organizations are leveraging Pulsar to build real-time streaming applications, and highlights key features on Pulsar’s product roadmap.",{},"\u002Fblog\u002Fapache-pulsar-2020-user-survey-report",{"title":77005,"description":89220},"blog\u002Fapache-pulsar-2020-user-survey-report",[821,303],"mWBN4um43dgxicoWSE_ZeYUbuzPQlhxzc7IjWR9XL5g",{"id":89228,"title":89229,"authors":89230,"body":89232,"category":821,"createdAt":290,"date":89687,"description":89688,"extension":8,"featured":294,"image":89689,"isDraft":294,"link":290,"meta":89690,"navigation":7,"order":296,"path":89691,"readingTime":89692,"relatedResources":290,"seo":89693,"stem":89694,"tags":89695,"__hash__":89696},"blogs\u002Fblog\u002Fapache-pulsar-helps-tencent-process-tens-of-billions-of-financial-transactions.md","Apache Pulsar® Helps Tencent Process Tens of Billions of Financial Transactions Efficiently with Virtually No Data Loss",[89231],"Dezhi Liu",{"type":15,"value":89233,"toc":89663},[89234,89237,89240,89243,89247,89256,89265,89271,89275,89278,89281,89284,89287,89291,89294,89300,89304,89307,89310,89324,89328,89331,89334,89337,89340,89343,89346,89349,89355,89359,89362,89368,89372,89375,89378,89381,89385,89388,89391,89394,89398,89401,89404,89408,89411,89417,89420,89437,89439,89442,89453,89456,89460,89463,89466,89472,89475,89478,89482,89485,89488,89491,89494,89499,89503,89506,89517,89520,89526,89530,89533,89539,89542,89553,89557,89560,89566,89569,89583,89586,89597,89601,89604,89624,89630,89633,89644,89647,89649,89652],[40,89235,89236],{"id":45530},"Executive summary",[48,89238,89239],{},"As the largest provider of Internet products and services in China, Tencent serves billions of users and over a million merchants—and these numbers are growing fast! Tencent's enterprises generate a huge volume of financial transactions, placing a tremendous load on their billing service, which processes hundreds of millions of dollars in revenue each day.",[48,89241,89242],{},"Because Tencent had been unable to scale its current billing service to handle their rapidly growing business, the possibility of data loss had become an escalating concern. To ensure data consistency, the company decided to redesign their system's transaction processing pipeline. After evaluating the pros and cons of several messaging systems, Tencent chose to implement Apache Pulsar. As a result, Tencent can now run their billing service on a very large scale with virtually no data loss.",[40,89244,89246],{"id":89245},"customer-background","Customer background",[48,89248,89249,89250,89255],{},"Tencent Holdings Limited is a multinational conglomerate holding company based in",[55,89251,89254],{"href":89252,"rel":89253},"https:\u002F\u002Fwww.google.com\u002Fsearch?rlz=1C1CHBF_enUS813US813&sxsrf=ACYBGNQ4k93MEkBznOhAuMNdkKN2gNfgEQ:1577016259811&q=Shenzhen&stick=H4sIAAAAAAAAAOPgE-LSz9U3MDEwLivJU-IAsXOScsu0tLKTrfTzi9IT8zKrEksy8_NQOFYZqYkphaWJRSWpRcWLWDmCM1LzqoAYALKaliFPAAAA&sa=X&ved=2ahUKEwjG_eGvm8nmAhWWvp4KHcYnB-oQmxMoATAeegQIExAL",[264]," Shenzhen",", China. It has hundreds of subsidiaries located in China and elsewhere around the globe. Tencent is considered to be one of the most innovative technology companies in the world specializing in internet-related products and services such as entertainment (gaming), financial services (e-commerce, payment systems), business services, a social networking platform (WeChat), and more.",[48,89257,89258,89259,89264],{},"Tencent uses an Internet billing platform internally known as ",[55,89260,89263],{"href":89261,"rel":89262},"https:\u002F\u002Fcloud.tencent.com\u002Fproduct\u002Fmidas",[264],"Midas"," to handle the enormous volume of transactions that flow through all of its businesses. Midas integrates both domestic and international payment channels and provides various services such as account management, precision marketing, security risk control, auditing and accounting, billing analysis, and more. On a typical day, Midas processes hundreds of millions of dollars in revenue which amount to hundreds of billions of dollars per year. Midas handles more than 30 billion escrow accounts and provides comprehensive billing services for more than 180 countries (regions), 10,000+ companies, and over 1 million merchants doing business in a variety of industries (see Figure 1).",[48,89266,89267],{},[384,89268],{"alt":89269,"src":89270}," illustration of Midas environment","\u002Fimgs\u002Fblogs\u002F63a2cc38a1ed687ae658a676_image10.png",[40,89272,89274],{"id":89273},"figure-1-industries-and-business-platforms-supported-by-midas","Figure 1 Industries and business platforms supported by Midas",[32,89276,33090],{"id":89277},"challenge",[48,89279,89280],{},"Tencent's enterprises continuously generate massive transaction volumes and their numbers are steadily growing. To handle this increased activity, the company needed a robust billing platform that could be scaled as their business grows.",[48,89282,89283],{},"Because Midas supports mission-critical services like billing and payments, the most essential challenges were to ensure data consistency and prevent data loss in transactions.",[48,89285,89286],{},"In addition, it was also very important to develop a solution that could handle high throughput with minimal delays in processing.",[32,89288,89290],{"id":89289},"a-closer-look-at-midas","A closer look at Midas",[48,89292,89293],{},"Figure 2 provides a high-level overview of Midas. This diagram illustrates the technical design of the entire platform and shows how the underlying layers work together to support the merchant side, the user side, and the various payment channels.",[48,89295,89296],{},[384,89297],{"alt":89298,"src":89299},"Technical overview of the Midas platform ","\u002Fimgs\u002Fblogs\u002F63a2cc37b8ed98554a05b6a3_image1.png",[40,89301,89303],{"id":89302},"requirement","Requirement",[48,89305,89306],{},"To meet its need for a more elastic and scalable billing platform, Tencent decided to redesign Midas's transaction processing pipeline. The company believed the problem could be solved by implementing a new messaging system, but which one?",[48,89308,89309],{},"Before evaluating the various available options, Tencent defined a set of requirements. To be a viable solution, the new messaging system would need to score high in all of the following areas:",[321,89311,89312,89315,89318,89321],{},[324,89313,89314],{},"Consistency: A billing service cannot tolerate data loss. This is a basic requirement—and the most essential one.",[324,89316,89317],{},"Availability: It must have failover capability. And, it must be able to recover from a failure automatically.",[324,89319,89320],{},"Massive storage: Mobile applications generate copious amounts of transaction data, so massive storage capacity is also a must.",[324,89322,89323],{},"Low latency: A payment service that handles billions of transactions per day must be able to process them with minimal delay (typically, less than 10 milliseconds per transaction).",[40,89325,89327],{"id":89326},"evaluation-phase","Evaluation phase",[48,89329,89330],{},"With the above requirements in mind, Tencent evaluated several Apache open-source, streaming platforms for Midas—specifically, Kafka®, RocketMQ™, and Pulsar. Here's what they found.",[48,89332,89333],{},"Apache Kafka aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. It is a popular choice for log collection and processing. However, Kafka can be unreliable when it comes to data consistency and durability (data loss). Therefore, Tencent deemed it unsuitable for mission-critical financial applications like Midas.",[48,89335,89336],{},"Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity, and flexible scalability. Unfortunately, its application program interface (API) is limited in that there is no user-friendly way to delete invalid messages by topic. Moreover, RocketMQ's open-source version does not provide the needed failover capability, making it a poor choice for Midas.",[48,89338,89339],{},"Apache Pulsar is an enterprise-grade publish\u002Fsubscribe (aka pub\u002Fsub) messaging system. Pulsar provides highly available storage through its Apache Bookkeeper service. Because Pulsar uses a decoupled architecture, its storage and processing layers can be scaled independently.",[48,89341,89342],{},"Message streaming and queuing are necessary for an event-driven system and Pulsar supports both of these consumption modes. Streaming is strictly ordered (that is, exclusive to one consumer) whereas queueing is unordered (shared by many).",[48,89344,89345],{},"Another key Pulsar feature, geo replication, helps improve application response time by adjusting the distribution of data across geographically distributed data networks.",[48,89347,89348],{},"Tencent ultimately chose Apache Pulsar as a service for its native high consistency, durability, low latency, scalability, and general flexibility.",[48,89350,89351],{},[384,89352],{"alt":89353,"src":89354},"Table summarizes Tencent's comparison of Kafka, RocketMQ, and Pulsar","\u002Fimgs\u002Fblogs\u002F63a3181323d4115d2167d342_Tencent's-comparison-of-Kafka,-RocketMQ,-and-Pulsar..webp",[40,89356,89358],{"id":89357},"solution","Solution",[48,89360,89361],{},"Tencent solved their scalability problem by integrating Pulsar into a distributed transaction framework called TDXA. TDXA leverages a message queue in both online transaction processing (OLTP) and real-time data processing to ensure consistency and prevent data loss. The message queue also handles, in a highly reliable way, any failures that might occur during transaction processing. Thus, the new solution is able to manage very high throughput with minimal delays.",[48,89363,89364],{},[384,89365],{"alt":89366,"src":89367},"examples of some of Tencent’s most common online transaction processing and real-time data processing activities ","\u002Fimgs\u002Fblogs\u002F63a2cc375623a8863ed010d9_image3.png",[32,89369,89371],{"id":89370},"online-transaction-processing","Online transaction processing",[48,89373,89374],{},"In online transaction processing, the workflow associated with any given payment often involves multiple internal and external systems. This can lead to longer RPC chains (that is, communications) and more numerous failures—in particular, network timeouts (for example, when interacting with overseas payment services).",[48,89376,89377],{},"By integrating with a local transaction state, TDXA is able to recover automatically in the event of a failure. It then systematically resumes processing, thus ensuring the consistency of billions of transactions daily.",[48,89379,89380],{},"An automated teller machine (ATM) for a bank is an example of a commercial OLTP application. OLTP applications have high throughput and are insert- or update- intensive in database management. These applications are used concurrently by hundreds of users. The key goals of OLTP applications are availability, speed, concurrency, and recoverability. OLTP applications help simplify business in various ways—for example, by reducing paper trails and providing faster, more accurate forecasts for revenues and expenses.",[32,89382,89384],{"id":89383},"real-time-data-processing","Real-time data processing",[48,89386,89387],{},"To overcome the challenge of validating data consistency in Midas, Tencent implemented a reconciliation system to authenticate data. This enabled the company to shorten reconciliation time and detect problems much sooner",[48,89389,89390],{},"For mobile payments, real-time user experience is critical. For example, if a player purchases a hero in a mobile game like \"King of Glory\" and the hero is not delivered in a timely manner, it will inevitably affect the user's experience negatively and result in complaints.",[48,89392,89393],{},"With TDXA, Tencent can reconcile billing transactions in real time using a stream computing framework to process the transactions produced in the message queue.",[3933,89395,89397],{"id":89396},"other-significant-benefits-in-real-time","Other significant benefits in real-time",[48,89399,89400],{},"During peak times (for example, a King of Glory anniversary celebration event), the transaction traffic in Midas can surge to more than ten times the average rate. The Pulsar message queue can buffer waves of high traffic to reduce the demand on the core transaction system for requests such as transaction inquiries, delivery notifications, and tips notifications.",[48,89402,89403],{},"Also, with the ability to process messages in a message queue in real-time, Tencent can offer real-time data analysis and provide precise marketing services to its customers and subsidiaries. Examples of typical services include transaction and balance reconciliation, fraud detection, and real-time risk control.",[32,89405,89407],{"id":89406},"a-deeper-dive-into-tdxa","A deeper dive into TDXA",[48,89409,89410],{},"TDXA is a distributed transaction framework designed to solve the data consistency and durability problems associated with processing huge transaction volumes in the application layer. Figure 4 provides a technical diagram of Midas.",[48,89412,89413],{},[384,89414],{"alt":89415,"src":89416}," Figure of technical diagram of Midas","\u002Fimgs\u002Fblogs\u002F63a2cc37c98edb2fe5b202e6_image2.png",[48,89418,89419],{},"The TDF network manages the flow of traffic through the billing transaction system. These are the main components of the TDF network:",[321,89421,89422,89425,89428,89431,89434],{},[324,89423,89424],{},"Distributed transaction manager(TM): The distributed transaction manager serves as the control center for TDXA. It uses a decentralized approach that will allow Tencent to scale the system as their business grows, offers necessary services, and ensures that systems are running and available 99.999% of the time. TM supports both the REST API-based Try-Confirm\u002FCancel (TCC) approach and hybrid DB transactions.",[324,89426,89427],{},"With TDF (which is an asynchronous coroutine framework and asynchronous transaction processing in TDSQL), TM is able to support the entire company's billing business in a highly efficient manner.",[324,89429,89430],{},"Configuration manager (CM): TDXA's configuration manager provides a flexible mechanism for registering, managing, and updating transaction processing flow at runtime. CM automatically checks the accuracy and completeness of the transaction flow. It also displays the transaction flow in a GUI console where users have the ability to manage it.",[324,89432,89433],{},"Distributed transactional database (TDSQL): A distributed transactional database which features high consistency, high availability, global deployment architecture, distributed horizontal scalability, high performance, enterprise-grade security support, and more. TDSQL provides a comprehensive distributed database solution.",[324,89435,89436],{},"Message queue (MQ): A highly consistent and available message queue that enables TDXA to handle various failure scenarios during transaction processing. A robust message queue plays a vital role in processing transactions for Midas.",[40,89438,68842],{"id":68841},[48,89440,89441],{},"In the process of adopting Pulsar, Tencent needed to make certain changes to Pulsar in order to meet their own unique requirements. In general, these changes provided support for the following:",[321,89443,89444,89447,89450],{},[324,89445,89446],{},"Delayed messaging and delayed retries (supported in v2.4.0)",[324,89448,89449],{},"An improved management console",[324,89451,89452],{},"An improved monitoring and alert system",[48,89454,89455],{},"Each of these system enhancements is described in greater detail below.",[40,89457,89459],{"id":89458},"delayed-message-delivery","Delayed message delivery",[48,89461,89462],{},"Delayed message delivery is a common requirement in a billing service. This feature is used for handling timeouts in transaction processing. In the event of a service failure or timeout, it makes little sense to retry a transaction many times within a short period of time because it is likely to fail again. Instead, it is better to retry by leveraging Pulsar's delayed message delivery feature.",[48,89464,89465],{},"Delayed message delivery can be implemented in two different ways. One is by organizing messages by different topics based on the time delay interval (see Figure 5). Pulsar's internal broker checks those delay topics periodically and delivers the delayed messages accordingly.",[48,89467,89468],{},[384,89469],{"alt":89470,"src":89471},"Figure of a messages organized by different topics based on the time delay interval ","\u002Fimgs\u002Fblogs\u002F63a2cc375623a84f39d010e7_image4.png",[48,89473,89474],{},"The above approach satisfies most requirements, except when you want to specify an arbitrary time delay. An arbitrary time delay can be implemented using a time wheel, which can support a finer level of granularity. However, for this approach, the system needs to maintain an index for the time wheel, thus rendering this method unsuitable when there is a large volume of delayed messages.",[48,89476,89477],{},"While keeping Pulsar's internal storage unchanged, Tencent implemented both of the above approaches to support bargaining activities in the King of Glory game.",[32,89479,89481],{"id":89480},"secondary-tag","Secondary tag",[48,89483,89484],{},"To ensure security across the tens of thousands of businesses it supports, Midas must synchronize transaction flow for each business.",[48,89486,89487],{},"Suppose you were to create a unique topic for each business. You would need to create tens of thousands of topics. This would greatly increase the burden of topic management. For example, if a consumer needed to consume messages from all the businesses involved in a given transaction flow, Midas would have to maintain tens of thousands of subscriptions.",[48,89489,89490],{},"To solve this problem, Tencent introduced an attribute called \"Tag\" to the metadata associated with a Pulsar message. Users can set multiple tags while producing a Pulsar message queue. When messages are consumed, the broker filters out the desired tags.",[48,89492,89493],{},"The example below illustrates how the tags \"King of Glory,\" \"Wechat Payment,\" and \"Successful Payment\" could be used in a payment message. Here, the tags indicate where the transaction originated from (King of Glory game vs. Wechat Payment) and what the status of the transaction is (success vs. failure).",[48,89495,89496],{},[384,89497],{"alt":758,"src":89498},"\u002Fimgs\u002Fblogs\u002F63a2cc38c88838ec7d61fb3c_image5.png",[32,89500,89502],{"id":89501},"management-console","Management console",[48,89504,89505],{},"You need to have a robust management console if you plan to use message queues on a large scale. Tencent needed the Midas management console to be able to handle the following requests from its users.",[321,89507,89508,89511,89514],{},[324,89509,89510],{},"What is the content of this message?",[324,89512,89513],{},"Who produced this message?",[324,89515,89516],{},"Will this message be consumed? If so, by whom?",[48,89518,89519],{},"To service these types of requests, Tencent added life-cycle-related information to Pulsar's message metadata. Doing so enabled Midas to track messages throughout their entire life cycle (from production to consumption). The numbered arrows in Figure 7 show the various stages in the life cycle of a message.",[48,89521,89522],{},[384,89523],{"alt":89524,"src":89525}," illustration of The life cycle of a message","\u002Fimgs\u002Fblogs\u002F63a2cc38c7e2b101da1cf7f9_image6.png",[32,89527,89529],{"id":89528},"monitor-and-alert","Monitor and alert",[48,89531,89532],{},"Figure 8 shows how Tencent uses Pulsar to monitor and alert on various metrics. Monitoring is accomplished using a series of user-defined alert rules. The metrics are collected and stored in Midas's Eagle-Eye monitoring platform.",[48,89534,89535],{},[384,89536],{"alt":89537,"src":89538}," illiustration of how Midas uses Pulsar to alert on various metrics","\u002Fimgs\u002Fblogs\u002F63a2cc382c0c6bb8d61a4d9a_image7.png",[48,89540,89541],{},"Tencent monitors and alerts on the following key metrics:",[321,89543,89544,89547,89550],{},[324,89545,89546],{},"Backlog: If a massive amount of information accumulates for online services, it means that consumption has become a bottleneck. When this happens, the system provides a timely alert so the appropriate personnel can deal with the problem.",[324,89548,89549],{},"Delay: The system should be able to search a purchase record within one second. By matching the production flow and consumption flow collected by the monitoring component, Tencent can calculate the end-to-end latency of each message.",[324,89551,89552],{},"Failure: The Midas Eagle-Eye platform maintains statistics of errors in the pipeline, monitoring and alerting from various dimensions such as business, IP, and others.",[32,89554,89556],{"id":89555},"tencents-new-midas-implementation-with-pulsar","Tencent's new Midas implementation with Pulsar",[48,89558,89559],{},"After making the enhancements described above, Tencent deployed Apache Pulsar with the architecture shown in Figure 9.",[48,89561,89562],{},[384,89563],{"alt":89564,"src":89565},"Illustration of the new architecture of midas with Pulsar ","\u002Fimgs\u002Fblogs\u002F63a2cc38d1dcd680a97ae68f_image8.png",[48,89567,89568],{},"Pulsar has greatly enhanced Midas by providing the following components and capabilities:",[321,89570,89571,89574,89577,89580],{},[324,89572,89573],{},"Broker, the message queue proxy layer, is responsible for message production and consumption requests. Broker supports horizontal scalability and rebalances partitions automatically by topic based on the throughput.",[324,89575,89576],{},"BookKeeper serves as the distributed storage for message queues. You can configure multiple replicas of messages in BookKeeper. BookKeeper is enabled with automatic failover capability under exceptional circumstances—for example, when a storage node has broken disks.",[324,89578,89579],{},"ZooKeeper serves as the metadata and cluster configuration for message queues.",[324,89581,89582],{},"Some Midas businesses are written in JavaScript while others are in PHP. The HTTP proxy provides a unified access endpoint and failure retry capability for clients that use other languages. When the production cluster fails, the proxy will downgrade processing and route messages to other clusters for disaster recovery.",[48,89584,89585],{},"In addition, Pulsar lets Tencent designate how each subscription is to consume messages. A subscription is a consumer group associated with a topic. Three types of subscriptions can be used in streaming:",[321,89587,89588,89591,89594],{},[324,89589,89590],{},"A shared subscription allows you to scale consumption beyond the number of partitions.",[324,89592,89593],{},"A failover subscription works well for stream processing in transaction cleanup workflow.",[324,89595,89596],{},"An exclusive subscription is used when only one consumer in a subscription is allowed to consume a topic partition at any given time.",[40,89598,89600],{"id":89599},"result","Result",[48,89602,89603],{},"After successfully adopting Pulsar, Tencent can now run their billing and transaction framework on a very large scale. With Pulsar's help, Midas now efficiently supports the following:",[321,89605,89606,89609,89612,89615,89618,89621],{},[324,89607,89608],{},"More than 80 payment channels with various characteristics",[324,89610,89611],{},"More than 300 different business processing units",[324,89613,89614],{},"Up to 8 clusters",[324,89616,89617],{},"More than 600 topics",[324,89619,89620],{},"Throughput rates of 50w+ queries per second",[324,89622,89623],{},"Data consumption rates averaging 10T+ per day",[48,89625,89626],{},[384,89627],{"alt":89628,"src":89629},"illustration of the combined power of Midas and Pulsar ","\u002Fimgs\u002Fblogs\u002F63a2cc38f41b5147a2370638_image9.png",[48,89631,89632],{},"As a result of implementing Pulsar, Tencent can now:",[321,89634,89635,89638,89641],{},[324,89636,89637],{},"Handle tens of billions of transactions during peak time.",[324,89639,89640],{},"Guarantee data consistency in processing transactions.",[324,89642,89643],{},"Provide 99.999% availability for the services it supports.",[48,89645,89646],{},"In summary, Pulsar's high consistency, availability, stability, and flexible framework have solved Tencent's biggest transaction processing challenges. By redesigning their transaction processing pipeline, Tencent can now scale Midas to handle the increased billing volume demands associated with their growing business.",[40,89648,39828],{"id":39827},[48,89650,89651],{},"Apache Pulsar is a young open-source project with attractive features. The Apache Pulsar community is growing rapidly with new adoptions in a variety of industries. We look forward to further collaborations with the Apache Pulsar community. We like to share advances with the greater community, and work with other users on making continuous improvements to Pulsar.",[321,89653,89654,89657,89660],{},[324,89655,89656],{},"Tencent is a trademark of Tencent Holdings Limited.",[324,89658,89659],{},"Apache and Kafka are registered trademarks of The Apache Software Foundation.",[324,89661,89662],{},"Pulsar and RocketMQ are trademarks of The Apache Software Foundation.",{"title":18,"searchDepth":19,"depth":19,"links":89664},[89665,89666,89667,89671,89672,89673,89678,89679,89685,89686],{"id":45530,"depth":19,"text":89236},{"id":89245,"depth":19,"text":89246},{"id":89273,"depth":19,"text":89274,"children":89668},[89669,89670],{"id":89277,"depth":279,"text":33090},{"id":89289,"depth":279,"text":89290},{"id":89302,"depth":19,"text":89303},{"id":89326,"depth":19,"text":89327},{"id":89357,"depth":19,"text":89358,"children":89674},[89675,89676,89677],{"id":89370,"depth":279,"text":89371},{"id":89383,"depth":279,"text":89384},{"id":89406,"depth":279,"text":89407},{"id":68841,"depth":19,"text":68842},{"id":89458,"depth":19,"text":89459,"children":89680},[89681,89682,89683,89684],{"id":89480,"depth":279,"text":89481},{"id":89501,"depth":279,"text":89502},{"id":89528,"depth":279,"text":89529},{"id":89555,"depth":279,"text":89556},{"id":89599,"depth":19,"text":89600},{"id":39827,"depth":19,"text":39828},"2020-02-18","An inside look at why and how Tencent uses Apache Pulsar messaging to power its billing platform for processing tens of billions of transactions every day.","\u002Fimgs\u002Fblogs\u002F63d79974d2a5679026c47152_63a317e28f20527970e09257_tencent-top-background-1.webp",{},"\u002Fblog\u002Fapache-pulsar-helps-tencent-process-tens-of-billions-of-financial-transactions","14 min",{"title":89229,"description":89688},"blog\u002Fapache-pulsar-helps-tencent-process-tens-of-billions-of-financial-transactions",[35559,821,9144],"ucXWExXA7vuZKDzSYsCKR_RjgrSswFyupDEMcgt3tF4",{"id":89698,"title":89699,"authors":89700,"body":89701,"category":821,"createdAt":290,"date":89869,"description":89708,"extension":8,"featured":294,"image":89870,"isDraft":294,"link":290,"meta":89871,"navigation":7,"order":296,"path":89872,"readingTime":11508,"relatedResources":290,"seo":89873,"stem":89874,"tags":89875,"__hash__":89876},"blogs\u002Fblog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-1.md","How to Build a Distributed Database with Apache BookKeeper - Part 1",[87358],{"type":15,"value":89702,"toc":89863},[89703,89706,89709,89716,89720,89723,89743,89746,89749,89763,89767,89770,89773,89776,89782,89785,89788,89791,89794,89797,89800,89802,89805,89808,89814,89817,89820,89823,89827,89830,89833,89836,89842,89845,89848,89851,89854,89857,89860],[48,89704,89705],{},"In this series of posts, I want to share some basic architectural concepts about possible anatomy of a distributed database with a shared-nothing architecture.",[48,89707,89708],{},"You can see how to leverage Apache BookKeeper features to resolve most of the challenges that come into play in a distributed system.",[48,89710,89711,89712,89715],{},"In the end, you can learn how this architecture has been adopted in ",[55,89713,87375],{"href":87373,"rel":89714},[264],", a distributed embeddable SQL database written in Java.",[40,89717,89719],{"id":89718},"system-overview","System overview",[48,89721,89722],{},"Let's start from a high-level view of what we want to build and what properties are we requiring.",[321,89724,89725,89728,89731,89734,89737,89740],{},[324,89726,89727],{},"We want a database, a storage that holds data durably and it is accessible from remote clients",[324,89729,89730],{},"We are storing data on a cluster of machines",[324,89732,89733],{},"Our machines do not share disks or use shared mounts, only network connections (LAN or WAN)",[324,89735,89736],{},"Machines are expected to fail, disks can be lost at any time, but we want the service to be available to clients until a part of them is up and running",[324,89738,89739],{},"We want to be able to add and remove machines without service interruption",[324,89741,89742],{},"We want to have complete control over consistency",[48,89744,89745],{},"This list sounds pretty generic and there are several ways of designing systems with such capabilities.",[48,89747,89748],{},"In order to make it more concrete, let's create a concrete scenario:",[321,89750,89751,89754,89757,89760],{},[324,89752,89753],{},"We have a SQL database with one table",[324,89755,89756],{},"Data is replicated over N machines",[324,89758,89759],{},"No shared disks, only network connections among the servers and between servers and clients",[324,89761,89762],{},"We adopt the architectural pattern of a replicated state machine",[40,89764,89766],{"id":89765},"write-ahead-logging","Write-ahead logging",[48,89768,89769],{},"In order to support the ACID (Atomicity, Consistency, Isolation, Durability) semantics, databases use write-ahead logging.",[48,89771,89772],{},"Let's assume that a database stores a copy of a table in local volatile memory (RAM). When a client requests a write (like an UPDATE operation), the database 'logs' the action to persistent storage and writes a new value for the record to the log (WAL).",[48,89774,89775],{},"When the storage acknowledges the write (fsync), the change is applied to the in-memory copy of the table.Then the storage acknowledges the result to the client. As soon as we update the in-memory copy, other clients are able to read the new value.",[48,89777,89778],{},[384,89779],{"alt":89780,"src":89781},"wall stream and table contents write ahead logging","\u002Fimgs\u002Fblogs\u002F63a2e28e1930fe7e8c975bfc_buildab-0.png",[48,89783,89784],{},"Write-ahead log stream and table contents",[48,89786,89787],{},"In the example above, the table starts empty, then we have a first write operation, INSERT (record1), that happens at LSN (log sequence number) 1. Our table now contains record1. Then we log at LSN2 another modification, INSERT (record2), now the table contains record1 and record2, then we log a DELETE (record1) at LSN3, and the table holds only record2.",[48,89789,89790],{},"When the server restarts, it performs a recovery operation, reading all of the logs and reconstructing the contents of the table, so we end up in having only record2 in the table.",[48,89792,89793],{},"If a value is on the log, we are sure that it won't be lost and any client that reads such value before any restart event are able to read again the same value.",[48,89795,89796],{},"This could not happen if we had applied the change in memory before writing to the log.",[48,89798,89799],{},"Please note that only operations that alter the contents of the table are written to the write-ahead log: we aren't logging reads.",[40,89801,87716],{"id":87715},[48,89803,89804],{},"You can always reconstruct the contents of the table from the log, but you cannot store an infinite log, so it comes the time for the log to be truncated. In order to release space, this operation is usually called a checkpoint.",[48,89806,89807],{},"When your database performs a checkpoint, it flushes on durable storage the contents of the table at a given LSN.",[48,89809,89810],{},[384,89811],{"alt":89812,"src":89813},"wall stream and table contents checkpoints","\u002Fimgs\u002Fblogs\u002F63a2e28ef7b78a3741a43085_buildab-2.png",[48,89815,89816],{},"A checkpoint happens at LSN3",[48,89818,89819],{},"Now that we have persisted durably the table at LSN3, we can save resources and drop the part of the log from LSN1 to LSN3. Therefore when the server performs recovery, it has only to replay LSN4, and this in turn allows a faster start up sequence.",[48,89821,89822],{},"Where are you storing the contents of the table during the checkpoint ? You can have several strategies, for instance, you can store the contents on some local disk (remember to fsync). But if the contents of the table are really small in respect to the number of writes to the WAL (so you have many changes on the same little set of records), you can think about dumping the contents of the table to the WAL itself.",[40,89824,89826],{"id":89825},"replicated-state-machines","Replicated state machines",[48,89828,89829],{},"A replicated state machine is an entity, in this case the table, that is at a given state (the contents of the table) at given time (log sequence number) and the sequence of changes to the state is the same over a set of interconnected machines, so eventually each machine holds the same state.",[48,89831,89832],{},"Whenever you change one record on a machine, you must apply the same change on every other copy.",[48,89834,89835],{},"We need a total order of the changes to the state of the machine, and our write-ahead log is perfect for this purpose.",[48,89837,89838],{},[384,89839],{"alt":89840,"src":89841},"illustration of Replicated state machines","\u002Fimgs\u002Fblogs\u002F63a2e28e5623a86552e29ab3_buildab-3.png",[48,89843,89844],{},"Each node has a copy of the table and the WAL is shared",[48,89846,89847],{},"In our architecture, only one node is able to change the state of the system, that is to alter the contents of the table: let's call this one the leader.",[48,89849,89850],{},"Every node has a copy of the entire table in memory. When a write occurs, it is written to the WAL and then it is made visible to clients for reads.",[48,89852,89853],{},"Other non leader nodes, the followers, tail the log, it continuously reads the changes from the log, exactly in the same order as they are issued by the leader.",[48,89855,89856],{},"Followers apply every change to their own local copy, this way they will see the same history for the table.",[48,89858,89859],{},"It is also important that every change is applied by the follower only after the same change has been acknowledged by the WAL, otherwise, followers would be in the future in respect to the leader.",[48,89861,89862],{},"Apache BookKeeper is the write-ahead log we need: it is durable and distributed. It doesn't need shared disks or remote storages, guarantee a total order of the items, and support fencing. In the next posts, I will show you how Apache Bookkeeper guarantees fulfills our needs.",{"title":18,"searchDepth":19,"depth":19,"links":89864},[89865,89866,89867,89868],{"id":89718,"depth":19,"text":89719},{"id":89765,"depth":19,"text":89766},{"id":87715,"depth":19,"text":87716},{"id":89825,"depth":19,"text":89826},"2020-02-04","\u002Fimgs\u002Fblogs\u002F63d7998e4d8472734cc95d9c_63a2e28e570e54635c7f34e5_bulidab-1.webp",{},"\u002Fblog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-1",{"title":89699,"description":89708},"blog\u002Fhow-to-build-distributed-database-apache-bookkeeper-part-1",[38442,12106],"iA9BJQax8AXyIl4r12mcDfZM95PwRzBmPM7XMrXDnls",{"id":89878,"title":89879,"authors":89880,"body":89881,"category":7338,"createdAt":290,"date":89975,"description":89976,"extension":8,"featured":294,"image":89977,"isDraft":294,"link":290,"meta":89978,"navigation":7,"order":296,"path":89979,"readingTime":20144,"relatedResources":290,"seo":89980,"stem":89981,"tags":89982,"__hash__":89983},"blogs\u002Fblog\u002Fpulsar-summit-san-francisco-2020-cfp-is-now-open.md","Pulsar Summit San Francisco 2020 CFP is now open",[78659],{"type":15,"value":89882,"toc":89968},[89883,89886,89889,89891,89894,89902,89904,89918,89920,89922,89940,89944,89947,89949,89953,89958,89961],[48,89884,89885],{},"Pulsar Summit is an annual conference dedicated to Apache Pulsar community, bringing together an international audience of CTOs\u002FCIOs, developers, data architects, data scientists, Apache Pulsar committers\u002Fcontributors, and the messaging and streaming community, to share experiences, exchange ideas and knowledge about Pulsar and its growing community, and receive hands-on training sessions led by Pulsar experts.",[48,89887,89888],{},"We are excited to announce that the first Pulsar Summit will be held in San Francisco in April, 2020. Talk submissions, pre-registration, and sponsorship opportunities are now open for the conference!",[40,89890,83002],{"id":83001},[48,89892,89893],{},"Presentations and lightning talks are accepted for speaking proposals. Suggested topics cover Pulsar use cases, operations, technology deep dive, and ecosystem. Submissions are open until January 31, 2020.",[48,89895,89896,89897,190],{},"If you are unsure about your proposal, or want some feedback or advice in general, we are happy to help out! Further details are available on the ",[55,89898,89901],{"href":89899,"rel":89900},"https:\u002F\u002Fpulsar-summit.org\u002Fcall-for-presentations\u002F",[264],"Pulsar Summit Website",[40,89903,56358],{"id":56357},[321,89905,89906,89909,89912,89915],{},[324,89907,89908],{},"CFP opens: December 15, 2019",[324,89910,89911],{},"CFP closes: January 31, 2020 - 23:59 PST",[324,89913,89914],{},"CFP notification: February 21, 2020",[324,89916,89917],{},"Schedule announcement: February 24, 2020",[40,89919,83057],{"id":83056},[48,89921,83060],{},[321,89923,89924,89927,89930,89933,89935,89938],{},[324,89925,89926],{},"Full conference pass.",[324,89928,89929],{},"Exclusive swag only available to speakers.",[324,89931,89932],{},"Expand your network and raise your profile in the Pulsar community.",[324,89934,77414],{},[324,89936,89937],{},"Your name, title, company, and bio will be featured on the Pulsar Summit San Francisco 2020 website.",[324,89939,69684],{},[40,89941,89943],{"id":89942},"pre-registration","Pre-registration",[48,89945,89946],{},"If you are interested in attending Pulsar Summit San Francisco 2020, we’d like to hear from you. Your ideas are very important to us, and we will prepare the content accordingly.",[40,89948,56379],{"id":56378},[48,89950,83102,89951,38617],{},[55,89952,38404],{"href":77457},[48,89954,83107,89955,83111],{},[55,89956,39823],{"href":39821,"rel":89957},[264],[48,89959,89960],{},"Hope to see you at Pulsar Summit San Francisco 2020!",[48,89962,89963,89964,190],{},"This post was originally published by Jennifer Huang on ",[55,89965,84106],{"href":89966,"rel":89967},"http:\u002F\u002Fpulsar.apache.org\u002Fblog\u002F2019\u002F12\u002F18\u002FPulsar-summit-cfp\u002F",[264],{"title":18,"searchDepth":19,"depth":19,"links":89969},[89970,89971,89972,89973,89974],{"id":83001,"depth":19,"text":83002},{"id":56357,"depth":19,"text":56358},{"id":83056,"depth":19,"text":83057},{"id":89942,"depth":19,"text":89943},{"id":56378,"depth":19,"text":56379},"2019-12-26","Pulsar Summit San Francisco 2020 CFP is now open!","\u002Fimgs\u002Fblogs\u002F63d799a74d8472cb4cc95da3_63a2e1d05164011ebdc6ed17_pulsar-summit-sf-2020.webp",{},"\u002Fblog\u002Fpulsar-summit-san-francisco-2020-cfp-is-now-open",{"title":89879,"description":89976},"blog\u002Fpulsar-summit-san-francisco-2020-cfp-is-now-open",[5376,821],"uv8ZBaC56hjFQrJ2gesGyAjlF4gMCwBdWzA6DT7IRms",{"id":89985,"title":89986,"authors":89987,"body":89988,"category":7338,"createdAt":290,"date":90040,"description":90041,"extension":8,"featured":294,"image":90042,"isDraft":294,"link":290,"meta":90043,"navigation":7,"order":296,"path":90044,"readingTime":20144,"relatedResources":290,"seo":90045,"stem":90046,"tags":90047,"__hash__":90048},"blogs\u002Fblog\u002Fpulsar-milestone-celebration-200-contributors.md","Pulsar Milestone Celebration — 200 Contributors!",[61300],{"type":15,"value":89989,"toc":90038},[89990,89993,89996,89999,90002,90008,90011,90014,90020,90023,90036],[48,89991,89992],{},"Dear Apache Pulsar enthusiast,",[48,89994,89995],{},"As we know, when assessing the health of an open-source community, it is tempting to focus on various quantitative metrics, for example, activity, size (contributors), demographics, diversity, and so on, among which the number of contributors is a key metric for measuring the health and popularity of a project and a way to inform the trends.",[48,89997,89998],{},"And today, we are very proud to see that Apache Pulsar has attracted its 200th contributor! It is an important milestone for our community growth.",[48,90000,90001],{},"Over the years, there’s been an upward trend that more organizations embracing real-time data and stream processing, and Pulsar is the key component of that shift. As an open-source distributed pub-sub messaging system originally created at Yahoo! and graduated as a Top-Level Project (TLP) in September 2018, Pulsar has launched 79 releases, attracted 4100+ commits from 200 contributors, and received 4.6k+ stars, 1.2k+ forks, and 1.3k+ Slack users up to now.",[48,90003,90004],{},[384,90005],{"alt":90006,"src":90007},"apache pulsar interface with number of contributors","\u002Fimgs\u002Fblogs\u002F63a2e1172c0c6b91982b4aff_p-200-1.png",[48,90009,90010],{},"This achievement is worth celebrating, and at the same time, we would like to express sincere gratitude to you for making what Pulsar is today and shape what Pulsar will be tomorrow.",[48,90012,90013],{},"Pulsar aims to empower the next generation of event streaming systems by delivering a unified solution that connects, stores and processes real-time event streams. Going forward, we will be continuously dedicated to making Pulsar as a highly flexible, scalable and reliable product and creating a welcoming and sustainable community where Pulsar and you can thrive together.",[48,90015,90016],{},[384,90017],{"alt":90018,"src":90019},"illustration with rocket and peoples","\u002Fimgs\u002Fblogs\u002F63a2e1178970e680a319a01a_p-200-cooperation-1.png",[48,90021,90022],{},"P.S. want to be a Pulsar contributor?",[48,90024,90025,90026,4003,90031,90035],{},"Get started today by ",[55,90027,90030],{"href":90028,"rel":90029},"http:\u002F\u002Fpulsar.apache.org\u002Fen\u002Fcontributing\u002F",[264],"reading contribution guidelines",[55,90032,90034],{"href":36230,"rel":90033},[264],"submitting a PR",", any contribution on codes, docs or other is highly appreciated. Thank you.",[48,90037,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":90039},[],"2019-12-20","We are very proud to see that Apache Pulsar has attracted its 200th contributor! We would like to express sincere gratitude to all contributors for making what Pulsar is today and shape what Pulsar will be tomorrow.","\u002Fimgs\u002Fblogs\u002F63d799c5cdd0792344bdf181_63a2e1173f201112232d8dc8_p-200-head.webp",{},"\u002Fblog\u002Fpulsar-milestone-celebration-200-contributors",{"title":89986,"description":90041},"blog\u002Fpulsar-milestone-celebration-200-contributors",[302,821,26747],"0nxkMGVZF_OARJKxFQjwabUbaoNbjq5DuunX0lM1m3A",{"id":90050,"title":90051,"authors":90052,"body":90053,"category":821,"createdAt":290,"date":90269,"description":90270,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":90271,"navigation":7,"order":296,"path":90272,"readingTime":11508,"relatedResources":290,"seo":90273,"stem":90274,"tags":90275,"__hash__":90276},"blogs\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-4-2.md","What's New in Apache Pulsar 2.4.2",[53434],{"type":15,"value":90054,"toc":90253},[90055,90058,90065,90068,90072,90075,90078,90086,90090,90093,90097,90100,90104,90107,90115,90119,90122,90126,90129,90133,90136,90140,90143,90149,90153,90156,90167,90171,90174,90178,90184,90188,90191,90195,90198,90206,90209,90211,90217,90219,90240,90245,90251],[48,90056,90057],{},"We are very glad to see the Apache Pulsar community has successfully released 2.4.2 version. Thank the great efforts from Apache Pulsar community with over 110 commits, covering improvements and bug fixes.",[48,90059,90060,90061,190],{},"For detailed changes related to 2.4.2 release, refer to ",[55,90062,23976],{"href":90063,"rel":90064},"https:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#2.4.2",[264],[48,90066,90067],{},"I will highlight some improvements and bug fixes in this blog.",[40,90069,90071],{"id":90070},"use-classloaders-to-load-java-functions","Use classLoaders to load Java functions",[48,90073,90074],{},"In Pulsar 2.4.2, windowed functions can work well whether Java Functions instances use shaded JAR or classLoaders, and functionClassLoader is set correctly when the --output-serde-classname option is enabled.",[48,90076,90077],{},"Before Pulsar 2.4.2, Java Functions instances are started with a shaded JAR, and different classLoaders are used to load the internal Pulsar code, user code, and the interfaces that the two interacts with each other. This change results in two issues:",[321,90079,90080,90083],{},[324,90081,90082],{},"The windowed functions do not work well if Java Functions instances use classLoaders.",[324,90084,90085],{},"When using the --output-serde-classname option, functionClassLoader is not set correctly.",[40,90087,90089],{"id":90088},"start-broker-with-functions-worker","Start Broker with Functions worker",[48,90091,90092],{},"In Pulsar 2.4.2, we can start Broker with Functions worker when broker client is enabled with TLS. Before Pulsar 2.4.2, when we run Functions worker with the broker, it checks whether TLS is enabled in the function_worker.yml file. If TLS is enabled, it uses TLS port. However, when TLS is enabled on Functions worker, it checks the broker.conf. Since Functions worker runs with the broker, it makes sense to check the broker.conf as the single source of truth about whether or not to use TLS.",[40,90094,90096],{"id":90095},"add-error-code-and-error-message-when-a-key-does-not-exist","Add error code and error message when a key does not exist",[48,90098,90099],{},"In Pulsar Functions, BookKeeper is supported to store the state of Functions. When users attempt to fetch a key that does not exist from function state, an NPE(NullPointerException) error occurs. In Pulsar 2.4.2, we add error code and error message for the case when a key does not exist.",[40,90101,90103],{"id":90102},"deduplication","Deduplication",[48,90105,90106],{},"Deduplication removes messages based on the the largest sequence ID that pre-persisted. If an error is persisted to BookKeeper, a retry attempt is “deduplicated” with no message ever getting persisted. In version 2.4.2, we fix the issue from the following two aspects:",[321,90108,90109,90112],{},[324,90110,90111],{},"Double check the pending messages and return error to the producer when the duplication status is uncertain. For example, when a message is still pending.",[324,90113,90114],{},"Sync back the lastPushed map with the lastStored map after failures.",[40,90116,90118],{"id":90117},"consume-data-from-the-earliest-location","Consume data from the earliest location",[48,90120,90121],{},"In Pulsar 2.4.2, we add --subs-position for Pulsar Sinks, so users can consume data from the latest and earliest locations. Before 2.4.2 release, data in topics is consumed from the latest location in Pulsar Sinks by default, and users can not consume the earliest data in sink topic.",[40,90123,90125],{"id":90124},"close-previous-dispatcher-when-the-subscription-type-changes","Close previous dispatcher when the subscription type changes",[48,90127,90128],{},"In Pulsar 2.4.2, when the type of a subscription changes, a new dispatcher is created, and the old dispatcher is closed, thus avoiding memory leaks. Before 2.4.2, when the subscription type of a topic changes, a new dispatcher is created and the old one is discarded, yet not closed, which causes memory leaks. If the cursor is not durable, the subscription is closed and removed from the topic when all consumers are removed. The dispatcher should be closed at this time. Otherwise, RateLimiter instances are not garbage collected, which results in a memory leak.",[40,90130,90132],{"id":90131},"select-an-active-consumer-based-on-the-subscription-order","Select an active consumer based on the subscription order",[48,90134,90135],{},"In Pulsar 2.4.2, the active consumer is selected based on the subscription order. The first consumer in the consumer list is selected as an active consumer without sorting. Before 2.4.2, the active consumer is selected based on the priority level and consumer name. In this case, the active consumer joins and leaves, and no consumer is actually elected as \"active\" or consumes messages.",[40,90137,90139],{"id":90138},"remove-failed-stale-producer-from-the-connection","Remove failed stale producer from the connection",[48,90141,90142],{},"In Pulsar 2.4.2, failed producer is removed correctly from the connection. Before Pulsar 2.4.2, broker cannot clean up the old failed producer correctly from the connection. When broker tries to clean up producer-future in the failed producer, it removes the newly created producer-future rather than the old failed producer, and the following error occurs in broker.",[8325,90144,90147],{"className":90145,"code":90146,"language":8330},[8328],"\n17:22:00.700 [pulsar-io-21-26] WARN  org.apache.pulsar.broker.service.ServerCnx - [\u002F1.1.1.1:1111][453] Producer with id persistent:\u002F\u002Fprop\u002Fcluster\u002Fns\u002Ftopic is already present on the connection  \n\n",[4926,90148,90146],{"__ignoreMap":18},[40,90150,90152],{"id":90151},"add-new-apis-for-schema","Add new APIs for schema",[48,90154,90155],{},"In Pulsar 2.4.2, we add the following APIs for schema:",[321,90157,90158,90161,90164],{},[324,90159,90160],{},"getAllVersions: return the list of schema versions for a given topic.",[324,90162,90163],{},"testCompatibility: be able to test the compatibility for a schema without registering it.",[324,90165,90166],{},"getVersionBySchema: provide a schema definition and provide the schema version for it.",[40,90168,90170],{"id":90169},"expose-getlastmessageid-method-in-consumerimpl","Expose getLastMessageId() method in consumerImpl",[48,90172,90173],{},"In Pulsar 2.4.2, we expose getLastMessageId() method in consumerImpl. It benefits users when they want to know the lag messages, or only consume messages before the current time.",[40,90175,90177],{"id":90176},"add-new-send-interface-in-cgo","Add new send() interface in C++\u002FGo",[48,90179,90180,90181,90183],{},"In Pulsar 2.4.2, we add new send() interface in C++\u002FGo, so the MessageID will be returned to users. The logic is consistent with that in Java. In Java client, the MessageId send(byte",[2628,90182],{}," message) returns MessageId for users.",[40,90185,90187],{"id":90186},"consumer-background-tasks-are-cancelled-after-subscription-failures","Consumer background tasks are cancelled after subscription failures",[48,90189,90190],{},"In Pulsar 2.4.2, we ensure that consumer background tasks are cancelled after subscription failures. Before 2.4.2, some background consumer tasks are started in the ConsumerImpl constructor though these tasks are not cancelled if the consumer creation fails, leaving active references to these objects.",[40,90192,90194],{"id":90193},"delete-topics-attached-with-regex-consumers","Delete topics attached with regex consumers",[48,90196,90197],{},"In Pulsar 2.4.2, we can delete topics attached with a regex consumer. The followings are detailed methods.",[321,90199,90200,90203],{},[324,90201,90202],{},"Add a flag in CommandSubscribe so that a regex consumer will never trigger the creation of a topic.",[324,90204,90205],{},"Subscribe to a non-existing topic. When a specific error occurs, the consumer is interpreted as a permanent failure and thus stopping retrying.",[48,90207,90208],{},"Before 2.4.2, it's not possible to delete topics when there is a regex consumer attached to them. The reason is that the regex consumer will immediately reconnect and re-create the topic.",[40,90210,52473],{"id":52472},[48,90212,90213,90214,190],{},"Download Pulsar 2.4.2 ",[55,90215,267],{"href":53730,"rel":90216},[264],[48,90218,78604],{},[321,90220,90221,90225,90229,90234],{},[324,90222,90223],{},[55,90224,78612],{"href":78611},[324,90226,90227],{},[55,90228,78618],{"href":78617},[324,90230,78621,90231],{},[55,90232,36242],{"href":36242,"rel":90233},[264],[324,90235,90236,90237],{},"You can self-register at ",[55,90238,57760],{"href":57760,"rel":90239},[264],[48,90241,78633,90242,190],{},[55,90243,75345],{"href":36230,"rel":90244},[264],[48,90246,84101,90247,190],{},[55,90248,90250],{"href":90249},"\u002Fcontent-type-filtring-system\u002Fblog","Apache Pusar blog",[48,90252,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":90254},[90255,90256,90257,90258,90259,90260,90261,90262,90263,90264,90265,90266,90267,90268],{"id":90070,"depth":19,"text":90071},{"id":90088,"depth":19,"text":90089},{"id":90095,"depth":19,"text":90096},{"id":90102,"depth":19,"text":90103},{"id":90117,"depth":19,"text":90118},{"id":90124,"depth":19,"text":90125},{"id":90131,"depth":19,"text":90132},{"id":90138,"depth":19,"text":90139},{"id":90151,"depth":19,"text":90152},{"id":90169,"depth":19,"text":90170},{"id":90176,"depth":19,"text":90177},{"id":90186,"depth":19,"text":90187},{"id":90193,"depth":19,"text":90194},{"id":52472,"depth":19,"text":52473},"2019-12-04","Learn improvements and bug fixes in Apache Pulsar 2.4.2 release.",{},"\u002Fblog\u002Fwhats-new-in-apache-pulsar-2-4-2",{"title":90051,"description":90270},"blog\u002Fwhats-new-in-apache-pulsar-2-4-2",[302,821,4301],"OfvTS_E6rr7nrhx_BMaftIr1kOrALRfeynOrBKTB-DM",{"id":90278,"title":90279,"authors":90280,"body":90282,"category":821,"createdAt":290,"date":90555,"description":90556,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":90557,"navigation":7,"order":296,"path":90558,"readingTime":33204,"relatedResources":290,"seo":90559,"stem":90560,"tags":90561,"__hash__":90562},"blogs\u002Fblog\u002Fhow-orange-financial-combats-financial-fraud-using-apache-pulsar.md","How Orange Financial combats financial fraud in over 50M transactions a day using Apache Pulsar",[90281],"Vincent Xie",{"type":15,"value":90283,"toc":90544},[90284,90287,90291,90294,90300,90304,90307,90313,90316,90319,90325,90328,90331,90334,90345,90349,90352,90355,90358,90361,90367,90371,90374,90377,90380,90384,90387,90390,90401,90407,90418,90424,90456,90462,90470,90472,90475,90489,90492,90495,90509,90511,90514,90517,90519],[48,90285,90286],{},"Mobile payment has achieved great success in China. These days, transactions can be completed within seconds simply by scanning a QR code. While that undoubtedly brings convenience to our daily lives, mobile payment also brings huge challenges to the risk control infrastructure. In September 2019, I gave a talk at the O’Reilly Strata Data Conference in New York and shared how our company leveraged Apache Pulsar to boost the efficiency of risk indicator development within Orange Financial.",[40,90288,90290],{"id":90289},"about-orange-financial","About Orange Financial",[48,90292,90293],{},"Orange Financial (also known as China Telecom Bestpay Co., Ltd), is an affiliate company of China Telecom. Established in March 2011, Orange Financial quickly received a “payment business license” issued by the People’s Bank of China. The subsidiaries of Orange Financial include Bestpay, Orange Wealth, Orange Insurance, Orange Credit, Orange Financial Cloud, and others. Bestpay, in particular, has become the third-largest payment provider in China, closely following Alipay and WeChat Payment. With 500 million registered users and 41.9 million active users, Orange Financial’s transaction volume reached 1.13 trillion CNY ($18.37 billion USD) in 2018.",[48,90295,90296],{},[384,90297],{"alt":90298,"src":90299},"image of Orange Finance companies","\u002Fimgs\u002Fblogs\u002F63a2cf8f032c7fac4f34f8cf_bestpay-business.png",[40,90301,90303],{"id":90302},"mobile-payment-in-china","Mobile payment in China",[48,90305,90306],{},"China currently has the largest mobile payment market in the world and it continues to grow year after year. According to data from a research institute, China had 462 million mobile payment users in 2016. That number reached 733 million in 2019. In 2020 and beyond, it will grow larger still. The total value of mobile payment transactions was $22 trillion (USD) in 2016. By the end of 2019, it was expected to hit $45 trillion USD.",[48,90308,90309],{},[384,90310],{"alt":90311,"src":90312},"2 graph of china's mobile payment user and payment transaction from 2015 to 2020","\u002Fimgs\u002Fblogs\u002F63a2cf8f470f443672d0fe63_mobile-payment.png",[48,90314,90315],{},"In China, the number of economic activities carried out through mobile payment is surging, as people are less likely to use cash or credit cards than ever before. The high industry penetration rate of mobile payment in China indicates that mobile payment is closely related to our daily life. You can do almost everything with a QR code on your smartphone -- order food, take taxi and metro, rent a bike, buy coffee and so on.",[48,90317,90318],{},"Mobile payment has achieved great success in China because it is convenient and fast. A transaction can be completed within seconds simply by scanning the QR code. That speed and convenience accelerates the adoption of mobile payments in e-commerce, financial services, transport, retail, and other businesses.",[48,90320,90321],{},[384,90322],{"alt":90323,"src":90324},"graph of industry penetration rate of mobile payment in china in 2018","\u002Fimgs\u002Fblogs\u002F63a2cf8f6ffab73016cbef58_payment-china.png",[40,90326,90327],{"id":50905},"Our challenges",[48,90329,90330],{},"With greater ease of use comes greater threats. While mobile payment brings convenience to our daily lives, it also brings huge challenges to the risk control infrastructure. An instant transaction involves thousands of rules running against the transaction to prevent potential financial frauds. RSA reports that the top fraud types include phishing, rogue applications, Trojan attack, and brand abuse. Financial threats in the mobile payment era are more than that. They include account or identity theft, merchant frauds, and money laundering, just to name a few. According to a survey from China UnionPay, 60% of the 105,000 interviewees reported that they had encountered mobile-payment security threats.",[48,90332,90333],{},"We have a robust risk management system that helps us detect and prevent these attacks. Yet, even though we have been doing quite well in protecting the assets of our customers in recent years, we still face many challenges:",[321,90335,90336,90339,90342],{},[324,90337,90338],{},"High concurrency: our system deals with over 50 million transactions and 1 billion events every day. The peak traffic can reach 35 thousand transactions per second.",[324,90340,90341],{},"Low latency demand: we require our system to respond to a transaction within 200 milliseconds.",[324,90343,90344],{},"A large number of batch jobs and streaming jobs.",[40,90346,90348],{"id":90347},"lambda-architecture","Lambda Architecture",[48,90350,90351],{},"The core of any risk management system is the decision. A decision is a combination of one or more indicators, such as geographic coordinates of a user’s login or the transaction volume of a retailer. Suspicions are raised, for example, if the geographic coordinates of a user’s recent logins are always the same. This tells us the transactions are likely being initiated by a bot or simulator. Similarly, if the transaction volume of a fruit stall is around $300 a day, when the volume suddenly rises up to $3000, our alert system is triggered.",[48,90353,90354],{},"Developing risk control indicators requires both historical data and real-time data. For example, when we sum up the total transaction volume of a merchant for the past month (30 days), we have to calculate the volume of the last 29 days in batch mode, and then sum up with the value returned by a streaming task on data collected on the current day since 12 am.",[48,90356,90357],{},"Most internet companies deploy a Lambda Architecture to solve similar challenges. Lambda is effective and keeps a good balance of speed and reliability. Previously, we also adopted the Lambda Architecture, which has three layers: (1) the batch layer, (2) the streaming layer, and (3) the serving layer.",[48,90359,90360],{},"The batch layer is for historical data computation, with data stored in Hive. Spark is the predominant batch computation engine. The streaming layer is for real-time computation, with Flink is the computing engine consuming data persisted in Kafka. The serving layer retrieves the final result for serving.",[48,90362,90363],{},[384,90364],{"alt":90365,"src":90366},"illustration of lambda architecture","\u002Fimgs\u002Fblogs\u002F63a2cf8ff86fd21f9b66c47d_lambda-arch.png",[40,90368,90370],{"id":90369},"the-problems-with-lambda-architecture","The problems with Lambda Architecture",[48,90372,90373],{},"Our experience has shown, however, that Lambda Architecture is problematic because it is complex and hard to maintain. First of all, we have to split our business logic into many segments. This increases our communication overhead and creates maintenance difficulties. Secondly, the data is duplicated in two different systems, requiring that we move data among different systems for processing.",[48,90375,90376],{},"As the business grows, our data processing stack becomes very complex because we constantly have to maintain all three software stacks (batch layer, streaming layer and serving layer). It also means we have to maintain multiple clusters: Kafka, Hive, Spark, Flink, and HBase, as well as a diverse engineering team with different skill sets. This makes the cost of maintaining that data-processing stack prohibitively expensive.",[48,90378,90379],{},"In seeking more efficient alternatives, we found Apache Pulsar. With Apache Pulsar, we made a bold attempt to re-factor our data processing stack. The goal is to simplify the stack, improve production efficiency, reduce cost, and accelerate decision-making in our risk management system.",[40,90381,90383],{"id":90382},"why-apache-pulsar-works-best","Why Apache Pulsar works best",[48,90385,90386],{},"Recognizing the unique challenges of simplifying business processes and keeping good control of financial risks, we began to investigate Apache Pulsar.",[48,90388,90389],{},"Apache Pulsar is an open-source distributed pub-sub messaging system originally created at Yahoo!. Today, it is a part of the Apache Software Foundation. After thorough investigation, we determined that Pulsar is the best fit for our businesses. We have summarized the reasons for this conclusion below:",[1666,90391,90392,90395,90398],{},[324,90393,90394],{},"Cloud-native architecture and segment-centric storage",[324,90396,90397],{},"Apache Pulsar adopts layered architecture and segment based storage (using Apache BookKeeper). An Apache Pulsar cluster is composed of two layers: (a) a stateless serving layer, comprised of a set of brokers that receive and deliver messages; and (b) a stateful persistence layer, comprised of a set of Apache BookKeeper storage nodes (called “bookies”) that store messages durably. Apache Pulsar is enabled with high availability, strong consistency, and low latency features.",[324,90399,90400],{},"Pulsar stores messages based on topic partitions. Each topic partition is assigned to one of the living brokers within the system (called the “owner broker” of that topic partition). The owner broker serves message-reads from the partition and message-writes to the partition. If a broker fails, Pulsar automatically moves the topic partitions that were owned by it to the remaining available brokers in the cluster. Since brokers are “stateless”, Pulsar only transfers ownership from one broker to another during node failure or broker cluster expansion. Importantly, no data copying occurs during this process.",[48,90402,90403],{},[384,90404],{"alt":90405,"src":90406},"illustration of topic repartition Pulsar stores messages ","\u002Fimgs\u002Fblogs\u002F63a2cf8ff41b51f85e39fde2_pulsar-arch.png",[1666,90408,90409,90412,90415],{},[324,90410,90411],{},"Messages on a Pulsar topic partition are stored in a distributed log. That log is further divided into segments. Each segment is stored as an Apache BookKeeper ledger that is distributed and stored in multiple bookies within the cluster. A new segment is created in one of three situations: (1) after a previous segment has been written for longer than a configured interval (time-based rolling); (2) if the size of the previous segment has reached a configured threshold (size-based rolling); or (3) whenever the ownership of topic partition is changed.",[324,90413,90414],{},"With segmentation, the messages in a topic partition can be evenly distributed and balanced across all the bookies in the cluster. This means the capacity of a topic partition is not solely limited by the capacity of one node. Instead, it can scale up to the total capacity of the whole BookKeeper cluster.",[324,90416,90417],{},"Layered architecture and segment-centric storage (with Apache BookKeeper) are two key design philosophies. These attributes provide Apache Pulsar with several significant benefits, including unlimited topic partition storage, instant scaling without data rebalancing, and independent scalability of serving and storage clusters.",[48,90419,90420],{},[384,90421],{"alt":90422,"src":90423},"illustration Layered architecture and segment-centric storage","\u002Fimgs\u002Fblogs\u002F63a2cf8f1f91d5f19cfba137_segment.png",[1666,90425,90426,90429,90432,90435,90438,90441,90444,90447,90450,90453],{},[324,90427,90428],{},"Apache Pulsar provides two types of read API: pub-sub for streaming and segment for batch processing",[324,90430,90431],{},"Apache Pulsar follows the general pub-sub pattern. (a) a producer publishes a message to a topic; and (b) a consumer subscribes to the topic, processes a received message, and sends a confirmation after the message is processed (Ack).",[324,90433,90434],{},"A subscription is a named configuration rule that determines how messages are delivered to consumers. Pulsar enables four types of subscriptions that can coexist on the same topic, distinguished by subscription name:",[324,90436,90437],{},"Exclusive subscriptions: only a single consumer is allowed to attach to the subscription.",[324,90439,90440],{},"Shared subscriptions: multiple consumers can subscribe and each consumer receives a portion of the messages.",[324,90442,90443],{},"Failover subscriptions: multiple consumers can attach to the same subscription, but only one consumer can receive messages. Only when the current consumer fails, the next consumer in line begins to receive messages.",[324,90445,90446],{},"Key-shared subscriptions: multiple consumers can attach to the same subscription, and messages with the same key or same ordering key are delivered to only one consumer.",[324,90448,90449],{},"In a batch process, Pulsar adopts segment-centric storage and reads data from the storage layer (BookKeeper or tiered storage).",[324,90451,90452],{},"Building a unified data processing stack using Pulsar and Spark",[324,90454,90455],{},"Once we understood Apache Pulsar, we chose that product to build a new unified data processing stack using Pulsar as the unified data store and Spark as the unified computing engine.",[48,90457,90458],{},[384,90459],{"alt":90460,"src":90461},"illustration pulsar data store","\u002Fimgs\u002Fblogs\u002F63a2cf8fd38afc00cee00617_pulsar-bestpay-arch.png",[1666,90463,90464,90467],{},[324,90465,90466],{},"Spark 2.2.0 Structured Streaming provides a solid foundation for batch and streaming processes. You can read data in Pulsar through Spark Structured Streaming, and query historical data in Pulsar through Spark SQL.",[324,90468,90469],{},"Apache Pulsar addresses the messy operational problems of other systems by storing data in segmented streams. The data is appended to topics (streams) as they arrive, then segmented and stored in scalable log storage, Apache BookKeeper. Since data is stored as only one copy (the “source of truth”), it solves the inconsistency problems in Lambda Architecture. Meanwhile, we can access data in streams via unified pub-sub messaging and segments for elastic parallel batch processing. Together with a unified computing engine like Spark, Apache Pulsar is a perfect unified messaging and storage solution for building the unified data processing stack. In light of all this, we decided to adopt Apache Pulsar to re-architect our stack for our business.",[40,90471,78265],{"id":78264},[48,90473,90474],{},"To enable Apache Pulsar for our business, we need to upgrade our data processing stack. The upgrading is done with two steps.",[48,90476,90477,90478,90483,90484,90488],{},"First, import data from the old Lambda based data processing stack. Our data is comprised of historic data and real-time streaming data. For real-time streaming data, we leverage ",[55,90479,90482],{"href":90480,"rel":90481},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsar-io-kafka",[264],"pulsar-io-kafka"," to read data from Kafka and then write to Pulsar while keeping schema information unchanged. For historic data, we use ",[55,90485,90487],{"href":85222,"rel":90486},[264],"pulsar-spark"," to query data stored in Hive by Spark, and store the results with Schema (AVRO) format into Pulsar. Both pulsar-io-kafka and pulsar-spark are already open-sourced by StreamNative.",[48,90490,90491],{},"Second, move our computation jobs to process the records stored in Pulsar. We use Spark Structured Streaming for real-time processing, and Spark SQL for batch processing and interactive queries.",[48,90493,90494],{},"The new Apache Pulsar based solution unifies the computing engine, data storage, and programming language. Compared with Lambda Architecture, the new solution reduces complexity dramatically:",[321,90496,90497,90500,90503,90506],{},[324,90498,90499],{},"Reduce complexity by 33% (the number of clusters is reduced from six to four);",[324,90501,90502],{},"Save storage space by 8.7% (expected: 28%);",[324,90504,90505],{},"Improve production efficiency by 11 times (support SQL);",[324,90507,90508],{},"Higher stability due to the unified architecture.",[40,90510,2125],{"id":2122},[48,90512,90513],{},"Apache Pulsar is a cloud-native messaging system with layered architecture and segment-centric storage. Pulsar is a perfect choice for building our unified data processing stack. Together with a unified computing engine like Spark, Apache Pulsar is able to boost the efficiency of our risk-control decision deployment. Thus, we are able to provide merchants and consumers with safe, convenient, and efficient services.",[48,90515,90516],{},"Pulsar is a young and promising project and the Apache Pulsar community is growing fast. We have invested heavily in the new Pulsar based unified data stack. We’d like to contribute our practices back to the Pulsar community and help companies with similar challenges to solve their problems.",[40,90518,52473],{"id":52472},[321,90520,90521,90528,90533,90538],{},[324,90522,90523],{},[55,90524,90527],{"href":90525,"rel":90526},"https:\u002F\u002Fwww.slideshare.net\u002Fstreamnative\u002Fhow-orange-financial-combat-financial-frauds-over-50m-transactions-a-day-using-apache-pulsar-176284080",[264],"Slides of How Orange Financial combats financial fraud over 50M transactions a day using Apache Pulsar",[324,90529,90530],{},[55,90531,90482],{"href":90480,"rel":90532},[264],[324,90534,90535],{},[55,90536,90487],{"href":85222,"rel":90537},[264],[324,90539,90540],{},[55,90541,90543],{"href":90542},"\u002Fblog\u002Ftech\u002F2019-07-16-one-storage-system-for-both-real-time-and-historical-data-analysis-pulsar-story\u002F","Apache Pulsar as One Storage System for Both Real-time and Historical Data Analysis",{"title":18,"searchDepth":19,"depth":19,"links":90545},[90546,90547,90548,90549,90550,90551,90552,90553,90554],{"id":90289,"depth":19,"text":90290},{"id":90302,"depth":19,"text":90303},{"id":50905,"depth":19,"text":90327},{"id":90347,"depth":19,"text":90348},{"id":90369,"depth":19,"text":90370},{"id":90382,"depth":19,"text":90383},{"id":78264,"depth":19,"text":78265},{"id":2122,"depth":19,"text":2125},{"id":52472,"depth":19,"text":52473},"2019-11-11","Orange Financial leveraged Apache Pulsar to boost the efficiency of risk indicator development.",{},"\u002Fblog\u002Fhow-orange-financial-combats-financial-fraud-using-apache-pulsar",{"title":90279,"description":90556},"blog\u002Fhow-orange-financial-combats-financial-fraud-using-apache-pulsar",[35559,821,9144],"EWAAVEffencRr-bvbQX2cPgNMSC_-ryC8T8-dCbV9XU",{"id":90564,"title":90565,"authors":90566,"body":90567,"category":821,"createdAt":290,"date":90782,"description":90783,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":90784,"navigation":7,"order":296,"path":90785,"readingTime":7986,"relatedResources":290,"seo":90786,"stem":90787,"tags":90788,"__hash__":90789},"blogs\u002Fblog\u002Fintroduction-to-pulsarctl.md","Introduction to Pulsarctl",[48575],{"type":15,"value":90568,"toc":90769},[90569,90572,90575,90578,90581,90584,90587,90595,90599,90602,90608,90617,90620,90626,90630,90633,90636,90642,90645,90651,90656,90662,90667,90673,90676,90680,90686,90690,90696,90699,90705,90708,90712,90718,90722,90728,90732,90738,90742,90745,90767],[48,90570,90571],{},"We're excited to announce that StreamNative open sourced Pulsarctl: an awesome admin tool in Apache Pulsar.",[48,90573,90574],{},"Apache Pulsar is a distributed pub-sub messaging system designed for scalability, flexibility, and no data loss, and it is a top-level project of the Apache Software Foundation. Currently, Apache Pulsar enjoys rapid growth and development. To make it better and more user-friendly, we’ve designed Pulsarctl for Go users.",[40,90576,90565],{"id":90577},"introduction-to-pulsarctl",[48,90579,90580],{},"Pulsarctl is an alternative tool of pulsar-admin, used to manage clients in Apache Pulsar. Pulsarctl is written in Go, based on Pulsar REST API. It provides Go developer with API interface and user-friendly commands, making it easier to interact with Pulsar Broker.",[48,90582,90583],{},"Compared with pulsar-admin, Pulsarctl is more user-friendly: Pulsarctl requires less dependencies to use commands, and provides more comprehensive description and usage for commands. With Pulsarctl, users can find and resolve issues faster when errors occur.",[48,90585,90586],{},"You can use Pulsarctl in the following two ways:",[1666,90588,90589,90592],{},[324,90590,90591],{},"Use it in Go project and interact with Pulsar brokers. The Admin API is developed by Go.",[324,90593,90594],{},"Use it as pulsar-admin in the command line.",[40,90596,90598],{"id":90597},"how-to-use-pulsar-go-admin-api","How to use Pulsar Go Admin API",[48,90600,90601],{},"Pulsarctl provides Admin API based on Go, and make it easier to interact with Pulsar Broker. Pulsarctl Admin API provides the following interface.",[8325,90603,90606],{"className":90604,"code":90605,"language":8330},[8328],"\n\u002F\u002F Client provides a client to the Pulsar Restful API\n    type Client interface {\n        Clusters() Clusters\n        Functions() Functions\n        Tenants() Tenants\n        Topics() Topics\n        Subscriptions() Subscriptions\n        Sources() Sources\n        Sinks() Sinks\n        Namespaces() Namespaces\n        Schemas() Schema\n        Brokers() Brokers\n        BrokerStats() BrokerStats\n    }\n\n",[4926,90607,90605],{"__ignoreMap":18},[916,90609,90610],{},[48,90611,90612,90613,190],{},"Note: For more information on Admin API interfaces, refer to ",[55,90614,90615],{"href":90615,"rel":90616},"https:\u002F\u002Fgodoc.org\u002Fgithub.com\u002Fstreamnative\u002Fpulsarctl",[264],[48,90618,90619],{},"The following example demonstrates how to use Pulsarctl Admin API.",[8325,90621,90624],{"className":90622,"code":90623,"language":8330},[8328],"\nconfig := &pulsar.Config{\n        WebServiceURL: “http:\u002F\u002Flocalhost:8080”,\n        HTTPClient:    http.DefaultClient,\n\n        \u002F\u002F If the server enable the TLSAuth\n        \u002F\u002F Auth: auth.NewAuthenticationTLS()\n\n        \u002F\u002F If the server enable the TokenAuth\n        \u002F\u002F TokenAuth: auth.NewAuthenticationToken()\n    }\n    \u002F\u002F the default NewPulsarClient will use v2 APIs. If you need to request other version APIs,\n    \u002F\u002F you can specified the API version like this:\n    \u002F\u002F admin := cmdutils.NewPulsarClientWithAPIVersion(pulsar.V2)\n    admin, err := pulsar.New(config)\n    if err != nil {\n        \u002F\u002F handle the err\n        return\n    }\n\n    \u002F\u002F more APIs, you can find them in the pkg\u002Fpulsar\u002Fadmin.go\n    \u002F\u002F You can find all the method in the pkg\u002Fpulsar\n    clusters, err := admin.Clusters().List()\n    if err != nil {\n        \u002F\u002F handle the error\n    }\n\n    \u002F\u002F handle the result\n    fmt.Println(clusters)\n\n",[4926,90625,90623],{"__ignoreMap":18},[40,90627,90629],{"id":90628},"how-to-use-pulsarctl-in-the-command-line","How to use Pulsarctl in the command line",[48,90631,90632],{},"Pulsarctl commands provide comprehensive description and usage.",[48,90634,90635],{},"Take create topic as an example, the following is the output of create topic.",[48,90637,90638],{},[384,90639],{"alt":90640,"src":90641},"pulsarctl-commands-doc","\u002Fimgs\u002Fblogs\u002F63a2d3c32c0c6b3efc211261_create-topics-doc.png",[48,90643,90644],{},"Pulsarctl unifies partitioned-topics and topics commands, and delivers clear and detailed output.",[48,90646,90647],{},[384,90648],{"alt":90649,"src":90650},"topic-list-show","\u002Fimgs\u002Fblogs\u002F63a2d3c356d9634bc11bd549_topic-list-show.png",[321,90652,90653],{},[324,90654,90655],{},"In Pulsarctl, all commands related to subscription is grouped in subscription commands. In pulsar-admin, all commands related to subscription is used as subcommands of topics, which is not convenient to use.",[48,90657,90658],{},[384,90659],{"alt":90660,"src":90661},"sub-commands-list","\u002Fimgs\u002Fblogs\u002F63a2d3c4a8019d18111b0494_sub-commands-list.png",[321,90663,90664],{},[324,90665,90666],{},"Pulsarctl improves the usage of special characters. In pulsar-admin, users are required to enter json-string in shell, which complicates the usage. Take functions putstate as an example, the following table lists the output comparison between pulsar-admin and pulsarctl.",[48,90668,90669],{},[384,90670],{"alt":90671,"src":90672},"comparison between pulsar-admin and pulsarctl","\u002Fimgs\u002Fblogs\u002F63a2dc615164013492c3dd98_comparison-between-pulsar-admin-and-pulsarctl.webp",[48,90674,90675],{},"The following examples illustrate differences in using pulsar-admin and pulsarctl.",[32,90677,90679],{"id":90678},"query-all-commands","Query all commands",[48,90681,90682],{},[384,90683],{"alt":90684,"src":90685},"pulsarctl-commands-list","\u002Fimgs\u002Fblogs\u002F63a2d3c4f86fd2ed5b69a4b5_commands-list.gif",[32,90687,90689],{"id":90688},"create-non-partitioned-topic","Create non-partitioned topic",[48,90691,90692],{},[384,90693],{"alt":90694,"src":90695},"pulsarctl-create-non-partitioned-topic","\u002Fimgs\u002Fblogs\u002F63a2d3c460d03704e32448c9_create-non-partitioned-topic.gif",[32,90697,16412],{"id":90698},"create-partitioned-topic",[48,90700,90701],{},[384,90702],{"alt":90703,"src":90704},"pulsarctl-create-partitioned-topic","\u002Fimgs\u002Fblogs\u002F63a2d3c4adb6f618f48dbe22_create-partitioned-topic.gif",[48,90706,90707],{},"To query topics with pulsar-admin, you have to separate partitioned topic and non-partitioned topic.",[32,90709,90711],{"id":90710},"query-non-partitioned-topic","Query non-partitioned topic",[48,90713,90714],{},[384,90715],{"alt":90716,"src":90717},"pulsar-admin-list-non-partitioned-topics","\u002Fimgs\u002Fblogs\u002F63a2d3c4f86fd24f2a69a4d3_pulsar-admin-list-non-partitioned-topics.gif",[32,90719,90721],{"id":90720},"query-partitioned-topics","Query partitioned topics",[48,90723,90724],{},[384,90725],{"alt":90726,"src":90727},"pulsar-admin-list-partitioned-topics","\u002Fimgs\u002Fblogs\u002F63a2d3c456d963fc5a1bd566_pulsar-admin-list-partitioned-topics.gif",[32,90729,90731],{"id":90730},"query-topics-with-pulsarctl","Query topics with Pulsarctl",[48,90733,90734],{},[384,90735],{"alt":90736,"src":90737},"pulsarctl-list-topics","\u002Fimgs\u002Fblogs\u002F63a2d3c4f489b783b8c3ceb9_pulsarctl-list-topics.gif",[40,90739,90741],{"id":90740},"contribute-to-pulsarctl","Contribute to Pulsarctl",[48,90743,90744],{},"If you have any issues with Pulsarctl, feel free to contact us. Any of your contributions to code or documentation is highly appreciated. The more you give, the more you get. In the contribution journey, you will learn more about Pulsarctl and Apache Pulsar.",[321,90746,90747,90753,90760],{},[324,90748,90749,90750],{},"Github: ",[55,90751,42821],{"href":42821,"rel":90752},[264],[324,90754,90755,90756],{},"Contribution Guide: ",[55,90757,90758],{"href":90758,"rel":90759},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsarctl\u002Fblob\u002Fmaster\u002FCONTRIBUTING.md",[264],[324,90761,90762,90763],{},"Developer Guide: ",[55,90764,90765],{"href":90765,"rel":90766},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fpulsarctl\u002Fblob\u002Fmaster\u002Fdocs\u002Fen\u002Fdeveloper-guide.md",[264],[48,90768,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":90770},[90771,90772,90773,90781],{"id":90577,"depth":19,"text":90565},{"id":90597,"depth":19,"text":90598},{"id":90628,"depth":19,"text":90629,"children":90774},[90775,90776,90777,90778,90779,90780],{"id":90678,"depth":279,"text":90679},{"id":90688,"depth":279,"text":90689},{"id":90698,"depth":279,"text":16412},{"id":90710,"depth":279,"text":90711},{"id":90720,"depth":279,"text":90721},{"id":90730,"depth":279,"text":90731},{"id":90740,"depth":19,"text":90741},"2019-11-06","Learn how to use Pulsar Go Admin API and use Pulsarctl in the command line.",{},"\u002Fblog\u002Fintroduction-to-pulsarctl",{"title":90565,"description":90783},"blog\u002Fintroduction-to-pulsarctl",[7347,821],"4P-9B9J8NnGYqXqmR_xwDeo2aImb6kyd6XeskMxf8kk",{"id":90791,"title":90792,"authors":90793,"body":90794,"category":821,"createdAt":290,"date":91031,"description":91032,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":91033,"navigation":7,"order":296,"path":91034,"readingTime":63265,"relatedResources":290,"seo":91035,"stem":91036,"tags":91037,"__hash__":91038},"blogs\u002Fblog\u002Fpowering-tencent-billing-platform-with-apache-pulsar.md","Powering Tencent Billing Platform with Apache Pulsar",[89231],{"type":15,"value":90795,"toc":91016},[90796,90798,90804,90808,90811,90816,90819,90833,90836,90840,90843,90847,90849,90852,90855,90857,90866,90869,90872,90876,90879,90882,90884,90887,90901,90904,90907,90912,90916,90919,90933,90935,90938,90941,90945,90947,90950,90954,90957,90968,90972,90975,90980,90997,91000,91004,91006,91009,91014],[40,90797,19156],{"id":19155},[48,90799,90800,90803],{},[55,90801,89263],{"href":89261,"rel":90802},[264]," is a billing platform that supports Tencent's businesses in handling its revenue of hundreds of billions dollars. It integrates domestic and international payment channels, provides various services such as account management, precision marketing, security risk control, auditing and accounting, billing analysis and so on. The platform carries daily revenue of hundreds of millions of dollars. It provides services for 180+ countries (regions), 10,000+ businesses and more than 1 million settlers. Working as an all-round one-stop billing platform, the total number of its escrow accounts is more than 30 billions.",[48,90805,90806],{},[384,90807],{"alt":21101,"src":89299},[48,90809,90810],{},"In Midas, the most critical challenge is to ensure the data consistency in transactions. We have developed a distributed transaction engine TDXA to handle the challenge. TDXA is a distributed transaction framework, which is designed to solve the consistency problem in application layer. The architecture of TDXA is as follows.",[48,90812,90813],{},[384,90814],{"alt":90815,"src":89416},"illustration transaction flow ",[48,90817,90818],{},"The main components are described as follows.",[321,90820,90821,90824,90827,90830],{},[324,90822,90823],{},"TM: A distributed transaction manager. As the control center of TDXA, it deploys a decentralized approach to offer services of high availability. TM supports both REST API based Try-Confirm\u002FCancel (TCC) and hybrid DB transactions. With TDF (an asynchronous coroutine framework) and asynchronous transaction processing in TDSQL, TM is able to support the billing business of the whole company in a high efficient way.",[324,90825,90826],{},"CM: The configuration center of TDXA. CM provides a flexible mechanism to register, manage and update transaction processing flow at runtime. It automatically checks the correctness and completeness of the transaction flow, and visualize it in a GUI console for users. Users can manage the flow in the GUI console.",[324,90828,90829],{},"TDSQL: A distributed transactional database, with characteristics of strong consistency, high availability, global deployment architecture, distributed horizontal scalability, high performance, enterprise-grade security support and so on. TDSQL provides a comprehensive distributed database solution.",[324,90831,90832],{},"MQ: A highly consistent and available message queue is required to enable TDXA to handle various failures during processing transactions.",[48,90834,90835],{},"As you can see, a highly consistent and available message queue plays a mission critical role in processing transactions for our billing service.",[40,90837,90839],{"id":90838},"message-queue-in-billing-service","Message queue in billing service",[48,90841,90842],{},"The usage of a message queue in our billing service can be divided into two categories: online transaction processing, and real-time data processing.",[48,90844,90845],{},[384,90846],{"alt":758,"src":89367},[32,90848,89371],{"id":89370},[48,90850,90851],{},"There are more than 80 channels with various characteristics, and more than 300 different business processing logic within Midas. One single payment workflow often involves many internal and external systems. This leads to longer RPC chains, and more failures, especially network timeouts (e.g. when interacting with oversea payment services).",[48,90853,90854],{},"TDXA leverages a message queue for handling failures occurred in processing transactions in a reliable way. Integrating with a local transaction state, TDXA is able to resume the transaction process from failures and ensure the consistency of billions of transactions daily.",[32,90856,89384],{"id":89383},[48,90858,90859,90860,90865],{},"The second challenge in a billing platform is, how to prove the data consistency of the transactions? We verify it by using a reconciliation system. The shorter the reconciliation time is, the sooner the problem is detected. For mobile payments, real-time user experience is critical. For example, if the hero is not delivered in time after purchasing in the ",[55,90861,90864],{"href":90862,"rel":90863},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FWangzhe_Rongyao",[264],"King of Glory"," game, it will inevitably affect user experience, and thus resulting in complaints.",[48,90867,90868],{},"We reconcile the billing transactions in real time by using a stream computing engine to process the transactions produced in the message queue.",[48,90870,90871],{},"TDXA leverages the message queue in both online transaction processing and real-time data processing to ensure the effectiveness and consistency of a transaction.",[32,90873,90875],{"id":90874},"other-scenarios","Other scenarios",[48,90877,90878],{},"During peek (for example, in a King of Globry anniversary celeration event), the transaction traffic of Midas can burst to more than 10 times of average. The message queue is able to buffer such peek traffic to reduce the pressure of core transaction system for requests such as transaction inquiries, delivery and tips notification.",[48,90880,90881],{},"Also, with the ability to process messages in a message queue in real-time, we are able to offer real-time data analysis and provide precise marketing services for our customers.",[40,90883,79151],{"id":50969},[48,90885,90886],{},"The requirements of a distributed message queue for our billing platform are summarized as follows:",[321,90888,90889,90892,90895,90898],{},[324,90890,90891],{},"Strong consistency: A billing service bears no data loss, which is the basic requirement.",[324,90893,90894],{},"High availability: It must have the failover capability and can perform the automatic recovery on failures.",[324,90896,90897],{},"Massive storage: Mobile applications generates a large amount of transaction data, so massive storage capacity is required.",[324,90899,90900],{},"Low latency: A payment service handling billions of transactions requires receiving messages in predictable low latency (less than 10ms).",[48,90902,90903],{},"We have evaluated many open source solutions for Midas. Kafka is popular in log collection and processing, however it is rarely used in mission-critical financial use cases due to its problems of data consistency and durability. RocketMQ doesn't provide a user-friendly API in administrating topics (e.g. you cannot delete invalid messages on per-topic basis), and it doesn't provide failover capability in its open source version. We evaluated and chose Pulsar because of its native high consistency. Apache Pulsar provides storage services of high availability based on Apache Bookkeeper, and deploys a decoupled architecture, so the storage and processing layers can scale independently. Pulsar also supports several consumption modes and geo replication.",[48,90905,90906],{},"The following is a summary of our comparison of Kafka, RocketMQ and Pulsar.",[48,90908,90909],{},[384,90910],{"alt":758,"src":90911},"\u002Fimgs\u002Fblogs\u002F63a2cc37f41b5198c0370637_table.png",[40,90913,90915],{"id":90914},"adopting-pulsar","Adopting Pulsar",[48,90917,90918],{},"In the process of adopting Pulsar in Tencent, we made changes to Pulsar in order to meet our requirements. The changes are summarized as follows.",[1666,90920,90921,90924,90927,90930],{},[324,90922,90923],{},"Support delaying messages and delayed retries (supported in version 2.4.0).",[324,90925,90926],{},"Support secondary tag.",[324,90928,90929],{},"Improve the management console, support message query and consumption tracking.",[324,90931,90932],{},"Improve monitoring and alerting system.",[32,90934,89459],{"id":89458},[48,90936,90937],{},"Delayed message delivery is a common requirement in billing service. For example, it can be used for handling timeout in transaction processing. Concerning service failure or timeout, there is no need to retry a transaction many times in a short period, because it is likely to fail again. It makes more sense to retry in a backoff manner by leveraging delayed message delivery in Pulsar.",[48,90939,90940],{},"Delayed message delivery can be implemented in two different approaches. One is segragating messages into different topics based on the delay time interval, and broker checks those delay topics periodically based on their time interval and delivers the delayed messages accordingly.",[48,90942,90943],{},[384,90944],{"alt":758,"src":89525},[32,90946,89529],{"id":89528},[48,90948,90949],{},"We collect the metrics from Pulsar and store them in our Eagle-Eye ops platform. Thus, we can write alert rules to monitor the system.",[48,90951,90952],{},[384,90953],{"alt":758,"src":89538},[48,90955,90956],{},"We monitor and alert on the following metrics:",[321,90958,90959,90962,90965],{},[324,90960,90961],{},"Backlog: If massive information accumulates for online services, it means that consumption has become a bottleneck. At this time, it is necessary to give a timely alert and inform the relevant personnel to deal with it.",[324,90963,90964],{},"End-to-end latency: In the transaction record query scenario, the purchase record is required to be searched within a second. By matching the production flow and consumption flow collected by the monitoring component, we can count the end-to-end latency of each message.",[324,90966,90967],{},"Failures: The Eagle-Eye ops platform makes statistics of errors in the pipeline, monitoring and alerting from various dimensions such as business, IP and others.",[40,90969,90971],{"id":90970},"pulsar-in-midas","Pulsar in Midas",[48,90973,90974],{},"With the enhancements we made in Pulsar, we deployed Pulsar in the following architecture.",[48,90976,90977],{},[384,90978],{"alt":90979,"src":89565}," Pulsar in Midas htt proxy",[321,90981,90982,90985,90988,90991,90994],{},[324,90983,90984],{},"As the message queue proxy layer, Broker is responsible for message production and consumption requests. Broker supports horizontal scalability, and rebalances automatically by topic according to the load.",[324,90986,90987],{},"BookKeeper serves as the distributed storage for message queues. You can configure multiple replicas of messages in BookKeeper. BookKeeper is enabled with failover capability under exceptional circumstances.",[324,90989,90990],{},"ZooKeeper serves as the metadata and cluster configuration center for message queues.",[324,90992,90993],{},"Some Midas businesses are written in JS and PHP. The HTTP proxy provides the unified access endpoint and retry capability for clients using other languages. When the production cluster fails, the proxy will degrade and route messages to the other clusters for disaster recovery.",[324,90995,90996],{},"Pulsar supports various consumption modes. Shared subscription allows scaling up the consumption beyond the number of partitions, and Failover subscription works well for stream processing in transaction cleanup workflow.",[48,90998,90999],{},"We have successfully adopted and run Pulsar at a very large scale. It handles tens of billions of transactions during peak time, guarantees data consistency in processing transactions, and provides 99.999% high availability for our services. The high consistency, availability and stability of Pulsar helps our billing and transaction engine run very efficiently.",[48,91001,91002],{},[384,91003],{"alt":18,"src":89629},[40,91005,319],{"id":316},[48,91007,91008],{},"Apache Pulsar is a young open source project with attractive features. Apache Pulsar community is growing fast with new adoptions in different industries. We’d like to develop further collaborations with Apache Pulsar community, contribute our improvements back to the community, and work with other users to further improve Pulsar.",[48,91010,91011],{},[384,91012],{"alt":91013,"src":89270}," pulsar environment",[48,91015,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":91017},[91018,91019,91024,91025,91029,91030],{"id":19155,"depth":19,"text":19156},{"id":90838,"depth":19,"text":90839,"children":91020},[91021,91022,91023],{"id":89370,"depth":279,"text":89371},{"id":89383,"depth":279,"text":89384},{"id":90874,"depth":279,"text":90875},{"id":50969,"depth":19,"text":79151},{"id":90914,"depth":19,"text":90915,"children":91026},[91027,91028],{"id":89458,"depth":279,"text":89459},{"id":89528,"depth":279,"text":89529},{"id":90970,"depth":19,"text":90971},{"id":316,"depth":19,"text":319},"2019-10-22","Tencent adopted and run Pulsar at a very large scale. It handles tens of billions of transactions during peak time.",{},"\u002Fblog\u002Fpowering-tencent-billing-platform-with-apache-pulsar",{"title":90792,"description":91032},"blog\u002Fpowering-tencent-billing-platform-with-apache-pulsar",[35559,821,4301],"yMKrswn0xDJc6XJhHawWDtmzUF6ldx-oIlgrbITZWaE",{"id":91040,"title":91041,"authors":91042,"body":91043,"category":821,"createdAt":290,"date":91291,"description":91292,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":91293,"navigation":7,"order":296,"path":85227,"readingTime":4475,"relatedResources":290,"seo":91294,"stem":91295,"tags":91296,"__hash__":91297},"blogs\u002Fblog\u002Fuse-apache-skywalking-to-trace-apache-pulsar-messages.md","Use Apache SkyWalking to Trace Apache Pulsar Messages",[808],{"type":15,"value":91044,"toc":91278},[91045,91047,91050,91053,91056,91058,91065,91069,91072,91077,91083,91088,91091,91097,91100,91103,91107,91116,91119,91126,91129,91133,91138,91144,91149,91152,91155,91161,91166,91169,91172,91187,91193,91199,91203,91206,91212,91215,91219,91222,91228,91232,91235,91241,91245,91248,91251,91262,91268,91270,91273,91276],[40,91046,33228],{"id":33227},[48,91048,91049],{},"Apache Pulsar is a distributed messaging platform and a fast-growing alternative to Kafka. Apache SkyWalking is a popular application performance monitoring tool for distributed systems, specially designed for microservices, cloud-native, and container-based (Docker, K8s, and Mesos) architectures.",[48,91051,91052],{},"Message tracing is a very useful feature which helps engineers troubleshoot problems related to message publishing and receiving.",[48,91054,91055],{},"This tutorial shares how to track Pulsar messages by using SkyWalking.",[40,91057,86804],{"id":86803},[48,91059,91060,91061,91064],{},"Before getting started, make sure you have installed Git, JDK 8, Maven 3, and Pulsar (cluster or standalone). If you do not have an available Pulsar, follow the instruction (",[55,91062,86810],{"href":86810,"rel":91063},[264],") to install.",[40,91066,91068],{"id":91067},"build-pulsar-agent-from-skywalking-source","Build Pulsar agent from SkyWalking source",[48,91070,91071],{},"The Apache Pulsar agent will be officially released in the SkyWalking 6.5.0, since it has not been released yet, you need to build a Pulsar agent from the source of SkyWalking.",[1666,91073,91074],{},[324,91075,91076],{},"Download the SkyWalking source and build the Pulsar agent plugin.",[8325,91078,91081],{"className":91079,"code":91080,"language":8330},[8328],"\n$ git clone https:\u002F\u002Fgithub.com\u002Fapache\u002FSkyWalking.git\n    $ cd SkyWalking\n    $ git submodule init\n    $ git submodule update\n    $ .\u002Fmvnw clean package -DskipTests\n\n",[4926,91082,91080],{"__ignoreMap":18},[1666,91084,91085],{"start":19},[324,91086,91087],{},"Decompress the file apache-SkyWalking-apm-bin.tar.gz.",[48,91089,91090],{},"After the decompression, all packages are in the directory apm-dist\u002Ftarget.",[8325,91092,91095],{"className":91093,"code":91094,"language":8330},[8328],"\n$ tar -xf apache-SkyWalking-apm-bin.tar.gz\n\n",[4926,91096,91094],{"__ignoreMap":18},[48,91098,91099],{},"You can find the Pulsar agent plugin is in the directory agent\u002Fplugins.",[48,91101,91102],{},"Congratulations, you have successfully built the Pulsar agent plugin.",[40,91104,91106],{"id":91105},"start-a-skywalking-backend","Start a SkyWalking backend",[48,91108,91109,91110,91115],{},"If you already have an available SkyWalking backend, you can skip this step. If not, here is a ",[55,91111,91114],{"href":91112,"rel":91113},"https:\u002F\u002Fgithub.com\u002Fapache\u002FSkyWalking\u002Fblob\u002Fmaster\u002Fdocs\u002Fen\u002Fsetup\u002Fbackend\u002Fbackend-ui-setup.md#deploy-backend-and-ui",[264],"quick start"," for running a SkyWalking backend locally.",[48,91117,91118],{},"Tip: if you run the SkyWalking backend and Pulsar broker on the same machine, you need to change the web service port of SkyWalking or Pulsar.",[48,91120,91121,91122,190],{},"For how to change the web service port of SkyWalking UI, see ",[55,91123,267],{"href":91124,"rel":91125},"https:\u002F\u002Fgithub.com\u002Fapache\u002FSkyWalking\u002Fblob\u002Fmaster\u002Fdocs\u002Fen\u002Fsetup\u002Fbackend\u002Fui-setup.md",[264],[48,91127,91128],{},"If you want to change the web service port of Pulsar broker, edit the file conf\u002Fbroker.conf.",[40,91130,91132],{"id":91131},"download-test-project-and-set-up","Download test project and set up",[1666,91134,91135],{},[324,91136,91137],{},"Download the SkyWalking integration test for Apache Pulsar.",[8325,91139,91142],{"className":91140,"code":91141,"language":8330},[8328],"\n$ git clone https:\u002F\u002Fgithub.com\u002FSkyAPMTest\u002Fagent-auto-integration-testcases.git\nIn this repo, you can find a project named pulsar-scenario.\n\n",[4926,91143,91141],{"__ignoreMap":18},[1666,91145,91146],{"start":19},[324,91147,91148],{},"Import the project pulsar-scenario to your IDE.",[48,91150,91151],{},"Here takes Intelli IDEA as an example.",[48,91153,91154],{},"As shown in the image below, the project pulsar-scenario is a spring boot application and has a CaseController.",[48,91156,91157],{},[384,91158],{"alt":91159,"src":91160},"developer interface to show how to Import the project pulsar-scenario","\u002Fimgs\u002Fblogs\u002F63a2ca69f76c8216c8dc57e0_pulsar-skywalking-2.png",[1666,91162,91163],{"start":279},[324,91164,91165],{},"Set up the Pulsar agent plugin.",[48,91167,91168],{},"Before starting the spring boot application, you need to set up the Pulsar agent plugin.",[48,91170,91171],{},"Tip:",[321,91173,91174,91181,91184],{},[324,91175,91176,91177,190],{},"For how to set up a Java agent and its properties, see ",[55,91178,267],{"href":91179,"rel":91180},"https:\u002F\u002Fgithub.com\u002Fapache\u002FSkyWalking\u002Fblob\u002Fmaster\u002Fdocs\u002Fen\u002Fsetup\u002Fservice-agent\u002Fjava-agent\u002FREADME.md",[264],[324,91182,91183],{},"By default, the project pulsar-scenario uses the 8082 port.",[324,91185,91186],{},"Detail of VM options as shown in the image below:",[8325,91188,91191],{"className":91189,"code":91190,"language":8330},[8328],"\n```bash\n     \n        -javaagent:\u002Fapm-dist\u002Ftarget\u002Fapache-SkyWalking-apm-bin\u002Fagent\u002FSkyWalking-agent.jar -DSW_AGENT_COLLECTOR_BACKEND_SERVICES=:11800 -DSW_AGENT_NAME=pulsar-demo -Dservice.url=pulsar:\u002F\u002F:6650\n     \n    ```\n \n",[4926,91192,91190],{"__ignoreMap":18},[48,91194,91195],{},[384,91196],{"alt":91197,"src":91198},"image of pulsar agent plugin","\u002Fimgs\u002Fblogs\u002F63a2caf74b652c0d8ed573d8_pulsar-skywalking-3.png",[40,91200,91202],{"id":91201},"test-and-view-in-skywalking","Test and view in SkyWalking",[48,91204,91205],{},"After performing the steps stated previously, you have prepared all the environments. Next, you can simulate some requests and view them in SkyWalking UI. Download the pulsar-case.",[8325,91207,91210],{"className":91208,"code":91209,"language":8330},[8328],"\n$ curl http:\u002F\u002Flocalhost:8082\u002Fpulsar-scenario\u002Fcase\u002Fpulsar-case\n\n",[4926,91211,91209],{"__ignoreMap":18},[48,91213,91214],{},"By executing the HTTP request above, some traces are created in SkyWalking. Let’s go to the SkyWalking UI to check them.",[32,91216,91218],{"id":91217},"dashboard-view","Dashboard view",[48,91220,91221],{},"It shows there are 2 endpoints, 1 service, and 1 MQ.",[48,91223,91224],{},[384,91225],{"alt":91226,"src":91227},"skywalking interface with graph","\u002Fimgs\u002Fblogs\u002F63a2cb38d1dcd60c82797c88_pulsar-skywalking-4.png",[32,91229,91231],{"id":91230},"topology-view","Topology view",[48,91233,91234],{},"It shows that a user sends a request to the web service (that is, your test web application) and the web service sends to and receives messages from a Pulsar broker.",[48,91236,91237],{},[384,91238],{"alt":91239,"src":91240},"skywalking interface with workflow apache","\u002Fimgs\u002Fblogs\u002F63a2cb38a8019d4464159a22_pulsar-skywalking-5.png",[32,91242,91244],{"id":91243},"trace-view","Trace view",[48,91246,91247],{},"It shows the trace details of each request.",[48,91249,91250],{},"Currently, the Pulsar agent plugin has 3 types of spans as below:",[321,91252,91253,91256,91259],{},[324,91254,91255],{},"Producer span, which records messages sent by producers.",[324,91257,91258],{},"Producer callback span, which records messages are already sent.",[324,91260,91261],{},"Consumer span, which records messages are received by consumers.",[48,91263,91264],{},[384,91265],{"alt":91266,"src":91267},"skywalking interface with pulsar demo","\u002Fimgs\u002Fblogs\u002F63a2cb38880d37c30edfd313_pulsar-skywalking-6.png",[40,91269,319],{"id":316},[48,91271,91272],{},"As you can see, SkyWalking UI is pretty cool! If you are still worried about how to track Pulsar messages, try this integration of Pulsar and SkyWalking.",[48,91274,91275],{},"Thanks for the SkyWalking community who gives me a lot of help for the integration of Pulsar and SkyWalking.",[48,91277,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":91279},[91280,91281,91282,91283,91284,91285,91290],{"id":33227,"depth":19,"text":33228},{"id":86803,"depth":19,"text":86804},{"id":91067,"depth":19,"text":91068},{"id":91105,"depth":19,"text":91106},{"id":91131,"depth":19,"text":91132},{"id":91201,"depth":19,"text":91202,"children":91286},[91287,91288,91289],{"id":91217,"depth":279,"text":91218},{"id":91230,"depth":279,"text":91231},{"id":91243,"depth":279,"text":91244},{"id":316,"depth":19,"text":319},"2019-10-10","A step-by-step tutorial of using Apache SkyWalking to trace messages in Apache Pulsar.",{},{"title":91041,"description":91292},"blog\u002Fuse-apache-skywalking-to-trace-apache-pulsar-messages",[38442,799,821,16985,8058,26747],"bnlwW6LysXLWDyZyFK2G1Yp_TiDHuCG4PuLi5tLAkA0",{"id":91299,"title":91300,"authors":91301,"body":91302,"category":7338,"createdAt":290,"date":91590,"description":91591,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":91592,"navigation":7,"order":296,"path":91593,"readingTime":4475,"relatedResources":290,"seo":91594,"stem":91595,"tags":91596,"__hash__":91597},"blogs\u002Fblog\u002Fstreamnative-open-sourced-and-contributed-apache-pulsar-manager-to-asf.md","StreamNative open sourced and contributed Apache Pulsar Manager to ASF",[73997],{"type":15,"value":91303,"toc":91572},[91304,91307,91310,91314,91317,91320,91329,91332,91336,91339,91343,91346,91350,91353,91359,91363,91366,91372,91376,91379,91385,91389,91392,91398,91402,91405,91411,91415,91418,91424,91428,91431,91437,91441,91444,91450,91454,91457,91468,91474,91478,91481,91504,91508,91511,91514,91517,91553,91555,91561],[48,91305,91306],{},"We’re thrilled to announce that StreamNative open sourced and contributed Apache Pulsar Manager to ASF as a part of Apache Pulsar!",[48,91308,91309],{},"Source code and document have been transferred to GitHub that ties together all of our initiatives with information on why we design it, how to use, contribute and develop, what future plans are, and so on.",[40,91311,91313],{"id":91312},"why-design-apache-pulsar-manager","Why design Apache Pulsar Manager",[48,91315,91316],{},"Apache Pulsar is a next-generation streaming and messaging system designed for scalability, flexibility, and no data loss, and it is a top-level project of Apache Software Foundation.",[48,91318,91319],{},"Currently, Apache Pulsar enjoys rapid growth and development. However, as an infrastructure, it still needs a better and comprehensive ecosystem.",[48,91321,91322,91323,91328],{},"Apache Pulsar has a monitoring tool, which is called ",[55,91324,91327],{"href":91325,"rel":91326},"http:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fadministration-dashboard\u002F",[264],"Apache Pulsar Dashboard",", focusing on simple collecting and displaying information of Pulsar, such as show statistics of tenants, namespaces, topics, subscriptions, and so on.",[48,91330,91331],{},"However, Pulsar Dashboard lacks the ability to manage Pulsar, such as add, delete and update tenants, namespaces, topics, and so on. When a cluster is expanded, using the Pulsar Admin tool to manage Pulsar can not satisfy demands. Consequently, Pulsar needs a simple and easy-to-use management console for users.",[40,91333,91335],{"id":91334},"what-is-apache-pulsar-manager","What is Apache Pulsar Manager?",[48,91337,91338],{},"Apache Pulsar Manager is a web-based GUI management and monitoring tool that manages tenants, namespaces, topics, subscriptions, brokers, clusters, and supports dynamic configurations of multiple environments.",[40,91340,91342],{"id":91341},"feature-preview-of-apache-pulsar-manager","Feature preview of Apache Pulsar Manager",[48,91344,91345],{},"The following images show feature preview of Apache Pulsar Manager.",[32,91347,91349],{"id":91348},"log-in","Log in",[48,91351,91352],{},"You can use the default account (pulsar) and the default password (pulsar) to log in.",[48,91354,91355],{},[384,91356],{"alt":91357,"src":91358},"pulsar-manager-login","\u002Fimgs\u002Fblogs\u002F63a2c81fea2197dc2c0a33e5_pulsar-manager-login.gif",[32,91360,91362],{"id":91361},"configure-multiple-environments","Configure multiple environments",[48,91364,91365],{},"You can configure dynamic environments with multiple service URLs.",[48,91367,91368],{},[384,91369],{"alt":91370,"src":91371},"pulsar-manager-environments","\u002Fimgs\u002Fblogs\u002F63a2c81f1603a7dec87d42a0_pulsar-manager-environments.gif",[32,91373,91375],{"id":91374},"manage-tenants","Manage tenants",[48,91377,91378],{},"You can add, modify, delete, configure, and perform other operations on tenants.",[48,91380,91381],{},[384,91382],{"alt":91383,"src":91384},"pulsar-manager-tenants","\u002Fimgs\u002Fblogs\u002F63a2c81fecfd4d19d6a58be6_pulsar-manager-tenants.gif",[32,91386,91388],{"id":91387},"manage-namespaces","Manage namespaces",[48,91390,91391],{},"You can add and delete namespaces, modify namespace policies, and perform other operations on namespaces.",[48,91393,91394],{},[384,91395],{"alt":91396,"src":91397},"pulsar-manager-namespaces","\u002Fimgs\u002Fblogs\u002F63a2c81f4b652c02d1d47d1b_pulsar-manager-namespaces.gif",[32,91399,91401],{"id":91400},"manage-topics","Manage topics",[48,91403,91404],{},"You can add, delete, offload, and perform other operations on partitioned topics, non-partitioned topics, persistent topics, non-persistent topics, and so on.",[48,91406,91407],{},[384,91408],{"alt":91409,"src":91410},"pulsar-manager-topics","\u002Fimgs\u002Fblogs\u002F63a2c81f8f2052591b9f25d6_pulsar-manager-topics.gif",[32,91412,91414],{"id":91413},"manage-subscriptions","Manage subscriptions",[48,91416,91417],{},"You can skip, expire, clear, reset, and perform other operations on subscriptions.",[48,91419,91420],{},[384,91421],{"alt":91422,"src":91423},"pulsar-manager-subscriptions","\u002Fimgs\u002Fblogs\u002F63a2c81f772535d05aa91771_pulsar-manager-subscriptions.gif",[32,91425,91427],{"id":91426},"manage-clusters","Manage clusters",[48,91429,91430],{},"You can view, configure, and perform other operations on clusters.",[48,91432,91433],{},[384,91434],{"alt":91435,"src":91436},"pulsar-manager-clusters","\u002Fimgs\u002Fblogs\u002F63a2c820c5e6197cfba99156_pulsar-manager-clusters.gif",[32,91438,91440],{"id":91439},"manage-brokers","Manage brokers",[48,91442,91443],{},"You can view, configure, and run health checks on brokers.",[48,91445,91446],{},[384,91447],{"alt":91448,"src":91449},"pulsar-manager-brokers","\u002Fimgs\u002Fblogs\u002F63a2c8204b652ca2b6d47d1c_pulsar-manager-brokers.gif",[32,91451,91453],{"id":91452},"monitor-topics-and-subscriptions","Monitor topics and subscriptions",[48,91455,91456],{},"The figure below shows:",[321,91458,91459,91462,91465],{},[324,91460,91461],{},"One non-partitioned topic (data-technology)",[324,91463,91464],{},"Two partitioned topics (data-export-to-db and data-import-from-db) Partitioned topics are divided into two dimensions, and you can see the subscriptions of each topic and each subscription belongs to which topic(s).",[324,91466,91467],{},"Statistics, such as the number of messages sent and received per second, throughput and storage used per second, and so on.",[48,91469,91470],{},[384,91471],{"alt":91472,"src":91473},"pulsar-manager-topics-monitors","\u002Fimgs\u002Fblogs\u002F63a2c820a4e6a6d8759626cb_pulsar-manager-topics-monitors.gif",[40,91475,91477],{"id":91476},"future-plan-of-apache-pulsar-manager","Future plan of Apache Pulsar Manager",[48,91479,91480],{},"An effective management tool is a must-have for Apache Pulsar, so we plan to add the following features in the next release.",[321,91482,91483,91486,91489,91492,91495,91498,91501],{},[324,91484,91485],{},"Support authentication and authorization",[324,91487,91488],{},"Support schema management",[324,91490,91491],{},"Support function management",[324,91493,91494],{},"Support connector management",[324,91496,91497],{},"Support bookie management",[324,91499,91500],{},"Support peek messages, including single and batch",[324,91502,91503],{},"Optimize backend, including querying data, paging and filtering results",[40,91505,91507],{"id":91506},"get-involved-in-apache-pulsar-community","Get involved in Apache Pulsar community",[48,91509,91510],{},"Alongside this announcement, we have not only published various documents describing R&D details but also shared blogs clarifying the development progress and detailing our intent to contribute Apache Pulsar Manager to Apache Pulsar community back. We’ve been long-time contributors to Apache Pulsar and Apache BookKeeper, and share the best practices we developed, push the industry forward and enjoy being a part of the open-source community.",[48,91512,91513],{},"And now, we encourage everyone to participate in the development of Apache Pulsar Manager and welcome any contributions including code and documentation. Through this project, you can learn from Apache Pulsar Manager’s full development lifecycle, as well as reuse the code to build your own experiences.",[48,91515,91516],{},"To get started, check out the Apache Pulsar Manager project on GitHub:",[321,91518,91519,91525,91532,91539,91546],{},[324,91520,91521],{},[55,91522,91524],{"href":78358,"rel":91523},[264],"Readme",[324,91526,91527],{},[55,91528,91531],{"href":91529,"rel":91530},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-40%3A-Pulsar-Manager",[264],"Design proposal",[324,91533,91534],{},[55,91535,91538],{"href":91536,"rel":91537},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar-manager\u002Fblob\u002Fmaster\u002FCONTRIBUTING.md",[264],"Contribution guide",[324,91540,91541],{},[55,91542,91545],{"href":91543,"rel":91544},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar-manager\u002Fblob\u002Fmaster\u002Fdocs\u002Fdeveloper-guide.md",[264],"Developer's guide",[324,91547,91548],{},[55,91549,91552],{"href":91550,"rel":91551},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar-manager\u002Fissues\u002F152",[264],"Roadmap",[40,91554,78580],{"id":78579},[48,91556,91557,91558,190],{},"If you want to have real-time discussions with developers on Pulsar issues, join ",[55,91559,57762],{"href":57760,"rel":91560},[264],[48,91562,91563,91564,1154,91569,190],{},"If you are interested in Pulsar user stories on production, Pulsar development details, Pulsar community news, and so on, follow ",[55,91565,91568],{"href":91566,"rel":91567},"https:\u002F\u002Fmedium.com\u002Fstreamnative",[264],"StreamNative on Medium",[55,91570,84120],{"href":33664,"rel":91571},[264],{"title":18,"searchDepth":19,"depth":19,"links":91573},[91574,91575,91576,91587,91588,91589],{"id":91312,"depth":19,"text":91313},{"id":91334,"depth":19,"text":91335},{"id":91341,"depth":19,"text":91342,"children":91577},[91578,91579,91580,91581,91582,91583,91584,91585,91586],{"id":91348,"depth":279,"text":91349},{"id":91361,"depth":279,"text":91362},{"id":91374,"depth":279,"text":91375},{"id":91387,"depth":279,"text":91388},{"id":91400,"depth":279,"text":91401},{"id":91413,"depth":279,"text":91414},{"id":91426,"depth":279,"text":91427},{"id":91439,"depth":279,"text":91440},{"id":91452,"depth":279,"text":91453},{"id":91476,"depth":19,"text":91477},{"id":91506,"depth":19,"text":91507},{"id":78579,"depth":19,"text":78580},"2019-09-24","StreamNative open sourced and contributed Pulsar Manager - a React based pulsar management console - to the ASF.",{},"\u002Fblog\u002Fstreamnative-open-sourced-and-contributed-apache-pulsar-manager-to-asf",{"title":91300,"description":91591},"blog\u002Fstreamnative-open-sourced-and-contributed-apache-pulsar-manager-to-asf",[302,821],"Zo-NaF-awbzgpNJHCjVSPMGWwqo4ielQlynkXpKMnlc",{"id":91599,"title":91600,"authors":91601,"body":91603,"category":821,"createdAt":290,"date":91781,"description":91782,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":91783,"navigation":7,"order":296,"path":91784,"readingTime":11180,"relatedResources":290,"seo":91785,"stem":91786,"tags":91787,"__hash__":91788},"blogs\u002Fblog\u002Fapache-pulsar-adoption-story-in-actorcloud-iot-platform.md","Apache Pulsar Adoption Story in ActorCloud (IoT Platform)",[91602],"Rocky Jin",{"type":15,"value":91604,"toc":91771},[91605,91607,91610,91613,91619,91623,91626,91632,91636,91639,91650,91653,91657,91660,91666,91672,91676,91679,91685,91702,91706,91709,91717,91723,91726,91758,91760,91763,91769],[40,91606,19156],{"id":19155},[48,91608,91609],{},"EMQ is an open-source software company providing highly-scalable, real-time messaging and streaming engine for IoT platforms & applications in the 5G era. Currently, EMQ is one of the most widely used MQTT message brokers in the world and has successfully supported various global clients, including HPE, Ericsson, Huawei, China Mobile, China UnionPay, and so on.",[48,91611,91612],{},"ActorCloud is an open-source IoT platform launched by EMQ, which provides multiple protocol access, message flow management, data parsing, and data processing capabilities for devices on a secure and reliable basis. ActorCloud uses Apache Pulsar to store and process streaming data, leverages Apache Pulsar Functions to handle data faster and analyzes IoT data through the SQL engine exposed to the upper layer.",[48,91614,91615],{},[384,91616],{"alt":91617,"src":91618},"graph MQTT Broker Cluster","\u002Fimgs\u002Fblogs\u002F63a2c6080f50d1de904d95c3_emq-2.png",[40,91620,91622],{"id":91621},"problem","Problem",[48,91624,91625],{},"As an IoT platform, ActorCloud needs to access data, manage devices, store and analyze data, and provide programming interfaces to the upper layer so that developers can develop applications conveniently. Since ActorCloud has plenty of devices and large amounts of data, it needs the ability to scale out horizontally to meet business needs.",[48,91627,91628],{},[384,91629],{"alt":91630,"src":91631},"illustration of scale out","\u002Fimgs\u002Fblogs\u002F63a2c6082cf67d1e32fb42fa_emq-3.png",[40,91633,91635],{"id":91634},"why-pulsar-fits-best","Why Pulsar fits best",[48,91637,91638],{},"To solve the problem stated previously, we need a highly available, distributed, and scalable messaging platform, which is Apache Pulsar.",[321,91640,91641,91644,91647],{},[324,91642,91643],{},"Apache Pulsar is a highly available and scalable messaging platform with easy deployment and maintenance.",[324,91645,91646],{},"Apache Pulsar achieves high throughput with 1.8M messages per second for a partition, which fully satisfies our needs for a large volume of data.",[324,91648,91649],{},"Apache Pulsar Functions are lightweight compute processes that consume messages from one or more Pulsar topics, apply user-supplied processing logics to each message, and publish the results of the computation to another topic. Apache Pulsar Functions support three kinds of runtimes: thread, process, and Kubernetes, which provides high flexibility for writing, running, and deploying Functions. Consequently, we need only focus on computation logic rather than dealing with complicated configuration and management, which helps us build a streaming platform easily.",[48,91651,91652],{},"With the highly available and scalable ability, Functions, and connectors, Pulsar helps us develop ActorCloud faster so that we select Pulsar as our messaging platform finally.",[40,91654,91656],{"id":91655},"how-we-use-pulsar-at-actorcloud","How we use Pulsar at ActorCloud",[48,91658,91659],{},"ActorCloud transfers business logic written in SQL to an engine through API and translate business rules to connectors and Functions in Pulsar. Sources consume the data from EMQ X Brokers through shared subscriptions, then Pulsar persists these data and processes them with Functions in real time, and sends them to external systems through sinks.",[8325,91661,91664],{"className":91662,"code":91663,"language":8330},[8328],"\n{ \n        \"id\": \"mailTest\",\n        \"sql\": \"SELECT temp FROM sensor WHERE temp > 0\",\n        \"enabled\": true,\n        \"actions\": [{\n            \"mail\": {\n                \"title\": \"temperature warning\",\n                \"content\": \"temperature is ${temp} degrees, please take action promptly\",\n                \"emails\": [ \"alert@emqx.io\" ]\n            }\n        }]\n    }\n\n",[4926,91665,91663],{"__ignoreMap":18},[48,91667,91668],{},[384,91669],{"alt":91670,"src":91671},"graph of EMQ and external systems with pulsar","\u002Fimgs\u002Fblogs\u002F63a2c6348b65c310ffce96d2_emq-4.png",[32,91673,91675],{"id":91674},"how-we-use-pulsar-functions-at-actorcloud","How we use Pulsar Functions at ActorCloud",[48,91677,91678],{},"Apache Pulsar provides native support for serverless functions where data is processed as soon as it arrives in a streaming fashion and gives flexible deployment options (thread, process, container). We need only focus on computation logic rather than dealing with complicated configuration or management, which helps us build a streaming platform faster and conveniently.",[48,91680,91681],{},[384,91682],{"alt":91683,"src":91684},"illustration to explain pulsar function","\u002Fimgs\u002Fblogs\u002F63a2c63477253501f3a7c055_emq-5.png",[321,91686,91687,91690,91693,91696,91699],{},[324,91688,91689],{},"To better support batch and stream processing scenarios, ActorCloud uses Pulsar window functions. Currently, Pulsar supports the count-based window and the time-based window.",[324,91691,91692],{},"ActorCloud uses Pulsar Functions API and Pulsar admin tool (create, delete, update, restart, stop, get, and so on) to manipulate functions, which simplifies the deployment and management.",[324,91694,91695],{},"ActorCloud uses Pulsar's shared subscription mode to extend the ability of data consumption. Besides, Pulsar supports exclusive, failover, and key_shared subscription modes.",[324,91697,91698],{},"Pulsar Functions provides three messaging semantics, that is, at-most-once delivery, at-least-once delivery, and effectively-once delivery. ActorCloud uses the at-least-once delivery to ensure each message sent to a function is processed at least once.",[324,91700,91701],{},"Pulsar deploys a multi-layer architecture of separating computation and storage. When storing data, ActorCloud configures message retention policies to select data retention periods. At the same time, Pulsar integrates with Presto SQL, allowing users to use Presto SQL to query data stored in BookKeeper. Consequently, ActorCloud uses Presto SQL to query real-time and historical data to deal with analytical tasks.",[32,91703,91705],{"id":91704},"how-we-use-pulsar-io-connectors-at-actorcloud","How we use Pulsar IO connectors at ActorCloud",[48,91707,91708],{},"Pulsar IO connectors come in two types:",[321,91710,91711,91714],{},[324,91712,91713],{},"Sources feed data into Pulsar from other systems, and common sources include other messaging systems and firehose-style data pipeline APIs.",[324,91715,91716],{},"Sinks are fed data from Pulsar, and common sinks include other messaging systems, SQL and NoSQL databases.",[48,91718,91719],{},[384,91720],{"alt":91721,"src":91722},"illustration pulsar IO connectors","\u002Fimgs\u002Fblogs\u002F63a2c6355623a84f02ca209b_emq-6.png",[48,91724,91725],{},"ActorCloud takes full advantage of Pulsar IO connectors and creates various sources and sinks to meet different needs.",[1666,91727,91728,91731,91734,91737,91740,91743,91746,91749,91752,91755],{},[324,91729,91730],{},"EMQ source",[324,91732,91733],{},"Read data from EMQ and write data to Pulsar topics (sync data from EMQ with Pulsar).",[324,91735,91736],{},"Mail sink",[324,91738,91739],{},"Receive data from Pulsar topics and send emails.",[324,91741,91742],{},"Publish sink",[324,91744,91745],{},"Receive data from Pulsar topics and send data to external systems using initialized HttpClient.",[324,91747,91748],{},"DB sink",[324,91750,91751],{},"Receive data from Pulsar topics and send data to external systems. DB sink encapsulates JDBC and supports sending data from Pulsar topics to SQLite, MySQL, and PostgreSQL.",[324,91753,91754],{},"EMQ sink",[324,91756,91757],{},"Receive data from Pulsar topics and send data to EMQ X topics.",[40,91759,319],{"id":316},[48,91761,91762],{},"With both EMQ X and Apache Pulsar, ActorCloud implements IoT device data access, device management, data storage, data analysis, and provides a flexible programming interface to develop IoT applications that meet specific needs of IoT vertical industries, and it enables horizontal expansion of device access and data processing.",[48,91764,91765],{},[384,91766],{"alt":91767,"src":91768},"illustration actor cloud EMQ","\u002Fimgs\u002Fblogs\u002F63a2c63548a256b4d588b171_emq-7.png",[48,91770,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":91772},[91773,91774,91775,91776,91780],{"id":19155,"depth":19,"text":19156},{"id":91621,"depth":19,"text":91622},{"id":91634,"depth":19,"text":91635},{"id":91655,"depth":19,"text":91656,"children":91777},[91778,91779],{"id":91674,"depth":279,"text":91675},{"id":91704,"depth":279,"text":91705},{"id":316,"depth":19,"text":319},"2019-09-09","An inside look at why ActorCloud chooses Apache Pulsar over other messaging systems.",{},"\u002Fblog\u002Fapache-pulsar-adoption-story-in-actorcloud-iot-platform",{"title":91600,"description":91782},"blog\u002Fapache-pulsar-adoption-story-in-actorcloud-iot-platform",[35559,821,51871,303],"NNt0_6SXRBuetm5ASoQoGxLgUlACiQ7nR8weARjDsnQ",{"id":91790,"title":91791,"authors":91792,"body":91794,"category":821,"createdAt":290,"date":91948,"description":91949,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":91950,"navigation":7,"order":296,"path":91951,"readingTime":39247,"relatedResources":290,"seo":91952,"stem":91953,"tags":91954,"__hash__":91955},"blogs\u002Fblog\u002Fuse-apache-pulsar-as-streaming-table-with-8-lines-of-code.md","Use Apache Pulsar as Streaming Table with 8 Lines of Code",[91793],"Yijie Shen",{"type":15,"value":91795,"toc":91936},[91796,91800,91803,91806,91814,91817,91821,91824,91835,91838,91844,91847,91858,91861,91867,91871,91874,91877,91880,91883,91887,91890,91894,91897,91901,91904,91908,91911,91915,91918,91920],[40,91797,91799],{"id":91798},"why-redesign-pulsar-flink-connector","Why redesign Pulsar Flink connector",[48,91801,91802],{},"In our previous post, we presented Apache Pulsar as a cloud-native messaging system that is designed for streaming performance and scalability, analyzed the integration status with Apache Flink 1.6, and looked forward to possible future integrations.",[48,91804,91805],{},"Recently, as Apache Flink just released version 1.9.0 with many notable features, we've reconsidered the previous integration and decided to redesign it from the ground up. In the rework, we followed these two principles:",[321,91807,91808,91811],{},[324,91809,91810],{},"Regard Table API as first-class citizens, ease its use without compromising expressiveness.",[324,91812,91813],{},"Resilient to failures with exactly-once source and at-least-once sink.",[48,91815,91816],{},"In the next sections, we would present the use and the design of the new Pulsar Flink connector.",[40,91818,91820],{"id":91819},"register-a-pulsar-table-with-a-minimum-number-of-taps","Register a Pulsar table with a minimum number of taps",[48,91822,91823],{},"For messages sent to Pulsar, we know everything about them:",[321,91825,91826,91829,91832],{},[324,91827,91828],{},"The schema of the data, whether it is a primitive type or a record with multiple fields.",[324,91830,91831],{},"The storage format of messages in Pulsar, whether it's AVRO, JSON or Protobuf.",[324,91833,91834],{},"Your desired metadata, such as event time and publish time.",[48,91836,91837],{},"Therefore, users are supposed to concern less on these storage details and concentrate more on their business logic. A Pulsar streaming table could be instantly composed:",[8325,91839,91842],{"className":91840,"code":91841,"language":8330},[8328],"\nval prop = new Properties()\n    prop.setProperty(\"service.url\", \"pulsar:\u002F\u002F....\")\n    prop.setProperty(\"admin.urrl\", \"http:\u002F\u002F....\")\n    prop.setProperty(\"topicsPattern\", \"topic-*\")\n    tableEnv\n        .connect(new Pulsar().properties(prop))\n        .inAppendMode()\n        .registerTableSource(\"table1\")\n\n",[4926,91843,91841],{"__ignoreMap":18},[48,91845,91846],{},"From now on, you could print the schema of the table1, select desired fields, and build analysis based on the table1. Behind the scenes, we do several tedious works for you:",[321,91848,91849,91852,91855],{},[324,91850,91851],{},"Find all matching topics currently available and keep an eye on any changes while the streaming job is running.",[324,91853,91854],{},"Fetch schemas for each topic, make sure they all share one same schema; otherwise, it is meaningless to go on with analytics.",[324,91856,91857],{},"Build a scheme-specific deserializer on the initializing phase for each read thread. The deserializer knows the format of messages and converts to one Flink Row for each message.",[48,91859,91860],{},"The figure below provides some implementation details of a source task:",[48,91862,91863],{},[384,91864],{"alt":91865,"src":91866},"Checkpoint illustration","\u002Fimgs\u002Fblogs\u002F63a2c3abecfd4d29b4a2359e_pulsar-flink-2.png",[40,91868,91870],{"id":91869},"at-least-once-pulsar-sink","At-least-once Pulsar sink",[48,91872,91873],{},"When you send a message to Pulsar using sendAsync, your message will be buffered in a pendingMessages queue, and you will get a CompletableFuture handle. You can register a callback with the handle and get notified once the sending is complete. Another Pulsar producer API flush sends all messages buffered in the client directly and wait until all messages have been successfully persisted.",[48,91875,91876],{},"We use these two APIs in our Pulsar sink implementation to guarantee its at-least-once semantic. For each record we receive in the sink, we send it to Pulsar with sendAsync and maintain a count pendingRecords that has not been persistent.",[48,91878,91879],{},"On each checkpoint, we call flush() manually and wait for message acknowledgments from Pulsar brokers. The checkpoint is considered complete when we get all acknowledgments and pendingRecords decreases to 0, and the checkpoint is regarded as a failure if an exception occurs while persisting messages.",[48,91881,91882],{},"By default, a failing checkpoint in Flink causes an exception that results in an application restart; therefore, messages are guaranteed to be persisted at least once.",[40,91884,91886],{"id":91885},"future-directions","Future directions",[48,91888,91889],{},"We have the following plans under our belts.",[32,91891,91893],{"id":91892},"unified-source-api-for-both-batch-and-streaming-execution","Unified source API for both batch and streaming execution",[48,91895,91896],{},"FLIP-27 is brought up again recently since Flink community starts to prepare its 1.10 features. We would stay tuned to its status and bring our new connector to batch\u002Fstreaming compatible.",[32,91898,91900],{"id":91899},"pulsar-as-a-state-backend","Pulsar as a state backend",[48,91902,91903],{},"Since Pulsar has a layered architecture (Streams and Segmented Streams, powered by Apache Bookkeeper), it becomes natural to use Pulsar as a storage layer and store Flink state.",[32,91905,91907],{"id":91906},"scale-out-source-parallelism","Scale-out source parallelism",[48,91909,91910],{},"Currently, source parallelism has an upper limit to the number of topic partitions. For the upcoming Pulsar 2.5.0, key-shared subscription and sticky consumer enables us to scale-out source parallelism while maintaining the semantics of exact-once.",[32,91912,91914],{"id":91913},"end-to-end-exactly-once","End-to-end exactly-once",[48,91916,91917],{},"One of the vital features in Pulsar 2.5.0 is transaction support. Once transactional produce is enabled, we could achieve end-to-end exactly-once with Flink two-phase commit sink.",[40,91919,78580],{"id":78579},[321,91921,91922,91929],{},[324,91923,91924],{},[55,91925,91928],{"href":91926,"rel":91927},"https:\u002F\u002Fflink.apache.org\u002F2019\u002F05\u002F03\u002Fpulsar-flink.html",[264],"When Flink & Pulsar Come Together",[324,91930,91931],{},[55,91932,91935],{"href":91933,"rel":91934},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FFLINK\u002FFLIP-27:+Refactor+Source+Interface",[264],"Refactor Source Interface",{"title":18,"searchDepth":19,"depth":19,"links":91937},[91938,91939,91940,91941,91947],{"id":91798,"depth":19,"text":91799},{"id":91819,"depth":19,"text":91820},{"id":91869,"depth":19,"text":91870},{"id":91885,"depth":19,"text":91886,"children":91942},[91943,91944,91945,91946],{"id":91892,"depth":279,"text":91893},{"id":91899,"depth":279,"text":91900},{"id":91906,"depth":279,"text":91907},{"id":91913,"depth":279,"text":91914},{"id":78579,"depth":19,"text":78580},"2019-08-28","Learn the latest updates of integration between Apache Pulsar and Apache Flink.",{},"\u002Fblog\u002Fuse-apache-pulsar-as-streaming-table-with-8-lines-of-code",{"title":91791,"description":91949},"blog\u002Fuse-apache-pulsar-as-streaming-table-with-8-lines-of-code",[28572,821,8057],"jU64mSnxUl4QiA80MF73qVsm962ACwr5KG4_V5W0BzA",{"id":91957,"title":91958,"authors":91959,"body":91961,"category":821,"createdAt":290,"date":92173,"description":92174,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":92175,"navigation":7,"order":296,"path":92176,"readingTime":23092,"relatedResources":290,"seo":92177,"stem":92178,"tags":92179,"__hash__":92180},"blogs\u002Fblog\u002Fgetui-push-notification-system.md","Build a Priority-based Push Notification System Using Apache Pulsar at GeTui",[91960],"Zi Xiang",{"type":15,"value":91962,"toc":92162},[91963,91965,91973,91976,91979,91985,91988,91991,91995,91997,92000,92003,92009,92012,92016,92019,92024,92030,92035,92038,92040,92043,92046,92059,92063,92066,92069,92089,92095,92099,92102,92146,92148,92151,92154,92157,92160],[40,91964,19156],{"id":19155},[48,91966,91967,91972],{},[55,91968,91971],{"href":91969,"rel":91970},"https:\u002F\u002Fwww.getui.com\u002F",[264],"GeTui"," is one of the largest third-party push notification service providers in China. It helps mobile application developers set up and send notifications to users across iOS, Android, and other platforms, by leveraging data-driven analysis on user profiles.",[48,91974,91975],{},"Since 2010, GeTui has successfully supported over hundreds of thousands of applications and billions of users, including DiDi, JD.com, Weibo, NetEase, People's Daily, Xinhua News Agency, CCTV, and so on.",[48,91977,91978],{},"As a notification service provider, the message queuing system plays an extremely significant role within GeTui.",[48,91980,91981],{},[384,91982],{"alt":91983,"src":91984},"illustration of a cluster and phone SDK","\u002Fimgs\u002Fblogs\u002F63a2c243f76c82bde7d3f1cc_getui-overview.png",[48,91986,91987],{},"Figure 1 illustrates the overview of GeTui push notification service. When a GeTui customer needs to send push notifications to its end users, it first sends messages to GeTui's push notification service. The push notifications are queued in the service based on their priorities.",[48,91989,91990],{},"However, resource contention increases when the number of push notifications waiting in message queues increases. It drives demands for a priority-based push notification design because we need to allocate more resources to customers with high priorities.",[40,91992,91994],{"id":91993},"kafka-solution","Kafka solution",[32,91996,33228],{"id":33227},[48,91998,91999],{},"Our first priority-based push notification solution was implemented by using Apache Kafka.",[48,92001,92002],{},"Kafka is a high-performance distributed streaming platform developed by LinkedIn, which is also widely used within GeTui, from log aggregation to online and offline message distribution, and many other use cases.",[48,92004,92005],{},[384,92006],{"alt":92007,"src":92008},"illustration of priority queue","\u002Fimgs\u002Fblogs\u002F63a2c2435623a8a798c6a74a_getui-queue.png",[48,92010,92011],{},"In this solution, we set the priority of messages into three levels: high, normal, and low. Messages of each priority are stored in a group of topics. The push notification tasks are sent to different topics based on their priorities. Downstream consumers receive messages based on their priorities. The push notification tasks with the same priority are polled in a round-robin way. It guarantees push notifications with higher priorities can be sent as early as possible, and push notifications with low priority can be eventually sent as well.",[32,92013,92015],{"id":92014},"problems","Problems",[48,92017,92018],{},"When the business grows and the number of applications using our service increases, the Kafka solution ran into problems as below:",[321,92020,92021],{},[324,92022,92023],{},"For customers with the same priority levels, their notification tasks pushed at the same time becomes more and more. Later tasks (taskN in the image below) are delayed due to earlier tasks (task1, task2, task3 in the image below) are waiting to be processed. If task1 has a high volume of messages, then taskN will wait until task1 is finished.",[48,92025,92026],{},[384,92027],{"alt":92028,"src":92029},"illustration og problems resolved by kafka sollution","\u002Fimgs\u002Fblogs\u002F63a2c243ecfd4dcfcda158f6_getui-task.png",[321,92031,92032],{},[324,92033,92034],{},"When the number of topics increases from 64 to 256, the throughput of Kafka degrades sharply. Since in Kafka, each topic and partition are stored as one or a few physical files, when the number of topics increases, random IO access introduces lots of IO contentions and consumes lots of I\u002FO resources. Hence, we can not solve the first problem by just increasing the number of topics.",[48,92036,92037],{},"To solve the problems stated previously, we need to evaluate another messaging system that supports a large number of topics while maintaining as high throughput as Kafka. After doing some investigations, Apache Pulsar catches our attention.",[40,92039,91635],{"id":91634},[48,92041,92042],{},"Apache Pulsar is a next-generation distributed messaging system developed at Yahoo, it was developed from the ground up to address several shortcomings of existing open-source messaging systems and has been running in Yahoo's production for three years, powering critical applications like Mail, Finance, Sports, Flickr, the Gemini Ads Platform, and Sherpa (Yahoo's distributed key-value store). Besides, Pulsar was open-sourced in 2016 and graduated from the Apache incubator as an Apache top-level project (TLP) in September 2018.",[48,92044,92045],{},"After working closely with the Pulsar community and diving deeper into Pulsar, we decided to adopt Pulsar for the new priority-based push notification solution for the following reasons:",[321,92047,92048,92051,92054,92056],{},[324,92049,92050],{},"Pulsar can scale to millions of topics with high performance, and its segment-based architecture delivers better scalability.",[324,92052,92053],{},"Pulsar provides a simple and flexible messaging model that unifies queuing and streaming, so it can be used for both work queue and pub-sub messaging use cases.",[324,92055,50984],{},[324,92057,92058],{},"Pulsar provides an excellent I\u002FO isolation, which is suitable for both messaging and streaming workloads.",[40,92060,92062],{"id":92061},"pulsar-solution","Pulsar solution",[48,92064,92065],{},"After extensive discussions, we settled down a new solution using Apache Pulsar.",[48,92067,92068],{},"The Pulsar solution is close to the Kafka solution, but it solves the problems we encountered in Kafka by leveraging Pulsar's advantages.",[321,92070,92071,92074,92077,92080,92083,92086],{},[324,92072,92073],{},"In Pulsar solution, we create topics dynamically based on tasks. It guarantees later tasks do not wait due to other tasks are waiting to be processed in a queue.",[324,92075,92076],{},"We create a Pulsar topic for each task with normal-level and high-level priorities and create a fixed number of topics for tasks with low-level priority.",[324,92078,92079],{},"Tasks with the same priorities are polling the topic to read messages, when the quotas are filled up, tasks with the next same priorities move to read messages in the next priority level.",[324,92081,92082],{},"Tasks with the same priority can modify quota, which guarantees they can receive more messages.",[324,92084,92085],{},"Consumers can be added and deleted dynamically using Pulsar's shared subscription without the need to increase and rebalance partitions.",[324,92087,92088],{},"BookKeeper provides the flexibility of adding storage resources online and without rebalancing the old partitions.",[48,92090,92091],{},[384,92092],{"alt":92093,"src":92094},"illustration of pulsar system solution","\u002Fimgs\u002Fblogs\u002F63a2c2434269eefc3563dfc4_getui-pulsar-solution.png",[40,92096,92098],{"id":92097},"best-practice-of-pulsar","Best practice of Pulsar",[48,92100,92101],{},"Pulsar has been successfully running on production for months serving the new priority-based push notification system. During the whole process of adopting and running Pulsar on production, we have collected some best practices on how to make Pulsar work smoothly and efficiently on our production.",[321,92103,92104,92107,92110,92113,92116,92119,92122,92125,92128,92131,92134,92137,92140,92143],{},[324,92105,92106],{},"Different subscriptions are relatively independent. If you want to consume some messages of a topic repeatedly, you need to use different subscriptionName to subscribe. Monitor your backlog when adding new subscriptions. Pulsar uses a subscription-based retention mechanism. If you have an unused subscription, please remove it; otherwise, your backlog will keep growing.",[324,92108,92109],{},"If a topic is not subscribed, messages sent to the topic are dropped by default. Consequently, if producers send messages to topics first, and then consumers receive the messages later, you need to make sure the subscription has been created before producers sending messages to topics; otherwise some messages will not be consumed.",[324,92111,92112],{},"If no producers send messages to a topic, or no consumers subscribe to a topic, then the topic is deleted after a period. You can disable this behavior by setting brokerDeleteInactiveTopicsEnabled to false.",[324,92114,92115],{},"TTL and other policies are applied to a whole namespace rather than a topic.",[324,92117,92118],{},"By default, Pulsar stores metadata under root znode of ZooKeeper. It is recommended to configure the Pulsar cluster with a prefix zookeeper path.",[324,92120,92121],{},"Pulsar's Java API is different from Kafka's, that is, messages need to be explicitly acknowledged in Pulsar.",[324,92123,92124],{},"The storage size displayed in Pulsar dashboard is different from the storage size shown in Prometheus. The storage size shown in Prometheus is the total physical storage size, including all replicas.",[324,92126,92127],{},"Increase dbStorage_rocksDB_blockCacheSize to prevent slow-down in reading large volume of backlog.",[324,92129,92130],{},"More partitions lead to higher throughputs.",[324,92132,92133],{},"Use stats and stats-internal to retrieve topic statistics when troubleshooting a problem in your production cluster.",[324,92135,92136],{},"The default backlogQuotaDefaultLimitGB in Pulsar is 10 GB. If you are using Pulsar to store messages for multiple days, it is recommended to increase the amount or set a large quota for your namespaces. Choose a proper backlogQuotaDefaultRetentionPolicy for your use case because the default policy is producer_request_hold, which rejects produce requests when you exhaust the quota.",[324,92138,92139],{},"Set the backlog quota based on your use case.",[324,92141,92142],{},"Since Pulsar reads and dispatches messages in the broker's cache directly, the read time metrics of BookKeeper in Prometheus can be null at most of the time.",[324,92144,92145],{},"Pulsar writes messages to journal files and writes cache synchronously, and the write cache is flushed back to log files and RocksDB asynchronously. It is recommended to use SSD for storing journal files.",[40,92147,319],{"id":316},[48,92149,92150],{},"We have successfully run the new Pulsar based solution on production for some use cases for a few months. Pulsar has shown great stability. We keep watching the news, updates, and activities in the Pulsar community and leverage the new features for our use cases.",[48,92152,92153],{},"Graduated from the ASF incubator as a top-level project in 2018, Pulsar has plenty of attractive features and advantages over competitors, such as geo-replication, multi-tenancy, seamless cluster expansion, read-write separation, and so on.",[48,92155,92156],{},"The Pulsar community is still young, but there is already a fast-growing tendency of adopting Pulsar for replacing many legacy messaging systems.",[48,92158,92159],{},"During the process of adopting and running Pulsar, we run into a few problems, and a huge thank you goes to Jia Zhai and Sijie Guo from StreamNative for providing quality support.",[48,92161,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":92163},[92164,92165,92169,92170,92171,92172],{"id":19155,"depth":19,"text":19156},{"id":91993,"depth":19,"text":91994,"children":92166},[92167,92168],{"id":33227,"depth":279,"text":33228},{"id":92014,"depth":279,"text":92015},{"id":91634,"depth":19,"text":91635},{"id":92061,"depth":19,"text":92062},{"id":92097,"depth":19,"text":92098},{"id":316,"depth":19,"text":319},"2019-07-23","Learn why GeTui chooses Pulsar over Kafka as its messaging system for push notification needs and the best practice of running Pulsar on production.",{},"\u002Fblog\u002Fgetui-push-notification-system",{"title":91958,"description":92174},"blog\u002Fgetui-push-notification-system",[35559,821],"RS9aF7oZ8QeoXsrn9VjQ_jIoAh0Keya-WJ_hKDNi_A0",{"id":92182,"title":92183,"authors":92184,"body":92185,"category":821,"createdAt":290,"date":92485,"description":92486,"extension":8,"featured":294,"image":92487,"isDraft":294,"link":290,"meta":92488,"navigation":7,"order":296,"path":92489,"readingTime":38438,"relatedResources":290,"seo":92490,"stem":92491,"tags":92492,"__hash__":92493},"blogs\u002Fblog\u002Fone-storage-system-both-real-time-historical-data-analysis-apache-pulsar-story.md","One Storage System for both Real-time and Historical Data Analysis - An Apache Pulsar story",[91793],{"type":15,"value":92186,"toc":92471},[92187,92191,92194,92197,92200,92203,92211,92215,92218,92232,92235,92237,92240,92244,92247,92251,92256,92259,92262,92265,92269,92272,92276,92279,92284,92290,92295,92301,92306,92312,92317,92323,92327,92330,92334,92340,92343,92372,92376,92379,92385,92391,92394,92397,92411,92415,92418,92420,92446,92448,92469],[40,92188,92190],{"id":92189},"the-state-of-the-art-real-time-data-storage-and-processing-approach","The state-of-the-art real-time data storage and processing approach",[48,92192,92193],{},"In the field of massively parallel data analysis, AMPLab's \"One stack to rule them all\" proposes to use Apache Spark as a unified engine to support all commonly used data processing scenarios, such as batch processing, stream processing, interactive query, and machine learning. Structured streaming is the new Apache Spark API released in Spark 2.2.0 that lets you express computation on streaming data in the same way you express a batch computation on static data, and Spark SQL engine performs a wide range of optimizations for both scenarios internally.",[48,92195,92196],{},"On the other hand, Apache Flink get in the public eye around 2016 with many appealing features, for example, better stream processing support at the time, the built-in watermark support, and exactly-once semantics. Flink has quickly become a strong competitor for Spark. Regardless of the platform, users nowadays are more concerned about how to quickly discover the value of data. Streaming data and static data are no longer separate entities, but two different representations through the data lifecycle.",[48,92198,92199],{},"A natural idea arises: can I keep all streaming data in messaging systems as they are collected? For traditional systems, the answer is no. Take Apache Kafka as an example, in Kafka, storage of topics is partition-based—a topic partition is entirely stored within and accessed by a single broker, whose capacity is limited by the capacity of the smallest node. Therefore, as data size grows, capacity expansion can only be achieved by partition rebalancing, which in turn requires recopying the whole partition for balancing both data and traffic to newly added brokers. Recopying data is expensive and error-prone, and it consumes network bandwidth and IO. To make matters worse, Kafka is designed to run on physical machines, as we are moving towards container-based cloud architecture, it lacks many key features such as I\u002FO isolation, multi-tenancy, and scalability.",[48,92201,92202],{},"Due to the limitations of existing streaming platforms, organizations utilize two separate systems for streaming data storage: a messaging system for newly imported data, and later off-load aged data to cold storage for the long-time store. The separation of data store into two systems brought in two main obstacles inevitably:",[321,92204,92205,92208],{},[324,92206,92207],{},"On the one hand, to guarantee the correctness and real-time of analysis results, users are required to be aware of boundaries of each data and need to perform joint queries with data stored in two systems;",[324,92209,92210],{},"On the other hand, dumping streaming data to file or object storage periodically requires additional operation and maintenance costs as well as a considerable consumption of cluster computation resources.",[40,92212,92214],{"id":92213},"a-short-introduction-to-apache-pulsar","A short introduction to Apache Pulsar",[48,92216,92217],{},"Apache Pulsar is an enterprise-grade distributed messaging system created at Yahoo and now it is a top-level open source project in the Apache Software Foundation. Pulsar follows the general pub-sub pattern, where a producer publishes a message to a topic; a consumer can subscribe to the topic, processes a received message, and send a confirmation after the message is processed (Ack). A subscription is a named configuration rule that determines how messages are delivered to consumers. Pulsar enables four types of subscriptions that can coexist on the same topic, distinguished by subscription name:",[321,92219,92220,92223,92226,92229],{},[324,92221,92222],{},"Exclusive subscription—only a single consumer is allowed to attach to the subscription.",[324,92224,92225],{},"Shared subscriptions—can be subscribed by multiple consumers; each consumer receives a portion of the messages.",[324,92227,92228],{},"Failover subscriptions—multiple consumers can attach to the same subscription, but only one consumer can receive messages. Only when the current consumer fails, the next consumer in line begins to receive messages.",[324,92230,92231],{},"Key-shared subscriptions (beta)—Multiple consumers can attach to the same subscription, and messages with the same key or same ordering key are delivered to only one consumer.",[48,92233,92234],{},"Pulsar was created from the ground up as a multi-tenant system. To support multi-tenancy, Pulsar has a concept of tenants. Tenants can be spread across clusters and can have their authentication and authorization scheme applied to them. They are also the administrative unit at which storage quotas, message TTL, and isolation policies can be managed. The multi-tenant nature of Pulsar is reflected mostly visibly in topic URLs, which have this structure: persistent:\u002F\u002Ftenant\u002Fnamespace\u002Ftopic. As you can see, the tenant is the most basic unit of categorization for topics (more fundamental than the namespace and topic name).",[40,92236,91635],{"id":91634},[48,92238,92239],{},"A fundamentally layered architecture and segment-centric storage (with Apache BookKeeper) are two key design philosophies that make Apache Pulsar more advanced compared with other messaging systems. An Apache Pulsar cluster is composed of two layers: a stateless serving layer, comprised of a set of brokers that receive and deliver messages, and a stateful persistence layer, comprised of a set of Apache BookKeeper storage nodes called bookies that durably store messages. Let's investigate the designs one by one:",[32,92241,92243],{"id":92242},"layered-architecture","Layered architecture",[48,92245,92246],{},"Similar to Kafka, Pulsar stores messages based on topic partitions, each topic partition is assigned to one of the living brokers in Pulsar, which is called the owner broker of that topic partition. The owner broker serves message-reads from the partition and message-writes to the partition. If a broker fails, Pulsar automatically moves the topic partitions that were owned by it to the remaining available brokers in the cluster. Since brokers are \"stateless\", Pulsar only transfers ownership from one broker to another during nodes failure or broker cluster expansion, no data copy occurred during this time.",[32,92248,92250],{"id":92249},"segment-centric-storage","Segment-centric storage",[48,92252,92253],{},[384,92254],{"alt":92255,"src":85033},"illustration Segment-centric storage",[48,92257,92258],{},"As shown in Figure 1, messages on a Pulsar topic partition are stored in a distributed log, and the log is further divided into segments. Each segment is stored as an Apache BookKeeper ledger that is distributed and stored in multiple bookies in the cluster. A new segment is created either after a previous segment has been written for longer than a configured interval (aka time-based rolling), or if the size of the previous segment has reached a configured threshold (aka size-based rolling), or whenever the ownership of topic partition is changed. With segmentation, the messages in a topic partition can be evenly distributed and balanced across all the bookies in the cluster, which means the capacity of a topic partition is not limited only by the capacity of one node. Instead, it can scale up to the total capacity of the whole BookKeeper cluster.",[48,92260,92261],{},"The two design philosophies in Apache Pulsar provide several significant benefits such as unlimited topic partition storage, instant scaling without data rebalancing, and independent scalability of serving and storage clusters. Besides, tiered storage brings in Pulsar 2.0 provides an alternative way to reduce storage cost for aged data. With tiered storage, older messages in bookies can be moved to cheaper storage such as HDFS or S3.",[48,92263,92264],{},"Last but not least, Pulsar provides typed messages storage via Pulsar Schema; therefore you can designate data schema while creating a topic, and Pulsar does the rest of the intricate work for you, such as message validation, message serialization to and message deserialization from the wire format.",[40,92266,92268],{"id":92267},"pulsar-spark-connector","Pulsar Spark Connector",[48,92270,92271],{},"We have developed a Pulsar Spark Connector that enables Spark to execute streaming or batch job against messages stored in Pulsar and writes job results back to Pulsar.",[32,92273,92275],{"id":92274},"pulsar-spark-connector-api","Pulsar Spark Connector API",[48,92277,92278],{},"Since the Structured Streaming in Spark 2.2.0, Spark keeps SparkSession as the only entrance to write a program, and you could use the declarative API called DataFrame\u002FDataSet to meet your needs. In such a program, you declare how a DataFrame is generated, transformed, and finally written. Spark SQL engine does other optimizations and runs your code distributedly on a cluster. Take the following codes as illustrative examples to use Pulsar as a data source or data sink:",[321,92280,92281],{},[324,92282,92283],{},"Construct a streaming source using one or more topics.",[8325,92285,92288],{"className":92286,"code":92287,"language":8330},[8328],"val df = spark\n        .readStream\n        .format(\"pulsar\")\n        .option(\"service.url\", \"pulsar:\u002F\u002Flocalhost:6650\")\n        .option(\"admin.url\", \"http:\u002F\u002Flocalhost:8080\")\n        .option(\"topicsPattern\", \"topic.*\") \u002F\u002F Subscribe to a pattern\n        \u002F\u002F .option(\"topics\", \"topic1,topic2\")    \u002F\u002F Subscribe to multiple topics\n        \u002F\u002F .option(\"topic\", \"topic1\"). \u002F\u002Fsubscribe to a single topic\n        .option(\"startingOffsets\", startingOffsets)\n        .load()\n    df.selectExpr(\"CAST(__key AS STRING)\", \"CAST(value AS STRING)\").as[(String, String)]\n",[4926,92289,92287],{"__ignoreMap":18},[321,92291,92292],{},[324,92293,92294],{},"Construct a batch source.",[8325,92296,92299],{"className":92297,"code":92298,"language":8330},[8328],"val df = spark\n        .read\n        .format(\"pulsar\")\n        .option(\"service.url\", \"pulsar:\u002F\u002Flocalhost:6650\")\n        .option(\"admin.url\", \"http:\u002F\u002Flocalhost:8080\")\n        .option(\"topicsPattern\", \"topic.*\")\n        .option(\"startingOffsets\", \"earliest\")\n        .option(\"endingOffsets\", \"latest\")\n        .load()\ndf.selectExpr(\"CAST(__key AS STRING)\", \"CAST(value AS STRING)\")\n        .as[(String, String)]\n",[4926,92300,92298],{"__ignoreMap":18},[321,92302,92303],{},[324,92304,92305],{},"Sink streaming results continuously to Pulsar topics",[8325,92307,92310],{"className":92308,"code":92309,"language":8330},[8328],"val ds = df\n        .selectExpr(\"__topic\", \"CAST(__key AS STRING)\", \"CAST(value AS STRING)\") \u002F\u002F the __topic field is used to choose the right topic for each record\n        .writeStream\n        .format(\"pulsar\")\n        .option(\"service.url\", \"pulsar:\u002F\u002Flocalhost:6650\")\n        .start()\n",[4926,92311,92309],{"__ignoreMap":18},[321,92313,92314],{},[324,92315,92316],{},"Write batch results to Pulsar.",[8325,92318,92321],{"className":92319,"code":92320,"language":8330},[8328],"df.selectExpr(\"CAST(__key AS STRING)\", \"CAST(value AS STRING)\")\n        .write\n        .format(\"pulsar\")\n        .option(\"service.url\", \"pulsar:\u002F\u002Flocalhost:6650\")\n        .option(\"topic\", \"topic1\")\n        .save()\n",[4926,92322,92320],{"__ignoreMap":18},[3933,92324,92326],{"id":92325},"tip","Tip",[48,92328,92329],{},"Pulsar Spark Connector support DataSet\u002FDataFrame read from and write to Pulsar messages directly, the metadata fields of a message, such as an event time, message Id, are prefixed with two underscores (e.g.eventTime) to avoid potential naming conflict with messages' payload.",[32,92331,92333],{"id":92332},"pulsar-spark-connector-internals","Pulsar Spark Connector internals",[48,92335,92336],{},[384,92337],{"alt":92338,"src":92339},"illustration Pulsar Spark Connector internals","\u002Fimgs\u002Fblogs\u002F63a1eb9b7b156c1fe28148d5_ssc.png",[48,92341,92342],{},"Figure 2 shows the main components of Structured Streaming (abbreviated as ''SS'' hereafter):",[321,92344,92345,92348,92351,92354,92357,92360,92363,92366,92369],{},[324,92346,92347],{},"Input and Output—provides fault tolerance. SS requires input sources which must be replayable, allowing re-read recent input data if a node crashes (Pulsar Spark Connector guarantees this). Output sinks must support idempotent writes to provide ''exactly-once'' (Pulsar Spark Connector cannot do this currently, we provide ''at-least-once'' guarantee and you could deduplicate messages in later Spark jobs through a primary key).",[324,92349,92350],{},"API—batch and streaming programs share Spark SQL batch API, with several API features to support streaming specifically.",[324,92352,92353],{},"Triggers control how often the engine attempts to compute a new result and update the output sink.",[324,92355,92356],{},"Users can use watermark policy to determine when to stop handling late arrived data.",[324,92358,92359],{},"Stateful operators allow users to track and update mutable state by keys in case of complex processing.",[324,92361,92362],{},"Execution layer—On receiving a DataFrame query, SS determines how to run it incrementally (for streaming queries), optimizes it and runs it through one of the execution models:",[324,92364,92365],{},"Microbatch model by default for higher throughput with dynamic load balancing, rescaling, fault recovery, and straggler mitigation.",[324,92367,92368],{},"Continuous model for low-latency circumstances.",[324,92370,92371],{},"Log and state store—Write-ahead-Log is written first for tracking consumed positions in each source. A large scale state store takes snapshots of the internal state of operators and facilitates the recovering procedure during failure.",[3933,92373,92375],{"id":92374},"the-execution-flow-for-a-streaming-job","The execution flow for a streaming job",[48,92377,92378],{},"For Pulsar Spark Connector, source and sink in Spark defines how we should implement read\u002Fwrite logic:",[8325,92380,92383],{"className":92381,"code":92382,"language":8330},[8328],"trait Source {\n        def schema: StructType\n        def getOffset: Option[Offset]\n        def getBatch(start: Option[Offset], end: Offset): DataFrame\n        def commit(end: Offset): Unit\n        def stop(): Unit\n    }\ntrait Sink {\n        def addBatch(batchId: Long, data: DataFrame): Unit\n}\n",[4926,92384,92382],{"__ignoreMap":18},[48,92386,92387],{},[384,92388],{"alt":92389,"src":92390},"Figure 3 The execution flow for a streaming job","\u002Fimgs\u002Fblogs\u002F63a1eb9b8f866f3b3530dc25_flow.png",[48,92392,92393],{},"StreamExecution handles the execution logic internally. Figure 3 shows a microbatch execution flow inside StreamExecution: 1. At the very beginning of each microbatch, SS asks the source for available data (getOffset) and persist it to the WAL. 2. The source then provides the data inside a batch (getBatch) based on the start and end offsets provided by SS. 3. SS triggers the optimization and compilation of the logical plan and writes the calculation result to sink (addBatch). Note: the actual data acquisition and calculation happen here. 4. Once the data is successfully written to the sink, SS notifies the source that the data can be discarded (commit) and the successfully executed batchId is written to the internal commitLog.",[48,92395,92396],{},"Back to Pulsar Spark Connector, we do the following stuff:",[321,92398,92399,92402,92405,92408],{},[324,92400,92401],{},"During the query planning phase, topics schema would be fetched from Pulsar, check for compatibility (topics in a query should share the same schema) and transformed to DataFrame schema.",[324,92403,92404],{},"Create a consumer for each topic partition and return data between (start, end].",[324,92406,92407],{},"On receiving commit calls from SS, we resetCursor on a topic partition, notify Pulsar the data could be cleaned.",[324,92409,92410],{},"For DataFrame we received from addBatch, records are sent to corresponding topics through producer send.",[3933,92412,92414],{"id":92413},"topicpartition-adddelete-discovery","Topic\u002Fpartition add\u002Fdelete discovery",[48,92416,92417],{},"Streaming jobs are long-running by nature. During their execution, topics or partitions may be deleted or added. Pulsar Spark Connector enables topic\u002Fpartition discovery at the beginning of each microbatch or each epoch in continuous execution, by listing all partitions available and comparing these with the partitions from the last microbatch or epoch, we could easily find which partitions are newly added or which partitions are gone, and scheduling new tasks or remove existing tasks accordingly.",[40,92419,78580],{"id":78579},[321,92421,92422,92427,92435],{},[324,92423,92424],{},[55,92425,92268],{"href":85222,"rel":92426},[264],[324,92428,4221,92429,92434],{},[55,92430,92433],{"href":92431,"rel":92432},"https:\u002F\u002Fgithub.com\u002Fyjshen\u002Fconnector-test",[264],"tutorial"," on how to set up the development environment and start using the Pulsar Spark Connector.",[324,92436,92437,92438,1154,92442,92445],{},"If you are interested in Pulsar community news, Pulsar development details, and Pulsar user stories on production, follow ",[55,92439,92441],{"href":91566,"rel":92440},[264],"StreamNative Medium",[55,92443,36254],{"href":33664,"rel":92444},[264]," on Twitter.",[40,92447,22673],{"id":22672},[321,92449,92450,92455,92462],{},[324,92451,92452],{},[55,92453,58632],{"href":51111,"rel":92454},[264],[324,92456,92457],{},[55,92458,92461],{"href":92459,"rel":92460},"https:\u002F\u002Fcs.stanford.edu\u002F~matei\u002Fpapers\u002F2018\u002Fsigmod_structured_streaming.pdf",[264],"Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark in SIGMOD 2018",[324,92463,92464],{},[55,92465,92468],{"href":92466,"rel":92467},"https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002Fh938R5zZrB76OxLSqnCw4g",[264],"From Messaging systems to Data Platform - A great Chinese blog introduces the evolution of the messaging systems",[48,92470,3931],{},{"title":18,"searchDepth":19,"depth":19,"links":92472},[92473,92474,92475,92479,92483,92484],{"id":92189,"depth":19,"text":92190},{"id":92213,"depth":19,"text":92214},{"id":91634,"depth":19,"text":91635,"children":92476},[92477,92478],{"id":92242,"depth":279,"text":92243},{"id":92249,"depth":279,"text":92250},{"id":92267,"depth":19,"text":92268,"children":92480},[92481,92482],{"id":92274,"depth":279,"text":92275},{"id":92332,"depth":279,"text":92333},{"id":78579,"depth":19,"text":78580},{"id":22672,"depth":19,"text":22673},"2019-07-16","Learn how Apache Pulsar stores all real-time data in one single system and how it supports data analysis on the full-time range.","\u002Fimgs\u002Fblogs\u002F63c7c05ece7b433ae233424d_63b5c2c60a21cf3fe3c0f561_pulsar-spark-background.png",{},"\u002Fblog\u002Fone-storage-system-both-real-time-historical-data-analysis-apache-pulsar-story",{"title":92183,"description":92486},"blog\u002Fone-storage-system-both-real-time-historical-data-analysis-apache-pulsar-story",[35559,821,303],"6jFVkxk5OPeBNjjduAqBhv1cdAnWDj2PpxkP6nUiYNU",{"id":92495,"title":92496,"authors":92497,"body":92498,"category":821,"createdAt":290,"date":92932,"description":92933,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":92934,"navigation":7,"order":296,"path":92935,"readingTime":3556,"relatedResources":290,"seo":92936,"stem":92937,"tags":92938,"__hash__":92939},"blogs\u002Fblog\u002Fnew-in-apache-pulsar-2-4-0.md","What's New in Apache Pulsar 2.4.0",[806],{"type":15,"value":92499,"toc":92911},[92500,92503,92505,92508,92512,92515,92518,92522,92525,92531,92535,92538,92544,92549,92552,92560,92564,92567,92570,92573,92576,92579,92585,92593,92602,92606,92609,92612,92615,92618,92642,92650,92654,92657,92660,92663,92666,92672,92680,92682,92686,92722,92732,92734,92738,92746,92749,92755,92758,92772,92775,92778,92782,92785,92794,92798,92807,92818,92821,92827,92833,92837,92840,92843,92849,92852,92858,92861,92867,92871,92874,92877,92880,92886,92888],[48,92501,92502],{},"We are very glad to see the Apache Pulsar community has successfully released the wonderful 2.4.0 release after a few months of accumulated hard works. It is a great milestone for this fast-growing project and the whole Pulsar community. Here is a selection of some of the most interesting and important features the community added to this new release.",[40,92504,85547],{"id":85546},[48,92506,92507],{},"The following are the core development updates of Pulsar 2.4.0.",[32,92509,92511],{"id":92510},"pip-26-delayed-or-scheduled-message-delivery","PIP-26: Delayed or scheduled message delivery",[48,92513,92514],{},"Delayed message delivery and scheduled message delivery are commonly seen in the traditional messaging systems. A producer can specify a message to be delivered after a given delayed duration or at a scheduled time. The message is only dispatched to a consumer after time criteria is fully satisfied.",[48,92516,92517],{},"Pulsar introduces these two functionalities in 2.4.0 for the consumers of shared subscriptions. The following two examples demonstrate how to use these two features.",[3933,92519,92521],{"id":92520},"example-for-delayed-message-delivery","Example for delayed message delivery",[48,92523,92524],{},"The following example shows how to deliver messages after 3 minutes.",[8325,92526,92529],{"className":92527,"code":92528,"language":8330},[8328],"\nproducer.newMessage()\n        .deliverAfter(3L, TimeUnit.Minute)\n        .value(“Hello Pulsar after 3 minutes!”)\n        .send();\n\n",[4926,92530,92528],{"__ignoreMap":18},[3933,92532,92534],{"id":92533},"example-for-scheduled-message-delivery","Example for scheduled message delivery",[48,92536,92537],{},"The following example shows how to deliver messages at 11pm on 06\u002F27\u002F2019.",[8325,92539,92542],{"className":92540,"code":92541,"language":8330},[8328],"\nproducer.newMessage()\n        .deliverAt(new Date(2019, 06, 27, 23, 00, 00).getTime())\n        .value(“Hello Pulsar at 11pm on 06\u002F27\u002F2019!”)\n        .send();\n\n",[4926,92543,92541],{"__ignoreMap":18},[916,92545,92546],{},[48,92547,92548],{},"Note that the messages sent by deliverAfter or deliverAt will not be batched even batching is enabled at a producer side.",[48,92550,92551],{},"Pulsar broker uses a DelayedDeliveryTracker for tracking the delayed delivery of messages for a particular subscription. The current DelayedDeliveryTracker holds the delayed messages in an in-memory priority queue. So you have to plan for the memory usage when enabling the delayed delivery feature. A persistent hash-wheel based implementation was discussed in the community and is planned to add in the future to support a wider range of delay durations.",[48,92553,92554,92555,190],{},"To learn more about the design of delayed message delivery, see ",[55,92556,92559],{"href":92557,"rel":92558},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-26:-Delayed-Message-Delivery",[264],"PIP-26",[32,92561,92563],{"id":92562},"pip-34-key_shared-subscription","PIP-34: Key_Shared subscription",[48,92565,92566],{},"Prior to 2.4.0 release, Pulsar only supports 3 subscription modes, Exclusive, Failover and Shared. Both Exclusive and Failover subscription modes are streaming subscription modes. In such modes, a Pulsar partition can only be assigned to one consumer of the subscription to consume and the messages are dispatched in partition order. In contrast, the Shared subscription mode dispatches the messages of a single partition to multiple consumers in a round-robin fashion. Shared subscription is also known as queuing (or worker-queue) subscription mode.",[48,92568,92569],{},"In Exclusive and Failover subscriptions, the ordering of the messages is guaranteed on per partitions basis. However, the parallelism of the consumption is limited by the number of partitions of the topic. In contrast, the consumption parallelism of a Shared subscription can go beyond the number of partitions, but it doesn’t have any ordering guarantees.",[48,92571,92572],{},"In a lot of use cases such as change data capture (aka CDC) for distributed databases, applications require both the scalability of Shared subscription to increase the number of consumers for high throughput and the ordering guarantees provided in Exclusive or Failover subscription. Key_Shared subscription is introduced in 2.4.0 to meet this requirement.",[48,92574,92575],{},"In Key_Shared subscription, there can be more consumers than partitions. And the messages of the same key are routed to one consumer of the subscription.",[48,92577,92578],{},"The following example shows how to use Key_Shared subscription.",[8325,92580,92583],{"className":92581,"code":92582,"language":8330},[8328],"\nclient.newConsumer()\n        .topic(“topic”)\n        .subscriptionType(SubscriptionType.Key_Shared)\n        .subscriptionName(“key-shared-subscription”)\n        .subscribe();\n    \n",[4926,92584,92582],{"__ignoreMap":18},[48,92586,92587,92588,190],{},"If you are interested in learning the design details of Key_Shared subscription, see ",[55,92589,92592],{"href":92590,"rel":92591},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-34:-Add-new-subscribe-type-Key_shared",[264],"PIP-34",[48,92594,92595,92596,92601],{},"There are more cool features about Key_Shared subscription planned for 2.5.0 release. If you are interested in this feature or would like to contribute to it, you can follow the GitHub issue ",[55,92597,92600],{"href":92598,"rel":92599},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F4077",[264],"#4077"," and discuss your ideas with Pulsar committers.",[32,92603,92605],{"id":92604},"pip-36-configure-max-message-size-at-broker-side","PIP-36: Configure max message size at broker side",[48,92607,92608],{},"Previously, Pulsar limits the max message size (aka MaxMessageSize) to 5 MB. This setting was hardcoded at Pulsar encoder and decoder. Administrators cannot adjust this setting by modifying the broker configuration. But in some use cases, for example, when capturing change events from databases, a change event might be larger than 5 MB. These change events cannot be produced to Pulsar successfully.",[48,92610,92611],{},"Pulsar introduces a setting at broker configuration in 2.4.0 release. This setting allows administrators to configure a different value for the max message size. Additionally, Pulsar introduces a new field max_message_size in the CommandConnected response that brokers send back to clients when they connect. Then Pulsar clients are able to learn the MaxMessageSize that each broker supports and configure the batching buffer accordingly.",[48,92613,92614],{},"You need 2.4.0 release for both brokers and clients to leverage this feature.",[48,92616,92617],{},"Note that although Pulsar allows configuring max message size, it doesn’t mean it is recommended to configure the setting to an arbitrary large value. Because a very large max message size hurts IO and resource efficiency. There are also multiple PIPs tackling supporting arbitrary large sized messages by chunking the large messages into smaller chunked messages. These PIPs are:",[321,92619,92620,92631],{},[324,92621,92622,5410,92627],{},[55,92623,92626],{"href":92624,"rel":92625},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-37%3A-Large-message-size-handling-in-Pulsar",[264],"PIP-37: Large message size handling in Pulsar",[55,92628,92630],{"href":85577,"rel":92629},[264],"#4400",[324,92632,92633,5410,92637],{},[55,92634,92636],{"href":71438,"rel":92635},[264],"PIP-31: Transactional Streaming",[55,92638,92641],{"href":92639,"rel":92640},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F2664",[264],"#2664",[48,92643,92644,92645,92649],{},"You can follow the GitHub issue, subscribe the Pulsar mailing lists or join the ",[55,92646,92648],{"href":57760,"rel":92647},[264],"Pulsar slack channels"," to receive development updates about these features.",[32,92651,92653],{"id":92652},"pip-33-replicated-subscription","PIP-33: Replicated subscription",[48,92655,92656],{},"Geo-replication is one of the best features that Pulsar provides outperforming other messaging or streaming systems in the market. In a geo-replicated Pulsar instance, a topic can be configured to be replicated across multiple regions (for example, us-west, us-east and eu-central). The topic is presented as a virtual global entity in which messages can be published and consumed from any of the configured cluster.",[48,92658,92659],{},"However, the only limitation is that subscriptions are currently local to the cluster in which they are created. That says, the subscription state is NOT replicated across regions. If a consumer reconnects to a new region, it triggers the creation of a new unrelated subscription, albeit with the same name. The subscription will be created at the tail of the topic in the new region (or at the beginning, depending on its SubscriptionInitialPosition configuration) and at the same time, the original subscription will be left dangling in the previous region.",[48,92661,92662],{},"Pulsar introduces Replicated Subscription in 2.4.0. It added a mechanism to keep subscription state in-sync between multiple geo-replicated regions, within a sub-second framework.",[48,92664,92665],{},"You can configure your consumer to enable replicated subscription by setting replicateSubscriptionState to be true. The code example is shown as below:",[8325,92667,92670],{"className":92668,"code":92669,"language":8330},[8328],"\nConsumer consumer = client.newConsumer(Schema.STRING)\n    .topic(\"my-topic\")\n                .subscriptionName(\"my-subscription\")\n                .replicateSubscriptionState(true)\n                .subscribe();\n \n",[4926,92671,92669],{"__ignoreMap":18},[48,92673,92674,92675,190],{},"If you are interested in learning the design details about replicated subscription, see ",[55,92676,92679],{"href":92677,"rel":92678},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-33:-Replicated-subscriptions",[264],"PIP-33",[40,92681,4301],{"id":4298},[32,92683,92685],{"id":92684},"pip-30-mutual-authentication-and-kerberos-support","PIP-30: Mutual authentication and Kerberos support",[48,92687,92688,92689,1186,92694,4003,92699,92704,92705,5157,92710,92715,92716,92721],{},"Pulsar supports pluggable authentication mechanisms, such as ",[55,92690,92693],{"href":92691,"rel":92692},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fsecurity-tls-authentication\u002F",[264],"TLS Authentication",[55,92695,92698],{"href":92696,"rel":92697},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fsecurity-athenz\u002F",[264],"Athenz",[55,92700,92703],{"href":92701,"rel":92702},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fsecurity-token-client\u002F",[264],"JSON Web Tokens",". However all the provided authentication mechanisms are one-step authentication. The current authentication abstraction is not able to support mutual authentication between client and server, such as ",[55,92706,92709],{"href":92707,"rel":92708},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSimple_Authentication_and_Security_Layer",[264],"SASL",[55,92711,92714],{"href":92712,"rel":92713},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-30%3A-change-authentication-provider-API-to-support-mutual-authentication",[264],"PIP-30"," changes the interface to support mutual authentication. The ",[55,92717,92720],{"href":92718,"rel":92719},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fsecurity-kerberos\u002F",[264],"Kerberos Authentication"," was implemented using the newly changed authentication interfaces.",[48,92723,92724,92725,92728,92729,190],{},"If you are interested in learning the implementation details, see ",[55,92726,92714],{"href":92712,"rel":92727},[264],". If you are interested in trying the kerberos authentication, follow the instructions documented at ",[55,92730,40821],{"href":92718,"rel":92731},[264],[40,92733,15627],{"id":34962},[32,92735,92737],{"id":92736},"go-functions","Go Functions",[48,92739,92740,92741,190],{},"Prior to 2.4.0, users can only write Pulsar functions using Java or Python. In 2.4.0, Pulsar starts supporting writing Pulsar functions using the popular ",[55,92742,92745],{"href":92743,"rel":92744},"https:\u002F\u002Fgolang.org\u002F",[264],"Golang",[48,92747,92748],{},"The exclamation example of Pulsar Functions written in Golang is shown below.",[8325,92750,92753],{"className":92751,"code":92752,"language":8330},[8328],"\nimport (\n        \"fmt\"\n        \"context\"\n\n        \"github.com\u002Fapache\u002Fpulsar\u002Fpulsar-function-go\u002Fpf\"\n    )\n\n    func HandleRequest(ctx context.Context, in []byte) error {\n        fmt.Println(string(in) + \"!\")\n        return nil\n    }\n\n    func main() {\n        pf.Start(HandleRequest)\n    }\n\n",[4926,92754,92752],{"__ignoreMap":18},[48,92756,92757],{},"Go Function support in 2.4.0 is an MVP (minimum viable product). There are more features planned in 2.5.0 for Go Function to align with the features available in Java\u002FPython Function.",[48,92759,92760,92761,92766,92767,92601],{},"If you are interested in learning the implementation details of Go Function, see ",[55,92762,92765],{"href":92763,"rel":92764},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fwiki\u002FPIP-32%3A-Go-Function-API%2C-Instance-and-LocalRun",[264],"PIP-32",". If you are interested in contributing to Go Function, follow the Github issue ",[55,92768,92771],{"href":92769,"rel":92770},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F3767",[264],"#3767",[40,92773,86121],{"id":92774},"schema",[48,92776,92777],{},"Pulsar introduced native schema support and provided a built-in schema registry since 2.0.0 release. After a few successful releases, Pulsar Schema has become more and more mature. Especially in 2.4.0, there are a lot of changes happen around Pulsar Schema. Here are a few highlights for them.",[32,92779,92781],{"id":92780},"schema-versioning","Schema versioning",[48,92783,92784],{},"Prior to 2.4.0, Pulsar clients only use the latest version of schema or the provided schema for encoding and decoding Pulsar messages. Hence it didn’t handle well on encoding and decoding Pulsar messages with schema evolution.",[48,92786,92787,92788,92793],{},"Issue ",[55,92789,92792],{"href":92790,"rel":92791},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F4646",[264],"#4646"," introduced versioned schema reader to deserialize Pulsar messages using correct version of schema and handle schema evolution properly.",[32,92795,92797],{"id":92796},"transitive-compatibility-check-strategies","Transitive compatibility check strategies",[48,92799,92800,92801,92806],{},"Prior to 2.4.0, Pulsar Schema only supported ALWAYS_COMPATIBLE, ALWAYS_INCOMPATIBLE, BACKWARD, FORWARD and FULL compatibility check strategies. BACKWARD, FORWARD and FULL strategies only check the new schema with the last schema. However, it is not enough. Issue ",[55,92802,92805],{"href":92803,"rel":92804},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F4170",[264],"#4170"," introduced three transitive check strategies to check the compatibility with all existing schemas. These transitive strategies are:",[321,92808,92809,92812,92815],{},[324,92810,92811],{},"BACKWARD_TRANSITIVE: Consumers using the new schema can read messages produced by all previous schemas, not just the last schema. For example, if there are three schemas for a topic that change in order V1, V2, and V3, then BACKWARD_TRANSITIVE compatibility ensures that consumers using the new schema V3 can process data written by the producers using the schema V3, V2, or V1.",[324,92813,92814],{},"FORWARD_TRANSITIVE: The messages produced with a new schema can be read by consumers using all previously registered schemas, not just the last schema. For example, if there are three schemas for a topic that change in order V1, V2, and V3, then FORWARD_TRANSITIVE compatibility ensures that data written by the producers using the new schema V3 can be processed by the consumers using the schema V3, V2, or V1.",[324,92816,92817],{},"FULL_TRANSITIVE: The new schema is forward and backward compatible with all previously registered schemas, not just the last one. For example, if there are three schemas for a topic that change in order V1, V2, and V3, then FULL_TRANSITIVE compatibility ensures that the consumers using the new schema V3 can process data written by the producers using the schema V3, V2, and V1, and data written by the producers using the new schema V3 can be processed by the consumers using the schema V3, V2, and V1.",[48,92819,92820],{},"The completed list of compatibility check strategies is shown below.",[48,92822,92823],{},[384,92824],{"alt":92825,"src":92826},"tabs with compatibility check strategy","\u002Fimgs\u002Fblogs\u002F63a1e7e3b18376006d8608b0_compatibility-check-strategies.webp",[48,92828,47112,92829,190],{},[55,92830,82637],{"href":92831,"rel":92832},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fen\u002Fnext\u002Fschema-evolution-compatibility\u002F",[264],[32,92834,92836],{"id":92835},"genericschema-and-autoconsume","GenericSchema and AutoConsume",[48,92838,92839],{},"Prior to 2.4.0, Pulsar only supported constructing schemas using static POJOs. This is convenient for applications that can know the schema ahead of time. However, in some use cases (for example, CDC - change data capture), applications don’t know the schema ahead of time. In such use cases, there is no way for applications to declare the schema programmably or dynamically. Pulsar resolves the problem by introducing GenericSchema and GenericRecord in 2.4.0.",[48,92841,92842],{},"You can declare a schema programmably by using GenericSchemaBuilder. The code example of constructing a generic schema is shown below:",[8325,92844,92847],{"className":92845,"code":92846,"language":8330},[8328],"\nRecordSchemaBuilder recordSchemaBuilder = SchemaBuilder.record(\"schemaName\");\n    recordSchemaBuilder\n        .field(\"intField\")\n        .type(SchemaType.INT32);\n    SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);\n    Schema schema = Schema.generic(schemaInfo);\n \n",[4926,92848,92846],{"__ignoreMap":18},[48,92850,92851],{},"After you declared a generic schema, you can build the records programmatically. The code example of building a generic record is shown below:",[8325,92853,92856],{"className":92854,"code":92855,"language":8330},[8328],"\nProducer producer = client.newProducer(Schema.generic(schemaInfo)).create();\n\n    producer.newMessage().value(schema.newRecordBuilder()\n                .set(\"intField\", 32)\n                .build()).send();\n \n",[4926,92857,92855],{"__ignoreMap":18},[48,92859,92860],{},"If you don’t know the schema of a topic, you can use AUTO_CONSUME to consume the topic into GenericRecord. The GenericRecord will provide the schema associated with this record. The example of using AUTO_CONSUME is shown below:",[8325,92862,92865],{"className":92863,"code":92864,"language":8330},[8328],"\nConsumer pulsarConsumer = client.newConsumer(Schema.AUTO_CONSUME())\n        …\n        .subscribe();\n\n    Message msg = consumer.receive() ;\n    GenericRecord record = msg.getValue(); \n \n",[4926,92866,92864],{"__ignoreMap":18},[32,92868,92870],{"id":92869},"keyvalue-schema","KeyValue Schema",[48,92872,92873],{},"KeyValue Schema was first introduced to Pulsar in 2.3.0 release. The first implementation of KeyValue schema encoded a key\u002Fvalue pair together into the payload of a message and it didn’t store key schema and value schema.",[48,92875,92876],{},"In 2.4.0, Pulsar stores both key and value schemas as the schema data in KeyValue schema, so Pulsar can handle the schema evaluation on both key and value. Additionally, Pulsar introduces a new encoding mode that encodes key into the key part of a message and value into the payload part of a message. This allows leverage Pulsar features related to message keys.",[48,92878,92879],{},"The example of constructing a key\u002Fvalue schema with SEPARATED encoding type is shown below:",[8325,92881,92884],{"className":92882,"code":92883,"language":8330},[8328],"\nSchema> kvSchema = Schema.KeyValue(\n        Schema.INT32,\n        Schema.STRING,\n        KeyValueEncodingType.SEPARATED\n    );       \n \n",[4926,92885,92883],{"__ignoreMap":18},[40,92887,78580],{"id":78579},[321,92889,92890,92897,92905],{},[324,92891,92892,92893,190],{},"Pulsar 2.4.0 release notes, click ",[55,92894,267],{"href":92895,"rel":92896},"http:\u002F\u002Fpulsar.apache.org\u002Frelease-notes\u002F#240-mdash-2019-06-30-a-id-240-a",[264],[324,92898,92437,92899,1154,92902,92445],{},[55,92900,92441],{"href":91566,"rel":92901},[264],[55,92903,36254],{"href":33664,"rel":92904},[264],[324,92906,92907,92908,190],{},"If you are interested in Pulsar examples, demos, tools and extensions, check out ",[55,92909,84128],{"href":84126,"rel":92910},[264],{"title":18,"searchDepth":19,"depth":19,"links":92912},[92913,92919,92922,92925,92931],{"id":85546,"depth":19,"text":85547,"children":92914},[92915,92916,92917,92918],{"id":92510,"depth":279,"text":92511},{"id":92562,"depth":279,"text":92563},{"id":92604,"depth":279,"text":92605},{"id":92652,"depth":279,"text":92653},{"id":4298,"depth":19,"text":4301,"children":92920},[92921],{"id":92684,"depth":279,"text":92685},{"id":34962,"depth":19,"text":15627,"children":92923},[92924],{"id":92736,"depth":279,"text":92737},{"id":92774,"depth":19,"text":86121,"children":92926},[92927,92928,92929,92930],{"id":92780,"depth":279,"text":92781},{"id":92796,"depth":279,"text":92797},{"id":92835,"depth":279,"text":92836},{"id":92869,"depth":279,"text":92870},{"id":78579,"depth":19,"text":78580},"2019-07-09","Learn the new features in Apache Pulsar 2.4.0 including delayed message delivery, key-shared subscription, replicated subscription.",{},"\u002Fblog\u002Fnew-in-apache-pulsar-2-4-0",{"title":92496,"description":92933},"blog\u002Fnew-in-apache-pulsar-2-4-0",[302,821],"i3FtrpI8d4QdE8YJ97V9NgLuW3lntN9_5RI5FZeDv00",{"id":22030,"title":22031,"authors":92941,"body":92942,"category":3550,"createdAt":290,"date":22777,"description":22778,"extension":8,"featured":7,"image":22779,"isDraft":294,"link":290,"meta":93463,"navigation":7,"order":296,"path":10357,"readingTime":21788,"relatedResources":290,"seo":93464,"stem":22782,"tags":93465,"__hash__":22784},[6785,810,809,808],{"type":15,"value":92943,"toc":93433},[92944,92946,92948,92950,92952,92954,92956,92958,92964,92968,92970,92976,92980,92982,92986,92988,92990,92996,92998,93000,93004,93006,93008,93010,93012,93014,93019,93021,93023,93031,93033,93035,93037,93039,93047,93049,93051,93053,93059,93061,93063,93068,93085,93087,93089,93093,93095,93097,93099,93101,93103,93107,93112,93116,93118,93120,93128,93130,93132,93140,93142,93144,93146,93150,93152,93169,93171,93173,93179,93181,93185,93189,93191,93193,93203,93205,93213,93215,93227,93232,93241,93243,93245,93247,93249,93251,93259,93261,93263,93265,93269,93273,93275,93277,93289,93291,93300,93304,93306,93314,93316,93328,93330,93337,93341,93345,93351,93353,93355,93357,93361,93363,93365,93370,93372,93378,93384,93391,93398,93405,93412,93419,93426],[48,92945,22037],{},[48,92947,22040],{},[48,92949,22043],{},[48,92951,22046],{},[48,92953,22049],{},[40,92955,22053],{"id":22052},[48,92957,22056],{},[321,92959,92960,92962],{},[324,92961,22061],{},[324,92963,22064],{},[48,92965,92966],{},[384,92967],{"alt":18,"src":22069},[48,92969,22072],{},[321,92971,92972,92974],{},[324,92973,22077],{},[324,92975,22080],{},[48,92977,92978],{},[384,92979],{"alt":18,"src":22085},[40,92981,22089],{"id":22088},[48,92983,92984,22094],{},[55,92985,1332],{"href":10389},[48,92987,22097],{},[48,92989,22100],{},[321,92991,92992,92994],{},[324,92993,22105],{},[324,92995,22108],{},[40,92997,22112],{"id":22111},[48,92999,22115],{},[48,93001,93002],{},[384,93003],{"alt":18,"src":22120},[32,93005,22124],{"id":22123},[48,93007,22127],{},[48,93009,22130],{},[48,93011,22133],{},[48,93013,22136],{},[48,93015,22139,93016,22144],{},[55,93017,5599],{"href":22142,"rel":93018},[264],[32,93020,22148],{"id":22147},[48,93022,22151],{},[321,93024,93025,93027,93029],{},[324,93026,22156],{},[324,93028,22159],{},[324,93030,22162],{},[48,93032,22165],{},[32,93034,22169],{"id":22168},[48,93036,22172],{},[48,93038,22175],{},[321,93040,93041,93043,93045],{},[324,93042,22180],{},[324,93044,22183],{},[324,93046,22186],{},[48,93048,22189],{},[40,93050,22193],{"id":22192},[32,93052,22197],{"id":22196},[321,93054,93055,93057],{},[324,93056,22202],{},[324,93058,22205],{},[48,93060,22208],{},[32,93062,22212],{"id":22211},[48,93064,22215,93065,22220],{},[55,93066,22218],{"href":22218,"rel":93067},[264],[321,93069,93070,93078,93083],{},[324,93071,22225,93072,22231,93075,22237],{},[55,93073,22230],{"href":22228,"rel":93074},[264],[55,93076,22236],{"href":22234,"rel":93077},[264],[324,93079,22240,93080,22246],{},[55,93081,22245],{"href":22243,"rel":93082},[264],[324,93084,22249],{},[40,93086,22253],{"id":22252},[48,93088,22256],{},[48,93090,93091],{},[384,93092],{"alt":18,"src":22261},[40,93094,22265],{"id":22264},[48,93096,22268],{},[32,93098,22272],{"id":22271},[48,93100,22275],{},[48,93102,22278],{},[321,93104,93105],{},[324,93106,22283],{},[48,93108,22286,93109,22292],{},[55,93110,22291],{"href":22289,"rel":93111},[264],[321,93113,93114],{},[324,93115,22297],{},[48,93117,22300],{},[48,93119,22303],{},[321,93121,93122,93124,93126],{},[324,93123,22308],{},[324,93125,22311],{},[324,93127,22314],{},[32,93129,22318],{"id":22317},[48,93131,22321],{},[321,93133,93134,93136,93138],{},[324,93135,22326],{},[324,93137,22329],{},[324,93139,22332],{},[32,93141,22336],{"id":22335},[48,93143,22339],{},[48,93145,22342],{},[321,93147,93148],{},[324,93149,22347],{},[48,93151,22350],{},[321,93153,93154,93159,93164],{},[324,93155,22355,93156,22361],{},[55,93157,22360],{"href":22358,"rel":93158},[264],[324,93160,22364,93161,22370],{},[55,93162,22369],{"href":22367,"rel":93163},[264],[324,93165,22373,93166,22379],{},[55,93167,22378],{"href":22376,"rel":93168},[264],[32,93170,22383],{"id":22382},[48,93172,22056],{},[321,93174,93175,93177],{},[324,93176,22061],{},[324,93178,22064],{},[48,93180,22394],{},[48,93182,93183],{},[384,93184],{"alt":18,"src":22399},[48,93186,93187],{},[384,93188],{"alt":18,"src":22404},[48,93190,22407],{},[32,93192,22411],{"id":22410},[321,93194,93195,93197,93199,93201],{},[324,93196,22416],{},[324,93198,22419],{},[324,93200,22422],{},[324,93202,22425],{},[32,93204,22429],{"id":22428},[321,93206,93207,93209,93211],{},[324,93208,22434],{},[324,93210,22437],{},[324,93212,22440],{},[32,93214,22444],{"id":22443},[321,93216,93217,93219,93221,93223,93225],{},[324,93218,22416],{},[324,93220,22419],{},[324,93222,22453],{},[324,93224,22456],{},[324,93226,22459],{},[48,93228,22462,93229,22468],{},[55,93230,22467],{"href":22465,"rel":93231},[264],[321,93233,93234,93239],{},[324,93235,93236,22476],{},[55,93237,22360],{"href":22358,"rel":93238},[264],[324,93240,22479],{},[48,93242,22482],{},[48,93244,22485],{},[40,93246,22489],{"id":22488},[48,93248,22492],{},[48,93250,22495],{},[321,93252,93253,93255,93257],{},[324,93254,22500],{},[324,93256,22503],{},[324,93258,22506],{},[48,93260,22509],{},[48,93262,22512],{},[48,93264,22515],{},[48,93266,93267],{},[384,93268],{"alt":18,"src":22520},[48,93270,93271],{},[384,93272],{"alt":18,"src":22525},[32,93274,22529],{"id":22528},[3933,93276,22532],{"id":22410},[321,93278,93279,93281,93283,93285,93287],{},[324,93280,22537],{},[324,93282,22540],{},[324,93284,22543],{},[324,93286,22546],{},[324,93288,22549],{},[3933,93290,22553],{"id":22552},[321,93292,93293,93295],{},[324,93294,22558],{},[324,93296,22561,93297,22567],{},[55,93298,22566],{"href":22564,"rel":93299},[264],[48,93301,93302],{},[384,93303],{"alt":18,"src":22572},[3933,93305,22576],{"id":22575},[321,93307,93308,93310,93312],{},[324,93309,22581],{},[324,93311,22584],{},[324,93313,22587],{},[3933,93315,22444],{"id":22590},[321,93317,93318,93320,93322,93324,93326],{},[324,93319,22595],{},[324,93321,22584],{},[324,93323,22600],{},[324,93325,22603],{},[324,93327,22606],{},[3933,93329,22610],{"id":22609},[321,93331,93332],{},[324,93333,22615,93334,22621],{},[55,93335,22620],{"href":22618,"rel":93336},[264],[48,93338,93339],{},[384,93340],{"alt":18,"src":22626},[48,93342,93343],{},[384,93344],{"alt":18,"src":22631},[321,93346,93347,93349],{},[324,93348,22636],{},[324,93350,22639],{},[40,93352,2125],{"id":2122},[48,93354,22644],{},[32,93356,22648],{"id":22647},[48,93358,93359,22654],{},[55,93360,22653],{"href":18969},[32,93362,22658],{"id":22657},[48,93364,22661],{},[48,93366,22664,93367,22669],{},[55,93368,22668],{"href":17075,"rel":93369},[264],[8300,93371,22673],{"id":22672},[48,93373,93374,758,93376],{},[2628,93375,5599],{},[55,93377,21529],{"href":21529},[48,93379,93380,758,93382],{},[2628,93381,1332],{},[55,93383,10389],{"href":10389},[48,93385,93386,758,93388],{},[2628,93387,22690],{},[55,93389,22693],{"href":22693,"rel":93390},[264],[48,93392,93393,758,93395],{},[2628,93394,22699],{},[55,93396,22702],{"href":22702,"rel":93397},[264],[48,93399,93400,758,93402],{},[2628,93401,22708],{},[55,93403,22711],{"href":22711,"rel":93404},[264],[48,93406,93407,758,93409],{},[2628,93408,22717],{},[55,93410,22720],{"href":22720,"rel":93411},[264],[48,93413,93414,758,93416],{},[2628,93415,22726],{},[55,93417,22729],{"href":22729,"rel":93418},[264],[48,93420,93421,758,93423],{},[2628,93422,22735],{},[55,93424,22564],{"href":22564,"rel":93425},[264],[48,93427,93428,758,93430],{},[2628,93429,22743],{},[55,93431,22618],{"href":22618,"rel":93432},[264],{"title":18,"searchDepth":19,"depth":19,"links":93434},[93435,93436,93437,93442,93446,93447,93456,93459],{"id":22052,"depth":19,"text":22053},{"id":22088,"depth":19,"text":22089},{"id":22111,"depth":19,"text":22112,"children":93438},[93439,93440,93441],{"id":22123,"depth":279,"text":22124},{"id":22147,"depth":279,"text":22148},{"id":22168,"depth":279,"text":22169},{"id":22192,"depth":19,"text":22193,"children":93443},[93444,93445],{"id":22196,"depth":279,"text":22197},{"id":22211,"depth":279,"text":22212},{"id":22252,"depth":19,"text":22253},{"id":22264,"depth":19,"text":22265,"children":93448},[93449,93450,93451,93452,93453,93454,93455],{"id":22271,"depth":279,"text":22272},{"id":22317,"depth":279,"text":22318},{"id":22335,"depth":279,"text":22336},{"id":22382,"depth":279,"text":22383},{"id":22410,"depth":279,"text":22411},{"id":22428,"depth":279,"text":22429},{"id":22443,"depth":279,"text":22444},{"id":22488,"depth":19,"text":22489,"children":93457},[93458],{"id":22528,"depth":279,"text":22529},{"id":2122,"depth":19,"text":2125,"children":93460},[93461,93462],{"id":22647,"depth":279,"text":22648},{"id":22657,"depth":279,"text":22658},{},{"title":22031,"description":22778},[5954,799,303],1775716492970]