Pulsar Virtual Summit Europe 2023 brought the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.
May 23rd witnessed a remarkable milestone as over 400 attendees from 20+ countries joined the virtual stage to explore the cutting-edge advancements in Apache Pulsar and the real-world success stories of Pulsar-powered companies. This record-breaking turnout at the Pulsar Summit not only demonstrates the surging adoption of Pulsar but also highlights the ever-growing enthusiasm and curiosity surrounding this game-changing technology!
Some key facts:
- 400+ attendees representing 20+ countries
- 24 speakers from companies, including The Lego Group, Zafin, VMware, Axon, HSL, and more
In this blog post, I will share the key takeaways I gained from the Pulsar Virtual Summit Europe:
- Pulsar 3.0 introduces new features that make Pulsar an even better choice as an Enterprise ready technology for real-time event-driven architecture with the introduction of Long-term-support and improvements that allow for supporting millions of topics.
- The Pulsar ecosystem continues to expand, with over 10,000 active Slack users and over 600 contributors.
- Pulsar's developer experience is continuously improving, with recent enhancements including the availability of Docker images for ARM64 architectures and a revamped website.
- Building real-time data pipelines is becoming easier with the availability of low-code transformations and Apache NiFi integration.
- Pulsar is widely adopted across various industries, including finance, telecommunications, transport, and manufacturing, as showcased by real-world examples presented during the Pulsar Summit Europe.
Pulsar as an Enterprise-ready messaging & data-streaming service.
Pulsar has key features that confirm it as an ideal choice for Enterprises, such as multi-tenancy, the ability to handle hundreds of thousands of topics, elasticity, and built-in geo-replication.
At this summit, companies such as the Lego Group and Raiffeisen Bank International (RBI) share their experience in adopting and operating a Pulsar cluster as their centralized messaging & streaming service at their company scale. You can read the details in the paragraphs above.
LTS with Pulsar 3.0
LTS is essential for big companies as it provides stability, security, compatibility, and reduced disruption, enabling them to maintain smooth operations, protect data, and effectively manage their software infrastructure.
Matteo Merli, Apache Pulsar PMC Chair & StreamNative CTO, announced Pulsar's Long Term Support model (LTS), starting at the Pulsar 3.0.x release.
The two main goals of LTS are:
- providing a path for more extended support of releases so that users can upgrade at their pace
- and at the same time, providing a path for fast innovation
Depending on what you need, you can decide which Pulsar version you should be using:
- LTS release for more stability: only fixes are backported
- or: feature releases to benefit from improvement and newer features
Using an LTS release, you’ll benefit from bug fixes for up to 24 months & security patches for up to 36 months.
Additionally, the feature releases will be published in a predictable schedule: every three months.
Furthermore, LTS will provide a smoother path for upgrades.
Quoting Matteo Merli:
Pulsar 3.0 is a new chapter. I truly believe we are now even better positioned to deliver features & improvements fast & safely.
Support of millions of topics at the Enterprise scale
Handling a large volume of topics is a strong requirement for a centralized and multi-tenant messaging platform. Pulsar stands out as the ideal solution for this scenario, as it enables a single cluster to manage over 1 million topics efficiently. This is indeed impressive, but can we push the boundaries even further?
During his captivating keynote presentation at the Pulsar Summit, Matteo Merli, the Apache Pulsar PMC Chair and StreamNative CTO, unveils the exciting potential of Oxia, a new open-source metadata store, to enable Pulsar's remarkable scalability.
In the context of distributed systems, it is crucial to have real-time knowledge of the specific node responsible for serving a particular resource. Furthermore, Pulsar, functioning as a storage system, necessitates the storage of metadata, including ‘pointers’ to other data. To fulfill these requirements, Pulsar heavily relies on a distributed coordination and metadata storage system. This robust system addresses various inquiries, such as determining the assigned broker node for a specific topic or retrieving the data retention policy associated with a given topic, among others.
You have the option to select which metadata provider system to utilize. Since PIP-45, the metadata provider has become pluggable. The primary implementations available are ZooKeeper and etcd. However, both of them have limitations that hinder Pulsar's scalability when it comes to increasing the number of topics:
- They lack horizontal scalability.
- Vertical scaling only offers limited improvements.
- Their data storage capacity is restricted to a few gigabytes.
To address these challenges, StreamNative developed Oxia, which resolves metadata and coordination issues on a large scale. Oxia introduces a novel architecture that leverages modern Kubernetes environments.
With a conventional metadata and coordination provider, a single Pulsar cluster can currently handle over 1 million topics. In contrast, Oxia aims to enable a single Pulsar cluster to support more than 100 million topics, which is truly remarkable.
Oxia is an open-source solution and not limited to Pulsar. It can be employed for other distributed coordination and metadata requirements.
Watch Matteo’s keynote or read this blog article to learn more.
Dealing with metrics from a vast number of topics can pose significant challenges. To address this, Asaf Mesika from StreamNative has put forward an enhancement proposal for the Pulsar metrics system (PIP-264) to facilitate monitoring for a large number of topics. The proposed improvements include:
- Aggregating metrics for groups of topics.
- Implementing fine-grained metrics filtering.
- Unifying metrics using a standardized naming convention.
- Consolidating all existing metrics libraries into a single one, namely OpenTelemetry.
For further details, you can watch Asaf's talk at this link.
The proposal shows great promise, and in my opinion, anyone responsible for monitoring a cluster with a high number of topics should closely follow this PIP and consider contributing to it.
Finally, in the latest release of Pulsar (Pulsar 3.0 LTS), various enhancements have been introduced to enhance your ability to manage a larger number of topics. These improvements include:
- The introduction of a new version of BookKeeper, which greatly enhances throughput and reduces latency, especially in scenarios involving a high number of topics. You can find more information about this in Matteo's keynote announcement.
- A more efficient service discovery and session establishment mechanism, enabling newly connected Pulsar clients to initiate message sending and consumption much more quickly.
These updates in Pulsar 3.0 LTS provide significant benefits for managing a higher volume of topics.
Pulsar performance continuously improves.
In Pulsar 3.0, an upgraded version of BookKeeper is introduced, resulting in significant enhancements in throughput and latency. These improvements are particularly noticeable when dealing with numerous topics or when message batching is disabled or ineffective. For instance, Pulsar 3.0 achieves twice the throughput compared to previous versions when operating with over 10,000 topics. Additionally, the utilization of Direct IO enhances IO speed, especially in containerized environments.
Pulsar has a thriving ecosystem and an engaged community.
During the opening keynote of the Pulsar Virtual Summit Europe 2023, Sijie Guo, the CEO of StreamNative, highlighted the remarkable growth of the Pulsar community. Starting with just a couple of contributors in its early days, Pulsar now boasts a staggering 600 contributors and is a top-5 Apache Software Foundation project.
Thousands of organizations worldwide have also embraced Pulsar. Moreover, there is a community of over 10,000 Slack members ready to provide assistance.
Pulsar benefits from an extensive range of open-source connectors, offloaders, and protocol adapters, allowing smooth integration with various systems. It also offers a comprehensive collection of client libraries, enabling developers to code event-driven applications in their preferred programming language. Furthermore, Pulsar seamlessly integrates with popular open-source processing engines like Apache Flink and Apache Spark. To explore this ecosystem, you can visit the StreamNative Hub.
During the Pulsar Summit Europe, numerous sessions delve into the seamless integration of Pulsar with Spring, Apache Pinot, RisingWave, Nifi, and other technologies. To learn more, you can access the videos from the Ecosystem track.
Pulsar's elasticity enables achieving optimal performance while maintaining cost efficiency.
Horizontal scalability is a crucial requirement for any data streaming platform, but it should not be confused with elasticity.
Horizontal scalability involves adding more resources to handle increased workloads. On the other hand, elasticity refers to the ability to quickly adapt to changes in workload by efficiently allocating and deallocating resources, thereby achieving optimal performance at the right cost.
While some data streaming platforms lack elasticity and require careful resource allocation planning in advance, Pulsar stands out with its exceptional elasticity.
In the first part of his presentation, Julien Jakubowski, Developer Advocate EMEA at StreamNative, elucidates how Pulsar's sophisticated architecture delivers both scalability and elasticity. He further explores the three levels of elasticity offered by Pulsar.
The load balancer plays a pivotal role in Pulsar's elasticity. In Pulsar 3.0, the community has enhanced the load balancer with a specific focus on elasticity. The new load balancer in Pulsar 3.0 ensures the following:
- Efficiently balancing traffic within the cluster, even during periods of abrupt workload spikes.
- Quickly achieving an optimal state for the cluster.
- Maximizing topic availability during reassignments.
The developer experience with Pulsar is constantly being enhanced.
The developer experience is crucial, and Pulsar has benefited from significant improvements in this area.
In his keynote presentation, Matteo Merli announced that Docker images are now available for both x86-64 and ARM64 architectures starting from Pulsar 3.0. Developers using Apple Silicon-based MacBooks can now enjoy an enhanced experience as Pulsar docker containers boot and run faster on these devices.
Furthermore, Pulsar has unveiled a brand new website! The credit goes to Emidio Cardeira, Asaf Mesika, Tison Chen from StreamNative, and Kiryl Valkovich from Teal Tools for implementing this update. The Apache Pulsar website now features a refreshed and visually appealing design that perfectly captures the futuristic essence of our dynamic community and next-generation solution.
Building real-time data pipelines with minimal programming skills is becoming easier.
Sijie’s Guo explains in his opening talk that you don’t need to bring a full-fledged streaming processing technology such as Apache Flink or Apache Spark to build all your pipelines. For the less advanced use cases, you can build a pipeline with the comprehensive set of Pulsar IO connectors without writing a single line of code. You can also create Pulsar Functions to write simple and easy-to-deploy processing logic with just a few lines of code.
During his presentation, Christophe Bornet from DataStax introduced an innovative advancement in Pulsar known as Pulsar Transformations. These transformations enable users to manipulate data through low-code techniques while harnessing the power of existing components within Pulsar.
Additionally, Apache Pulsar and Apache NiFi can be combined to create real-time data pipelines without coding. By using a drag-and-drop interface, users can easily connect different data sources and destinations. This integration allows for seamless data flow and processing, enabling users to handle complex data tasks efficiently. For more information and a demo, you can watch Tim's talk on this topic.
You can efficiently build a full data pipeline with minimal or zero lines of code without the need to handle the complexity of setting up and managing a stream processing infrastructure. Pulsar IO connectors, Pulsar functions, Pulsar transformations, or Apache NiFi provide convenient options for creating data pipelines without the complexities of infrastructure configuration and maintenance.
Pulsar supports various use cases in several industries.
Pulsar has already been deployed by thousands of companies across the globe in various industries. Quoting Sijie Guo in the first keynote:
From med-tech to financial services, IoT, manufacturing, e-commerce, gaming, and more, Pulsar became part of the modern data stack.
During the event, numerous industry professionals showcased real-world examples of how Pulsar transforms data streaming applications across various sectors. Engineers representing prominent companies demonstrated their successful implementation of Pulsar, emphasizing the reasons behind their choice and its benefits.
George Orban (Daiwa Capital Market) shares a "Pulsar love story" where he discusses migrating a pricing engine and trading system to Apache Pulsar. He delves into the motivations behind selecting Pulsar, highlighting its suitability for finance and enterprise applications, and outlines the notable enhancements it brought to their stack in terms of resilience, robustness, and speed.
Raiffeisen Bank International (RBI) is one of the top European banks focusing on digital transformation, sustainability, and customer experience. They help over 60M customers with all kinds of financial services. Their central backbone for all their data integration initiatives is powered by Pulsar. Watch Markus Falkner & Armin Woworsky’s keynote, where they share their experience on Async API & GitOps on this platform.
Zafin offers an enterprise platform that enables banks to separate products and pricing from their core systems and consolidate them into a cross-enterprise product innovation layer.
Zafin selected Pulsar for a complex & sensitive data streaming use case because:
- Pulsar can scale out rapidly and dynamically to increase throughput without restarting the applications
- Pulsar's geo-replication feature allows them to seamlessly replicate their entire data to a disaster recovery region without experiencing performance drawbacks.
- Pulsar’s Tier Storage feature allows for multi-year data retention on cheap storage.
- A Pulsar cluster can be upgraded without downtime.
By using StreamNative Cloud to manage Pulsar, Zafin is able to ensure observability with out-of-the-box monitoring. In addition, StreamNative Cloud greatly simplifies Zafin’s cluster management.
Zafin has been partnering with StreamNative for over a year, and its timely delivery has garnered high satisfaction from all parties involved.
Lloyd Chandran & Matt Hefford from Zafin share their experience in this presentation.
Habip Kenan Üsküda (Axon Networks) shares their journey of constructing an observability stack for their Pulsar-based platform in the telecommunications field. This stack empowered their monitoring infrastructure to expand seamlessly, accommodating an impressive scale of 1 million topics.
Jaakko Malkki explains in his presentation how Helsingin Seudun Liikenne (Helsinki Regional Transport Authority) utilizes Transitdata, a microservice application based on Pulsar, to process real-time public transport data like predictions for stop times, vehicle locations, and service notifications. He highlights the difficulties of testing applications with a microservice architecture at the system level and discusses their approach to simplifying the creation of automated tests. This strategy enables the rapid rollout of new features.
At Pulsar Summit Europe, engineers from LEGO Group discuss their utilization of Pulsar as a messaging and streaming platform, implemented across various domains within the company. They delve into their experiences in hosting and managing this platform at an enterprise level and highlight their successful experience with StreamNative Cloud. Watch the videos below to learn more:
In conclusion, the Pulsar Virtual Summit Europe was an incredible event that showcased the power and potential of Apache Pulsar. From enlightening keynote sessions to deep-dive technical presentations, attendees gained valuable insights into leveraging Pulsar for their real-time data applications.
But the excitement doesn't end there! The upcoming Pulsar Summit North America is just around the corner, taking place on October 25, 2023, in San Francisco. The summit promises to be a hub of innovation, collaboration, and knowledge sharing among industry experts, developers, and enthusiasts.
If you have valuable insights, experiences, or breakthroughs related to Apache Pulsar, remember to submit your ideas before the Call for Speakers closing date on July 7, 2023. This is your chance to contribute and share your expertise with the Pulsar community. Be part of shaping the future of real-time data processing with Pulsar by submitting your proposal.
Let's come together to learn, network, and take Pulsar to new heights. See you there!