Oct 24, 2024
20 min

Announcing Apache Pulsar™ 4.0: Towards an Open Data Streaming Architecture

Matteo Meril
Co-Founder and CTO, StreamNative

We are excited to announce the release of Apache Pulsar 4.0, the second Long-Term Support (LTS) version after the successful introduction of LTS with Pulsar 3.0 in May 2023. Pulsar 4.0 represents a pivotal step forward in our mission to make data streaming more accessible, affordable, and scalable. With a focus on modularity, observability, scalability, and security, this release extends Pulsar advantages for enterprise deployments, which necessarily emphasize this release’s enhanced Quality of Service (QoS) controls. With an ongoing trend towards simplicity and flexibility in data streaming today, this release drives Pulsar closer to becoming the foundation of an Open Data Streaming Architecture.

In this post, we’ll explore the key areas of innovation in Pulsar 4.0 and the Pulsar Improvement Proposals (PIPs) that have been instrumental in shaping this release.

A Modular Data Streaming Architecture

From the start, Pulsar was designed with modularity as a core principle. This philosophy has guided our contributors and committers as they continuously enhance the project to make data streaming available and accessible for organizations of all sizes. The modular architecture allows organizations to opt for deployment models that align with their specific security and infrastructure requirements. Over multiple releases, the Apache Pulsar community has transformed every layer of the Pulsar system—including metadata storage, data storage, protocol handling, and load balancing—into fully pluggable components. This flexibility fosters rapid innovation and allows Pulsar to adapt to the ever-changing demands of IT infrastructure. 

Modularity has enabled the development of reusable components like Oxia for scalable metadata storage and the S3-based Write-Ahead Logging implementation in the Ursa Engine.

This modular architecture is the result of incremental improvements rather than a single large refactor. Below are some notable modularized components:

Metadata Storage

Metadata storage is the most critical component of a data streaming engine, acting as its central nervous system by managing node coordination, consensus, node membership tracking, and more. Historically, both Kafka and Pulsar have relied on ZooKeeper for metadata storage. While Kafka moved to KRaft as an alternative, the Pulsar community took a different approach by making the metadata storage pluggable (see PIP-45). This approach allows Pulsar to integrate alternative metadata implementations, such as RocksDB for local standalone instances, etcd as a ZooKeeper alternative, or more scalable solutions like Oxia.

After the Pulsar community stabilized the metadata interface, we looked into addressing a bigger challenge—scaling the number of topic partitions Pulsar could support. This led to the creation of Oxia, which provides a scalable, distributed metadata storage solution, completely different from KRaft's approach.

Data Storage

Pulsar has supported pluggable data storage from the beginning, implemented through the Managed Ledger. The Managed Ledger represents a segmented stream, with distributed log segments stored in remote storage. This interface was designed specifically for Pulsar's segmented storage architecture, providing an abstraction layer that separates compute from storage. This flexibility also enabled the introduction of Tiered Storage in 2018, making Pulsar the first solution in the market to support this feature.

While the Managed Ledger is inherently flexible, much of Pulsar's core code has historically depended on specific implementation classes such as ManagedLedgerImpl, ManagedCursorImpl, and PositionImpl. This tight coupling made it difficult to introduce new implementations. Pulsar 4.0 addresses this limitation by performing a major refactor of the Managed Ledger, significantly simplifying the process of integrating new implementations. One example implementation is the S3-based Write-Ahead Log in messaging platforms and services powered by Apache Pulsar, such as the ONE StreamNative Platform.

Another change in Pulsar 4.0 supporting new Managed Ledger implementations is the abstraction of the message ID implementation, eliminating the direct dependency on ledger IDs and entry IDs. By centralizing the sequence generation process, this refactor enables more efficient, incremental sequence generation, which is a key requirement for the S3-based write-ahead log implementation.

These changes in Pulsar 4.0 accelerate innovation at the storage layer, opening the door to future improvements and integrations.

Pluggable Protocols

One of Pulsar’s key differentiators is its ability to store a single copy of data and serve it through multiple subscription models, each tailored to specific business requirements. This flexibility stands out compared to other technologies, but adopting these advanced features often requires development teams to modify their existing applications and learn new APIs - a process that can impact development timelines. Meanwhile, many existing applications are built around protocols and APIs like Kafka or MQTT. Rather than forcing users to migrate to the Pulsar protocol and APIs, Pulsar’s architecture supports the development of additional protocols and APIs that can leverage its underlying data storage capabilities.

This approach led to the creation of KoP (Kafka on Pulsar), MoP (MQTT on Pulsar), and other protocol handlers. These protocols add capabilities to Pulsar transforming it into an open platform capable of supporting multiple protocols natively. The protocol framework has significantly matured in recent years, particularly with StreamNative driving broader adoption through the Ursa Engine, which is Kafka API-compatible and built on top of Pulsar’s protocol handler framework and powered by Apache Pulsar. The pluggable protocol support in Apache Pulsar allows platforms such as the ONE StreamNative Platform to integrate seamlessly into existing ecosystems while still offering the architectural advantages of its storage layer.

Load Manager

The Load Manager is a key component in distributing workloads across multiple nodes in Pulsar. Over the course of several releases, the load manager has evolved to include various load-shedding strategies tailored to different types of workloads. This flexibility is one of the reasons why Pulsar can support a wide range of organizations, from startups and unicorns to large enterprises and hyperscalers. The Extensible Load Manager, introduced in Pulsar 3.0, has matured significantly by the time of the Pulsar 4.0 release.

Many new functionalities have been built on top of it in the ONE StreamNative Platform, including graceful rollout capabilities and more. At the upcoming Data Streaming Summit, the author of the Extensible Load Manager will provide a deeper dive into its implementation and how it powers features like read-only brokers and graceful rollouts for high availability during cluster upgrades.

Ongoing Modular Approach

In addition to these major components, many other features in Pulsar are also pluggable, which fosters rapid innovation. These pluggable components include, but are not limited to, the delayed delivery queue implementation, topic compaction service, and more. This modularity allows the Pulsar community to evolve Pulsar’s implementations incrementally, without disrupting the core functionality.

With Pulsar 4.0, we are proud to see how this modular approach is shaping the future of data streaming architecture. Pulsar now supports multiple protocols through the protocol handler framework, offers multi-tenancy with workload isolation between tenants, and enables the potential for multi-modality through pluggable storage classes that can be configured at the tenant and namespace levels. This flexibility paves the way for innovations like the StreamNative Ursa Engine, positioning Pulsar as a future-proof solution for modern data streaming needs.

Built-in OpenTelemetry for Comprehensive Observability

As the industry continues to standardize around OpenTelemetry for observability, Pulsar 4.0 embraces this evolution by integrating with the framework. This integration provides robust telemetry data collection, greatly improving debugging, monitoring, and performance insights at scale.

The implementation of PIP-264 Enhanced OTel-based metric system introduces key improvements to manage the issue of cardinality in large-scale deployments where brokers may handle between 10k-100k topics. A central solution introduced in this PIP is the Topic Metric Group, a new aggregation level for metrics. This feature allows users to organize topics into groups through configurations using wildcards or regular expressions, effectively organizing large topic sets into manageable groups. By offering a more granular aggregation level—beyond just namespaces—users can control how topics are grouped, thus reducing the burden of tracking metrics across many topics. This approach strikes a balance between reducing cardinality and maintaining necessary levels of detail for observability.

Another critical aspect of PIP-264 is the fine-grained filtering mechanism. This rule-based dynamic configuration allows users to specify which metrics should be collected or dropped at the namespace, topic, or group level. By default, only a minimal subset of essential metrics is retained at the group or namespace level, while unnecessary metrics are discarded to maintain efficiency. However, when performance issues or anomalies arise, users can dynamically override these default settings, expanding metric collection at higher granularity levels—down to the topic or even consumer/producer level. This dynamic filtering system allows for real-time responsiveness to issues, similar to adjusting logging levels dynamically. After the need for observing detailed metrics is resolved, users can disable these overrides to return to the default filtering settings, maintaining optimal system performance. As a result, performance remains optimized while still delivering valuable insights when necessary.

Increased Scalability to Support Demanding Workloads

Enhanced Load Balancing for Millions of Topics

Since the introduction of the Extensible Load Manager in Pulsar 3.0, Pulsar's load management capabilities have evolved significantly with each subsequent release. The Extensible Load Manager has consistently introduced new features designed to handle dynamic workloads efficiently while improving overall system performance and resource utilization. These advancements have made the load manager a crucial component in ensuring scalability and reliability in modern data streaming systems.

One of the load balancing enhancements came with PIP-307, which optimized the bundle transfer protocol for the Extensible Load Manager. This improvement eliminates the need for redundant topic lookups during bundle transfers, reducing publish latency spikes and improving performance when unloading large numbers of topics. The new protocol also introduces graceful Managed Ledger shutdown, minimizing potential race conditions during ownership transfers and ensuring smoother topic transitions between brokers.

Building on that, PIP-354 in Pulsar 4.0 introduces dynamic improvements that further enhance the adaptability of the load manager. A new automatic load-shedding mechanism enables brokers to autonomously adjust to fluctuating workloads by redistributing topics to balance the load across the cluster. This minimizes bottlenecks and prevents individual brokers from becoming overloaded, ensuring that the system remains stable even during periods of high traffic. Additionally, with enhanced metrics integration, the load manager can make more intelligent, real-time decisions on resource allocation, making Pulsar more resilient to sudden changes in workload patterns.

From the StreamNative perspective, the introduction of Oxia as a scalable metadata storage backend—improving observability and addressing the cardinality issues in metrics collection—combined with the enhancements to the load balancer, position the ONE StreamNative Platform to support beyond millions of topic partitions. These improvements ensure that Pulsar can scale effectively, maintaining performance and reliability in even the most demanding data streaming environments.

Enhanced Key_Shared Subscription: Scale Without Compromising Message Order

Key_Shared subscription is one of Pulsar's most valuable features, enabling organizations to scale their message processing capacity by adding multiple consumers while maintaining strict message ordering based on keys. This capability is crucial for applications requiring both high throughput and ordered processing, such as financial transactions, event processing, and real-time analytics.

In Pulsar 4.0, we've improved the Key_Shared subscription implementation through a significant enhancement with PIP-379. The new design ensures messages with the same key are handled by only one consumer at a time, while eliminating unnecessary message blocking that previously impacted system performance during consumer changes and application restarts.

The enhancement brings business value through improved service reliability and operational efficiency. Organizations can now scale their consumer application count dynamically without worrying about message ordering inconsistencies or system slowdowns. When consumers are added or removed, only the affected message keys are temporarily managed, rather than blocking entire message streams.

Operations teams can quickly identify and resolve any Key_Shared ordered message delivery issues through comprehensive troubleshooting metrics in Pulsar topic stats. This translates to reduced system downtime and faster incident resolution, crucial for maintaining service level agreements in production environments. Future improvements will introduce a REST API that will further simplify troubleshooting by providing direct access to unacknowledged message details and powerful key-based search capabilities for resolving message delivery issues where typically the root cause is in an application that doesn't acknowledge a message and due to message ordering constraints, further messages for the key are blocked. Web based user interfaces and CLI tools can build upon this REST API, allowing also automation for resolving or alerting in operations. Related Key_shared troubleshooting metrics will also be exposed via Prometheus and OTel interfaces in future updates.

This major improvement positions Pulsar 4.0 as an even more compelling choice for organizations requiring both strict message ordering and high scalability in their data streaming architecture, particularly valuable for businesses processing millions of ordered events across a large number of consumers.

Enhanced Secure Docker Image Runtime Based on Alpine and Java 21

Pulsar 4.0 contains enhancements to its Docker runtime environment, combining the security benefits of Alpine Linux with the performance improvements of Java 21's runtime. PIP-324 introduced in Pulsar 3.3.0 aligns with our commitment to providing a secure, efficient, and resource-optimized platform for messaging workloads.

The new Docker images are now based on Alpine Linux instead of Ubuntu, reducing the image size while improving the security posture.

A key security enhancement is the elimination of CVEs in the base image. While the previous Ubuntu-based images carried 12 Medium/Low CVEs with no available resolution, the new Alpine-based images start with zero CVEs, providing a more secure foundation for production deployments. This improvement is particularly valuable for organizations with strict security requirements and compliance needs.

The Docker images now include Java 21 with Generational ZGC, bringing significant improvements in garbage collection performance. Generational ZGC provides sub-millisecond pause times, better CPU utilization, and improved memory efficiency compared to previous garbage collectors. This translates to more predictable latencies and better resource utilization for Pulsar deployments.

These improvements make Pulsar 4.0's Docker runtime an even more compelling choice for organizations requiring both security and performance in their messaging infrastructure. The combination of Alpine Linux's minimal attack surface and Java 21's advanced garbage collection provides a robust foundation for running Pulsar in containerized environments.

Enhanced Quality of Service Controls

Apache Pulsar's truly multi-tenant architecture has made it a preferred choice for organizations building messaging-as-a-service platforms versus disparate, siloed clusters. The platform's ability to efficiently manage resources across multiple tenants while maintaining service reliability has proven particularly valuable in demanding enterprise environments.

In Pulsar 4.0, we highlight significant improvements in Quality of Service (QoS) controls, particularly through PIP-322 that was introduced in Pulsar 3.2. This enhancement refactors the rate limiting implementation, addressing critical performance issues that previously impacted service reliability and system performance during high-load scenarios.

Rate limiting serves as the foundation for comprehensive capacity management in multi-tenant environments. One of the key goals of capacity management in a multi-tenant system is to address the "noisy neighbor" problem - where one tenant's workload negatively impacts others - without requiring significant infrastructure overprovisioning to handle peak loads.

The new rate limiting implementation uses an efficient token bucket algorithm that provides accurate and consistent rate limiting across all levels - broker, topic, and resource group. This unified approach eliminates the need for previous separate "default" and "precise" rate limiters, significantly reducing CPU overhead and lock contention that previously affected IO threads and added unnecessary latency for resources that weren’t throttled.

The refactored rate limiting system provides more consistent behavior when handling various throttling scenarios. This ensures more predictable performance in multi-tenant environments where multiple rate limiting conditions may apply simultaneously.

These QoS improvements position Pulsar as an even more robust platform for messaging-as-a-service teams, enabling better service level management and capacity control in large-scale deployments. The enhanced rate limiting system provides a foundation for future QoS features, particularly valuable for organizations requiring precise control over resource utilization and improved service reliability across multiple tenants.

The PIPs Behind Pulsar 4.0

Pulsar 4.0 includes numerous Pulsar Improvement Proposals (PIPs) that have enhanced the platform's capabilities across multiple areas. Here are some of the most significant improvements:

Core Architecture

  • PIP-264: Implements comprehensive OpenTelemetry integration for improved observability
  • PIP-335: Introduces Oxia as a scalable metadata storage solution, offering improved scalability and reliability
  • PIP-376: Makes topic policies service pluggable for better extensibilityPIP-379: Enhances Key_Shared subscription with improved message ordering and troubleshooting capabilities
  • PIP-384: Decouples ManagedLedger interfaces for more flexible storage implementations

Performance and Scalability

  • PIP-354: Applies topK mechanism to ModularLoadManagerImpl for better resource utilization
  • PIP-358: Enhances resource weight functionality across load management componentsPIP-364: Introduces a new AvgShedder load balancing algorithm
  • PIP-378: Adds ServiceUnitStateTableView abstraction for improved state management

Security and Operations

  • PIP-324: Introduces Alpine-based Docker images for reduced attack surface and smaller footprint
  • PIP-337: Adds SSL Factory Plugin for customized SSL Context and Engine generation
  • PIP-347: Adds role field in consumer statistics for better authentication tracking
  • PIP-369: Introduces flag-based selective unload for namespace isolation policy changes
  • PIP-383: Supports granting/revoking permissions for multiple topics

Many of these improvements build upon features introduced in Pulsar 3.x releases, such as PIP-322 (Rate Limiting Refactoring) from 3.2, which laid the groundwork for better multi-tenancy and Quality of Service controls. The combination of these PIPs positions Pulsar 4.0 as a significant step forward in building a more robust, scalable, and secure streaming platform.

Thank You to Apache Pulsar Contributors

Apache Pulsar 4.0 represents the collaborative effort of a vibrant and growing open-source community. This landmark release was made possible through the dedication and contributions of developers, organizations, and users worldwide who share our vision of making data streaming more accessible, affordable, and scalable.

We extend our deepest gratitude to:

  • The individual contributors who developed new features, reported bugs, fixed bugs, and improved documentation
  • The committers and PMC members who guided the project's technical direction
  • The organizations that have deployed Pulsar in production and shared their valuable feedback
  • The users who participated in testing and provided invaluable input during the release process
  • The broader Apache Software Foundation community for their continued support

Your collective efforts have not only shaped this release but continue to strengthen Apache Pulsar's position as a leading data streaming platform. The improvements in Pulsar 4.0 reflect our community's commitment to technical excellence and innovation.

We welcome new contributors to join our community and help us build the future of data streaming technology. Whether through code contributions, documentation improvements, or sharing your Pulsar deployment experiences, every contribution helps make Pulsar better for everyone.

Conclusion

Apache Pulsar 4.0 marks a transformative milestone as our second Long-Term Support (LTS) release, delivering major advancements in modularity, observability, and scalability. This release significantly enhances Pulsar's position as the foundation for an open data streaming architecture, with improvements that address critical enterprise needs:

  • A fully modular architecture that enables flexible deployment models and storage options, from Oxia for metadata to S3-based write-ahead logging
  • Built-in OpenTelemetry integration providing deep insights into system performance and behavior
  • Enhanced Key_Shared subscriptions bringing improved message ordering while maintaining scalability
  • Advanced load balancing capabilities supporting millions of topic partitions
  • Strengthened Quality of Service controls with refined rate limiting
  • A more secure and efficient containerized runtime based on Alpine Linux and Java 21

These enhancements make Pulsar 4.0 a compelling choice for organizations building modern data streaming applications, from startups to global enterprises. The combination of enterprise-grade features, operational simplicity, and robust security positions Pulsar as a foundational technology for the future of data streaming.

We invite the global community of developers, architects, and data engineers to explore Pulsar 4.0's capabilities and join us in advancing the state of the art in data streaming technology. We also invite you to contact us to explore the advantages that we provide on top of Pulsar in our ONE StreamNative Platform which simplifies your data streaming initiatives.

Matteo Meril
Matteo Merli is the CTO at StreamNative. Prior to this role, Matteo held the title of VP, Apache Pulsar at The Apache Software Foundation since June 2015. Matteo was also a Co-Founder at Streamlio before it was acquired by Splunk, where Matteo worked as a Sr. Principal Software Engineer. With a background in computer engineering, Matteo has a wealth of experience in designing and developing large-scale distributed applications and working on projects involving network protocols and data analysis. Matteo's expertise extends to architecting and leading the development of Pulsar, a distributed pub-sub messaging platform.

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Apache Pulsar Announcements