Oct 25, 2023
10 min read

Q3 '23 StreamNative Cloud Launch: Deliver a modern data streaming platform for enterprises

Vik Narayan
Product Manager, StreamNative
Eric Shen
Product Manager, StreamNative
Sijie Guo
CEO and Co-Founder, StreamNative, Apache Pulsar PMC Member

This Q3 StreamNative Cloud Launch comes to you from Pulsar Summit North America 2023, where the Pulsar community and messaging & data streaming industry experts have come together to share insights into the future of Apache Pulsar and data streaming, as well as explore new areas of innovation. This year, we’re introducing StreamNative Private Cloud- StreamNative’s self-managed offering of Apache Pulsar, Kafka on StreamNative, improvements to Apache Pulsar, Pulsar Functions, Connectors, insights into how Pulsar and the data streaming ecosystem work together, and much more.

This year's Pulsar Summit witnessed nearly 200 in-person attendees and featured over 20 enlightening sessions delivered by industry leaders. The event commenced with a warm welcome from StreamNative's CTO, Matteo Merli, who delved into the evaluation of data streaming platforms for modern enterprises. We then learned about Cisco’s journey deploying Pulsar on Cisco’s Cloud Native IoT Platform from Cisco Senior Director Chandra Ganguly and Principal Engineer Alec Hothan. Additionally, David Christle, Staff Machine Learning Engineer at Discord, elucidated their transition from Google Pub/Sub to Pulsar for Streaming Machine Learning with Flink and Iceberg.

Stay tuned to our blog for a concise overview of the highlights from Pulsar Summit North America 2023; session recordings will be available on our website shortly.

We are thrilled to align our roadmap with the visionary insights shared by our esteemed speakers for the Q3 launch in the latest release of StreamNative Cloud. These features empower customers to deliver a state-of-the-art data streaming platform for enterprises, enabling them to build mission-critical business applications from end to end.

What defines a modern data streaming platform for enterprises?

So, what defines a modern data streaming platform for enterprises?

In his keynote address, Matteo Merli outlined the criteria for evaluating a data streaming platform or an organization. I summarized them as the following four pillars.

  1. Unified: Modern enterprises need a unified platform that allows them to store a single copy of data and enables different applications and teams to consume the data in various semantics. This includes both queuing and streaming, supporting their preferred APIs and protocols such as Pulsar, Kafka, AMQP, MQTT, and JMS. This integration bridges the operational and analytical domains, providing developers with a suite of tools to create end-to-end real-time data streaming applications.
  2. Multi-tenant: Traditional messaging and data streaming systems were designed for single teams, making them unsuitable for organizations where multiple teams must share data and collaborate on innovation. A modern data streaming platform must be multi-tenant, reducing the need to manage and operate numerous clusters.
  3. Cost-efficient: A modern data streaming platform must be cost-efficient across multi-cloud and hybrid cloud environments, facilitating effective operation even during economic downturns.
  4. Global: Modern enterprises operate globally, spanning multiple regions, jurisdictions, and cloud environments. A modern data streaming solution must be designed to meet data privacy and sovereignty requirements across diverse regions while operating as a unified solution.

Figure 1. Four pillars of a modern data streaming platform

Overview of the latest features

Now, let's explore the latest features that advance our vision of a modern data streaming platform for enterprises:

  • Functions GA on StreamNative Cloud
  • Kafka on StreamNative (KSN)
  • StreamNative Private Cloud (Public Preview)
  • Revocable Cloud API Keys (Public Preview)
  • Broker Autoscaling (Public Preview)
  • Lakehouse tiered storage (Private Preview)

Functions Generally Available on StreamNative Cloud

Pulsar Functions stands as one of the distinctive features offered by Apache Pulsar. It presents an efficient means to consume messages from one or multiple topics, apply user-defined logic, and publish the processed results to other topics. This Pulsar-native lightweight computing framework solution empowers developers to focus on their core business logic and code creation, eliminating the necessity for complicated stream processing frameworks for the majority of use cases. However, the activation of Pulsar Functions was previously reliant on specific requests within Hosted and BYOC clusters. It has consistently ranked among the most sought-after features for broader availability within StreamNative Cloud.

We are thrilled to announce the General Availability of Pulsar Functions on StreamNative Cloud for all new Hosted and BYOC clusters. So, what does this mean for you?

  • Serverless Computing: With Pulsar Functions seamlessly integrated into StreamNative Cloud, there is no need to concern yourself with setting up and managing a separate stream processing cluster. Everything your Pulsar Functions need is already set up by StreamNative Cloud.
  • Simplified Management: The process of managing and monitoring your functions has become significantly simplified. You can effortlessly submit or manage functions using Terraform or `pulsarctl`, and conveniently access function details and logs directly from the StreamNative Console.
  • Continued Support: At StreamNative, our unwavering commitment to providing top-notch support remains steadfast. Whether you represent an enterprise or a digital-native startup, our dedicated team is readily available to offer guidance and assistance.

For more comprehensive information regarding this announcement and our future plans for Functions on StreamNative Cloud, we invite you to explore our informative blog post. With this announcement, all new Hosted and BYOC clusters will offer access to Pulsar Functions. As for existing clusters, please don't hesitate to reach out to our support team to request activation within your current setup.

Kafka on StreamNative (KSN)

Pulsar represents the future of data streaming technology, offering a plethora of advantages over Apache Kafka, particularly for enterprises. It offers native multi-tenancy, geo-replication, and unparalleled scalability and elasticity. However, transitioning to a new technology is not a simple decision, especially for enterprises heavily invested in Kafka. 

StreamNative has been at the forefront of addressing this challenge. Over the past few years, we introduced  Kafka-on-Pulsar (KoP), an open-source project that ensures Kafka protocol compatibility within Pulsar. With KoP, Kafka developers can seamlessly harness Pulsar's innovations while retaining the familiarity of Kafka. Yet, we recognized the need to bridge certain gaps, especially to cater to the advanced requirements of enterprise Kafka users, including support for additional Kafka features like KStreams or KSQL.

We are thrilled to introduce Kafka on StreamNative (KSN), a tailor-made enterprise solution designed for Kafka users looking to leverage Pulsar's enhanced capabilities. KSN encompasses all the familiar Kafka features you rely on, including but not limited to KStream, KSQL, KTables with Topic Compaction, Kafka Schema Registry, Kerberos Authentication, and more. Kafka Transaction is also available as part of KSN, currently in a private preview phase. For a more detailed overview, please explore our dedicated blog post on KSN.

Furthermore, we have extensively fortified KSN through rigorous testing, ensuring it not only meets but exceeds the demands of large-scale Kafka deployments. With KSN, it delivers throughput equivalent to what you can achieve with native Pulsar. Be sure to consult our performance comparison, which delves into the differences between native Pulsar and KSN, covering two distinct entry formats (entry format refers to how KSN stores data published from Kafka clients, with KSN supporting both Pulsar and Kafka formats).

Figure 2: Max Throughput Test (Pulsar vs KSN)

StreamNative Private Cloud (Public Preview)

In 2021, we introduced the StreamNative Platform, leveraging StreamNative Pulsar Operators. However, as we rolled it out, we identified challenges for our users:

  • Pulsar Operators supported only core components, leaving others to Helm, causing inconsistent user experiences.
  • Separate Pulsar Operators meant difficulty in integrating new components like Oxia, Cloud API Keys, and pfSQL.
  • There was an absence of a centralized resource for managing configurations like authentication.

Recognizing these challenges, we’ve revamped the StreamNative Platform, evolving it into the StreamNative Private Cloud, powered by an all-in-one operator. Central to StreamNative Private Cloud is the StreamNative Operator, streamlining the deployment, scaling, and Pulsar cluster management. This allows businesses to effortlessly orchestrate intricate data streams, shifting their focus to gleaning valuable insights from their data.

Figure 3. StreamNative Private Cloud

Benefits of the StreamNative Private Cloud include:

  • Streamlined Deployment: StreamNative Private Cloud automates the deployment of Pulsar clusters, so teams don't have to worry about manually configuring and managing the individual components.
  • High Availability: StreamNative Private Cloud sets up clusters in a highly available manner by default. They manage replica placement, broker distribution, and failover mechanisms, ensuring that event streams stay reliable even in the face of failures.
  • Declarative Configuration: StreamNative Private Cloud uses a declarative API, so teams can define the Pulsar cluster configuration in Kubernetes manifests. This makes it easy to manage the Pulsar cluster and to roll back changes if necessary.
  • Automated operation: StreamNative Private Cloud supports Auto-Scaling, which supports adjusting resource allocation in response to incoming workloads. 
  • Cost efficiency: StreamNative Private Cloud supports the Lakehouse tiered storage to offload your cold data to a lakehouse system in parquet format, which brings you the cost saving on storage and also supports efficient historical data analysis.

For a comprehensive overview of StreamNative Private Cloud, explore our documentation. Interested in experiencing it firsthand? Request a trial license and establish a Pulsar cluster in your private cloud setup.

Revocable Cloud API Keys (Public Preview)

Before you can send or receive a single message from a StreamNative Cloud Pulsar cluster, configuring the authentication mechanism is a prerequisite. By default, StreamNative Cloud employs OAuth2 authentication, considered one of the most advanced methods for authenticating clients accessing your Pulsar clusters. However, despite its sophistication, configuring OAuth2 can be complex, and many clients and integrations may not fully support it. Therefore, it is crucial to strike a balance between flexibility and security, ensuring protection against unauthorized access without confining your development team to a limited set of tools and clients.

In pursuit of this goal, we have introduced StreamNative Cloud API Keys as a novel authentication mechanism for your StreamNative Cloud clusters. StreamNative Cloud API Keys are JWT-based authentication tokens that empower Pulsar clients to establish connections with your Pulsar clusters on StreamNative. These keys have a substantial lifespan, featuring configurable expiration dates and the ability to be revoked at any time through the API or StreamNative Console. This feature is now accessible for both Hosted and BYOC clusters.

The Cloud API Keys feature offers a dual advantage: it provides a flexible authentication solution compatible with a wide range of clients and enables key rotation at regular intervals to bolster security and compliance. Additionally, it allows for immediate revocation in the event of a security breach. For more in-depth information about this feature, please refer to our dedicated announcement blog post. If you wish to explore Cloud API Keys before its general availability, don't hesitate to reach out to us!

Broker Autoscaling (Public Preview)

One of the standout features of Apache Pulsar has always been its ability to decouple storage from computing, enabling independent scalability between stateless serving and stateful storage. While this architectural innovation successfully addresses scalability challenges at their core, the process of scaling brokers based on CPU and Memory requirements has, until now, often required manual intervention or configuring a Horizontal Pod Autoscaler. This manual approach has proven to be inefficient and cost-ineffective, increasing operational expenses. But fear not, because we're thrilled to announce the solution: Broker Autoscaling.

This feature brings the following benefits to our valued customers:

  • Broker Autoscaling continually adjusts resource allocation in response to incoming workloads. Whether your system is suddenly inundated with traffic or experiences quieter periods, Pulsar ensures optimal resource utilization, resulting in substantial cost savings and notably improved system performance.
  • Broker Autoscaling balances the message processing load across brokers, proactively preventing any single broker from becoming a bottleneck. This guarantees the system maintains high throughput and low latency, even when dealing with hefty workloads.
  • Broker Autoscaling streamlines resource allocation, ensuring that organizations only pay for the precise resources they need. This strategic approach makes it a cost-effective solution, particularly in the realm of real-time data streaming.

You can read the documentation for more information. This feature is now available for open preview across all the Hosted clusters. For our BYOC (Bring Your Own Cluster) users, we encourage you to get in touch with our dedicated support team to experience the benefits of Broker Autoscaling firsthand.

Lakehouse tiered storage (Private Preview)

Apache Pulsar has been a pioneer in introducing the concept of tiered storage. This feature, which has also been adopted by competitors like Kafka, Confluent, and Redpanda, has become a cornerstone for many companies, including tech giants like Tencent, in their pursuit of cost-effective long-term streaming data storage. However, while tiered storage has been a game-changer, it was initially implemented using Pulsar’s proprietary storage format. This approach comes with inherent limitations that restrict the full potential of Apache Pulsar. In response, we’ve taken a bold step by adopting open industry-standard storage formats, a move we believe will greatly benefit Apache Pulsar users and the broader data streaming community. 

We are excited to introduce Lakehouse tiered storage to Apache Pulsar as a Private Preview feature on StreamNative Cloud. With this feature, well-known lakehouse storage options like Delta Lake, Apache Hudi, and Apache Iceberg become the tiered storage layer for Apache Pulsar. This development effectively transforms Apache Pulsar into a Streaming Lakehouse, allowing you to ingest directly into any lakehouse storage using popular messaging and streaming APIs and protocols such as Pulsar, Kafka (via KSN), AMQP (via AoP), and more. Our tests have demonstrated a 5x reduction in storage size compared to retaining data in BookKeeper and tiered storage using the existing Pulsar format.

For an in-depth understanding of the Streaming Lakehouse, we invite you to explore our blog post series. This feature is now available for BYOC customers. If you are interested in trying it out, please contact us. Your feedback will be invaluable as we continue to refine and enhance the tiered storage solution. Whether you're a Lakehouse vendor, a data processing or streaming SQL vendor, or an Apache Pulsar user, we welcome collaboration to define and iterate APIs for processing and querying data in this exciting realm of the "Streaming Lakehouse."

Start building with new StreamNative Cloud features

Are you eager to begin? We're excited to introduce our Quarterly Launch demo webinars. Don't forget to secure your spot for the Q3 '23 Launch demo webinar on November 14. It's an excellent opportunity to gain firsthand insights from our product and developer relations teams on how to effectively leverage these fresh features.

If you haven't already, we encourage you to request a trial license for our new Private Cloud to experience self-management capabilities or sign up for StreamNative Cloud to explore the latest features. Feel free to reach out to us for Proof of Concept (POC) opportunities with cloud credits as well.

Vik Narayan
Product Manager, StreamNative
Eric Shen
Eric Shen is a Product Manager at StreamNative. He previously worked at Microsoft & Qiniu & PingCAP & Hikvision and focused on Cloud, Storage, and Databases.
Sijie Guo
Sijie’s journey with Apache Pulsar began at Yahoo! where he was part of the team working to develop a global messaging platform for the company. He then went to Twitter, where he led the messaging infrastructure group and co-created DistributedLog and Twitter EventBus. In 2017, he co-founded Streamlio, which was acquired by Splunk, and in 2019 he founded StreamNative. He is one of the original creators of Apache Pulsar and Apache BookKeeper, and remains VP of Apache BookKeeper and PMC Member of Apache Pulsar. Sijie lives in the San Francisco Bay Area of California.

Related articles

Apr 11, 2024
5 min read

The New CAP Theorem for Data Streaming: Understanding the Trade-offs Between Cost, Availability, and Performance

Mar 31, 2024
5 min read

Data Streaming Trends from Kafka Summit London 2024

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Product Announcements