David Kjerrumgaard, Pulsar In Action Author, Talks All Things Pulsar
August 03, 2021
In this blog we talk to David Kjerrumgaard, long-time Pulsar user and author of the book Pulsar in Action, a Manning Publication, to get his insights on the messaging and streaming space, the trends driving Pulsar adoption, and his new role as a Developer Advocate at StreamNative.
Q: Before we jump in, let’s start with your background.
A: Over the past decade, I have had the opportunity to architect stream processing solutions for Fortune 500 companies across a variety of industries. First, as part of the professional services team at Hortonworks using a combination of Apache NiFi, Storm and Kafka, and later at a startup called Streamlio that focused on Apache Pulsar and Heron. Streamlio was acquired by Splunk to build out the messaging layer of its stream processing offering that is responsible for processing over 10 terabytes of data per day.
Q: You’ve been working on Pulsar since 2017. Can you tell us about the early days with Pulsar?
A: To give some context, Pulsar was committed to open-source by Yahoo in 2016 and when I joined the team at Streamlio in 2017, we were initially focused on the Apache Heron distributed computing framework. Based on the feedback from our customers, we quickly pivoted to Apache Pulsar to address the gap in the market for a unified messaging and streaming platform. We spent the next 16 months maturing the Pulsar project and building the community. In 2018 Pulsar became a top-level project at the Apache Software foundation.
Q: People have strong opinions on the Pulsar versus Kafka debate. What is your perspective?
A: It’s an interesting debate. Some people look at Kafka and think that its widespread adoption is because the tech is better or superior in some way. The reality is that Kafka was released five or six years ahead of Pulsar and the community has had more time to mature. I believe that Pulsar is on the same trajectory that Kafka was at this point in its evolution. In fact, Pulsar growth has skyrocketed over the last few years and if you look at the projects today, the tables are turning.
Kjerrumgaard is supported by the numbers here. In June 2021, Apache Pulsar surpassed Apache Kafka in its number of monthly active contributors.
Q: That is quite a change. Can you share your perspective on why Pulsar is so popular?
A: Pulsar’s cloud-native architecture has many advantages over existing legacy messaging systems that were designed to run on physical servers. The growing popularity of cloud and container-based deployments has accelerated the adoption of Pulsar because it is designed to run in these environments. If you are a Kafka or Confluent organization today and you’re moving to the cloud, you’re going to consider Pulsar.
Q: What are the advantages that Pulsar has in a cloud environment?
A: Every messaging system consists of two distinct “layers”, a serving layer that is responsible for receiving and delivering messages to clients, and a storage layer that retains the messages on disk until they are consumed.
Traditional messaging systems such as Kafka or RabbitMQ are designed to have these two layers running alongside one another on the same physical node in order to eliminate the need for an additional network “hop” to retrieve the data from storage. Today, the minor speed advantage you gain from a single-tier architecture is outweighed by the lack of scalability it imposes.
Apache Pulsar decouples the serving and storage layers, allowing them to run independently inside separate containers which makes it easier to deploy and dynamically scale in the cloud. Separating the layers also allows the serving layer to be completely stateless, meaning that any node can serve any message because the data is located on a different layer, only one network call away.
Pulsar’s independent layers can fully exploit the elasticity of today’s modern cloud computing environments by dynamically adding or removing capacity in either the serving or storage layers. This can be done automatically by leveraging existing tools such as Kubernetes horizontal pod autoscaler.
Q: If you were to name Pulsar’s biggest differentiator, what would it be?
A: Versatility is a big differentiator for Pulsar. Not only is it the only messaging platform that supports both pub/sub and streaming message consumption patterns, but it’s pluggable protocol handler allows it to support a variety of common messaging protocols such as AMQP, MQTT, JMS, and Kafka.
All other messaging systems only support one messaging consumption pattern and one binary messaging protocol. A common driver of Pulsar adoption is migration away from multiple messaging systems onto a unified messaging platform based on Apache Pulsar. Companies are looking to eliminate the need to maintain both a system for pub/sub messaging such as RabbitMQ and another one for streaming such as Apache Kafka.
Q: How difficult is it to move from other streaming and messaging technologies to Pulsar?
A: More often than not, the organization has developed several business critical applications based upon the technology and so they are tied to a particular API which makes migration difficult.
Apache Pulsar’s ability to support legacy messaging protocols streamlines the migration process by allowing you to run your existing applications with minimal code changes. If you are migrating an application that uses one of the wire protocols that Pulsar supports then the only changes that need to be made to your code are API related.
If you are migrating an existing Kafka application that uses the Java client, you can use Pulsar’s Kafka Adaptor that provides a 100% Kafka compatible API. Using this adapter, any existing Java code will work without any changes needed.
Q: Your passion for the space is apparent. Can you tell me about the decision to join StreamNative?
A: It is great to see the Pulsar market taking off, and multiple companies offering Apache Pulsar as a service. What distinguishes StreamNative from the competition is the caliber of talent. Not only do we have two of the original creators of Apache Pulsar, but we have more Apache committers than anyone else which means we are the center of gravity for the Apache project overall.
Having worked with Matteo Merli [Apache Pulsar Chair and StreamNative CTO) and Sijie Guo [Apache Pulsar Member and StreamNative CEO] in the past, I knew that their technical expertise was second to none in this space and I knew that I couldn’t pass up the opportunity to collaborate with them again. Pivoting from an individual contributor role at Splunk to a Developer Advocate at StreamNative will allow me to have a bigger impact on the Apache Pulsar community at a time when adoption is accelerating.
Q: Can you tell us about the StreamNative offering?
A: StreamNative is powered by Pulsar and provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies. We offer both Cloud and Platform products so you can choose cloud, on-prem, or a hybrid of both.
Q: Last question, what makes StreamNative Cloud & Platform exciting?
A: The StreamNative products are a game-changer, because they enable organizations to unlock the power of Apache Pulsar with a turnkey, enterprise offering across cloud, hybrid, and on-premise environments without the heavy lift from the DevOps teams.
David’s experience developing real-time messaging, streaming, Edge/IoT, and Big Data solutions for customers across a broad range of industries will be beneficial to both the StreamNative team and will help ensure the success of StreamNative’s customers.
His upcoming book, Pulsar in Action, will be available in print by Manning Publications in December. For a sneak peek, visit StreamNative.io. We are a proud sponsor of the book and are excited to offer an early release. Check our site in mid-August to get your free download.
To stay up-to-date on Kjerrumgaard’s upcoming talks and webinars, we encourage you to join the StreamNative mailing list, StreamNative Community Slack Channel, and follow us on Twitter at @streamnative.io.