We are excited to announce that StreamNative is open-sourcing "MQTT on Pulsar" (MoP). MoP brings the native MQTT protocol support to Apache Pulsar by introducing an MQTT protocol handler on Pulsar brokers. Similar to KoP, MoP is also an implementation of the pluggable protocol handler. By adding the MoP protocol handler in your existing Pulsar cluster, you can migrate your existing MQTT applications and services to Pulsar without modifying the code. This enables MQTT applications to leverage Pulsar’s multi-layer system architecture (separation of compute and storage) and powerful features, such as infinite event stream retention with Apache BookKeeper and tiered storage.
What is Apache Pulsar
Apache Pulsar is a cloud-native, distributed messaging and streaming platform that manages hundreds of billions of events per day. Pulsar was originally developed and deployed inside Yahoo as the consolidated messaging platform connecting critical Yahoo applications such as Yahoo Finance, Yahoo Mail, and Flickr, to data. Pulsar was contributed to open source by Yahoo in 2016 and became a top-level Apache Software Foundation project in 2018.
Apache Pulsar is a multi-tenant, high-performance solution for server-to-server messaging, including features such as native support for multiple clusters in a Pulsar instance, seamless geo-replication of messages across clusters, low publish and end-to-end latency, seamless scalability to over a million topics, and guaranteed message delivery with persistent message storage provided by Apache BookKeeper, among others.
Currently, Apache Pulsar is used in a wide variety of industries. Enterprises, such as Tencent, Verizon Media, Splunk, ChinaMobile, and BIGO, have deployed Apache Pulsar to achieve their business goals. For more use cases, click here.
What is MQTT
Message Queuing Telemetry Transport (MQTT) is a lightweight publish-subscribe messaging transport protocol. MQTT is built on the TCP/IP protocol and was created by IBM in 1999. It is ideal for connecting remote devices with a small code footprint and minimal network bandwidth to provide real-time and reliable messaging services. Today, as a low-overhead and low-bandwidth real-time communication protocol, MQTT today is adopted widely across multiple industries, such as the Internet of Things (IoT), small microcontrollers, automotive, etc.
Apache Pulsar provides a unified messaging model for both queueing and streaming workloads. Pulsar implemented its own Protobuf-based binary protocol to provide high performance and low latency. This choice of Protobuf makes it convenient to implement Pulsar clients and the project already supports Java, Go, Python, and C++ languages alongside third-party clients provided by the Pulsar community.
Because Apache Pulsar’s multi-tenancy and the overall architecture with Apache Bookkeeper is able to simplify operations, an increasing number of companies are exploring a way to shift and build the foundation of their services on Pulsar. However, to adopt Pulsar’s new unified messaging protocol, existing applications written using other messaging protocols have to be rewritten.
To address this, StreamNative has been working on a number of new projects to make this transition easier for organizations adopting Pulsar. Earlier this year, StreamNative announced KoP (Kafka-on-Pulsar) and AoP (AMQP-on-Pulsar) protocol handlers to facilitate the migration to Pulsar from Kafka and AMQP.
KoP brings the native Apache Kafka protocol support to Apache Pulsar by introducing a Kafka protocol handler on Pulsar brokers. For details, see here.
AoP brings the native AMQP protocol support to Apache Pulsar by introducing an AMQP protocol handler on Pulsar brokers. For details, see here.
Over the past several months, StreamNative received a lot of inbound requests for help migrating services from MQTT to Pulsar and recognized the need to also support MQTT protocol natively on Pulsar. StreamNative invested engineering time and effort to introduce a general protocol handler framework in Pulsar that would allow developers who use the MQTT protocol to use Pulsar.
MoP is implemented as a pluggable protocol handler that can support native MQTT protocol on Pulsar by leveraging Pulsar features such as Pulsar topics, cursors etc. The diagram below illustrates a Pulsar cluster with the MoP protocol handler. Both the MQTT Proxy and MQTT protocol handler can run along with Pulsar brokers.
MQTT has defined three Quality of Service (QoS) levels:
QoS level 0 (at most once): This service level guarantees a best-effort delivery. There is no guarantee of delivery. The receiver does not acknowledge receipt of the message and the message is not stored and re-transmitted by the sender.
QoS level 1 (at least once): This service level guarantees that the message is transferred successfully to the receiver. The sender stores the message until it gets a PUBACK packet from the receiver that acknowledges receipt of the message. If the sender does not receive an acknowledgement, it will resend the message with the duplicate (DUP) flag set. It is possible for a message to be delivered multiple times.
QoS level 2 (exactly once): This service level guarantees that each message is received only once by the receiver. The guarantee is provided by a sequence of four messages between the sender and the receiver to ensure that the message has been sent and that the acknowledgement has been received. QoS level 2 is the highest service level in MQTT.
Currently, the MoP protocol handler only supports QoS level 0 and QoS level 1. In future releases, QoS level 2 will be supported.
The MoP Proxy is an optional component for MoP. It extends MoP to multiple nodes to realize horizontal expansion of services. The MoP Proxy is mainly used to forward messages delivered between the MQTT Client and the Pulsar broker. Therefore, the MQTT Client only needs to connect to the MoP Proxy to send and receive data, regardless of the Pulsar broker to which topics are dispatched.
The MoP Proxy can sense the status of Pulsar brokers. Once a Pulsar broker is disconnected or unreachable, the MoP Proxy will send messages from the MQTT Client to a new Pulsar broker.
The following figure illustrates the MoP Proxy service workflow.
The MQTT client creates a connection with the MoP Proxy.
The MoP Proxy service sends a lookup request to Pulsar cluster to find out the owner broker URL of the topic along with the connection.
The Pulsar cluster returns the owner broker URL to the MoP Proxy.
The MoP Proxy builds a connection to the owner broker and starts to transfer data between the MQTT client and the owner Broker.
At present, the MoP Proxy works with the Pulsar broker. Users could choose whether to start the MoP Proxy service through the related configuration. For details, see here.
Try it Out
MoP is open-sourced under Apache License V2. You can download the MoP protocol handler to try out all the features of MoP. For details about how to use the MoP protocol handler, see here.
For any problems in the use of the MoP protocol handler, you can create an issue in the MoP repository. We will reply to you as soon as possible. Meanwhile, we look forward to your contribution to MoP.
Penghui Li is passionate about helping organizations to architect and implement messaging services. Prior to StreamNative, Penghui was a Software Engineer at Zhaopin.com, where he was the leading Pulsar advocate and helped the company adopt and implement the technology. He is an Apache Pulsar Committer and PMC member. Penghui lives in Beijing, China.
Xiaolong Ran is a Senior Software Engineer at Tencent Cloud, an Apache Pulsar Committer, an RoP maintainer, and an Apache Pulsar Go Client and Go Functions Developer and Maintainer.