Apache Pulsar Adoption Story in ActorCloud (IoT Platform)
Background
EMQ is an open-source software company providing highly-scalable, real-time messaging and streaming engine for IoT platforms & applications in the 5G era. Currently, EMQ is one of the most widely used MQTT message brokers in the world and has successfully supported various global clients, including HPE, Ericsson, Huawei, China Mobile, China UnionPay, and so on.
ActorCloud is an open-source IoT platform launched by EMQ, which provides multiple protocol access, message flow management, data parsing, and data processing capabilities for devices on a secure and reliable basis. ActorCloud uses Apache Pulsar to store and process streaming data, leverages Apache Pulsar Functions to handle data faster and analyzes IoT data through the SQL engine exposed to the upper layer.
Problem
As an IoT platform, ActorCloud needs to access data, manage devices, store and analyze data, and provide programming interfaces to the upper layer so that developers can develop applications conveniently. Since ActorCloud has plenty of devices and large amounts of data, it needs the ability to scale out horizontally to meet business needs.
Why Pulsar fits best
To solve the problem stated previously, we need a highly available, distributed, and scalable messaging platform, which is Apache Pulsar.
- Apache Pulsar is a highly available and scalable messaging platform with easy deployment and maintenance.
- Apache Pulsar achieves high throughput with 1.8M messages per second for a partition, which fully satisfies our needs for a large volume of data.
- Apache Pulsar Functions are lightweight compute processes that consume messages from one or more Pulsar topics, apply user-supplied processing logics to each message, and publish the results of the computation to another topic. Apache Pulsar Functions support three kinds of runtimes: thread, process, and Kubernetes, which provides high flexibility for writing, running, and deploying Functions. Consequently, we need only focus on computation logic rather than dealing with complicated configuration and management, which helps us build a streaming platform easily.
With the highly available and scalable ability, Functions, and connectors, Pulsar helps us develop ActorCloud faster so that we select Pulsar as our messaging platform finally.
How we use Pulsar at ActorCloud
ActorCloud transfers business logic written in SQL to an engine through API and translate business rules to connectors and Functions in Pulsar. Sources consume the data from EMQ X Brokers through shared subscriptions, then Pulsar persists these data and processes them with Functions in real time, and sends them to external systems through sinks.
How we use Pulsar Functions at ActorCloud
Apache Pulsar provides native support for serverless functions where data is processed as soon as it arrives in a streaming fashion and gives flexible deployment options (thread, process, container). We need only focus on computation logic rather than dealing with complicated configuration or management, which helps us build a streaming platform faster and conveniently.
- To better support batch and stream processing scenarios, ActorCloud uses Pulsar window functions. Currently, Pulsar supports the count-based window and the time-based window.
- ActorCloud uses Pulsar Functions API and Pulsar admin tool (create, delete, update, restart, stop, get, and so on) to manipulate functions, which simplifies the deployment and management.
- ActorCloud uses Pulsar's shared subscription mode to extend the ability of data consumption. Besides, Pulsar supports exclusive, failover, and key_shared subscription modes.
- Pulsar Functions provides three messaging semantics, that is, at-most-once delivery, at-least-once delivery, and effectively-once delivery. ActorCloud uses the at-least-once delivery to ensure each message sent to a function is processed at least once.
- Pulsar deploys a multi-layer architecture of separating computation and storage. When storing data, ActorCloud configures message retention policies to select data retention periods. At the same time, Pulsar integrates with Presto SQL, allowing users to use Presto SQL to query data stored in BookKeeper. Consequently, ActorCloud uses Presto SQL to query real-time and historical data to deal with analytical tasks.
How we use Pulsar IO connectors at ActorCloud
Pulsar IO connectors come in two types:
- Sources feed data into Pulsar from other systems, and common sources include other messaging systems and firehose-style data pipeline APIs.
- Sinks are fed data from Pulsar, and common sinks include other messaging systems, SQL and NoSQL databases.
ActorCloud takes full advantage of Pulsar IO connectors and creates various sources and sinks to meet different needs.
- EMQ source
- Read data from EMQ and write data to Pulsar topics (sync data from EMQ with Pulsar).
- Mail sink
- Receive data from Pulsar topics and send emails.
- Publish sink
- Receive data from Pulsar topics and send data to external systems using initialized HttpClient.
- DB sink
- Receive data from Pulsar topics and send data to external systems. DB sink encapsulates JDBC and supports sending data from Pulsar topics to SQLite, MySQL, and PostgreSQL.
- EMQ sink
- Receive data from Pulsar topics and send data to EMQ X topics.
Summary
With both EMQ X and Apache Pulsar, ActorCloud implements IoT device data access, device management, data storage, data analysis, and provides a flexible programming interface to develop IoT applications that meet specific needs of IoT vertical industries, and it enables horizontal expansion of device access and data processing.
Newsletter
Our strategies and tactics delivered right to your inbox