sharetwitterlinkedIn

Announcing the Google Cloud Pub/Sub Connector for Apache Pulsar

June 24, 2022
head img

We are excited to announce the general availability of the Google Cloud Pub/Sub connector for Apache Pulsar. The connector enables seamless integration between Google Cloud Pub/Sub and Apache Pulsar, improving the diversity of the Apache Pulsar ecosystem.

What is the Google Cloud Pub/Sub connector?

The Google Cloud Pub/Sub connector is a Pulsar IO connector enabling data replication between Google Cloud Pub/Sub and Apache Pulsar. The connector provides two ways to import and export data between the systems: Source and Sink.

Source

The Google Cloud Pub/Sub source fetches data from Google Cloud Pub/Sub and writes data to Apache Pulsar topics.

Figure 1. Google Cloud Pub/Sub source

Sink

The Google Cloud Pub/Sub sink pulls data from Apache Pulsar topics and persists data to Google Cloud Pub/Sub.

Figure 2. Google Cloud Pub/Sub sink

Why did StreamNative develop the Google Cloud Pub/Sub connector?

Apache Pulsar and Google Cloud Pub/Sub are two of the most popular and widely used messaging platforms in modern cloud environments. Apache Pulsar’s unified platform enables queueing data, analytics, and streaming in one underlying system. Google Cloud Pub/Sub is known for efficient performance, a powerful ecosystem in streaming analytics, and the capability of in-order delivery at scale.

Historically, however, users did not have a simple and reliable way of performing fully-featured messaging and streaming in one cloud pub/sub system, so they compensated for this by investing significant development efforts to bridge the gaps. 

The new StreamNative connector provides Google Cloud Pub/Sub users a way to connect the flow of messages to Pulsar and use the features unavailable elsewhere, while also avoiding problems with connectivity that can appear when there are intrinsic differences1 between systems or privacy requirements. The connector solves this problem by fully integrating with the rest of Pulsar’s system (including, serverless functions, per-message processing, and event-stream processing). It presents a low-code solution with out-of-box capabilities like multi-tenant connectivity, geo-replication, protocols for direct connection to end-user mobile clients or IoT clients, and more. These features are essential for two-way event traffic.

What are the benefits of using the Google Cloud Pub/Sub connector?

The integration between Google Cloud Pub/Sub and Apache Pulsar results in 3 key benefits.

  • Easy. You can quickly move data between Apache Pulsar and Google Cloud Pub/Sub without writing any code.
  • Efficient. You can reduce the time on the data layer and have more time to find the maximum business value from real-time data in an effective way.
  • Scalable. You can run this connector on any node (standalone or distributed), allowing you to build reactive data pipelines to meet your business and operational needs in real-time.

How do I start using the Google Cloud Pub/Sub connector?

You can be up and running with the connector in 3 easy steps:

  • Configure the services and download the connector
  • Configure the source connector
  • Configure the sink connector

Before you start

First, you must run an Apache Pulsar cluster and a Google Cloud Pub/Sub service.

  1. Prepare the Pulsar service. You can quickly run a Pulsar cluster anywhere by running $PULSAR_HOME/bin/pulsar standalone. See Getting Started with Pulsar for details. Alternatively, get started with StreamNative Cloud, which provides an easy-to-use and fully-managed Pulsar service in the public cloud.
  2. Prepare the Google Cloud Pub/Sub service. See Getting Started with Google Cloud Pub/Sub for details. Note that you need to install gcloud CLI, and set up the GOOGLE_APPLICATION_CREDENTIALS environment variable to access Google Cloud.
  3. Set up the Google Cloud Pub/Sub connector. Download the connector from the Releases page, and then move it to $PULSAR_HOME/connectors.

Apache Pulsar provides a Pulsar IO feature to run the connector. Follow the steps below to quickly get the connector up and running.

Configure the source connector

  1. Create a configuration file named google-pubsub-source-config.json to send the pulsar-io-google-pubsub/test-google-pubsub-source topic messages from Google Cloud Pub/Sub to the public/default/test-google-pubsub-source topic of Apache Pulsar:

    {
        "tenant": "public",
        "namespace": "default",
        "name": "google-pubsub-source",
        "topicName": "test-google-pubsub-source",
        "archive": "connectors/pulsar-io-google-pubsub-$VERSION.nar",
        "parallelism": 1,
        "configs":
        {
        "pubsubProjectId": "pulsar-io-google-pubsub",
        "pubsubTopicId": "test-google-pubsub-source"
        }
    }
    
  2. Run the source connector:

    $PULSAR_HOME/bin/pulsar-admin sources localrun --source-config-file /path/to/google-pubsub-source-config.json
    

Configure the sink connector

  1. Create a configuration file named google-pubsub-sink-config.json to send the public/default/test-google-pubsub-sink topic messages from Apache Pulsar to the pulsar-io-google-pubsub/test-google-pubsub-sink topic of Google Cloud Pub/Sub:

    {
        "tenant": "public",
        "namespace": "default",
        "name": "google-pubsub-sink",
        "inputs": [
        "test-google-pubsub-sink"
        ],
        "archive": "connectors/pulsar-io-google-pubsub-$VERSION.nar",
        "parallelism": 1,
        "configs": {
        "pubsubProjectId": "pulsar-io-google-pubsub",
        "pubsubTopicId": "test-google-pubsub-sink"
    }
    }
    
  2. Run the sink connector:

    $PULSAR_HOME/bin/pulsar-admin sinks localrun --sink-config-file /path/to/google-pubsub-sink-config.json
    

When you send a message to the public/default/test-google-pubsub-sink topic of Apache Pulsar, this message is persisted to the pulsar-io-google-pubsub/test-google-pubsub-sink topic of Google Cloud Pub/Sub.

For more information, see the demo video.

How can I get involved?

The Google Cloud Pub/Sub connector is a major step in the journey of integrating other messaging systems into the Pulsar ecosystem. To get involved with the Google Cloud Pub/Sub connector for Apache Pulsar, check out the following featured resources:

  • Try out the Google Cloud Pub/Sub connector. To get started, download the connector and refer to the ReadMe that walks you through the whole process.
  • Contact us. Feel free to create an issue on GitHub, send email to the Pulsar mailing list, or message us on Twitter to get answers from Pulsar experts.
  • Make a contribution. The Google Cloud Pub/Sub connector is a community-driven service, which hosts its source code on the StreamNative GitHub repository. We would love you to explore this new connector and contribute to its evolution. If you have any feature requests or bug reports, do not hesitate to share your feedback and ideas and submit a pull request.

1Intrinsic differences exist between platforms that have no notion of schema and the ones that have sophisticated schema capabilities because there is no simple way to translate between them. These platform differences range from traditional messaging like Amazon SQS to multi-level hierarchical Avro schema written to a data lake. Distinctions also exist between platforms relying on different data representations, such as Pandas DataFrames and simple messages.

© StreamNative, Inc. 2022Apache, Apache Pulsar, Apache BookKeeper, Apache Flink, and associated open source project names are trademarks of the Apache Software Foundation.TermsPrivacy