Native Apache Kafka Service Is Coming Soon to StreamNative Cloud. Join the waitlist and get $1,000 in credits.

Join Waitlist >
StreamNative Logo
BlogOct 12, 20227 min read

Improving Regular Expression-Based Subscriptions in Pulsar Consumers

Improving Regular Expression-Based Subscriptions in Pulsar Consumers

Written by

Andras Beni

Topics

Apache Pulsar

A closer look at regular expression-based subscriptions

Currently, you have two options when creating a consumer with a Pulsar client:

  1. Specify the list of topics you want to consume. Often this list consists of only one topic name, but it does not need to.
  2. Specify a regular expression as a topic pattern. Initially, this is equivalent to listing all topics that match the pattern. But as time goes by, new topics are created, while some may be deleted. The regular expression-based consumer has an auto-discovery mechanism. Under the hood, consumers regularly ask a broker for the current list of topics. Whenever the consumer finds a change in the set of topic names that match the pattern, it subscribes to the new topics. You can set the period at which the consumer will check for updates and refresh its list of topics at creation time. By default, this happens once every minute.

The following image shows a typical interaction between the consumer and the broker:

Figure 2. The broker sends a filtered response.

Skipping updates when nothing has changed

Often there’s no change in the response from the broker between subsequent requests. This is either because no new topic is created at all or because the new topic(s) don’t match the pattern. In such cases, there’s no value in responding with the list of topic names; the broker should indicate that nothing has changed, so that the consumer can continue without updates to its list of topics. One way to enable such a response is for brokers to track the last response to a specific consumer. However, this would put an unnecessary burden on brokers, so we decided to let consumers keep track of this state. Specifically, brokers calculate a hash from the topic list and include it in the response. The next time the consumer requests the list of topics, it adds the hash to the request. If the broker finds that the current hash is the same as the one in the request, it sends a response with a flag instead of topic names to show that the state the consumer knows about is still current.

Figure 3. The broker indicates that no change happened.

Notifications for faster discovery

The above features solved the issue of unnecessary network traffic but did not help discover topics earlier and avoid lags for the first messages. For that, we introduced topic list watchers.

As shown in Figure 4, consumers register as watchers with brokers. The initial exchange resembles what we’ve discussed in the sections above. The difference is that the broker keeps track of watchers. Brokers get notifications from the metadata store whenever a new topic is created (possibly through another broker) and immediately send a message to the consumers that registered with a pattern that the new topic’s name matches. This way, the consumers can know about newly created topics within seconds. Figure 4. The life cycle of topic list watchers

Polling and notifications in parallel

The procedure for topic list updates described above involves multiple services and steps. For example, the metadata store needs to notify every broker, and those brokers need to process the notification and then update every consumer that is interested in the newly created topic. During the process, it is possible that a broker experiences an issue right after the topic is created. In such events, it will be unable to send notifications to consumers even though the topic was successfully created. Put simply, creating a topic and sending notifications is not an atomic operation. As a result, consumers can’t rely on notifications exclusively; they still must use the polling mechanism to make up for updates they did not get. Conveniently, missing notifications do not cause errors or inconsistencies in the consumer; they merely delay message processing in new topics until the consumer polls again for matching topics.

Conclusion

The enhancements described above will be available in Apache Pulsar release 2.11.0. They will address issues that Pulsar users at scale face in connection with regular expression-based subscriptions. First, applying the topic pattern on the broker side and omitting updates altogether in most cases will reduce network utilization significantly. Second, watchers provide an efficient way of discovering topics right after creation, thus eliminating the lag in processing the first messages produced to those topics.

More on Apache Pulsar

Pulsar has become one of the most active Apache projects over the past few years, with a vibrant community driving innovation and improvements to the project. Check out the following resources to learn more about Pulsar.

About author

Andras Beni

Andras Beni Andras Beni is a Platform Engineer at StreamNative. He is focusing on Apache Pulsar and has been working on data platforms based open source software in his latest positions.

newsletter

Keep up with Our Stream

Insights, news, and updates from the heart of our community.

Sign up successful

Welcome to the Stream!

Thank you for your interest. We've sent a confirmation link to your email.