Blog

Nov 9, 2020

35 min read

Benchmarking Pulsar and Kafka - The Full Benchmark Report - 2020

Penghui Li

Engineering Lead of Messaging Team, StreamNative

Sijie Guo

CEO and Co-Founder, StreamNative, Apache Pulsar PMC Member

Text Link

Apache Pulsar

Kafka

Get the updated Apache Pulsar vs. Apache Kafka 2022 Benchmark here.

Read below for details on the 2020 Benchmark.

Introduction

Having identified multiple issues in Confluent’s approach to evaluating various performance factors, we decided to repeat their benchmark on Pulsar and Kafka with some adjustments. We wanted to improve the accuracy of the test to facilitate more meaningful comparisons between the two systems. We also wanted to get a more comprehensive view, so we broadened the scope of our test to include additional performance measures and simulated real-world scenarios.

Our benchmark repeated Confluent’s original tests with the appropriate corrections and included all the durability levels supported by Pulsar and Kafka. As a result, we were able to compare throughput and latency at equivalent levels of durability. In addition, we benchmarked new performance factors and conditions, such as varying numbers of partitions, subscriptions, and clients. We also emulated real-world use cases by testing mixed workloads containing writes, tailing-reads, and catch-up reads.

In this report, we describe the tests we performed in detail and share our results and conclusions.

Content

Maximum Throughput Test

The following is the test setup.

We designed this test to determine the maximum throughput Pulsar and Kafka can achieve when processing workloads that consist of publish and tailing-reads. We varied the number of partitions to see how each change impacted throughput. Our test strategy included the following principles and expected guarantees:

Each message was replicated three times to ensure fault tolerance.
We varied the number of acknowledgements to determine the maximum throughput of each system under various replication durability guarantees.
We enabled batching for Kafka and Pulsar, batching up to 1 MB of data for a maximum of 10 ms.
We tested varying numbers of partitions—specifically, 1, 100, and 2000—to measure the maximum throughput for each condition.
When benchmarking the maximum throughput for 100 and 2000 partitions, we ran 2 producers and 2 consumers.
When benchmarking the maximum throughput for a single partition, we varied the number of producers and consumers to measure changes in throughput under different conditions.
We used a message size of 1 KB.
For each scenario, we tested the maximum throughput under various durability levels.

The following is the result for each test.

#1 100 partitions, 1 subscription, 2 producers and 2 consumers

Our first test benchmarked maximum throughput on Pulsar and Kafka with 100 partitions under two different durability guarantees. We used one subscription, two producers, and two consumers for each system. Our test results are described below.

When configured to provide Level-1 durability guarantees (sync replication durability, sync local durability), Pulsar achieved a maximum throughput of ~300 MB/s, which reached the physical limit of the journal disk’s bandwidth. Kafka was able to achieve ~420 MB/s with 100 partitions. It should be noted that when providing level-1 durability, Pulsar was configured to use one disk as journal disk for writes and the other disk as ledger disk for reads, comparing to Kafka use both disks for writes and reads. While Pulsar's setup is able to provide better I/O isolation, its throughput was also limited by the maximum bandwidth of a single disk (~300 MB/s). Alternative disk configurations can be beneficial to Pulsar and allow for more cost effective operation, which will be discussed in a later blog post.
When configured to provide Level-2 durability guarantees (sync replication durability, async local durability), Pulsar and Kafka each achieved a maximum throughput of ~600 MB/s. Both systems reached the physical limit of disk bandwidth.

graph pulsar and Kafka with 100 partitions — Figure 1 shows the maximum throughput on Pulsar and Kafka with 100 partitions under sync local durability.

Figure 1: Maximum throughput with 100 partitions on Pulsar and Kafka (sync local durability)

Figure 2 shows the maximum throughput on Pulsar and Kafka with 100 partitions under async local durability.

graph of Maximum throughput with 100 partitions on Pulsar and Kafka — Figure 2: Maximum throughput with 100 partitions on Pulsar and Kafka (async local durability)

#2 2000 partitions, 1 subscription, 2 producers and 2 consumers

Our second test benchmarked maximum throughput using the same durability guarantees (acks = 2) on Pulsar and Kafka. However, we increased the number of partitions from 100 to 2000. We used one subscription, two producers, and two consumers. Our test results are described below.

Pulsar’s maximum throughput remained at ~300 MB/s under a Level-1 durability guarantee and increased to ~600 MB/s under Level-2 durability.
Kafka’s maximum throughput decreased from 600 MB/s (at 100 partitions) to ~300 MB/s when flushing data for each message individually (kafka-ack-all-sync).
Kafka’s maximum throughput decreased from ~500 MB/s (at 100 partitions) to ~300 MB/s when using the system’s default durability settings (kafka-ack-all-nosync).

To understand why Kafka’s throughput dropped, we plotted the average publish latency for each system under each durability guarantee tested. As you can see in Figure 3, when the number of partitions increased to 2000, Kafka’s average publish latency increased to 200 ms and its 99th percentile publish latency increased to 1200 ms.

graph of Maximum throughput with 2000 partitions on Pulsar and Kafka — Figure 3: Maximum throughput with 2000 partitions on Pulsar and Kafka

Increased publish latency can significantly impact throughput. Latency did not affect throughput on Pulsar because Pulsar clients leverage Netty’s powerful asynchronous networking framework. However, latency did impact throughput on Kafka because Kafka clients use synchronous implementation. We were able to improve throughput on Kafka by doubling the number of producers. When we increased the number of producers to four, Kafka achieved a throughput of ~600 MB/s.

Figure 4 shows the publish latency for Pulsar and Kafka with 2000 partitions.

graph of publish latency with 2000 partitions on Pulsar and Kafka — Figure 4: Publish latency with 2000 partitions on Pulsar and Kafka

#3 1 partition, 1 subscription, 2 producers and 2 consumers

Adding more brokers and partitions helps increase throughput on both Pulsar and Kafka. To gain a better understanding of each system’s efficiency, we benchmarked maximum throughput using only one partition. For this test, we used one subscription, two producers, and two consumers.

We observed the following:

Pulsar achieved a maximum throughput of ~300 MB/s at all levels of durability.
Kafka achieved a maximum throughput of ~300 MB/s under async replication durability, but only ~160 MB/s under sync replication durability.

Figure 5 shows the maximum throughput on Pulsar and Kafka with one partition under sync local durability.

graph of Maximum throughput with 1 partition on Pulsar and Kafka — Figure 5: Maximum throughput with 1 partition on Pulsar and Kafka (sync local durability)

Figure 6 shows the maximum throughput on Pulsar and Kafka with one partition under async local durability.

#4 1 partition, 1 subscription, 1 producer and 1 consumer

We benchmarked maximum throughput on Pulsar and Kafka using only one partition and one subscription, as in the previous test. However, for this test, we used only one producer and one consumer (instead of two of each).

We observed the following:

Pulsar sustained a maximum throughput of ~300 MB/s at all durability levels.
Kafka’s maximum throughput decreased from ~300 MB/s (in Test #3) to ~230 MB/s under async replication durability.
Kafka’s throughput was dropped from ~160 MB/s (in Test #3) to ~100 MB/s under sync replication durability.

Figure 7 shows the maximum throughput Pulsar and Kafka achieved with one partition, one producer, and one consumer under sync local durability.

Graph of maximum throughput with 1 partition, 1 producer, 1 consumer on Pulsar and Kafka — Figure 7: Maximum throughput with 1 partition, 1 producer, 1 consumer on Pulsar and Kafka (sync local durability)

To understand why Kafka’s throughput dropped, we plotted the average publish latency (see Figure 8) and end-to-end latency (see Figure 9) for each system under different durability guarantees. As you can see from the graphics below, even with just one partition, Kafka’s publish and end-to-end latency increased from single-digit values to multiple hundreds of milliseconds. Reducing the number of producers and consumers greatly impacted Kafka’s throughput. In contrast, Pulsar consistently offered predictable low single-digit latency.

graph of publish latency with 1 partition and 1 producer and 1 consumer on Pulsar and Kafka — Figure 8: Publish latency with 1 partition and 1 producer and 1 consumer on Pulsar and Kafka (sync durability)

graph of End-to-end latency with 1 partition, 1 producer, and 1 consumer on Pulsar and Kafka — Figure 9: End-to-end latency with 1 partition, 1 producer, and 1 consumer on Pulsar and Kafka (sync durability)

Publish and End-to-End Latency Test

The following is the test setup.

The test was designed to determine the lowest latency each system can achieve when processing workloads that consist of publish and tailing-reads. We varied the number of subscriptions and the number of partitions to see how each change impacted both publish and end-to-end latency. Our test strategy included the following principles and expected guarantees:

Each message was replicated three times to ensure fault tolerance.
We varied the number of acknowledgments to measure variances in throughput using different replication durability guarantees.
We varied the number of subscriptions (from 1 to 10) to measure latency for each.
We varied the number of partitions (from 100 to 10000) to measure latency for each.
We used a message size of 1KB.
The producer sent messages at a fixed rate of 200000/s (~200 MB/s) and the tailing-read consumers processed the messages while the producer continued to send them.

The following is the result for each test.

#1 100 partitions, 1 subscription

We started with 100 partitions and 1 subscription to benchmark the lowest latency Pulsar and Kafka can achieve under all different durability guarantees.

Our test showed Pulsar’s publish and end-to-end latency to be two to five times lower than Kafka’s at all levels of durability. You can see the actual test results in Table 1.

Table 1: Actual publish and end-to-end latency test results for Pulsar and Kafka by durability level

Publish Latency - Sync Local Durability

Figure 10 shows the differences in publish latency between Pulsar and Kafka using two replication durability settings (ack-1 and ack-2, respectively) and sync local durability. Table 2 shows the exact latency numbers for each case. As you can see, Pulsar’s 99th percentile latency is three times lower than Kafka’s under async replication durability (ack-1) and five times lower under sync replication durability (ack-2).

graph of Publish latency on Pulsar and Kafka — Figure 10: Publish latency on Pulsar and Kafka (with data sync)

‍

table actual publish latency test results on Pulsar and Kafka — Table 2: Actual publish latency test results on Pulsar and Kafka (with data sync)

To gain a better understanding of how latency changes over the time, we plotted the 99th percentile publish latency for Pulsar and Kafka using various replication durability settings. As you can see in Figure 11, Pulsar’s latency stayed consistent (~5 ms) but Kafka’s latency was spiky. Stable and consistently low latency is crucial to mission-critical services.

Figure 11: 99th percentile publish latency on Pulsar and Kafka

End-to-End Latency - Sync Local Durability

Figure 12 shows the differences in end-to-end latency between Pulsar and Kafka using two replication durability settings (ack-1 and ack-2, respectively) and sync local durability. Table 3 shows the exact latency numbers for each case. As you can see, Pulsar’s 99th percentile end-to-end latency was three times lower than Kafka’s under async replication durability (ack-1) and five times lower under sync replication durability (ack-2).

graph End-to-end latency with 1 subscription on Pulsar and Kafka — Figure 12: End-to-end latency with 1 subscription on Pulsar and Kafka (with data sync)

table Actual end-to-end latency test results with 1 subscription on Pulsar and Kafka — Table 3: Actual end-to-end latency test results with 1 subscription on Pulsar and Kafka (with data sync)

Publish Latency - Async Local Durability

Figure 13 shows the differences in publish latency between Pulsar and Kafka using two replication durability settings (ack-1 and ack-2, respectively) and async local durability. Table 4 shows the exact latency numbers for each case. As you can see, Kafka performed better in the async replication durability (ack-1) case. But Pulsar’s 99th percentile publish latency stayed consistent (below 5 ms) and increasing the replication durability guarantee (from ack-1 to ack-2) did not impact latency. However, Kafka’s 99th percentile publish latency with sync replication durability (ack-2) was much higher than Pulsar’s.

graph Publish latency on Pulsar and Kafka — Figure 13: Publish latency on Pulsar and Kafka (without data sync)

table actual publish latency test results on Pulsar and Kafka (without data sync) — Table 4: Actual publish latency test results on Pulsar and Kafka (without data sync)

To gain a better understanding of how publish latency changes over the time, we plotted the 99th percentile publish latency for Pulsar and Kafka under various replication durability settings. As you can see in Figure 14, Pulsar’s latency stayed consistently low (below 5 ms) and Kafka’s was about two times of Pulsar’s with sync replication durability.

graph 99th percentile publish latency on Pulsar and Kafka — Figure 14: 99th percentile publish latency on Pulsar and Kafka under various replication durability settings

End-to-End Latency - Async Local Durability

Figure 15 shows the differences in end-to-end latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and async local durability. Table 5 shows the exact latency numbers for each case. As you can see, Pulsar performed consistently better than Kafka in all cases. Pulsar’s 99th percentile end-to-end latency stayed consistent (~ 5 ms) and varying the replication durability setting had no impact. Kafka’s 99th percentile end-to-end latency was 1.5 times higher than Pulsar’s for ack-1 and 2 times higher for ack-2.

#2 100 partitions, 10 subscriptions

Once we understood how Pulsar and Kafka performed with just one subscription, we wanted to see how varying the number of subscriptions affected publish and end-to-end latency. So, we increased the number of subscriptions from 1 to 10 and assigned 2 consumers to each subscription.

As you can see from the details in Table 6, our test results showed the following:

Pulsar’s 99th percentile publish and end-to-end latency stayed between 5 and 10 ms.
Kafka’s 99th percentile publish and end-to-end latency were greatly impacted by increasing the number of subscriptions and went up to multiple seconds.

table Publish and end-to-end latency test results with 10 subscriptions — Table 6: Publish and end-to-end latency test results with 10 subscriptions

Publish Latency: Sync Local Durability

Figure 16 shows the differences in publish latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and sync local durability. Table 7 shows the exact latency numbers for each case. As you can see, Pulsar’s 99th percentile publish latency was still three times lower than Kafka’s under async replication durability (ack-1). But under sync replication durability (ack-2), Pulsar’s publish latency was 160 times lower than Kafka’s (as compared to 5 times lower with only one subscription).

Figure 16: Publish latency with 10 subscriptions on Pulsar and Kafka (with data sync)

table actual publish latency test results with 10 subscriptions on Pulsar and Kafka (with data sync) — Table 7: Actual publish latency test results with 10 subscriptions on Pulsar and Kafka (with data sync)

End-to-End Latency - Sync Local Durability

Figure 17 shows the differences in end-to-end latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and sync local durability. Table 8 shows the exact latency numbers for each case. As you can see, Pulsar’s 99th percentile latency was 20 times lower than Kafka’s under async replication durability (ack-1) and 110 times lower under sync replication durability (ack-2).

graph end-to-end latency with 10 subscriptions on Pulsar and Kafka (with data sync) — Figure 17: End-to-end latency with 10 subscriptions on Pulsar and Kafka (with data sync)

tabs actual end-to-end latency test results with 10 subscriptions on Pulsar and Kafka (with data sync) — Table 8: Actual end-to-end latency test results with 10 subscriptions on Pulsar and Kafka (with data sync)

Publish Latency - Async Local Durability

Figure 18 shows the differences in publish latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and async local durability. Table 9 shows the exact latency numbers for each case. As you can see, Pulsar outperformed Kafka significantly. Pulsar’s average publish latency was ~3 ms and its 99th percentile latency was within 5 ms. Kafka’s performance was satisfactory under async replication durability (ack-1), but significantly worse under sync replication durability (ack-2). Kafka’s 99th percentile publish latency under sync replication durability was 270 times higher than Pulsar’s.

graph publish latency quantiles — Figure 18: Publish latency with 10 subscriptions on Pulsar and Kafka (without data sync)

‍

tabs actual publish latency test results with 10 subscriptions on Pulsar and Kafka — Table 9: Actual publish latency test results with 10 subscriptions on Pulsar and Kafka (without data sync)

End-to-End Latency - Async Local Durability

Figure 19 shows the differences in end-to-end latency between Pulsar and Kafka under two replication durability settings (ack-1 and ack-2, respectively) and async local durability. Table 10 shows the exact latency numbers for different cases. As you can see, Pulsar performed consistently better than Kafka in all cases. Pulsar’s end-to-end latency consistently stayed between 4 and 7 ms and varying the replication durability setting had no impact. Kafka’s 99th percentile end-to-end latency was 13 times higher than Pulsar’s for ack-1 and 187 times higher for ack-2.

graph end to end latency — Figure 19: End-to-end latency with 10 subscriptions on Pulsar and Kafka (without data sync)

‍

tabs of Actual end-to-end latency test results with 10 subscriptions — Table 10: Actual end-to-end latency test results with 10 subscriptions (without data sync)

#3 100, 5000, 8000, 10000 partitions

Having learned how varying the number of subscriptions affects publish latency in both Pulsar and Kafka, we wanted to vary the number of partitions and observe the effects. So, we increased the number of partitions in increments from 100 to 10000 and looked for changes.

As you can see from the details in Table 11, our test results showed the following:

Pulsar’s 99th percentile publish latency remained stable at ~5 ms when the number of partitions increased.
Kafka’s 99th percentile publish latency was greatly impacted by incremental increases in the number of partitions and went up to multiple seconds.
When the number of partitions exceeded 5000, Kafka’s consumer was unable to keep up with the publish throughput.

tabs ctual publish latency test results with varying acks and durability — Table 11: Actual publish latency test results with varying acks and durability

Ack = 1, Sync local durability

Figure 20 and Figure 21 show the differences in publish and end-to-end latency, respectively, between Pulsar and Kafka when varying the number of partitions under sync local durability and async replication durability (ack = 1).

‍

graph end to end latency quantiles — Figure 21: End-to-end latency with varying numbers of partitions and 1 ack (with data sync)

Table 12 shows our actual publish latency test results with varying numbers of partitions and one acknowledgement.

table actual end-to-end latency test results with varying numbers of partitions — Table 13 shows our actual end-to-end latency test results with varying numbers of partitions and one acknowledgement.

Figure 22 shows end-to-end latency with varying numbers of partitions and one acknowledgment on Pulsar. Figure 23 shows end-to-end latency with varying numbers of partitions and one acknowledgement on Kafka.

graph End-to-end latency with varying numbers of partitions and 1 ack on Pulsar — Figure 22: End-to-end latency with varying numbers of partitions and 1 ack on Pulsar (with data sync)

Figure 23: End-to-end latency with varying numbers of partitions and 1 ack on Kafka (with data sync)

As you can see from the figures and tables above,

Pulsar’s 99th percentile publish latency remained stable at ~5 ms. Varying the number of partitions had no effect.
Pulsar’s 99th percentile end-to-end latency remained stable at ~6 ms. Varying the number of partitions had no effect.
Kafka’s 99th percentile publish latency degraded incrementally as the number of partitions increased and was 5 times higher at 10000 partitions (as compared to 100). This was 10 times higher than Pulsar’s.
Kafka’s 99th percentile end-to-end latency degraded incrementally as the number of partitions increased and was 10000 times higher at 10000 partitions (as compared to 100). Kafka’s 99th percentile end-to-end latency at 10000 partitions increased to 180 s and was 280000 times higher than Pulsar’s.

Ack = 2, Sync local durability

Figure 24 shows the differences in publish latency between Pulsar and Kafka when varying the number of partitions under sync local durability and sync replication durability (ack = 2). Table 14 shows the exact latency numbers for each case.

Figure 24: Publish latency with varying numbers of partitions and all/2 ack (with data sync)

table Actual publish latency test results with varying numbers of partitions and all/2 ack — Table 14: Actual publish latency test results with varying numbers of partitions and all/2 ack (with data sync)

Figure 25 and Figure 26 show how varying the number of partitions affects end-to-end latency in Pulsar and Kafka, respectively.

graph End-to-end latency with varying numbers of partitions and 2 acks on Pulsar — Figure 25: End-to-end latency with varying numbers of partitions and 2 acks on Pulsar (with data sync)

graph End-to-end latency with varying numbers of partitions and 2 acks on Kafka — Figure 26: End-to-end latency with varying numbers of partitions and 2 acks on Kafka (with data sync)

As you can see from the figures and tables above,

Pulsar’s 99th percentile publish latency remained stable at ~10 ms. Increasing the number of partitions had no effect. Kafka’s 99th percentile publish latency degraded incrementally as the number of partitions increased and was 30 times higher at 10000 partitions (as compared to 100). Kafka’s 99th percentile publish latency at 10000 partitions increased to 1.7 s and was 126 times higher than Pulsar’s.
Pulsar’s 99th percentile end-to-end latency remained stable at ~10 ms. Increasing the number of partitions had only a slight impact on Pulsar’s 99th percentile end-to-end latency. But even at 10000 partitions, it remained relatively low at ~50 ms.
Kafka’s 99th percentile end-to-end latency degraded incrementally as the number of partitions increased. At 10000 partitions, - Kafka’s 99th percentile end-to-end latency increased to 200 s and was 14771 times higher than Pulsar’s.

Ack = 1, Async local durability

Figure 27 shows the publish latency difference between Pulsar and Kafka when varying the number of partitions under async local durability and async replication durability (ack = 1). Table 15 shows the exact latency numbers for different cases.

graph Publish latency with varying numbers of partitions and 1 ack — Figure 27: Publish latency with varying numbers of partitions and 1 ack (without data sync)

‍

Table 15: Actual publish latency test results with varying numbers of partitions and 1 ack (without data sync)

Figure 28 and Figure 29 show how varying the number of partitions affected end-to-end latency in Pulsar and Kafka, respectively.

graph End-to-end latency with varying numbers of partitions and 1 ack on Kafka — Figure 29: End-to-end latency with varying numbers of partitions and 1 ack on Kafka (without data sync)

As you can see from the above figures and tables,

Pulsar’s 99th percentile publish latency remained stable at ~4 to ~5 ms. Increasing the number of partitions had no impact.
Kafka’s 99th percentile publish latency degraded incrementally as the number of partitions increased and was 13 times higher at 10000 partitions (as compared to 100). Kafka’s 99th percentile publish latency at 10000 partitions increased to 41 ms and was 8 times higher than Pulsar’s.
Pulsar’s 99th percentile end-to-end latency remained stable at ~4 to ~6 ms. Increasing the number of partitions had a slight impact on Pulsar’s 99.9th percentile end-to-end latency but, even at 10000 partitions, it stayed relatively low (within 24 ms).
Kafka’s 99th percentile end-to-end latency degraded incrementally as the number of partitions increased. At 10000 partitions, Kafka’s 99th percentile end-to-end latency went up to 180 s and was 34416 times higher than Pulsar’s.

Ack = 2, Async local durability

Figure 30 shows the differences in publish latency between Pulsar and Kafka when varying the number of partitions under async local durability and sync replication durability (ack = 2). Table 16 shows the exact latency numbers for each case.

graph Publish latency with varying numbers of partitions and all/2 ack — Figure 30: Publish latency with varying numbers of partitions and all/2 ack (without data sync)

Table 16: Actual publish latency test results with varying numbers of partitions and all/2 ack (without data sync)

Figure 31 and Figure 32 show how varying the number of partitions affected end-to-end latency on Pulsar and Kafka, respectively.

graph End-to-end latency with varying numbers of partitions with 2 acks on Kafka — Figure 32: End-to-end latency with varying numbers of partitions with 2 acks on Kafka (without data sync)

As you can see:

Pulsar’s 99th percentile publish latency remained stable at ~4 to ~5 ms. Increasing the number of partitions had no impact.
Kafka’s 99th percentile publish latency degraded incrementally as the number of partitions increased and was 202 times higher at 10000 partitions (as compared to 100). At 10000 partitions, Kafka’s 99th percentile publish latency increased to 1.7 s and was 278 times higher than Pulsar’s.
Pulsar’s 99th percentile end-to-end latency remained stable at ~4 to ~6 ms. Increasing the number of partitions had a slight impact on Pulsar’s 99.9th percentile end-to-end latency, but it remained relatively low within 28 ms.
Kafka’s 99th percentile end-to-end latency degraded incrementally as the number of partitions increased. At 10000 partitions, Kafka’s 99th percentile end-to-end latency increased to 200 s and was 32362 times higher than Pulsar’s.

Catch-up Read Test

The following is the test setup.

The test was designed to determine the maximum throughput Pulsar and Kafka can achieve when processing workloads that contain catch-up reads only. Our test strategy included the following principles and expected guarantees:

Each message was replicated three times to ensure fault tolerance.
We varied the number of acknowledgements to measure changes in throughput under various replication durability guarantees.
We enabled batching on Kafka and Pulsar, batching up to 1 MB of data for a maximum of 10 ms.
We benchmarked both systems with 100 partitions.
We ran a total of four clients—two producers and two consumers.
We used a message size of 1 KB.

At the beginning of the test, the producer started sending messages at a fixed rate of 200 K/s. When 512 GB of data had accumulated in a queue, the consumers began processing. The consumers first read the accumulated data from beginning to end, and then went on to process incoming data as it arrived. The producer continued to send messages at the same rate for the duration of the test.

We evaluated how quickly each system was able to read the 512 GB of backlog data. We compared Kafka and Pulsar under different durability settings.

The following is the result for each test.

#1 Async local durability with Pulsar’s journal bypassing feature enabled

In this test, we used equivalent async local durability guarantees on Pulsar and Kafka. We enabled Pulsar’s new journal bypassing feature to match the local durability guarantee provided by Kafka’s default fsync settings.

As you can see from our results shown in Figure 33 below,

Pulsar’s maximum throughput reached 3.7 million messages/s (3.5 GB/s) when processing catch-up reads only.
Kafka’s only reached a maximum throughput of 1 million messages/s (1 GB/s).
Pulsar processed catch-up reads 75% faster than Kafka.

graph Catch-up read throughput on Pulsar and Kafka — Figure 33: Catch-up read throughput on Pulsar and Kafka (bypass journaling)

#2 Async local durability with Pulsar’s journal bypassing feature disabled

In this test, we used async local durability guarantees on Pulsar and Kafka, but disabled Pulsar’s journal bypassing feature.

As you can see from our results shown in Figure 34 below,

Pulsar’s maximum throughput when processing catch-up reads reached 1.8 million messages/s (1.7 GB/s).
Kafka only reached a maximum throughput of 1 million messages/s (1 GB/s).
Pulsar processed catch-up reads twice as fast as Kafka.

Figure 34: Catch-up read throughput on Pulsar and Kafka (with data sync)

#3 Sync local durability

In this test, we compared Kafka and Pulsar under equivalent sync local durability guarantees.

As you can see from our results shown in Figure 35 below,

Pulsar’s maximum throughput reached 1.8 million messages/s (1.7 GB/s) when processing catch-up reads only. Kafka only reached a maximum throughput of 1 million messages/s (1 GB/s). Pulsar processed catch-up reads twice as fast as Kafka.

Mixed Workload Test

The following is the test setup.

This test was designed to evaluate how catch-up reads affect publish and tailing-reads in mixed workloads. Our test strategy included the following principles and expected guarantees:

Each message was replicated three times to ensure fault tolerance.
We enabled batching for Kafka and Pulsar, batching up to 1 MB of data for a maximum of 10 ms.
We benchmarked both systems with 100 partitions.
We compared Kafka and Pulsar under different durability settings.
We ran a total of four clients—two producers and two consumers.
We used a message size of 1 KB.

At the beginning of the test, both producers started sending data at a fixed rate of 200 K/s and one of the two consumers began processing tailing-reads immediately. When 512 GB of data had accumulated in a queue, the other (catch-up) consumer began reading the accumulated data from beginning to end, and then went on to process incoming data as it arrived. For the duration of the test, both producers continued to publish at the same rate and the tailing-read consumer continued to consume data at the same rate.

The following is the result for each test.

#1 Async local durability with Pulsar enabled bypass-journal feature

In this test, we compared Kafka and Pulsar with equivalent async local durability guarantees. We enabled Pulsar’s new journal bypassing feature to match the local durability guarantee provided by Kafka’s default fsync settings.

As you can see in Figure 36 below, catch-up reads caused significant write delays in Kafka, but had little impact on Pulsar. Kafka’s 99th percentile publishing latency increased to 1-3 seconds while Pulsar’s remained steady at several milliseconds to tens of milliseconds.

graph Effect of catch-up reads on publish latency on Pulsar and Kafka — Figure 36: Effect of catch-up reads on publish latency on Pulsar and Kafka (bypass journaling)

#2 Async local durability with Pulsar’s journal bypassing feature disabled

In this test, we used async local durability guarantees on Pulsar and Kafka, but disabled Pulsar’s journal bypassing feature.

As you can see in Figure 37 below, catch-up reads caused significant write delays on Kafka, but had little impact on Pulsar. Kafka’s 99th percentile publishing latency increased to 2-3 seconds while Pulsar’s remained steady at several milliseconds to tens of milliseconds.

#3 Sync local durability

In this test, we compared Kafka and Pulsar with equivalent sync local durability guarantees.

As you can see in Figure 38 below, catch-up reads caused significant write delays on Kafka, but had little impact on Pulsar. Kafka’s 99th percentile publishing latency increased to ~1.2 to ~1.4 seconds while Pulsar’s remained steady at several milliseconds to tens of milliseconds.

Conclusions

Below is a summary of our findings based on the results of our benchmark.

After the configuration and tuning errors were corrected, Pulsar matched the end-to-end latency Kafka had achieved in Confluent’s limited use case.
Under equivalent durability guarantees, Pulsar outperformed Kafka in workloads that simulated real-world use cases.
Pulsar delivered significantly better latency and better I/O isolation than Kafka in every test, regardless of the durability guarantee settings used or the numbers of subscriptions, partitions, or clients specified.

Penghui Li

Penghui Li is passionate about helping organizations to architect and implement messaging services. Prior to StreamNative, Penghui was a Software Engineer at Zhaopin.com, where he was the leading Pulsar advocate and helped the company adopt and implement the technology. He is an Apache Pulsar Committer and PMC member. Penghui lives in Beijing, China.

Sijie Guo

Sijie’s journey with Apache Pulsar began at Yahoo! where he was part of the team working to develop a global messaging platform for the company. He then went to Twitter, where he led the messaging infrastructure group and co-created DistributedLog and Twitter EventBus. In 2017, he co-founded Streamlio, which was acquired by Splunk, and in 2019 he founded StreamNative. He is one of the original creators of Apache Pulsar and Apache BookKeeper, and remains VP of Apache BookKeeper and PMC Member of Apache Pulsar. Sijie lives in the San Francisco Bay Area of California.

Show all

Blog

Apr 11, 2024

5 min read