Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Blog

August 12, 2025

6 min read

Pulsar Newbie Guide for Kafka Engineers (Part 3): Ledgers & Bookies

Penghui Li

Director of Streaming, StreamNative & Apache Pulsar PMC Member

Hang Chen

Director of Storage, StreamNative & Apache Pulsar PMC Member

Neng Lu

Director of Platform, StreamNative

TL;DR

This post dives into how Pulsar stores data using Apache BookKeeper – a departure from Kafka’s file-based storage per broker. Pulsar uses bookies (storage nodes) and ledgers (append-only logs spread across bookies) to durably persist messages. We’ll explain what ledgers are (think of them like distributed log segments) and how Pulsar brokers write to and read from bookies. The result is a decoupled architecture: brokers are stateless serving layers, and bookies handle data persistence and replication. Kafka engineers will learn how Pulsar achieves durability and high availability through this two-layer design.

Pulsar’s Storage Architecture vs Kafka’s

In Apache Kafka, each broker stores the data (messages) for the partitions it owns on its local disk. Replication is broker-to-broker; each partition has leader and follower brokers. In Apache Pulsar, the approach is different: brokers do NOT keep long-term data on their local disk. Instead, Pulsar leverages Apache BookKeeper as a separate storage layer. Brokers in Pulsar act more like stateless proxy/dispatchers, while bookies (BookKeeper servers) provide the durable storage.

Let’s break it down:

A bookie is a process (and node) in a Pulsar cluster whose job is to store data. It’s part of the BookKeeper ensemble. Think of bookies as analogous to Kafka brokers in terms of holding data, but they don’t talk to clients – they only store and serve data to Pulsar brokers.
A ledger is BookKeeper’s term for a log, similar to a Kafka log segment but distributed. More formally, “A ledger is an append-only data structure with a single writer that is assigned to multiple BookKeeper storage nodes, or bookies”. Each ledger is replicated to some number of bookies (by default, 2 or 3 copies depending on config).
Pulsar topics are composed of one or more ledgers. Over time, as one ledger fills or a time/size threshold is reached, the broker will create a new ledger for the topic. Earlier ledgers can be deleted when they’re fully consumed (if no longer needed due to retention).

Analogy: If you’re a Kafka engineer, imagine if each partition’s log was broken into segments (Kafka already does that) but each segment could live on different nodes from the broker and is replicated independently. That’s roughly how Pulsar uses ledgers. The broker writes a batch of messages to a ledger (which goes to multiple bookies) and when done, it can move to a new ledger. The metadata of which ledgers form a topic is stored (in ZooKeeper or metadata store). Consumers don’t need to know about ledgers; they just ask the broker for data, and the broker reads from the bookies as needed.

Why go to this trouble? Decoupling storage from brokers has several benefits:

Independent Scaling: Need more storage? Add more bookies; need more throughput/connection handling? Add more brokers. In Kafka, adding a broker adds both storage and serving capacity, but you have to rebalance data to use it. Pulsar can instantly use a new bookie for new ledgers without moving any existing data, and new brokers can start serving immediately by coordinating with existing bookies.
No Single Broker Hotspot for a Topic: In Kafka, one partition’s writes are handled by one broker at a time (the leader). In Pulsar, a topic’s ledgers could be on many bookies, and writes go to multiple bookies in parallel (though orchestrated by the broker). Read throughput can also be spread if needed.
Seamless Broker Restarts: If a Pulsar broker goes down, another broker can take over serving a topic by reading the ledgers from bookies – since the data wasn’t on the broker’s disk exclusively. In Kafka, if a broker goes down, partitions it hosted need a leader election and the new leader may serve stale data until it catches up, etc.

Ledgers: How They Work

A ledger is written by a Pulsar broker and replicated by BookKeeper to a set of bookies. By default, Pulsar might use a quorum of 2 or 3 bookies for each ledger (configurable with ensemble size, write quorum, ack quorum). For example, a common configuration is an ensemble of 3, write quorum 2, ack quorum 2 – meaning each entry (message) is written to 2 of 3 bookies and considered committed when 2 have written it (this gives one bookie tolerance for failure).

Key properties of ledgers:

Only one writer (the broker) appends to a ledger. There’s no contention on writes – similar to how a Kafka partition is written by one leader.
Once a ledger is closed (either the broker closes it or the broker crashes, triggering a recovery close), it becomes read-only. No more writes happen to that ledger. This is like Kafka log segments being immutable once closed.
Ledgers have a unique ID and are internally stored on bookies with that ID. The mapping of which ledgers belong to a topic is stored in metadata (ZooKeeper) as a list.

When a consumer asks for data, the Pulsar broker handling that topic will read from the ledgers. If the data is recent, chances are it might still be in broker cache (Pulsar brokers cache recent entries in memory for speed). If not, the broker retrieves entries from the bookies. This process is transparent to the client.

Importantly, cursor (subscription) positions are also stored in BookKeeper as special ledgers (called cursor or managed ledger metadata). This means the acked positions are durable and if a broker or client crashes, the subscription doesn’t lose track of where it was.

Caption: A Pulsar cluster with brokers (serving layer) and bookies (storage layer). Brokers publish entries to BookKeeper ledgers on multiple bookies, ensuring durability and replication. Consumers can connect to any broker to fetch data; the broker will read from bookies as needed. This architecture decouples message storage from the brokers’ lifecycle and load, enabling Pulsar’s horizontal scalability.

Some concrete numbers: BookKeeper is designed to handle many ledgers concurrently – thousands or more – and spread them across bookies. By using multiple disks (one for write-ahead journal, and others for storage) on each bookie, it can handle heavy concurrent writes and reads without the two interfering too much. This is why Pulsar can have very high throughput with many topics: the writes aren’t bottlenecked on a single disk or single node.

Lifecycle of a Topic’s Data

Consider a Pulsar topic (non-partitioned for simplicity). When messages start coming in:

First Ledger: The broker creates a new ledger for that topic (say ledger ID 101). BookKeeper assigns 2 or 3 bookies to store it (e.g., bookies A, B, C with a quorum of 2). The broker writes each message to bookies A and B (for example) and once both have it, the message is considered persisted. This happens for all messages until perhaps a threshold.
Ledger Rollover: Pulsar may roll over to a new ledger after some conditions – e.g., a certain number of entries or time or broker restart. Let’s say after 1 million messages or a few minutes, ledger 101 is closed and ledger 102 is created on possibly a different set of bookies (could be A, C, D).
Subsequent Ledgers: This continues; a topic will have a sequence of ledgers. If a broker crashes in the middle of writing ledger 102, BookKeeper can recover that ledger (finish it) and a new broker will open a new ledger 103 to continue from the last position.
Deletion of Ledgers: If topic data expires due to retention or all messages in a ledger have been acknowledged by all subscriptions, Pulsar can delete that ledger to free storage. Essentially, older ledgers get trimmed off the “log” once they are not needed. This is similar in effect to Kafka’s log retention deletion, but happens at a ledger granularity.

Ledgers thus allow Pulsar to have an infinite log without one huge file: old ledgers can be removed and new ledgers added. Also, if using tiered storage (another feature), older ledgers can even be offloaded to cold storage (like S3) seamlessly, since they are self-contained units of data.

BookKeeper vs Kafka Storage: Summing up Differences

Disaggregated vs Integrated: Pulsar+BookKeeper is a two-layer system (compute/serve vs storage), whereas Kafka is one-layer (brokers do both). This means Pulsar can scale storage independently and recover faster from broker failure since data lives separately.
Fine-grained Replication: Every message in Pulsar is written to multiple bookies immediately (satisfying the ack quorum). In Kafka, the leader writes to its disk then sends to followers; there can be a replication lag. Pulsar’s BookKeeper replication is effectively parallel writes – which can reduce the window of data loss if a node dies (as long as ack quorum was met, data is safe).
Throughput and Ordering: In Kafka, a single partition is served by one broker, and order is per partition. In Pulsar, order is maintained per topic (or partition) as well, but the data serving could come from multiple bookies to one or more brokers. Pulsar ensures ordering by having the broker orchestrate reads/writes of a topic’s ledgers in sequence. You won’t get out-of-order messages because the broker knows the ledger sequence and the entry IDs within ledgers.
Managed Ledger Abstraction: Pulsar introduces the concept of a “managed ledger” – this is an abstraction that represents the log for a topic, composed of one or more ledgers internally. The managed ledger handles creating new ledgers, closing them, and keeping track of cursors (subscription positions) in those ledgers. Kafka doesn’t expose anything like this because it doesn’t need to – the log is just the file on disk. But Pulsar’s managed ledger is a powerful component that handles a lot of complexity for you (e.g., it knows when a ledger can be deleted because all subscriptions have passed it).

Operational Notes for Kafka Folks

Bookie failures: If a bookie dies, any ledgers that had entries on that bookie still have at least one other copy (if replication was set to >=2). BookKeeper will auto-replicate the missing fragments to another bookie to re-establish the desired redundancy. This is somewhat analogous to Kafka’s leader election and replication catch-up, but at a fragment level. As an operator, replacing a bookie doesn’t require moving whole topics manually; BookKeeper handles re-replication of just the missing data fragments.
Adding bookies: New bookies will start receiving ledger fragments for new ledgers immediately (Pulsar will typically not move existing ledger data onto them, which is fine as new data balances out). This is much less work than Kafka reassigning lots of partitions to a new broker.
Monitoring: Monitor bookie disk usage, and bookie journals. If a bookie becomes slow or falls behind, it could bottleneck writes. Pulsar brokers expose metrics and stats for managed ledgers and bookie client performance. Tools like pulsar-admin brokers healthcheck and BookKeeper’s own metrics (like ledger count, journal latency) are valuable.

By now, hopefully the mysterious terms “bookies and ledgers” have solid meaning. You can see that Pulsar’s durability and streaming prowess come from BookKeeper under the hood. It provides Pulsar with the ability to handle per-message replication and storage in a very scalable way.

In the next part, we will shift gears to the consumption side of Pulsar – Subscriptions & Consumers. This is where Pulsar’s flexibility (multiple subscription modes) shows how it can act like Kafka or a traditional messaging system depending on what you need.

Key Takeaways

Pulsar offloads data storage to Apache BookKeeper. Brokers are stateless serving nodes, while bookies store message data. This decoupling is a fundamental difference from Kafka’s monolithic broker storage.
Ledgers are BookKeeper’s unit of storage – think of them as distributed log segments. A Pulsar topic’s data consists of a sequence of ledgers, each replicated to multiple bookies for durability. This provides built-in multi-copy storage and automatic failover at the storage layer.
Pulsar achieves durability by writing messages to multiple bookies (e.g., 2-3 copies) before acknowledging the producer. This is analogous to Kafka’s ACKS=all with min.insync.replicas, but Pulsar’s design makes every write go to a quorum of bookies in parallel.
Brokers do not hold persistent data, so recovery from failures is fast – any broker can serve a topic by reading from the ledgers on bookies. In Kafka, a broker failure means ts partitions need a new leader and possibly data catch-up; in Pulsar, the data was never lost (other brokers can pick up where it left off).
Operating Pulsar involves managing both brokers and bookies. The complexity of BookKeeper is mostly hidden from the end user, but understanding it helps in tuning (e.g., ledger sizes, number of bookie nodes). For a Kafka engineer, the concept of separating serving and storage may be new, but it’s the backbone of Pulsar’s scalability and reliability.

‍

------------------------------------------------------------------------------------------------

‍

Want to go deeper into real-time data and streaming architectures? Join us at the Data Streaming Summit San Francisco 2025 on September 29–30 at the Grand Hyatt at SFO.

30+ sessions | 4 tracks | Real-world insights from OpenAI, Netflix, LinkedIn, Paypal, Uber, AWS, Google, Motorq, Databricks, Ververica, Confluent & more!

[Explore the Full Agenda]

[Register Now]

‍

This is some text inside of a div block.

Button Text

Penghui Li

Penghui Li is passionate about helping organizations to architect and implement messaging services. Prior to StreamNative, Penghui was a Software Engineer at Zhaopin.com, where he was the leading Pulsar advocate and helped the company adopt and implement the technology. He is an Apache Pulsar Committer and PMC member.

Hang Chen

Hang Chen, an Apache Pulsar and BookKeeper PMC member, is Director of Storage at StreamNative, where he leads the design of next-generation storage architectures and Lakehouse integrations. His work delivers scalable, high-performance infrastructure powering modern cloud-native event streaming platforms.

Neng Lu

Neng Lu is currently the Director of Platform at StreamNative, where he leads the engineering team in developing the StreamNative ONE Platform and the next-generation Ursa engine. As an Apache Pulsar Committer, he specializes in advancing Pulsar Functions and Pulsar IO Connectors, contributing to the evolution of real-time data streaming technologies. Prior to joining StreamNative, Neng was a Senior Software Engineer at Twitter, where he focused on the Heron project, a cutting-edge real-time computing framework. He holds a Master's degree in Computer Science from the University of California, Los Angeles (UCLA) and a Bachelor's degree from Zhejiang University.

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.