Build a Real-Time Data Pipeline with Flink & Paimon

Introducing the StreamNative AI Hub — Agent Engine, MCP Server & more.

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Deny Accept

Breakout session

20 min

Building a Modern Streaming Data Pipeline with Apache Flink, Iceberg and Paimon

Abdul Rehman Zafar

Resources

Download Slide Deck ↓

TL;DR

Traditional batch ETL pipelines are being outpaced by the demand for real-time data processing, and this session introduced a modern streaming data pipeline using Apache Flink, Iceberg, and Paimon. By integrating these technologies, businesses can achieve real-time, scalable, and cost-efficient data processing systems. The session highlighted the main benefits of this approach, including improved latency, cost efficiency, and data freshness.

Opening

In today’s data-driven world, businesses can no longer afford to rely on yesterday’s data. The pressing need for real-time analytics is driving the evolution from traditional batch ETL pipelines to advanced streaming architectures. As Abdul Rehman Zafar pointed out, modern enterprises require up-to-date insights to make informed decisions promptly. This shift is fueled by technologies like Apache Flink, Iceberg, and Paimon, which together enable a seamless integration of event streams and transactional data for real-time processing.

What You'll Learn (Key Takeaways)

Leveraging Kafka and MySQL for Real-Time Data Ingestion – Learn how to use Kafka for high-throughput, low-latency data ingestion and MySQL for effective lookup operations in a streaming pipeline.
Apache Flink’s Role in Streaming Pipelines – Discover how Flink’s support for both streaming and batch processing, along with its fault tolerance and exactly-once guarantees, makes it a cornerstone for modern pipelines.
Iceberg vs. Paimon – Understand the key differences between Apache Iceberg and Apache Paimon, particularly how Paimon’s native support for streaming workloads fills the gaps left by Iceberg’s batch-oriented design.
Real-World Applications – Explore how companies are utilizing streaming data lake architectures for enhanced reporting, machine learning, and operational analytics, offering practical insights into implementation.

Q&A Highlights

Q: How do you compare Paimon and Iceberg?
A: Paimon is optimized for both batch and streaming workloads, offering native support for streaming data, which Iceberg lacks as it primarily supports batch processing with micro-batches.

Q: How can Fluz be used with Paimon in the pipeline?
A: Fluz operates on top of Paimon, allowing for optimizations like auto-tuning and deduplication, making it suitable to replace Kafka entirely as both the source and sink in the pipeline.

Q: What additional features does Fluz offer over Paimon?
A: Fluz provides streaming compaction, auto-tuning, and deduplication, offering an automated optimization layer over Paimon’s storage capabilities.

Q: How does Paimon compare to commercial streaming databases like TimePlus?
A: Unlike TimePlus, which is a proprietary streaming database, Paimon is open source and supports both batch and streaming workloads, providing a more flexible and cost-effective solution.

Abdul Rehman Zafar

Senior Solutions Architect, Ververica

Abdul is a Senior Solutions Architect in Ververica with expertise in real-time Streaming Analytics. He is a strategic technical advisor of Ververica, helping customers solve complex data engineering challenges. Before working with Ververica, specialising in Cloud computing and Steaming Analytics, he worked in Amazon Web Services as a Solutions Architect. In AWS, he helped startups and enterprises in their journey toward the cloud and big data by building petabyte-scale data pipelines. He has over 15 years of diverse experience in various roles, from startups to enterprises, solving data and distributed system-related challenges.