Native Apache Kafka Service Is Coming Soon to StreamNative Cloud. Join the waitlist and get $1,000 in credits.

Join Waitlist >
StreamNative Logo
VideoMay 29, 202520 min

Building a Modern Streaming Data Pipeline with Apache Flink, Iceberg and Paimon

Unlock Instant Access

Complete the form to start watching.

Session Overview

Learn to create a modern streaming data pipeline using Apache Flink, Iceberg, and Paimon for real-time, scalable data processing. Enhance analytics today!

TL;DR

Traditional batch ETL pipelines are being outpaced by the demand for real-time data processing, and this session introduced a modern streaming data pipeline using Apache Flink, Iceberg, and Paimon. By integrating these technologies, businesses can achieve real-time, scalable, and cost-efficient data processing systems. The session highlighted the main benefits of this approach, including improved latency, cost efficiency, and data freshness.

Opening

In today’s data-driven world, businesses can no longer afford to rely on yesterday’s data. The pressing need for real-time analytics is driving the evolution from traditional batch ETL pipelines to advanced streaming architectures. As Abdul Rehman Zafar pointed out, modern enterprises require up-to-date insights to make informed decisions promptly. This shift is fueled by technologies like Apache Flink, Iceberg, and Paimon, which together enable a seamless integration of event streams and transactional data for real-time processing.

What You'll Learn (Key Takeaways)

  • Leveraging Kafka and MySQL for Real-Time Data Ingestion – Learn how to use Kafka for high-throughput, low-latency data ingestion and MySQL for effective lookup operations in a streaming pipeline.
  • Apache Flink’s Role in Streaming Pipelines – Discover how Flink’s support for both streaming and batch processing, along with its fault tolerance and exactly-once guarantees, makes it a cornerstone for modern pipelines.
  • Iceberg vs. Paimon – Understand the key differences between Apache Iceberg and Apache Paimon, particularly how Paimon’s native support for streaming workloads fills the gaps left by Iceberg’s batch-oriented design.
  • Real-World Applications – Explore how companies are utilizing streaming data lake architectures for enhanced reporting, machine learning, and operational analytics, offering practical insights into implementation.

Q&A Highlights

Q: How do you compare Paimon and Iceberg? A: Paimon is optimized for both batch and streaming workloads, offering native support for streaming data, which Iceberg lacks as it primarily supports batch processing with micro-batches.

Q: How can Fluz be used with Paimon in the pipeline? A: Fluz operates on top of Paimon, allowing for optimizations like auto-tuning and deduplication, making it suitable to replace Kafka entirely as both the source and sink in the pipeline.

Q: What additional features does Fluz offer over Paimon? A: Fluz provides streaming compaction, auto-tuning, and deduplication, offering an automated optimization layer over Paimon’s storage capabilities.

Q: How does Paimon compare to commercial streaming databases like TimePlus? A: Unlike TimePlus, which is a proprietary streaming database, Paimon is open source and supports both batch and streaming workloads, providing a more flexible and cost-effective solution.

About Speaker

Abdul Rehman Zafar

Abdul Rehman Zafar Abdul is a Senior Solutions Architect in Ververica with expertise in real-time Streaming Analytics. He is a strategic technical advisor of Ververica, helping customers solve complex data engineering c...