Native Apache Kafka Service Is Coming Soon to StreamNative Cloud. Join the waitlist and get $1,000 in credits.

Join Waitlist >
StreamNative Logo
VideoMay 29, 202530 min

Unlock Next-Gen Stateful Streaming in Apache Spark with transformWithState and Streaming Platforms

Unlock Instant Access

Complete the form to start watching.

Session Overview

Master real-time data processing with Apache Spark's transformWithState. Build scalable, low-latency streaming apps for fraud detection & analytics.

TL;DR

The session focuses on the challenge of integrating flexible, scalable stateful streaming solutions in real-time data processing. The main solution presented is Apache Spark's new operator, transformWithState, which simplifies stateful processing by overcoming previous limitations and integrating seamlessly with platforms like Apache Pulsar and Kafka. Key benefits include the ability to build sophisticated, low-latency streaming applications with practical use cases like real-time fraud detection and session-based analytics.

Opening

Imagine a bustling ghost kitchen, where multiple restaurants share a space to fulfill food orders. These kitchens generate a wealth of real-time data, from order intake to food preparation and delivery logistics. The challenge lies in processing this data efficiently to ensure timely deliveries and enhance customer satisfaction. Jay Palaniappan, a Senior Solutions Architect at Databricks, used this scenario to introduce Apache Spark’s transformWithState, a new operator designed to tackle the complexities of stateful streaming and revolutionize how real-time data is managed.

What You'll Learn (Key Takeaways)

  • Simplified Stateful Streaming – transformWithState allows developers to manage complex stateful operations in real-time streaming applications without the previous limitations, such as single state variables or lack of schema evolution.
  • Practical Implementation Insights – Learn how to define custom stateful classes and handle input rows to effectively manage and trigger transformations based on specific business logic.
  • Real-World Applications – Discover how transformWithState is applied in scenarios like real-time fraud detection, session-based analytics, and gaming, showcasing its versatility and scalability in production environments.
  • Enhanced Performance and Reliability – With built-in features like RoxDB for state management and efficient changelog storage, transformWithState offers robust performance improvements over previous methods.

Q&A Highlights

Q: How does the performance of transformWithState in Spark Structured Streaming compare to Flink? A: Internal benchmarks show positive results favoring transformWithState, although specific data is not yet publicly released.

Q: Does transformWithState work with both streaming and batch data? A: transformWithState is designed exclusively for streaming. However, batch data can be processed as a stream if read appropriately, such as through a Delta table.

This session provided actionable insights for data streaming practitioners, demonstrating how Apache Spark's transformWithState can enhance stateful streaming capabilities and drive real-time data processing innovation.

About Speaker

Jay Palaniappan

Jay Palaniappan I bring 25+ years of IT expertise, with more than a decade focused on designing and managing Data and AI solutions on the Cloud. As a Solutions Architect at Databricks, I support Digital Native busine...