Introducing Managed Flink In StreamNative Cloud
As enterprises face increasing demands for real-time insights, they are under growing pressure to adopt stream processing solutions such as Apache Flink® to power real-time data and AI use cases.
In particular, enterprises leveraging a Bring Your Own Cloud (BYOC) model for stream processing encounter several challenges. These include network latency and performance bottlenecks when streaming and processing data across clouds or regions, as well as performance complex network configurations involving VPC peering, private endpoints, and secure connections to ensure seamless data flow. Additionally, data transfer and egress costs can rise significantly, especially when large volumes of streaming data are transmitted between clouds and processing environments.
Overcoming these challenges requires optimized connectivity, secure communication, and efficient cost management to maintain smooth, secure, and cost-effective data streaming and processing. One effective approach is to bring processing closer to data by deploying stream processing solutions within the same network, minimizing cross-cloud data movement and reducing latency and networking costs.
In this blog, you'll discover how two of the most powerful engines in the streaming ecosystem—StreamNative's URSA for Data Streaming and Ververica's VERA for Stream Processing—are coming together to deliver an integrated, out-of-the-box solution for building stream processing applications. As part of this offering, StreamNative is bringing processing closer to data by deploying Ververica's VERA engine alongside URSA within the same Virtual Private Cloud (VPC) in StreamNative Cloud, delivering a fully managed Flink service. This strategic integration minimizes data movement across networks, reduces latency, and optimizes costs, providing a seamless experience for real-time data ingestion and processing. This combination is set to transform how enterprises manage real-time data, empowering them with efficient and scalable streaming capabilities.
StreamNative's URSA engine is a truly cloud-native data streaming platform, designed to be much easier to run while reducing TCO by 60% and increasing throughput by 2.5 times. Ursa empowers organizations to effortlessly manage high-throughput, low-latency data streams across a wide range of use cases on Apache Kafka and Apache Pulsar, delivering exceptional performance and cost efficiency.
Ververica's VERA engine offers up to 2x the performance of Apache Flink through advanced features like SQL optimization, GeminiStateBackend, and Autopilot tuning. It also includes key capabilities such as Advanced CDC, Dynamic CEP, built-in Flink ML, and Apache Paimon connectors for enhanced stream processing.
When VERA is paired with URSA, users get a native, integrated solution that takes care of both ends of the real-time data pipeline—ingesting and processing data in real time with minimal effort and maximum efficiency.
Introducing StreamNative’s Fully Managed Service for Apache Flink
An outcome of this collaboration is StreamNative's Fully Managed Service for Apache Flink, powered by Ververica's VERA engine. This managed service is tailored for enterprises transitioning from batch to real-time stream processing. With this service, businesses can focus on building stream processing applications without worrying about infrastructure management or operational overhead. Enterprises can now build, deploy, and scale stream processing applications quickly, benefiting from the combined expertise of both StreamNative and Veverica in the streaming data space.
Flink Capabilities in StreamNative Cloud
StreamNative Cloud empowers users to create and deploy stream processing applications as Flink Deployments within a StreamNative Workspace. While Java-based deployments are currently supported, future updates will enable users to leverage Python and Flink SQL to interact with backend systems seamlessly.
A StreamNative Workspace serves as a unified environment for organizing and managing Flink, Kafka, and Pulsar compute resources, facilitating the efficient execution of stream processing applications. Users can configure Flink deployments to access and process data from topics across one or multiple Kafka/Pulsar clusters within the workspace.
Users have the ability to perform full lifecycle management (create, update, and delete) of their Flink deployments, ensuring operational flexibility and control.
Stateful Flink Deployments
StreamNative Cloud also supports stateful Flink deployments, utilizing cloud-based storage for state management. Currently, Google Cloud Storage (GCS) is supported, with plans to expand support to AWS S3 and Azure Blob Storage in the future. Flink automatically manages the state configuration, eliminating the need for users to handle backend storage settings. This fully managed approach simplifies the deployment of stateful stream processing applications, ensuring reliable state management with minimal user involvement.
Value Proposition - StreamNative Cloud Managed Flink
The fully managed Apache Flink service within StreamNative Cloud delivers exceptional value, combining VERA-powered performance with the cost efficiencies of deploying Flink in your own VPC.
- VERA-Powered Performance: Enhanced processing speed and reliability, powered by Ververica's VERA engine.
- BYOC Flexibility: Deploy Flink in your own VPC for increased security, optimized performance, and cost savings.
- Unified Streaming Integration: Seamlessly integrate Flink with Pulsar and Kafka for a comprehensive streaming solution.
- Connector Flexibility: Utilize Kafka Connect or Pulsar IO connectors to expand integration capabilities.
- Lakehouse Compatibility: Support for integrating Flink workloads with Lakehouse architecture, driving advanced analytics.
Among the various differentiators outlined, I would like to highlight the BYOC flexibility related to networking configuration. When enterprises integrate a third-party Flink service with StreamNative Cloud, they often encounter complex networking setups and incur additional egress costs as data transfers between Flink and StreamNative clusters. However, with StreamNative’s native Flink service, users benefit from out-of-the-box Flink functionality within the same VPC as the StreamNative Cloud cluster. This eliminates the need for complex networking configurations and ensures that no egress costs are incurred when data flows between StreamNative clusters and Flink deployments, resulting in a seamless, cost-efficient deployment experience.
Use Case Deep Dive - E-Commerce Real-Time Analytics Using StreamNative Cloud
To gain a deeper understanding of the stream processing capabilities in StreamNative Cloud through the managed Apache Flink service, let's explore a specific use case. This will demonstrate how developers can leverage the combined power of URSA and VERA to deploy a data platform that is both faster and more cost-efficient than a self-managed solution built on Kafka and Flink.
In this scenario, we’ll explore how an e-commerce platform leverages StreamNative’s URSA (for Kafka and Pulsar-based data streaming) and StreamNative’s fully managed Flink service (powered by Ververica’s VERA) to build a comprehensive real-time analytics solution. We’ll highlight where each solution fits into the architecture and the unique value propositions they offer.
Lets go left to right, as shown in the figure above.
1. Data Ingestion from E-Commerce Platform
Source
The e-commerce platform generates a variety of real-time events:
- User actions: Product views, clicks, and searches.
- Transactions: Purchases, cart additions, and payment events.
- Inventory updates: Stock changes due to orders, restocking, or promotions.
These events must be captured and processed in real time to power analytics, product recommendations, and operational alerts.
2. Ingesting Data into URSA (Kafka/Pulsar Cluster) via Kafka Connect or Pulsar IO
StreamNative’s URSA:
StreamNative’s URSA powers the unified Kafka/Pulsar cluster, serving as the backbone for real-time messaging and data streaming. URSA supports both Kafka and Pulsar protocols, providing flexibility and high-throughput messaging.
Data Ingestion with Connectors:
- Kafka Connect JDBC Connector: This connector pulls transactional data (e.g., purchases) from the e-commerce platform’s relational database and streams it into URSA’s Kafka-compatible topics (e.g., transactions topics).
- Pulsar IO Debezium Connector: This Pulsar IO connector captures change data (CDC) from the inventory database, streaming real-time stock updates into Pulsar topics (e.g., inventory updates).
Data Flow into URSA:
- Kafka-compatible transactions topic: Ingests purchase and payment data.
- Pulsar-based inventory-updates topic: Streams real-time changes in stock levels.
- User-actions topic: Ingests user behavior events, such as product clicks and views.
Value of URSA:
- Unified Streaming Platform: URSA supports both Kafka and Pulsar protocols, simplifying the data ingestion layer and ensuring seamless integration with external data sources.
- High-Throughput Messaging: URSA enables scalable, low-latency messaging, capable of handling millions of events per second with cost savings.
- Flexible Data Ingestion: URSA allows connectors like Kafka Connect and Pulsar IO to ingest data from multiple sources, reducing the complexity of custom pipelines.
3. Processing in Fully Managed Flink Service in StreamNative Cloud (Powered by Ververica’s VERA)
After data is ingested into URSA’s Kafka/Pulsar cluster, the next step is real-time processing, handled by the fully managed Flink service in StreamNative Cloud(powered by Ververica’s VERA engine). This service delivers advanced stream processing capabilities, ideal for large-scale real-time data analytics.
Fully Managed Flink in StreamNative Cloud (Powered by VERA):
The fully managed Flink service within StreamNative Cloud integrates directly with URSA to provide stateful stream processing and complex event processing, all powered by VERA.
Flink Job Setup
- Flink Sources: The Flink service subscribes to the Kafka/Pulsar topics in URSA:some text
- User-actions stream: Tracks user behavior in real time.
- Transactions stream: Monitors sales and payment events.
- Inventory-updates stream: Manages real-time stock tracking.
Flink Processing (Powered by VERA):
- User Behavior Aggregation: Flink (powered by VERA) processes user behavior data in windowed operations, generating insights like "Top 10 most viewed products" or identifying users with high purchase intent in real time.
- Real-Time Sales Monitoring: VERA processes the transactions stream to deliver real-time revenue insights, track top-selling products, and calculate sales velocity.
- Inventory Analytics: For inventory management, VERA processes the inventory-updates stream to monitor stock levels and trigger alerts when stock falls below critical thresholds.
Key Flink Features in VERA:
- Stateful Processing: VERA handles stateful computations, such as maintaining running totals for user sessions or sales, which are critical for real-time analytics in e-commerce.
- Event Time Processing: VERA enables accurate processing of out-of-order events using event-time semantics and watermarks, ensuring real-time insights are timely and accurate.
- Exactly-Once Semantics: VERA guarantees exactly-once processing, ensuring data integrity, especially for financial transactions and inventory updates.
Value of the Managed Flink Service (Powered by VERA):
- Advanced Stream Processing: The fully managed service provides real-time stream processing for critical e-commerce workflows, powered by VERA’s high-performance, stateful stream processing engine.
- Enterprise-Grade Reliability: With VERA, the Flink service in StreamNative Cloud ensures fault tolerance, scalability, and continuous availability for mission-critical applications.
- Unified Batch and Stream: VERA can handle both real-time streaming and historical batch data, simplifying the data pipeline and allowing continuous insights.
4. Sinking Data into Destination with StreamNative Cloud’s Lakehouse Storage Support and Connectors
VERA processes and enriches the data it is stored for, enabling both real-time insights and long-term analysis.
Real-Time Analytics Database (OLAP):
- Aggregated metrics (e.g., top-selling products and revenue trends) are written to a real-time OLAP database (e.g., ClickHouse or Druid) for instant querying and business dashboards.
StreamNative Cloud’s Lakehouse Storage with Apache Iceberg:
- Lakehouse Offloading: StreamNative’s Lakehouse Storage support allows VERA to offload processed data to a Lakehouse architecture in Apache Iceberg format, which enables:some text
- Unified Real-Time and Historical Data: Apache Iceberg integrates real-time and batch data for a comprehensive view.
- Optimized Data Storage: Iceberg provides efficient data storage with partitioning, snapshotting, and schema evolution.
- Data Retention for Advanced Analytics: Offloaded data can be queried by engines like Trino or Apache Spark for deeper insights, machine learning, or long-term trend analysis.
Alerts and Notifications:
- Operational Alerts: Flink (VERA) generates real-time alerts (e.g., low stock or transaction spikes) and sends them back to URSA’s Kafka/Pulsar topics for immediate consumption by alert systems (e.g., SMS, email).
5. Real-Time Analytics Consumed by End Users
Once the data is processed and stored, it is ready to be consumed by various stakeholders for real-time insights.
Dashboards and Visualizations:
- Real-Time Dashboards: Business teams can access interactive dashboards powered by Superset or Grafana, which visualize key metrics such as:some text
- Most viewed products and top user actions.
- Real-time sales and revenue trends.
- Inventory levels and low-stock alerts.
Personalized Customer Experiences:
- Dynamic Web Pages: VERA's real-time analytics can be used to update the website dynamically, offering personalized product recommendations and promotions based on real-time user behavior.
Operational Alerts:
- Efficient Operations: Alerts generated by VERA ensure that teams are notified immediately about critical events, such as inventory shortages or sales surges, allowing for swift operational responses.
Summary of URSA and VERA in the E-Commerce Use Case:
- URSA (Kafka/Pulsar):
URSA powers the real-time data ingestion and messaging, supporting both Kafka and Pulsar protocols for scalable and flexible data flow. It handles high-throughput streams from user actions, transactions, and inventory updates. - VERA:
VERA powers the fully managed Flink service within StreamNative Cloud, delivering stateful and advanced stream processing capabilities. It ensures real-time insights, exactly-once processing, and seamless integration with both streaming and batch use cases.
Together, StreamNative’s URSA and Ververica’s VERA offer a powerful, end-to-end solution for processing high-volume data streams and delivering real-time analytics. This ensures the e-commerce platform can efficiently scale and respond to evolving business needs.
Conclusion
StreamNative’s integration of URSA for Data Streaming and VERA for Stream Processing offers enterprises a powerful, fully managed stream processing solution within StreamNative Cloud. By bringing processing closer to data—deploying both engines within the same VPC—this solution minimizes data movement, reduces latency, and optimizes costs. The managed Flink service, powered by Ververica’s VERA, enables organizations to transition seamlessly from batch to real-time stream processing, unlocking advanced analytics, enhanced performance, and operational efficiency. With unified support for Kafka and Pulsar, as well as Lakehouse integration, StreamNative Cloud provides enterprises with the tools needed to build scalable, real-time applications that drive business value.
Newsletter
Our strategies and tactics delivered right to your inbox