Introducing the StreamNative AI Hub — Agent Engine, MCP Server & more.

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Blog

February 24, 2025

20 min

StreamNative Enables Seamless Streaming into Apache Iceberg™ with Snowflake Open Catalog

Kundan Vyas

Staff Product Manager, StreamNative

Ashwin Kamath

Senior Product Manager, Snowflake

As Generative AI continues to transform industries, the demand for real-time data is growing exponentially. However, ingesting and managing real-time data at scale remains a costly challenge. StreamNative’s vision is to simplify and optimize real-time data ingestion, providing a cost-effective solution that enables organizations to harness real-time data without excessive costs, making it accessible to everyone.

StreamNative enables seamless data ingestion into open lakehouse formats like Apache Iceberg and Delta Lake, supporting various catalogs.

StreamNative is excited to help customers ingest topic data to Apache Iceberg™ cost-effectively by partnering with Snowflake to develop a native integration with Snowflake's Open Catalog, a fully managed service for Apache Polaris™ (incubating) which is an open-source catalog enabling secure, centralized access to Iceberg tables across REST-compatible query engines.

Challenges of streaming data to a lakehouse architecture

While StreamNative tackles various data ingestion challenges, this blog highlights two key areas.

Elevated costs associated with connector-based pipelines

Connectors that stream data to a Lakehouse offer a fast and declarative approach to building data pipelines without requiring custom code. However, their reliance on compute resources can significantly drive up costs, depending on the processing capacity required for data workloads. Also,connector-based pipelines often introduce maintenance complexity due to operational overhead and dependency management.

Lack of unified governance with an interoperable catalog

Another key challenge with connector-based pipelines is their inability to publish data to centralized catalogs, resulting in fragmentation.This fragmentation results in inconsistent access controls, reduced visibility, and compliance risks, complicating data integrity and security across the enterprise.

Cost-Efficient Data Streaming with Ursa Engine

Ursa’s Leaderless Architecture Offers Cost-Effective, Scalable Data Streaming

By shifting from leader-based architectures to a leaderless design with a lakehouse-native storage approach, Ursa delivers key advantages:

Elimination of inter-zone network costs, including client and data replication traffic, which are among the largest expenses in leader-based deployments like Kafka and Redpanda.
Lower storage costs through the use of cloud-native object storage and optimized columnar formats.
Seamless real-time and batch analytics without the need for costly ETL transformations.

Ensuring governance with a unified, interoperable catalog

StreamNative Ursa’s lakehouse-native storage follows a "stream backed by table" approach, compacting streaming data into Parquet files within Iceberg or Delta Lake. This ensures a single, catalog-governed data copy while preserving streaming metadata for replay. Ursa Managed Table automates data lifecycle management and table registration for seamless discovery.

By compacting topic data into a single copy in Apache Parquet™ format, Ursa Engine ensures consistency, efficiency, and accessibility across teams. For example, Data Engineering & Infrastructure teams benefit from simplified management, lower storage costs, and stronger governance; Data Science & AI teams gain real-time, ETL-free access for AI/ML; and Analytics & BI teams accelerate insights with up-to-date, queryable data.

StreamNative's integration with Snowflake Open Catalog leverages Ursa’s lakehouse-native storage, enabling seamless data streaming directly into Snowflake’s Open Catalog for easy discovery and consumption.

Here’s a quote from Chris Child, VP of Product Management at Snowflake who underscores this vision.

"We are thrilled to partner with StreamNative to bring seamless, cost-effective streaming data ingestion into Apache Iceberg through Snowflake Open Catalog. This integration will help customers with interoperability needs make real-time data AI-ready while ensuring governance across their data ecosystem. Together, we’re enabling organizations to apply open standards and unlock new levels of efficiency and value from their streaming data with Snowflake's data and AI platform". - Chris Child, VP of Product Management, Snowflake

Seamless Streaming from StreamNative to Iceberg via Snowflake Open Catalog

StreamNative Cloud serves as a powerful streaming layer for Iceberg tables and Snowflake Open Catalog, enabling real-time data to be universally governed and easily integrated with the Snowflake AI Data Cloud. StreamNative Cloud allows enterprises to ingest/stream, process, and manage high-velocity data streams across diverse sources while maintaining schema consistency and lineage through Snowflake Open Catalog.

This streamlined integration not only simplifies data management but also accelerates data accessibility for downstream analytics and AI workloads, empowering organizations to unlock actionable insights from fresh, AI-ready data at scale.

The integration of StreamNative Cloud with Snowflake Open Catalog leverages Iceberg libraries to ingest data as Iceberg tables and register them within Snowflake’s Open Catalog.

Create and register an iceberg table – StreamNative Cloud utilizes Apache Iceberg libraries to authenticate with Snowflake’s Open Catalog service and execute REST APIs for table creation and registration.
Write topics data to Iceberg table – Topic data is written to Parquet files, with a corresponding Iceberg table created for each topic.
Generate snapshot – StreamNative Cloud runtime creates a new snapshot. This process occurs with each update to the Iceberg table, capturing all associated data and manifest files. Snapshots enable time-travel queries and support rollback operations.
Commit snapshot – Snapshot created in the previous step is committed in this step. Committing a snapshot is the process of atomically applying changes to an Iceberg table through the REST Catalog API. This ensures consistency and correctness in a distributed environment.
Query and Analyze Iceberg Data in Snowflake AI Data Cloud – Users can access and analyze the ingested data with the Snowflake AI Data Cloud and a variety of tools.

This native integration enables users to effortlessly configure a cluster for streaming data directly into Iceberg with just a few clicks, allowing them to quickly gain insights from their data.

StreamNative's integration with Snowflake Open Catalog provides unified governance, enabling visibility and access controls across streaming and non-streaming data as it moves from ingestion to processing, storage, and consumption. This integration also enables interoperability with a vendor-neutral, open source foundation in Apache Iceberg and Apache Polaris, giving organizations flexibility to read and write with a variety of engines.

Integration Setup

To establish an integration between StreamNative and Snowflake Open Catalog, three key steps must be followed:

Configuring Snowflake Open Catalog – Begin by setting up Snowflake Open Catalog to enable seamless integration with StreamNative Cloud for data streaming.
Deploying and Enabling Integration – Create a cluster and activate the Snowflake Open Catalog integration within StreamNative Cloud.
Connecting to Snowflake AI Data Cloud – Configure Snowflake AI Data Cloud to access and query data published in Snowflake Open Catalog, ensuring seamless interoperability.

Learn more about the configuration process and step-by-step implementation.

Enabling Open Catalog Integration

When creating a cluster in StreamNative Cloud, users can enable a Catalog Integration, configure Snowflake Open Catalog, and deploy the cluster seamlessly.

Setup StreamNative Cluster

To create a new instance, enter the instance name, configure the Cloud Connection, select the URSA Engine, and then specify the Cluster Location.

To create a cluster, provide the cluster name, select the Cloud Environment and Availability Zone, and proceed to configure the Lakehouse Storage settings.

Enable & Configure Open Catalog Integration

There are two options for selecting a storage location: you can either specify your own storage bucket or utilize a pre-created bucket provided by the BYOC environment. In this example, we will use the pre-created bucket.

To configure Snowflake Open Catalog, select Snowflake Open Catalog as the catalog provider and complete the remaining catalog configuration details. Click Deploy to finish catalog configuration.

Click Deploy to complete cluster creation.

Once the cluster is created, populate the StreamNative cluster to stream data directly into storage by creating and running a producer, where it is stored as Iceberg tables and published to Snowflake Open Catalog for discovery and analysis.

Query Data from Snowflake Open Catalog

To query data from Snowflake Open Catalog in Snowflake AI Data Cloud, users must complete the necessary steps to establish an integration. Once integrated, the data becomes accessible to query from Snowflake using SQL, Python, Notebooks, LLM functions, Cortex Analyst and more

Conclusion

The Public Preview release of Catalog integration within StreamNative Cloud represents a transformative step in connecting real-time data streaming pipelines to Lakehouse Storage, particularly through Snowflake Open Catalog. Built on vendor-neutral open standards like Apache Kafka, Apache Iceberg and Apache Polaris, this integration ensures greater interoperability than alternatives while providing cross-engine data governance, seamless schema evolution, and efficient metadata management. Organizations can ingest data directly into Iceberg tables, enable effortless discovery in Snowflake Open Catalog, and streamline analytics and machine learning workflows, making it easier to extract actionable insights and predictions from streaming data. With unified access controls, organizations gain full control of data movement, transformations, and consumption across many engines in their stack. Explore how this open, standards-based integration can revolutionize your data-driven strategy. Here are a few resources for you to explore:

Learn more about StreamNative’s Integration With Snowflake Open Catalog
Documentation for Snowflake Open Catalog Integration : Follow these steps to integrate StreamNative Cloud with Snowflake Open Catalog.
Try it yourself: Sign up for a trial to explore StreamNative's Ursa Engine and experience the power of real-time data in action.

This is some text inside of a div block.

Button Text

Kundan Vyas

Kundan is a Staff Product Manager at StreamNative, where he spearheads StreamNative Cloud, Lakehouse Storage and compute platform for connectivity, functions, and stream processing. Kundan also leads Partner Strategy at StreamNative, focusing on building strong, mutually beneficial relationships that enhance the company's offerings and reach.

Ashwin Kamath

Ashwin Kamath leads the product efforts for Snowflake Open Catalog, and previously Snowflake's telemetry and observability services. Before Snowflake, Ashwin held product management roles at Google and Microsoft, where he worked on observability, streaming analytics technologies and developer experience. Ashwin holds a master’s degree in computer science from University of Illinois, Urbana-Champaign.

Show all

Blog

Jun 12, 2025

6 min read