Data Platforms Optimized for the Cloud – the StreamNative and Snowflake Partnership
While AI is supposed to make our lives easier, that might not be true if you are working behind the scenes to build and deploy AI systems. With so much potential for a variety of moving parts, it’s important for you to ensure that your data architecture doesn't become overly complex and costly. Keeping a check on IT complexity is a key requirement to help you gain the advantages of AI without the challenges that arise down the road.
At StreamNative we are committed to helping our customers become successful with their data streaming initiatives. As a result, we are excited to announce we have achieved Technology Select Tier partner status from Snowflake, the AI Data Cloud company. This partnership enables joint customers to easily leverage streaming data to deploy real-time analytics with greater performance/scale, faster elasticity, and lower TCO (more on these later).
Two Companies, One Vision
Our vision is to enable customers to get the most value from data. “SteamNative’s commitment to helping Snowflake mobilize the world’s data can be seen through their tremendous and fast growth as a Snowflake partner with us," said Tarik Dwiek, Head of Technology Alliances, Snowflake. "We look forward to driving deeper value for Snowflake’s AI Data Cloud ecosystem by partnering with StreamNative to allow access to a fast, simplified, and cost-effective real-time analytics architecture through Snowflake’s single, integrated platform."
Apache Iceberg™ Open-Source Table Format
Both companies recognize how customers can gain value from having Iceberg as an option for writing analytical data. This is especially important for businesses that have large-scale data sets that are less frequently accessed, in which storage cost is a main concern. Writing Iceberg tables into cloud object storage is a cost-effective way for storing and analyzing these data sets.
With our support for Snowpipe Streaming (more information coming soon in a blog), we are the first Snowflake partner to support Iceberg tables in our joint architecture. Writing to Iceberg tables automatically supports Snowflake Catalog to ensure this capability fits seamlessly in your data architecture. The Iceberg support gives our joint customers advantages around schema evolution, version control, and advanced partition handling while leveraging cost-effective object storage.
Real-Time Plus AI
Real-time analytics has been a popular and important data pattern for many years, and the need to further accelerate data processing pipelines while simplifying the overall architecture has been an ongoing aspiration. In addition, businesses are extending their traditional real-time analytics architecture to augment their AI deployments, so systems are always up to date with the latest information. This is especially important in company-specific AI systems that rely on time-sensitive data, often sourced by recent customer interactions. And as always, keeping cloud infrastructure costs under control is a priority, especially when cloud bills are far higher than expected. So, a heavier focus on cloud FinOps is also a necessary part of your data strategies.
If you are using or are exploring the use of Snowflake as a component of your real-time analytics and AI architectures, then a data streaming platform is likely part of your strategy. Many would reasonably choose Apache Kafka, but might also realize that it is the Kafka API and its ecosystem we care most about. The different underlying engines that support the Kafka API/ecosystem then become the points of comparison among the popular Kafka-compatible options in the market. StreamNative provides the Kafka API/ecosystem compatibility to lower the learning curve for your data streaming initiatives, and also provides the performance, simplicity, and cost-effectiveness to get the ROI you seek. Before we go into more details, let’s briefly discuss two example joint customers.
Joint Customers
Our success story on InnerSpace provides a great example of how these technologies work together. InnerSpace is a location analytics business that captures insightful data about people’s behavior with the goal of improving indoor experiences. They help customers with operational optimization, in which the “operations” pertain to how people use office space. Many businesses invest in real estate and want to know how efficiently their space is used. InnerSpace gives their customers insights on metrics such as how many people are in the office, how many come in early, how many leave late, where are the underused areas, etc.
InnerSpace chose StreamNative to handle their speed, low latency, scale, and cost challenges while also trying to simplify their infrastructure to reduce the dependency on a large DevOps team. They ingest raw location data from standard hooks in networking equipment, and then use Pulsar Functions, a function-as-a-service engine in StreamNative Cloud, to process that data before it can be used by analysts. One of the two main computations they run on the raw data is to anonymize the MAC addresses of the various devices so that employee privacy is retained. Another is to run their location algorithm which takes the Wi-Fi signal strengths and calculates locations of the devices at a very accurate and granular level.
The integration of StreamNative Cloud and Snowflake then makes it easy to deliver that data into an analytics-ready format. Analysts can then use popular tools to analyze office space usage and make better use of their space.
Another great example is Iterable, the AI-powered customer communication platform that helps brands like Redfin, Priceline, Calm, and Box to activate customers with joyful interactions at scale. With Iterable, organizations drive high growth with individualized, harmonized and dynamic communications that engage customers throughout the entire lifecycle at the right time. Iterable continuously processes a high volume of data in real time, and constantly pushes out messages via various channels to their customers’ customers. With billions of daily events to capture and process, and about a billion messages sent per day, Iterable needs a system that provides high throughput, low latency, and can scale.
They process the raw event data into an intermediary topic, which is then transformed into an analyzable format, and then load the transformed data into Snowflake for downstream processing and analytics. The StreamNative and Snowflake components work together to run the entire data pipelines from end to end.
Considerations for Your Choice of Data Streaming Platform
What are the important considerations for a suitable data streaming platform to go with your Snowflake implementation?
As a start, you need a certified integration with Snowflake, and this partnership addresses that requirement. Second, the right cloud deployment option is critical, whether you need a fully managed service, a self-hosted deployment, or even a bring-your-own-cloud option for those of you who have stringent data security and sovereignty requirements. You need a technology option that supports your deployment needs. Another consideration is whether you can leverage existing expertise. If your developers and tools are focused on the Kafka API, then it makes sense to go with a Kafka API-compatible technology.
Those are the items that you probably have considered, and here are some other issues you should plan for:
- Central platform engineering team. A common organizational model today is having a platform engineering team that is responsible for the foundational technologies, while separate dev teams build out systems for specific use cases. To support such an organizational structure, you need to have a platform that has the multi-tenancy capabilities to either isolate or share specific data sets to more efficiently support a broad audience. StreamNative Cloud provides the multi-tenancy capabilities to support a central platform team while also providing the simplicity of consolidating many use cases into a single cluster.
- Faster elasticity. Elasticity is a basic characteristic among cloud technologies, but the speed and efficiency of elastic deployments can vary greatly. This is because distributed systems often must rebalance data across the many nodes, and this process can be time consuming. So, if you add or remove nodes, the rebalancing work kicks in, and that can be heavyweight and cause disruptions. StreamNative Cloud leverages an architecture that eliminates the costly data rebalancing work, so that you can quickly scale up or down without the housekeeping disruptions.
- Downstream infrastructure costs. You have initial estimates on what your cloud bill will be, but as many businesses are finding, there are costs that are not always obvious, and therefore harder to predict. One source of such costs is the networking costs associated with housekeeping tasks like data rebalancing, as mentioned above. With the no-data-rebalancing architecture of StreamNative, you can not only avoid disruptions when scaling, but you can also reduce networking costs to keep your cloud infrastructure bills under control.
Learn More
This is just a brief overview of why this StreamNative partnership with Snowflake is beneficial. If real-time analytics and AI are part of your data initiatives, StreamNative and Snowflake make a great combination. There’s so much to explore here, try StreamNative with free $200 credit or contact us and we’d be happy to discuss this technology integration with you in more detail.
Newsletter
Our strategies and tactics delivered right to your inbox