Optimize Data Lakes with Amazon S3 Tables & Iceberg

Introducing the StreamNative AI Hub — Agent Engine, MCP Server & more.

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Deny Accept

keynote

30 min

Amazon S3 Tables: Architecture, use cases, and integrations

Lee Kear

Anupriti Warade

Resources

Download Slide Deck ↓

TL;DR

Amazon S3 Tables address the complexity of managing tabular data at scale by leveraging Apache Iceberg support. This enables efficient storage and querying across multiple analytics engines like Athena and Redshift. Key benefits include improved performance, simplified security, and optimized cost management for data lakes.

Opening

In today's data-driven world, the demand for real-time insights and scalable analytics has transformed the landscape of data lakes. As organizations increasingly shift towards streaming-first architectures, Amazon S3 Tables emerge as a game-changer. With Apache Iceberg support, S3 Tables streamline the process of managing and querying large-scale structured datasets, providing a robust foundation for modern analytics and machine learning workloads. This session delves into the architecture, use cases, and integrations that make S3 Tables an essential tool for data practitioners.

What You'll Learn (Key Takeaways)

Streamlined Data Management – Amazon S3 Tables simplify the management of large-scale tabular data with built-in Apache Iceberg support, offering high performance and transactional consistency.
Optimized Streaming Data Ingestion – Explore best practices for streaming data directly into S3 Tables, enabling real-time analytics with low latency and high throughput.
Cost-Effective and Scalable – Benefit from automatic storage optimization and background compaction to reduce costs and enhance query performance.
Seamless Integration – Leverage a wide range of analytical tools and streaming engines, including Amazon Kinesis, Apache Pulsar, and Apache Spark, for versatile data processing and querying.

Q&A Highlights

Q: How does S3 Tables' automatic table maintenance compare with other managed Iceberg offerings?
A: S3 Tables use Iceberg's bin-pack compaction method, which is basic compared to some proprietary algorithms from other providers, but effective for most use cases.

Q: What tools are used for table maintenance and compaction?
A: It's a combination of Spark and custom implementation tools to ensure efficient compaction and maintenance.

Q: Are there plans to support Iceberg version 3?
A: AWS is currently assessing the changes in Iceberg version 3, with plans to announce timelines soon, although no specific dates are available yet.

Q: How do S3 Tables integrate with AWS Glue Catalog?
A: AWS Glue Catalog now supports Iceberg's REST catalog specifications, allowing seamless integration for data management and querying.

Q: What are the best practices for optimizing streaming workloads with Iceberg tables?
A: Key practices include smart partitioning, managing snapshots and orphan files, and tuning for concurrent commits to maintain performance and cost efficiency.

Lee Kear

Principal Storage Specialist Solutions Architect, Amazon Web Services

Lee Kear has been working in IT since she received her Master’s Degree in Computer Science from the Georgia Institute of Technology in 1999. After working in telecommunications, retail, media & entertainment, and healthcare; she started working at AWS in 2012 as a Systems Engineer on the Amazon S3 team. Lee became the first Storage Specialist Solutions Architect in 2016. She now leads the WorldWide S3 aligned Storage Specialist SAs where she concentrates on enablement and influencing the S3 roadmap. She loves to help customers use S3 in the most efficient, performant, and cost effective way possible for their use case. Outside of work, she enjoys traveling with her wife.

Anupriti Warade

Senior Product Manager, Amazon Web Services

Anupriti Warade is a Senior Technical Product Manager for Amazon S3. She specializes in helping customers innovate and building scalable products that drive success in analytics, AI and machine learning(ML).