Overview
A modern data foundation unifies operational, analytical, and AI workloads on open formats so data is queryable from any engine without copies. AWS's strategy centers on Amazon S3 Tables (managed Apache Iceberg), Amazon SageMaker Lakehouse (unified access across S3, Redshift, federated sources), AWS Glue for catalog and ETL, and Amazon DataZone for governance. With Iceberg as the open table format, you get ACID transactions, time travel, and schema evolution on cheap S3 storage.
Key concepts
- Apache Iceberg — open table format with ACID semantics
- Catalog standardization (AWS Glue Data Catalog, AWS Lake Formation)
- Zero-ETL integrations between operational stores and analytics
- Data quality, lineage, and active metadata
- Fine-grained access control with Lake Formation
Key AWS services
- Amazon S3 Tables
- AWS Glue
- AWS Lake Formation
- Amazon DataZone
- SageMaker Lakehouse
Learn more — curated resources
Hand-picked official docs, foundational papers, and the best community guides for going deeper on this topic.
Sessions on this topic
4 sessions from the Summit covered this topic. Each is a self-contained mini-lesson.
- ANT301Advanced
A practitioners guide to data for agentic AI
In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.
- ISV303Advanced
From hours to minutes: SafetyCulture's journey to 90% faster analytics
Discover how SafetyCulture, the global workplace operations platform used by 70,000+ organizations, achieved a 90% reduction in daily data pipeline execution time while processing the same data volumes on Amazon Redshift. Join this session with SafetyCulture data engineering team to learn how their team transformed a complex, slow-running data warehouse into a high-performance, AI-ready analytics platform using modern lakehouse architecture principles.
- DAT401Expert
Real-Time DataLakes with Apache Iceberg, Amazon MSK, and Amazon S3
Learn how to optimize Apache Iceberg data lakes on Amazon S3 for cost-effectiveness while enabling real-time analytics. This session explores S3 Tables deployments, focusing on streaming data from Apache Kafka via Amazon MSK into Iceberg format. Discover practical approaches for real-time table maintenance, metadata optimization for high-velocity writes, and data compaction strategies. Implement cost-effective retention policies using S3 Lifecycle configurations while maintaining sub-minute data freshness. See how MSK's native Iceberg integration eliminates pipeline overhead, reducing latency and operational costs. Gain actionable insights for balancing streaming performance with cost optimization at scale.
- DAT201Intermediate
Scaling Data Analytics: Easygo's Modern Lakehouse Journey on AWS
Discover how Melbourne-based Easygo, powering Stake and Kick.com, transformed their data analytics infrastructure to process over 600,000 daily transactions and tens of millions of streaming events. Learn about their implementation of a modern lakehouse architecture combining Amazon Aurora Zero-ETL integration with Amazon Redshift, Amazon Kinesis with AWS Glue streaming, and Apache Iceberg on Amazon S3. Results include 95% faster queries, 80% fewer ingestion incidents, 9 hours weekly maintenance savings, and accelerated global expansion. Explore practical strategies for building scalable, secure data foundations delivering near real-time analytics with robust governance across regulated markets.
Live updates related to this topic LIVE
Sourced via Parallel AI Monitor — continuous web watch on 21 topical streams. Updated .
- oracle.com Agent-native data infrastructure
Oracle Unveils AI Database Agentic Innovations for Business Data
Understanding Data published a detailed blueprint for an 'Event Sourcing for Agents' storage pattern, describing a log-based architecture that stores agent state as an append-only sequence of events to enable deterministic replay, time-travel debugging, and audit trails for produ
- fast.io Agent-native data infrastructure
How to Secure Vector Stores for AI Agents in 2025 | Fastio
Understanding Data published a detailed blueprint for an 'Event Sourcing for Agents' storage pattern, describing a log-based architecture that stores agent state as an append-only sequence of events to enable deterministic replay, time-travel debugging, and audit trails for produ
- understandingdata.com Agent-native data infrastructure
Event Sourcing for Agents: Log-Based Architecture for ...
Understanding Data published a detailed blueprint for an 'Event Sourcing for Agents' storage pattern, describing a log-based architecture that stores agent state as an append-only sequence of events to enable deterministic replay, time-travel debugging, and audit trails for produ
- cloud.google.com Agent-native data infrastructure
How UKG taps workforce intelligence with the Agentic Data Cloud | Google Cloud Blog
Understanding Data published a detailed blueprint for an 'Event Sourcing for Agents' storage pattern, describing a log-based architecture that stores agent state as an append-only sequence of events to enable deterministic replay, time-travel debugging, and audit trails for produ
- businesswire.com Agent-native data infrastructure
businesswire.com
Bedrock Data announced that ArgusAI now provides governance and agent-aware access control for AI agents built on Google Vertex AI Search and Dialogflow. The platform implements a 'Data Bill of Materials (DBOM)' to automatically discover and map the data stores accessed by agents
External links matched to this topic via topic relevance. The KB does not endorse third-party content; verify before citing.
Non-obvious insights
From the PlaybookOne sharp, contrarian insight per session — the things teams don't think of unprompted.