Overview
Real-time data systems on AWS use Amazon Kinesis (Data Streams, Firehose) and Amazon MSK (managed Apache Kafka) to ingest event streams, then process with Apache Flink (Amazon Managed Service for Apache Flink), Lambda, or EMR. Common patterns include change-data-capture (CDC), clickstream analytics, IoT telemetry, fraud detection, and real-time personalization. Kinesis Data Streams now integrates with Amazon Bedrock for streaming LLM inference.
Key concepts
- Event-driven architectures vs. batch ETL
- Apache Kafka topics, partitions, consumer groups
- Stream processing: stateful operators, windows, exactly-once
- Change Data Capture (CDC) with Debezium / DMS
- Backpressure, scaling, and replay for resilience
Key AWS services
- Amazon Kinesis
- Amazon MSK
- Managed Service for Apache Flink
- Amazon EventBridge
Learn more — curated resources
Hand-picked official docs, foundational papers, and the best community guides for going deeper on this topic.
Sessions on this topic
9 sessions from the Summit covered this topic. Each is a self-contained mini-lesson.
- ANT301Advanced
A practitioners guide to data for agentic AI
In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.
- DAT402Expert
Deep dive into database integrations with AWS Zero-ETL
Learn how AWS zero-ETL integrations eliminate complex data movement pipelines across multiple database engines, enabling data engineers, architects, and DBAs to reduce maintenance overhead while ensuring near real-time data availability for analytics and ML workloads. Examine the underlying architecture for supported zero-ETL integrations between Amazon Aurora, Amazon DynamoDB, and Amazon RDS sources to Amazon Redshift, Amazon SageMaker, and Amazon OpenSearch Service targets. Explore data movement options, tunable settings, and monitoring capabilities for ongoing data replicationall without traditional ETL complexity.
- ARC303Advanced
Unlock GenAI inference anywhere with Amazon EKS Hybrid Nodes
Join this session to explore how Amazon EKS Hybrid Nodes enables GenAI inference anywhere. We'll discuss reference architectures for adding on-prem GPUs to your EKS hybrid cluster, and for running real-time data capture and processing at the edge. You'll learn how EKS Hybrid Nodes enables seamless integration between the cloud and your on-prem or edge environments. Well also walk through a real-world example, showcasing how to accelerate GenAI inference at the edge using Amazon EKS Hybrid Nodes with NVIDIA DGX platform.
- DAT301Advanced
Powering your Agentic AI experience with AWS Streaming and Messaging
Powering your Agentic AI experience with AWS Streaming and MessagingOrganizations are accelerating innovation with generative AI and agentic AI use cases. This session explores how AWS streaming and messaging services such as Amazon Managed Streaming for Apache Kafka, Kinesis Data Streams, Amazon Managed Service for Apache Flink, and Amazon SQS build intelligent, responsive applications. Discover how streaming supports real-time data ingestion and processing, while messaging ensures reliable coordination between AI agents, orchestrates workflows, and delivers critical information at scale. Learn architectural patterns that highlight how a unified approach acts on data as fast as needed, providing the reliability and scale to grow for your next generation of AI.
- DAT401Expert
Real-Time DataLakes with Apache Iceberg, Amazon MSK, and Amazon S3
Learn how to optimize Apache Iceberg data lakes on Amazon S3 for cost-effectiveness while enabling real-time analytics. This session explores S3 Tables deployments, focusing on streaming data from Apache Kafka via Amazon MSK into Iceberg format. Discover practical approaches for real-time table maintenance, metadata optimization for high-velocity writes, and data compaction strategies. Implement cost-effective retention policies using S3 Lifecycle configurations while maintaining sub-minute data freshness. See how MSK's native Iceberg integration eliminates pipeline overhead, reducing latency and operational costs. Gain actionable insights for balancing streaming performance with cost optimization at scale.
- DAT201Intermediate
Scaling Data Analytics: Easygo's Modern Lakehouse Journey on AWS
Discover how Melbourne-based Easygo, powering Stake and Kick.com, transformed their data analytics infrastructure to process over 600,000 daily transactions and tens of millions of streaming events. Learn about their implementation of a modern lakehouse architecture combining Amazon Aurora Zero-ETL integration with Amazon Redshift, Amazon Kinesis with AWS Glue streaming, and Apache Iceberg on Amazon S3. Results include 95% faster queries, 80% fewer ingestion incidents, 9 hours weekly maintenance savings, and accelerated global expansion. Explore practical strategies for building scalable, secure data foundations delivering near real-time analytics with robust governance across regulated markets.
- MAE202Intermediate
Seven's AWS Journey: Streaming Premium Content at the Speed of Innovation
Join Tim Sheridan, Director of Product & Technology at Seven West Media, as he shares how Seven is leveraging cloud and AI to maximise the return on their most valuable asset — premium live content. With marquee events like the AFL Grand Final and The Ashes cricket series, the stakes couldn't be higher: massive concurrent audiences, critical advertising revenue, and zero tolerance for failure. Tim shares how they leaned on AI-powered developer and business tools to accelerate delivery, de-risk high-profile events, and maximise the return on its premium content investments. Discover how Seven's team transformed their approach to innovation — using cloud-native architecture and AI to achieve speed to market, audience experience, and advertising revenue.
- WPS301Advanced
AWS for healthcare analytics: accelerating time to insights
In today's data-driven healthcare landscape, organisations must rapidly transform diverse data sources into actionable insights that improve patient outcomes and accelerate operational efficiency. This session showcases how AWS' integrated analytics capabilities can deliver unmatched price-performance for every analytics workload, from data processing and SQL analytics to streaming and business intelligence. Through real-world healthcare examples, learn how AWS' built-in governance and scalability enable organisations to build secure, efficient analytics pipelines that accelerate time-to-insight. Ideal for data practitioners, IT decision-makers, and executives evaluating enterprise analytics platforms to drive their data-driven transformation.
- STP212Intermediate
How Apate AI uses Amazon Bedrock and voice AI to catch scammers
Scams are a global epidemic costing businesses and consumers trillions. Apate AI turns the tables on fraudsters by deploying lifelike conversational AI agents, powered by Amazon Bedrock and speech models on Amazon SageMaker bidirectional streaming, that engage scammers in real time to detect, divert, disrupt, and decode their tactics. In this session, learn how Apate AI converts every scam interaction into actionable intelligence and how to build your own voice AI agents on AWS.
Non-obvious insights
From the PlaybookOne sharp, contrarian insight per session — the things teams don't think of unprompted.