Streaming & Real-Time Data

Process millions of events per second as they happen.

9 sessions at the summit5 external resources

Overview

Real-time data systems on AWS use Amazon Kinesis (Data Streams, Firehose) and Amazon MSK (managed Apache Kafka) to ingest event streams, then process with Apache Flink (Amazon Managed Service for Apache Flink), Lambda, or EMR. Common patterns include change-data-capture (CDC), clickstream analytics, IoT telemetry, fraud detection, and real-time personalization. Kinesis Data Streams now integrates with Amazon Bedrock for streaming LLM inference.

Key concepts

  1. Event-driven architectures vs. batch ETL
  2. Apache Kafka topics, partitions, consumer groups
  3. Stream processing: stateful operators, windows, exactly-once
  4. Change Data Capture (CDC) with Debezium / DMS
  5. Backpressure, scaling, and replay for resilience

Key AWS services

  • Amazon Kinesis
  • Amazon MSK
  • Managed Service for Apache Flink
  • Amazon EventBridge

Learn more — curated resources

Hand-picked official docs, foundational papers, and the best community guides for going deeper on this topic.

Sessions on this topic

9 sessions from the Summit covered this topic. Each is a self-contained mini-lesson.

  1. ANT301Advanced

    A practitioners guide to data for agentic AI

    In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.

  2. DAT402Expert

    Deep dive into database integrations with AWS Zero-ETL

    Learn how AWS zero-ETL integrations eliminate complex data movement pipelines across multiple database engines, enabling data engineers, architects, and DBAs to reduce maintenance overhead while ensuring near real-time data availability for analytics and ML workloads. Examine the underlying architecture for supported zero-ETL integrations between Amazon Aurora, Amazon DynamoDB, and Amazon RDS sources to Amazon Redshift, Amazon SageMaker, and Amazon OpenSearch Service targets. Explore data movement options, tunable settings, and monitoring capabilities for ongoing data replicationall without traditional ETL complexity.

  3. ARC303Advanced

    Unlock GenAI inference anywhere with Amazon EKS Hybrid Nodes

    Join this session to explore how Amazon EKS Hybrid Nodes enables GenAI inference anywhere. We'll discuss reference architectures for adding on-prem GPUs to your EKS hybrid cluster, and for running real-time data capture and processing at the edge. You'll learn how EKS Hybrid Nodes enables seamless integration between the cloud and your on-prem or edge environments. Well also walk through a real-world example, showcasing how to accelerate GenAI inference at the edge using Amazon EKS Hybrid Nodes with NVIDIA DGX platform.

  4. DAT301Advanced

    Powering your Agentic AI experience with AWS Streaming and Messaging

    Powering your Agentic AI experience with AWS Streaming and MessagingOrganizations are accelerating innovation with generative AI and agentic AI use cases. This session explores how AWS streaming and messaging services such as Amazon Managed Streaming for Apache Kafka, Kinesis Data Streams, Amazon Managed Service for Apache Flink, and Amazon SQS build intelligent, responsive applications. Discover how streaming supports real-time data ingestion and processing, while messaging ensures reliable coordination between AI agents, orchestrates workflows, and delivers critical information at scale. Learn architectural patterns that highlight how a unified approach acts on data as fast as needed, providing the reliability and scale to grow for your next generation of AI.

  5. DAT401Expert

    Real-Time DataLakes with Apache Iceberg, Amazon MSK, and Amazon S3

    Learn how to optimize Apache Iceberg data lakes on Amazon S3 for cost-effectiveness while enabling real-time analytics. This session explores S3 Tables deployments, focusing on streaming data from Apache Kafka via Amazon MSK into Iceberg format. Discover practical approaches for real-time table maintenance, metadata optimization for high-velocity writes, and data compaction strategies. Implement cost-effective retention policies using S3 Lifecycle configurations while maintaining sub-minute data freshness. See how MSK's native Iceberg integration eliminates pipeline overhead, reducing latency and operational costs. Gain actionable insights for balancing streaming performance with cost optimization at scale.

  6. DAT201Intermediate

    Scaling Data Analytics: Easygo's Modern Lakehouse Journey on AWS

    Discover how Melbourne-based Easygo, powering Stake and Kick.com, transformed their data analytics infrastructure to process over 600,000 daily transactions and tens of millions of streaming events. Learn about their implementation of a modern lakehouse architecture combining Amazon Aurora Zero-ETL integration with Amazon Redshift, Amazon Kinesis with AWS Glue streaming, and Apache Iceberg on Amazon S3. Results include 95% faster queries, 80% fewer ingestion incidents, 9 hours weekly maintenance savings, and accelerated global expansion. Explore practical strategies for building scalable, secure data foundations delivering near real-time analytics with robust governance across regulated markets.

  7. MAE202Intermediate

    Seven's AWS Journey: Streaming Premium Content at the Speed of Innovation

    Join Tim Sheridan, Director of Product & Technology at Seven West Media, as he shares how Seven is leveraging cloud and AI to maximise the return on their most valuable asset — premium live content. With marquee events like the AFL Grand Final and The Ashes cricket series, the stakes couldn't be higher: massive concurrent audiences, critical advertising revenue, and zero tolerance for failure. Tim shares how they leaned on AI-powered developer and business tools to accelerate delivery, de-risk high-profile events, and maximise the return on its premium content investments. Discover how Seven's team transformed their approach to innovation — using cloud-native architecture and AI to achieve speed to market, audience experience, and advertising revenue.

  8. WPS301Advanced

    AWS for healthcare analytics: accelerating time to insights

    In today's data-driven healthcare landscape, organisations must rapidly transform diverse data sources into actionable insights that improve patient outcomes and accelerate operational efficiency. This session showcases how AWS' integrated analytics capabilities can deliver unmatched price-performance for every analytics workload, from data processing and SQL analytics to streaming and business intelligence. Through real-world healthcare examples, learn how AWS' built-in governance and scalability enable organisations to build secure, efficient analytics pipelines that accelerate time-to-insight. Ideal for data practitioners, IT decision-makers, and executives evaluating enterprise analytics platforms to drive their data-driven transformation.

  9. STP212Intermediate

    How Apate AI uses Amazon Bedrock and voice AI to catch scammers

    Scams are a global epidemic costing businesses and consumers trillions. Apate AI turns the tables on fraudsters by deploying lifelike conversational AI agents, powered by Amazon Bedrock and speech models on Amazon SageMaker bidirectional streaming, that engage scammers in real time to detect, divert, disrupt, and decode their tactics. In this session, learn how Apate AI converts every scam interaction into actionable intelligence and how to build your own voice AI agents on AWS.

Non-obvious insights

From the Playbook

One sharp, contrarian insight per session — the things teams don't think of unprompted.

RAG retrieval quality is dominated by chunking strategy, not embedding model. Boring but true. Spend a week on chunk size, overlap, and semantic boundaries before you spend a dollar on a fancier embedder. ---ANT301 — A practitioners guide to data for agentic AI
The hardest part of premium live streaming isn't streaming — it's the *ad insertion* at peak. Ad systems break first under burst load. Test those harder than the video pipeline; that's the actual fragile part. ---MAE202 — Seven's AWS Journey: Streaming Premium Content at th…
Most "healthcare analytics" wins come from joining clinical and operational data — currently siloed in most hospitals. Bringing them together unlocks insights neither team has individually. The technical work is medium; the political work is hard. ---WPS301 — AWS for healthcare analytics: accelerating time to i…
The intelligence value of engaging scammers (their tactics, scripts, escalation patterns) is *bigger* than the disruption value. The product underneath Apate is intelligence-as-a-service to law enforcement, with disruption as the lead-in. Counter-fraud AI is going to look more like an intelligence operation than a defence tool. ---STP212 — How Apate AI uses Amazon Bedrock and voice AI to cat…