Machine Learning & SageMaker

The end-to-end platform for building, training, and deploying ML models.

17 sessions at the summit5 external resources

Overview

Amazon SageMaker AI is the managed service for the full machine-learning lifecycle: data labeling, notebooks, training jobs (including distributed training on thousands of GPUs/Trainium chips), hyperparameter tuning, model registry, deployment, and monitoring. SageMaker Unified Studio brings together SageMaker, Bedrock, Glue, EMR, Redshift, and QuickSight in one workspace so data engineers, data scientists, and analysts collaborate on the same data.

Key concepts

  1. Training: SageMaker Training Jobs, distributed training, spot instances
  2. Fine-tuning and distillation for cost-effective specialization
  3. Inference: real-time, serverless, asynchronous, batch transform
  4. Model Registry, Pipelines, and MLOps automation
  5. Feature Store for reusable features across teams
  6. AWS Trainium and Inferentia for cost-optimized ML

Key AWS services

  • Amazon SageMaker AI
  • SageMaker Unified Studio
  • AWS Trainium
  • AWS Inferentia
  • SageMaker JumpStart

Learn more — curated resources

Hand-picked official docs, foundational papers, and the best community guides for going deeper on this topic.

Sessions on this topic

17 sessions from the Summit covered this topic. Each is a self-contained mini-lesson.

  1. AIM401Expert

    Beyond API Dependency: Fine-tuning Cost-Effective Models on AWS

    As API costs for general-purpose LLMs rise, relying solely on off-the-shelf models can quickly undermine both cost control and system reliability. In this session, we share how Nearmap moved beyond API dependency by fine-tuning and distilling domain-specific models on AWS to analyze 300 million building permits for roof modifications. Well discuss our approach to generating and structuring training data, distilling large models into smaller, production-ready alternatives, evaluating trade-offs across model architectures, and making data-driven accuracy-versus-cost decisions before deployment. Attendees will leave with concrete patterns for shipping efficient, specialized models into production.

  2. ANT301Advanced

    A practitioners guide to data for agentic AI

    In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.

  3. MAM307Advanced

    Modernise legacy code using fine-tuned Gen AI models

    Rio Tintos data science team saw an opportunity to preserve institutional knowledge and improve developer productivity by modernizing a legacy codebase. Rather than attempting a full system overhaul, the team focused first on adding generative AI capabilities to their critical legacy application. By using the proven, open, and trusted data foundation of AWS, the company laid the groundwork for incremental modernization without disrupting core operations. Learn about model fine tuning against legacy codebases, Amazon Nova, SageMaker Jumpstart and AgentCore in this deep dive with AWS & Rio Tinto

  4. COP302Advanced

    Applying AI for FinOps and FinOps for AI

    Explore the intersection of AI and FinOps in this advanced session. First, discover how Kiro CLI can simplify AWS cost management by analyzing trends, explaining spend, and recommending optimizations like rightsizing and Savings Plans. Then, dive into FinOps for AI- learn how to track and control generative AI costs across Amazon EC2, Amazon SageMaker, Amazon Bedrock, and more. We'll share architecture patterns, cost-saving strategies, and real-world examples to help you build scalable, production-ready AI solutions while staying on budget. Whether you're optimizing existing workloads or launching new AI initiatives, you'll leave with practical tools to maximize value.

  5. DAT402Expert

    Deep dive into database integrations with AWS Zero-ETL

    Learn how AWS zero-ETL integrations eliminate complex data movement pipelines across multiple database engines, enabling data engineers, architects, and DBAs to reduce maintenance overhead while ensuring near real-time data availability for analytics and ML workloads. Examine the underlying architecture for supported zero-ETL integrations between Amazon Aurora, Amazon DynamoDB, and Amazon RDS sources to Amazon Redshift, Amazon SageMaker, and Amazon OpenSearch Service targets. Explore data movement options, tunable settings, and monitoring capabilities for ongoing data replicationall without traditional ETL complexity.

  6. DEV201Intermediate

    How Flybuys Built AI Governance to Accelerate Adoption at Scale

    Scaling AI successfully isnt just about moving fast — its about building the right foundations first. In this session, learn how Flybuys focused early on AI governance, steering documents, and engineering standards to enable smooth, secure AI adoption at scale. Well explore how upfront investment in guardrails, training, and approval processes allowed teams to deploy AI capabilities faster and with confidence. Youll hear how Flybuys is embedding governance and security expectations into engineering workflows using Kiro, including standardised steering patterns, approval pathways, and controlled rollout of AI capabilities such as Powers. Attendees will gain practical insights into how slowing down early can unlock faster, safer AI delivery across the organisation.

  7. DAT303Advanced

    Explore whats new in data and AI governance with SageMaker Catalog

    Join this session to learn about the latest capabilities in Amazon SageMaker Catalog that help organizations govern data and AI more effectively. We will walk through new features that make it easier to discover, govern, and securely share structured and unstructured data, models, business intelligence dashboards, and applications. Youll hear how customers are using these capabilities to improve data discovery and access, streamline compliance, and support AI initiatives.

  8. WPS203Intermediate

    Optimising Outpatient Waitlists with ML at Gold Coast Health

    Deploying ML in high-stakes environments demands enterprise readiness, governance, and continuous monitoring. In this session, you'll learn how Gold Coast Health moved from pilot to production with a predictive model identifying patients unlikely to attend procedures — achieving 33% precision, doubling the 15% manual baseline — while ensuring fairness across cohorts. The session covers real-world ML architecture on Amazon SageMaker Pipelines, production monitoring including data quality, pipeline health, and drift detection, plus navigating AI governance through bias analysis and impact assessment. Whether you're in healthcare, financial services, or any regulated industry, walk away with actionable patterns for deploying responsible ML at scale.

  9. FSI207Intermediate

    From enterprise data mesh to AI with Amazon SageMaker Unified Studio

    From enterprise data mesh to AI with Amazon SageMaker Unified StudioFinancial institutions are unlocking enormous value with AI agents — from personalised customer experiences to better risk decision making. But to deliver on that promise, agents need data they can find, understand, and trust. This session shows how a data mesh architecture on Amazon SageMaker Unified Studio builds that foundation: discoverable data across lines of business, business context that grounds agent responses in real meaning, quality signals that build confidence in every answer, and governed access that keeps you compliant by design. We cover domain ownership, multi-account strategies, data contracts, business glossaries, data quality, and cross-domain governance — and demonstrate how this foundation empowers agentic AI that delivers trusted, accurate results at enterprise scale.

  10. STP213Intermediate

    AI-Powered Farming: How Halter's ML Models Transform Dairy Operations

    New Zealand Unicorn agritech startup Halter is revolutionizing dairy farming with AI-powered smart collars that predict critical livestock events. Their machine learning models enable heat detection, calving prediction, pasture optimization, and animal behavior classification, processing data from thousands of GPS-enabled collars across remote farms. By leveraging AWS infrastructure, Halter's engineering team built scalable ML pipelines that help farmers make data-driven decisions, reduce labor costs, and improve animal welfare. Learn how Halter developed production ML models for agriculture, overcame challenges of training on livestock data, and their journey toward managed ML services.

  11. STP204Intermediate

    How Heidi Health Fine-Tunes Speech-to-Text Models on AWS

    Join Heidi Health and AWS's Generative AI Innovation Center (GenAIIC) for a behind-the-scenes look at building and deploying custom speech-to-text AI for healthcare. Learn hard-won lessons and a practical blueprint: curating domain-specific training data, fine-tuning open-weight models, validating non-deterministic outputs at scale, and shipping to production with optimized inference. Both teams share how AWS services reduced infrastructure complexity, accelerated iteration cycles, and scaled custom models across diverse real-world use cases — all while maintaining security and cost efficiency. Ideal for ML engineers, data scientists, and technical leaders exploring fine-tuning and production ML on AWS.

  12. ISV102Foundational

    From documents to voice - building AI products on AWS

    How Affinda leverages Amazon Bedrock (Claude), SageMaker, EKS & CloudFormation to deliver intelligent document processing at enterprise scale, cutting setup time and costs by 90% with 95%+ accuracy. This session will demonstrate how Affinda powers real-world AI product development from Affinda's Intelligfent Document Processing platform to Pathfindr's (acquired by Affinda) custom AI agents. The session will showcase the complete journey of building Honey Insurance's voice agent - Australia's first voice agent in financial services, and how the Affinda-AWS partnership enables rapid AI product development for Enterprises.

  13. STP212Intermediate

    How Apate AI uses Amazon Bedrock and voice AI to catch scammers

    Scams are a global epidemic costing businesses and consumers trillions. Apate AI turns the tables on fraudsters by deploying lifelike conversational AI agents, powered by Amazon Bedrock and speech models on Amazon SageMaker bidirectional streaming, that engage scammers in real time to detect, divert, disrupt, and decode their tactics. In this session, learn how Apate AI converts every scam interaction into actionable intelligence and how to build your own voice AI agents on AWS.

  14. STP216Intermediate

    Building AI Agents: From Open-Source Frameworks to Production-Grade

    AI agents are moving from demo to deployment. Startups across ANZ are building production-grade assistants using open-source orchestration frameworks, fine-tuned foundation models, and GPU-accelerated inference on AWS and NVIDIA infrastructure. This panel explores what it actually takes to ship agentic use casesfrom choosing the right models and frameworks to managing latency, cost, and reliability at scale. We'll hear from AirTree VC on where the investment thesis is heading, from NVIDIA on how accelerated compute is shaping the agent stack, and from Heidi Health building and scaling these systems in production. Whether it's vertical agents for healthcare, customer support, or code generation, we'll focus on what's working, what's hype, and where the real startup opportunities lie in the agent ecosystem.

  15. IND101Foundational

    Test, Learn, Iterate: Amazon Connect Success

    Discover how Flybuys achieved rapid contact centre transformation through early Amazon Connect adoption using AI-powered capabilities and a disciplined Test, Learn, Iterate approach. Starting with a focused pilot, they deployed AI-driven features like intelligent routing, real-time sentiment analysis, and automated quality assurance. They progressed through Launch, Activate, and Consume phasescapturing baseline metrics, scaling through peer-led training, and continuously refining AI performance based on weekly feedback loops. The results: reduced AHT, improved CSAT, 100% AI-powered QA coverage, and measurable ROI. This demonstrates that early AI adoption delivers calculated, data-driven transformation.

  16. FSI202Intermediate

    Accelerating Payment Innovation: Spec-Driven Development with AWS Kiro

    Australian Payments Plusoperator of Australia's critical payment infrastructure including eftpos, BPAY, and NPP, processing millions of daily transactionstransformed their development practices by adopting Spec-Driven Development using AWS Kiro. AP+ manages the payment rails connecting banks, merchants, and consumers throughout Australia. Through intensive Event-Driven Architecture bootcamps and hands-on training, engineering teams now independently run development workshops every two weeks, accelerating delivery of payment platform innovations while maintaining the highest security and compliance standards required for national financial infrastructure. Learn the practical framework for building development velocity in regulated environments.

  17. MAE204Intermediate

    How Amazon Ads Creative Agent uses AWS to democratize ad creation

    Media advertisers see up to 25% higher engagement when delivering custom creative to relevant audiences, yet producing quality video ads traditionally requires weeks of expensive and specialized expertise. Discover the inner workings of Amazon Ads new AI Creative Agent, and how it's transforming the creative process by automating and enhancing the generation of multi-format ads to businesses regardless of their size or creative expertise. Explore how Amazon Bedrock, custom-built ML models, GPUs, and model evaluations are used to orchestrate and generate compelling ad creatives into full video productions with professional voiceovers from conversational natural language, while reducing creative development time.

Live updates related to this topic LIVE

Sourced via Parallel AI Monitor — continuous web watch on 21 topical streams. Updated .

External links matched to this topic via topic relevance. The KB does not endorse third-party content; verify before citing.

Non-obvious insights

From the Playbook

One sharp, contrarian insight per session — the things teams don't think of unprompted.

For genuinely domain-specific tasks, a fine-tuned 7B-class model often *beats* a frontier model on the metric that matters — because it overfits to *your* distribution. That's not a bug; it's the feature you're paying for. ---AIM401 — Beyond API Dependency: Fine-tuning Cost-Effective Mo…
RAG retrieval quality is dominated by chunking strategy, not embedding model. Boring but true. Spend a week on chunk size, overlap, and semantic boundaries before you spend a dollar on a fancier embedder. ---ANT301 — A practitioners guide to data for agentic AI
33% precision means 67% false positives. Deployment success depends on what you *do* with the prediction — calling patients vs. removing slots vs. double-booking. The model is only as good as the workflow around it. Build the intervention design before chasing higher precision. ---WPS203 — Optimising Outpatient Waitlists with ML at Gold Coas…
Most data mesh failures aren't technical — they're domain teams refusing to own their output. The CDO who can convince domain VPs to accept ownership is worth more than the platform itself. Hire for influence, not just engineering. ---FSI207 — From enterprise data mesh to AI with Amazon SageMake…
The hardest engineering problem in agritech ML isn't the model — it's *connectivity*. Cellular dead zones in rural farms are everywhere. Edge inference + delayed sync is the operating reality. Most cloud-first ML architectures don't survive contact with rural Australia. ---STP213 — AI-Powered Farming: How Halter's ML Models Transform…
The dominant accuracy issue in healthcare STT in Australia isn't medical jargon — it's *accents and code-switching*. Patient cohorts are linguistically diverse; clinicians switch registers. Train accordingly; English-only test sets miss most of the failure cases. ---STP204 — How Heidi Health Fine-Tunes Speech-to-Text Models on…