Data Governance & Privacy

Make data discoverable, trustworthy, and compliant.

7 sessions at the summit4 external resources

Overview

Modern data governance balances access and control. Amazon DataZone provides a business-friendly data catalog, AWS Lake Formation enforces fine-grained access on the data lake, AWS Glue Data Catalog is the technical metadata store, and Amazon Macie discovers PII in S3. Active metadata, lineage (via OpenLineage), and data contracts are emerging best practices. For AI specifically, model cards, data sheets, and AI guardrails extend governance to ML/LLM systems.

Key concepts

  1. Data catalog vs. data marketplace vs. data product
  2. Fine-grained access: row, column, cell-level
  3. Lineage and impact analysis
  4. PII discovery and classification
  5. AI governance: model cards, evaluation, watermarking

Key AWS services

  • Amazon DataZone
  • AWS Lake Formation
  • AWS Glue Data Catalog
  • Amazon Macie
  • AWS Audit Manager

Learn more — curated resources

Hand-picked official docs, foundational papers, and the best community guides for going deeper on this topic.

Sessions on this topic

7 sessions from the Summit covered this topic. Each is a self-contained mini-lesson.

  1. ANT301Advanced

    A practitioners guide to data for agentic AI

    In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.

  2. ARC301Advanced

    Build an AI-ready data foundation

    An unparalleled level of interest in generative AI and agentic AI is driving organizations to rethink their data strategy. While there is a need for data foundation constructs such as data pipelines, data architectures, data stores and data governance to evolve, there are business elements that need to stay constant like cost-efficiency and effectively collaborating across data estates. In this session we will cover how building your data foundation on AWS provides the tools and the building blocks to balance both needs, and empower organizations to grow their data strategy for building AI-ready applications.

  3. STP208Intermediate

    NextAI's LegalScout: A Data Foundation for Private Legal AI

    LegalScout helps Australian SME law firms turn Generative AI into a competitive advantage by securely leveraging their own client data and confidential matters to work smarter, not harder. Built with Australian lawyers on AWS using Amazon Bedrock for inference and Amazon S3Vectors for secure document searches, it automates repetitive work, streamlines workflows, and improves drafting, contract review, and research to boost productivity, reduce costs, and lift accuracy while maintaining strict privacy and compliance.

  4. WPS203Intermediate

    Optimising Outpatient Waitlists with ML at Gold Coast Health

    Deploying ML in high-stakes environments demands enterprise readiness, governance, and continuous monitoring. In this session, you'll learn how Gold Coast Health moved from pilot to production with a predictive model identifying patients unlikely to attend procedures — achieving 33% precision, doubling the 15% manual baseline — while ensuring fairness across cohorts. The session covers real-world ML architecture on Amazon SageMaker Pipelines, production monitoring including data quality, pipeline health, and drift detection, plus navigating AI governance through bias analysis and impact assessment. Whether you're in healthcare, financial services, or any regulated industry, walk away with actionable patterns for deploying responsible ML at scale.

  5. FSI207Intermediate

    From enterprise data mesh to AI with Amazon SageMaker Unified Studio

    From enterprise data mesh to AI with Amazon SageMaker Unified StudioFinancial institutions are unlocking enormous value with AI agents — from personalised customer experiences to better risk decision making. But to deliver on that promise, agents need data they can find, understand, and trust. This session shows how a data mesh architecture on Amazon SageMaker Unified Studio builds that foundation: discoverable data across lines of business, business context that grounds agent responses in real meaning, quality signals that build confidence in every answer, and governed access that keeps you compliant by design. We cover domain ownership, multi-account strategies, data contracts, business glossaries, data quality, and cross-domain governance — and demonstrate how this foundation empowers agentic AI that delivers trusted, accurate results at enterprise scale.

  6. STP209Intermediate

    How Cartesian Turns AI Agents from SaaS Killer to SaaS Moat

    The agents invasion into the software market is a fact of life now. Agents are changing how we are consuming software, services and information. But just like any technological inflection point, theres a redistribution of power with and SaaS platforms are struggling to find their centre of gravity in this new world. In this talk we will explore how, Cartesian is helping platforms lean in to their strategic assets like access to customers and privacy and find their moat in the agentic age by distributing and monetizing 3rd party agents.

  7. IDE101Foundational

    From principles to practice: Scaling AI responsibly

    Building AI applications that customers trust requires more than technical excellenceit demands a deliberate approach to managing risk across every stage of the AI lifecycle. As organizations scale their AI initiatives, the challenge of balancing innovation speed with responsible AI practices across dimensions like privacy, security, fairness, safety, and explainability becomes increasingly critical. Join our panelists for a 30-minute discussion where they will explore: Practical approaches to embedding responsible AI principles into AI application development without slowing down innovation, key considerations across privacy, security, fairness, safety, and explainability that organizations should prioritize, lessons learned from building AI applications that earn and maintain customer trust, and strategies for navigating the evolving responsible AI landscape and managing risk at scale. Whether you are a technical leader building AI solutions, a business decision-maker shaping your organization's AI strategy, or a practitioner looking to deepen your understanding of responsible AI, this session will provide actionable insights to help you build AI applications that are not only innovative but also trustworthy.

Live updates related to this topic LIVE

Sourced via Parallel AI Monitor — continuous web watch on 21 topical streams. Updated .

External links matched to this topic via topic relevance. The KB does not endorse third-party content; verify before citing.

Non-obvious insights

From the Playbook

One sharp, contrarian insight per session — the things teams don't think of unprompted.

RAG retrieval quality is dominated by chunking strategy, not embedding model. Boring but true. Spend a week on chunk size, overlap, and semantic boundaries before you spend a dollar on a fancier embedder. ---ANT301 — A practitioners guide to data for agentic AI
Cost-efficiency in data foundations comes from eliminating duplicate ingestion (the same data landing in three lakes), not from cheaper storage. Storage is rounding error in 2026; egress and re-processing are not. ---ARC301 — Build an AI-ready data foundation
33% precision means 67% false positives. Deployment success depends on what you *do* with the prediction — calling patients vs. removing slots vs. double-booking. The model is only as good as the workflow around it. Build the intervention design before chasing higher precision. ---WPS203 — Optimising Outpatient Waitlists with ML at Gold Coas…
Most data mesh failures aren't technical — they're domain teams refusing to own their output. The CDO who can convince domain VPs to accept ownership is worth more than the platform itself. Hire for influence, not just engineering. ---FSI207 — From enterprise data mesh to AI with Amazon SageMake…
The SaaS moat in the agentic era is *agent governance* — not features. Who decides which agents touch your customer's data, in what order, with what audit trail? That's not a feature you build; it's a position you claim. The first mover in each vertical will own it. ---STP209 — How Cartesian Turns AI Agents from SaaS Killer to Sa…
The orgs that deploy responsible AI fastest are the ones that already had strong product safety review processes — they're extending an existing muscle. Orgs without that muscle have to build it first; the schedule is real and underestimated. Plan for 6–12 months of muscle-building if you're starting cold. ---IDE101 — From principles to practice: Scaling AI responsibly