Retrieval Augmented Generation (RAG)

Overview

RAG combines a retrieval system (usually a vector database) with a generative model so the model can answer questions about private or up-to-date data it was never trained on. Documents are chunked, embedded into vectors, and stored. At query time, the most similar chunks are retrieved and inserted into the model's context. AWS offers Bedrock Knowledge Bases as a managed RAG pipeline, with Amazon OpenSearch Serverless, Aurora pgvector, MemoryDB, and S3 Vectors as vector store options.

Key concepts

Embeddings and vector similarity (cosine, dot product)
Chunking strategies — fixed, semantic, hierarchical
Hybrid search — combining keyword (BM25) and vector search
Reranking with cross-encoders for precision
Graph RAG — using knowledge graphs alongside vectors
Evaluation: faithfulness, answer relevance, context recall

Key AWS services

Bedrock Knowledge Bases
Amazon OpenSearch Serverless
Aurora PostgreSQL (pgvector)
Amazon MemoryDB
Amazon S3 Vectors

Learn more — curated resources

Hand-picked official docs, foundational papers, and the best community guides for going deeper on this topic.

Sessions on this topic

51 sessions from the Summit covered this topic. Each is a self-contained mini-lesson.

Live updates related to this topic LIVE

Sourced via Parallel AI Monitor — continuous web watch on 21 topical streams. Updated 2026-06-29.

External links matched to this topic via topic relevance. The KB does not endorse third-party content; verify before citing.

Non-obvious insights

From the Playbook

One sharp, contrarian insight per session — the things teams don't think of unprompted.

The single highest-leverage practice in agent ops is the offline eval suite. It's tedious to build but it unlocks everything downstream — model upgrades, prompt iteration, regression testing, vendor swaps. Teams that skip evals end up trapped on a single model and prompt forever. ---AIM201 — From demo to deployment: solving agentic AI's toughe…

RAG retrieval quality is dominated by chunking strategy, not embedding model. Boring but true. Spend a week on chunk size, overlap, and semantic boundaries before you spend a dollar on a fancier embedder. ---ANT301 — A practitioners guide to data for agentic AI

"AI-native" is mistakenly framed as a tech change. It's actually a *procurement* change. Your buying decisions need to weight "agent-friendly APIs" and "structured outputs" as first-class criteria. Half the AI roadblocks come from vendors whose APIs aren't built for this. ---DAT304 — AI-Native by Design: How Deputy Rewired Its Operatin…

The Code Interpreter sandbox is the safest pattern most teams ignore. It lets you give agents *capability* without giving them *prod access*. Sandbox + result-passing handles 80% of the "agent needs to run code" problem with a fraction of the blast radius. ---MAM306 — Adding Agentic AI to legacy apps with Amazon Bedrock…

Modular AgentCore decomposition lets you swap models per stage. Use a cheap model for triage ("is this even worth processing?"), a mid-tier for the bulk, and an expensive model only for ambiguous cases that fail confidence checks. Don't run uniform inference. The cost difference is 10×. ---ISV302 — Architecting Scalable AI Agents using Amazon Bedrock…

Most teams optimise for retrieval *quality* and forget *tail latency*. A single cold tenant with a giant document set will kill P99 for everyone unless you isolate aggressively. Per-tenant query budgets are a feature, not a limitation. ---STP205 — How Dovetail powers Multi-Tenant Agents with Vector …

Retrieval Augmented Generation (RAG)

Overview

Key concepts

Key AWS services

Learn more — curated resources

Sessions on this topic

From demo to deployment: solving agentic AI's toughest challenges

A practitioners guide to data for agentic AI

AI-Native by Design: How Deputy Rewired Its Operating Model on AWS

Adding Agentic AI to legacy apps with Amazon Bedrock AgentCore

Architecting Scalable AI Agents using Amazon Bedrock AgentCore

How Dovetail powers Multi-Tenant Agents with Vector Indexing at Scale

Data Observability Without the Pain - Lessons from a Production System

AWS Security Agent: Proactive AppSec from Design to Deployment

Transforming from SaaS to multi-tenant agentic SaaS

Digital transformation excellence using agentic AI

Postman and the Future of AI-Driven API Development in 2026

Charting the CX Frontier: A Cohesive, AI-Enabled Engagement Platform

How Auto & General leverage observability foundations for AI

How NAB is Conquering Multi-Cloud to Secure the Enterprise

Postman and the Future of AI-Driven API Development in 2026

Unite Teams, Tools, and AI to Drive Transformation at Scale

AI Native Development: Strategies and Impact across Amazon and AWS

AI Native Development: Strategies and Impact across Amazon and AWS

Advanced AI Security: Architecting Defense-in-Depth for AI Workloads

Accelerate Your Cloud Journey with AWS Transform

Rolling to Scale: Roller's Multi-Tenant SaaS platform on AWS

How Flybuys Built AI Governance to Accelerate Adoption at Scale

NextAI's LegalScout: A Data Foundation for Private Legal AI

Advanced AI Security: Architecting Defense-in-Depth for AI Workloads

Serverless Developer Experience: Day in a life of builder

Scaling RAG to Millions of Vectors: The Squiz Story

Secure Multi-tenant SaaS with AWS Lambda: A Tenant Isolation Deep Dive

AI Powered Resilience Lifecycle

Secure Multi-tenant SaaS with AWS Lambda: A Tenant Isolation Deep Dive

Diversity In Tech - AI Literacy Skills - Rapid prototyping with Kiro

MCP on EKS: Xero's AI-Driven Developer Experience

Charting the CX Frontier: A Cohesive, AI-Enabled Engagement Platform

PMY Delivers Realtime Crowd Analytics at the F1 Australian Grand Prix

Structured Approach to AI coding with Spec-Driven Development on Kiro

The AI Challenge You Don't Yet Know About - Software Supply Chain

Architecting for growth and resilience: Cell based design deep dive

Seven's AWS Journey: Streaming Premium Content at the Speed of Innovation

AI-Powered Farming: How Halter's ML Models Transform Dairy Operations

Using Tools and Agents in Generative AI applications

From principles to practice: Scaling AI responsibly

From documents to voice - building AI products on AWS

Power of Possibility: Leading Through Innovation and Connection

Behind the curtain: How Amazons AI innovations are powered by AWS

Transforming software license efficiency - Human-centered AI on AWS

hipages Journey Towards an Agentic Engineering Organisation

Zero-Downtime Migration from Sydney to Auckland (ap-southeast-6)

Test, Learn, Iterate: Amazon Connect Success

How scalable data foundations helped TGE unlock the power of AI

From GRC Platform to AI-Native Risk Intelligence on AWS:Protecht Story

How Canva Scales and Optimizes AI Workloads with Karpenter

How HBF Transformed Claims Processing From Two Weeks to Two Minutes

Live updates related to this topic LIVE

[2606.06448] Agent Memory: Characterization and System ...

Building an Agentic Access-Aware RAG System with Amazon FSx ...

Oracle AI Database Agentic AI: Enterprise Features Address Data ...

AI Agent Memory and RAG: The Complete 2026 Guide

How to Build AI Agent Memory in 2026 - Fountain City

Non-obvious insights

Related topics

Security, Identity & Compliance

Agentic AI

Generative AI & Foundation Models

Voice & Conversational AI