As API costs for general-purpose LLMs rise, relying solely on off-the-shelf models can quickly undermine both cost control and system reliability. In this session, we share how Nearmap moved beyond API dependency by fine-tuning and distilling domain-specific models on AWS to analyze 300 million building permits for roof modifications. Well discuss our approach to generating and structuring training data, distilling large models into smaller, production-ready alternatives, evaluating trade-offs across model architectures, and making data-driven accuracy-versus-cost decisions before deployment. Attendees will leave with concrete patterns for shipping efficient, specialized models into production.
What this session is about
Playbook
Editorial commentary · what to actually do about this on Monday
Independent editorial perspective — not an official AWS or speaker statement. Designed for executives evaluating what to brief their teams on next.
Live updates related to this session LIVE
Sourced via Parallel AI Monitor — continuous web watch on 21 topical streams. Updated .
- mem0.ai high confidence Agent benchmarks & evals
State of AI Agent Memory 2026: Benchmarks, Architectures ...
mem0.ai released the 'State of AI Agent Memory 2026', featuring a new benchmark that evaluates 10 different AI agent memory architectures across 21 integrations to determine the most effective patterns for production agent memory.
- openreview.net high confidence Agent benchmarks & evals
BankerToolBench: Evaluating AI Agents in End-to- ...
mem0.ai released the 'State of AI Agent Memory 2026', featuring a new benchmark that evaluates 10 different AI agent memory architectures across 21 integrations to determine the most effective patterns for production agent memory.
- threads.com high confidence Agent benchmarks & evals
Scale AI published SWE Atlas Refactoring Leaderboard ...
mem0.ai released the 'State of AI Agent Memory 2026', featuring a new benchmark that evaluates 10 different AI agent memory architectures across 21 integrations to determine the most effective patterns for production agent memory.
- arena.ai high confidence Agent benchmarks & evals
Agent Arena: AI Model Agentic Performance Leaderboard
The former LMSYS team launched 'Agent Arena' on 2026-06-04. Agent Arena is a crowdsourced testbed and leaderboard that ranks AI models based on their performance on real-world agentic tasks, including coding, research, and multi-step workflows. The evaluation focuses on signals s
- itecsonline.com high confidence Agent benchmarks & evals
Agentic AI Governance Framework 2026 | Shadow AI Guide | ITECS
mem0.ai released the 'State of AI Agent Memory 2026', featuring a new benchmark that evaluates 10 different AI agent memory architectures across 21 integrations to determine the most effective patterns for production agent memory.
External links matched to this session via topic relevance. The KB does not endorse third-party content; verify before citing.