← CatalogueFrom Copilot to Colleague · EvidenceGraph →

Reference · generated 2026-06-11

Evidence & metrics

Single source of truth for what this book is built on — corpus counts from stats.json (mirrors repo STATS.md), the claims ledger from evidence.json, and MASH judge rollups. Cite a claim as claims#N and open it in the evidence graph for timestamp anchors.

Corpus & pipeline metrics

Source corpus

Videos ingested (AI Engineer YT)757

Synthesis layer

Themes10
People profiled50
Concepts (internal)19
Concepts (public-safe)2
Chapter drafts10

Evidence layer

Claims in ledger47
Strong support39
Moderate support8
Tentative support0
Source anchors177
High-confidence anchors177

Manuscript

Total chapters10
Drafting10
Starter0
Outlined0

Diagrams

Overview4
Chapter openers10
Concepts18
Inline figures38
Maps3
Total diagrams73

Method

Research passes18
Agent programs7
Bounded tasks7
Trackable artefacts (total)966
99
Practitioners cited
100
Source videos in graph
47
Ledger claims
177
Timestamp anchors

Latest MASH judge run

Full scorecards →

2026-06-11-094038-1cef · 2026-06-11 · partial run

75
Humanness
81
Voice
42
Usefulness
90
Evidence
75
Defensibility
72
Non-redundancy

Claims ledger

Every book claim with support level and anchor count. Filter by chapter or support band; click a row to inspect sources in the graph.

105 of 105 claims
ClaimChStatementSupportAnchorsSpeakers
claims#101The important transition is from suggestion to delegated executionstrong33
claims#201Chat is an insufficient control surface for long-running or high-stakes workstrong33
claims#301Reliability comes less from model cleverness than from surrounding scaffoldingstrong33
claims#1201Human oversight works best as an architectural layer, not an afterthoughtstrong33
claims#4002Cheap generation raises the value of taste and judgment rather than lowering itstrong33
claims#4102Vibe coding is an exploration mode that fails as a production defaultstrong55
claims#4202Problem framing and review become the scarce skills once execution is cheapstrong44
claims#303Reliability comes less from model cleverness than from surrounding scaffoldingstrong33
claims#403Harness quality is a major determinant of coding-agent qualitystrong44
claims#503Specs are not paperwork; they are executable intentstrong22
claims#603The practical unit of AI coding is the codebase, not the snippetstrong33
claims#703Agent-ready codebases are designed, not discoveredmoderate33
claims#1403The harness is evolving from a local loop into a staged software factorystrong76
claims#1703Harness quality now includes capability packaging, not only repo hygienestrong44
claims#2403Coordination is the unsolved runtime primitive for multi-agent systemsmoderate43
claims#4303Coding agents expose the gap between standards a team possesses and standards it can operationalizestrong22
claims#4403Subagent specialization makes process explicit and encodes team judgment into rolesmoderate11
claims#304Reliability comes less from model cleverness than from surrounding scaffoldingstrong33
claims#404Harness quality is a major determinant of coding-agent qualitystrong44
claims#504Specs are not paperwork; they are executable intentstrong22
claims#604The practical unit of AI coding is the codebase, not the snippetstrong33
claims#804Evals are a control system, not just a test suitestrong75
claims#904Realistic evals must be grounded in natural tasks and operational historystrong33
claims#1904Evals are strongest when they are trace-linked and fed by production observabilitystrong65
claims#3704Activity-based metrics misread motion as progress in AI-augmented workstrong44
claims#4204Problem framing and review become the scarce skills once execution is cheapstrong44
claims#4504The best evals encode judgment mined from operational history, not invented in a clean roomstrong22
claims#905Realistic evals must be grounded in natural tasks and operational historystrong33
claims#1005Context failure is often a system-assembly problem, not simply a small-context-window problemstrong65
claims#1505The context gap increasingly includes capability packaging and progressive disclosurestrong44
claims#1705Harness quality now includes capability packaging, not only repo hygienestrong44
claims#1805Context failure is often a capability-exposure problem, not only a retrieval problemstrong33
claims#2505Context engineering is a primary engineering discipline, not a prompt trickstrong44
claims#2605RAG, memory, and GraphRAG solve different jobs; collapsing them into one bucket misses the architecturestrong77
claims#2705Enterprise usefulness scales with working-set quality, not corpus sizestrong44
claims#2805The next failure frontier is context misassembly, not just hallucinationstrong44
claims#106The important transition is from suggestion to delegated executionstrong33
claims#206Chat is an insufficient control surface for long-running or high-stakes workstrong33
claims#306Reliability comes less from model cleverness than from surrounding scaffoldingstrong33
claims#406Harness quality is a major determinant of coding-agent qualitystrong44
claims#506Specs are not paperwork; they are executable intentstrong22
claims#806Evals are a control system, not just a test suitestrong75
claims#906Realistic evals must be grounded in natural tasks and operational historystrong33
claims#1006Context failure is often a system-assembly problem, not simply a small-context-window problemstrong65
claims#1106Durable state and workflow semantics are trust features, not backend detailsstrong65
claims#1206Human oversight works best as an architectural layer, not an afterthoughtstrong33
claims#1306High-stakes systems tune agency instead of maximizing itstrong44
claims#1406The harness is evolving from a local loop into a staged software factorystrong76
claims#1506The context gap increasingly includes capability packaging and progressive disclosurestrong44
claims#1606AI-native advantage depends on organizational coherence, not output volume alonemoderate33
claims#1706Harness quality now includes capability packaging, not only repo hygienestrong44
claims#1806Context failure is often a capability-exposure problem, not only a retrieval problemstrong33
claims#1906Evals are strongest when they are trace-linked and fed by production observabilitystrong65
claims#2406Coordination is the unsolved runtime primitive for multi-agent systemsmoderate43
claims#2506Context engineering is a primary engineering discipline, not a prompt trickstrong44
claims#2606RAG, memory, and GraphRAG solve different jobs; collapsing them into one bucket misses the architecturestrong77
claims#4606Once an AI system can act autonomously, bounding its authority becomes the price of deploymentstrong22
claims#207Chat is an insufficient control surface for long-running or high-stakes workstrong33
claims#807Evals are a control system, not just a test suitestrong75
claims#1107Durable state and workflow semantics are trust features, not backend detailsstrong65
claims#1207Human oversight works best as an architectural layer, not an afterthoughtstrong33
claims#1307High-stakes systems tune agency instead of maximizing itstrong44
claims#2807The next failure frontier is context misassembly, not just hallucinationstrong44
claims#3007Identity is a first-class engineering object for agentic systemsstrong33
claims#3107Sandbox, least privilege, and auditability are product infrastructure, not security overheadstrong55
claims#3207Protocol standardization expands the attack surface if governance lagsstrong33
claims#3307Enterprise MCP adoption converges on gateways, blessed platforms, and a root of truststrong33
claims#3407Per-tool OAuth flows are a governance and IT visibility problem, not just a UX annoyancestrong33
claims#4607Once an AI system can act autonomously, bounding its authority becomes the price of deploymentstrong22
claims#5007Agent commerce is a new infrastructure layer: agents transact on a human's behalf, shifting the stack from payment rails to delegated intent and verifiable authoritymoderate43
claims#1108Durable state and workflow semantics are trust features, not backend detailsstrong65
claims#1308High-stakes systems tune agency instead of maximizing itstrong44
claims#2008Realtime AI quality is primarily a coordination and latency-engineering problem, not a model-quality problemstrong64
claims#2108Voice is best added as a realtime wrapper around a chat agent, not as a rebuildmoderate32
claims#2208Half-duplex is the silent architectural ceiling on natural voice conversationmoderate21
claims#2308TTS architecture is converging on LLM architecturemoderate32
claims#2908Latency masking belongs in the same architectural category as evals, harnesses, and durable runtimesstrong44
claims#109The important transition is from suggestion to delegated executionstrong33
claims#409Harness quality is a major determinant of coding-agent qualitystrong44
claims#709Agent-ready codebases are designed, not discoveredmoderate33
claims#809Evals are a control system, not just a test suitestrong75
claims#1209Human oversight works best as an architectural layer, not an afterthoughtstrong33
claims#1409The harness is evolving from a local loop into a staged software factorystrong76
claims#1509The context gap increasingly includes capability packaging and progressive disclosurestrong44
claims#1609AI-native advantage depends on organizational coherence, not output volume alonemoderate33
claims#2409Coordination is the unsolved runtime primitive for multi-agent systemsmoderate43
claims#2509Context engineering is a primary engineering discipline, not a prompt trickstrong44
claims#2709Enterprise usefulness scales with working-set quality, not corpus sizestrong44
claims#3309Enterprise MCP adoption converges on gateways, blessed platforms, and a root of truststrong33
claims#3509AI-native advantage is an operating-model redesign, not a procurement decisionstrong44
claims#3609Broader creation requires tighter review and governance — they rise together or the first becomes a liabilitystrong55
claims#3709Activity-based metrics misread motion as progress in AI-augmented workstrong44
claims#3809Review capacity is the throughput limit of an AI-native organizationstrong33
claims#3909Alignment debt is the AI-native equivalent of technical debtstrong33
claims#4009Cheap generation raises the value of taste and judgment rather than lowering itstrong33
claims#110The important transition is from suggestion to delegated executionstrong33
claims#310Reliability comes less from model cleverness than from surrounding scaffoldingstrong33
claims#510Specs are not paperwork; they are executable intentstrong22
claims#1010Context failure is often a system-assembly problem, not simply a small-context-window problemstrong65
claims#1210Human oversight works best as an architectural layer, not an afterthoughtstrong33
claims#1310High-stakes systems tune agency instead of maximizing itstrong44
claims#1610AI-native advantage depends on organizational coherence, not output volume alonemoderate33
claims#2010Realtime AI quality is primarily a coordination and latency-engineering problem, not a model-quality problemstrong64
claims#2310TTS architecture is converging on LLM architecturemoderate32
claims#4010Cheap generation raises the value of taste and judgment rather than lowering itstrong33

Stats regenerate via 99_Meta/scripts/build_stats.py; evidence.json via 99_Meta/scripts/anchor/build_evidence.py from the Claims Ledger. Last stats build: 2026-06-11.