| claims#1 | 01 | The important transition is from suggestion to delegated execution | strong | 3 | 3 |
| claims#2 | 01 | Chat is an insufficient control surface for long-running or high-stakes work | strong | 3 | 3 |
| claims#3 | 01 | Reliability comes less from model cleverness than from surrounding scaffolding | strong | 3 | 3 |
| claims#12 | 01 | Human oversight works best as an architectural layer, not an afterthought | strong | 3 | 3 |
| claims#40 | 02 | Cheap generation raises the value of taste and judgment rather than lowering it | strong | 3 | 3 |
| claims#41 | 02 | Vibe coding is an exploration mode that fails as a production default | strong | 5 | 5 |
| claims#42 | 02 | Problem framing and review become the scarce skills once execution is cheap | strong | 4 | 4 |
| claims#3 | 03 | Reliability comes less from model cleverness than from surrounding scaffolding | strong | 3 | 3 |
| claims#4 | 03 | Harness quality is a major determinant of coding-agent quality | strong | 4 | 4 |
| claims#5 | 03 | Specs are not paperwork; they are executable intent | strong | 2 | 2 |
| claims#6 | 03 | The practical unit of AI coding is the codebase, not the snippet | strong | 3 | 3 |
| claims#7 | 03 | Agent-ready codebases are designed, not discovered | moderate | 3 | 3 |
| claims#14 | 03 | The harness is evolving from a local loop into a staged software factory | strong | 7 | 6 |
| claims#17 | 03 | Harness quality now includes capability packaging, not only repo hygiene | strong | 4 | 4 |
| claims#24 | 03 | Coordination is the unsolved runtime primitive for multi-agent systems | moderate | 4 | 3 |
| claims#43 | 03 | Coding agents expose the gap between standards a team possesses and standards it can operationalize | strong | 2 | 2 |
| claims#44 | 03 | Subagent specialization makes process explicit and encodes team judgment into roles | moderate | 1 | 1 |
| claims#3 | 04 | Reliability comes less from model cleverness than from surrounding scaffolding | strong | 3 | 3 |
| claims#4 | 04 | Harness quality is a major determinant of coding-agent quality | strong | 4 | 4 |
| claims#5 | 04 | Specs are not paperwork; they are executable intent | strong | 2 | 2 |
| claims#6 | 04 | The practical unit of AI coding is the codebase, not the snippet | strong | 3 | 3 |
| claims#8 | 04 | Evals are a control system, not just a test suite | strong | 7 | 5 |
| claims#9 | 04 | Realistic evals must be grounded in natural tasks and operational history | strong | 3 | 3 |
| claims#19 | 04 | Evals are strongest when they are trace-linked and fed by production observability | strong | 6 | 5 |
| claims#37 | 04 | Activity-based metrics misread motion as progress in AI-augmented work | strong | 4 | 4 |
| claims#42 | 04 | Problem framing and review become the scarce skills once execution is cheap | strong | 4 | 4 |
| claims#45 | 04 | The best evals encode judgment mined from operational history, not invented in a clean room | strong | 2 | 2 |
| claims#9 | 05 | Realistic evals must be grounded in natural tasks and operational history | strong | 3 | 3 |
| claims#10 | 05 | Context failure is often a system-assembly problem, not simply a small-context-window problem | strong | 6 | 5 |
| claims#15 | 05 | The context gap increasingly includes capability packaging and progressive disclosure | strong | 4 | 4 |
| claims#17 | 05 | Harness quality now includes capability packaging, not only repo hygiene | strong | 4 | 4 |
| claims#18 | 05 | Context failure is often a capability-exposure problem, not only a retrieval problem | strong | 3 | 3 |
| claims#25 | 05 | Context engineering is a primary engineering discipline, not a prompt trick | strong | 4 | 4 |
| claims#26 | 05 | RAG, memory, and GraphRAG solve different jobs; collapsing them into one bucket misses the architecture | strong | 7 | 7 |
| claims#27 | 05 | Enterprise usefulness scales with working-set quality, not corpus size | strong | 4 | 4 |
| claims#28 | 05 | The next failure frontier is context misassembly, not just hallucination | strong | 4 | 4 |
| claims#1 | 06 | The important transition is from suggestion to delegated execution | strong | 3 | 3 |
| claims#2 | 06 | Chat is an insufficient control surface for long-running or high-stakes work | strong | 3 | 3 |
| claims#3 | 06 | Reliability comes less from model cleverness than from surrounding scaffolding | strong | 3 | 3 |
| claims#4 | 06 | Harness quality is a major determinant of coding-agent quality | strong | 4 | 4 |
| claims#5 | 06 | Specs are not paperwork; they are executable intent | strong | 2 | 2 |
| claims#8 | 06 | Evals are a control system, not just a test suite | strong | 7 | 5 |
| claims#9 | 06 | Realistic evals must be grounded in natural tasks and operational history | strong | 3 | 3 |
| claims#10 | 06 | Context failure is often a system-assembly problem, not simply a small-context-window problem | strong | 6 | 5 |
| claims#11 | 06 | Durable state and workflow semantics are trust features, not backend details | strong | 6 | 5 |
| claims#12 | 06 | Human oversight works best as an architectural layer, not an afterthought | strong | 3 | 3 |
| claims#13 | 06 | High-stakes systems tune agency instead of maximizing it | strong | 4 | 4 |
| claims#14 | 06 | The harness is evolving from a local loop into a staged software factory | strong | 7 | 6 |
| claims#15 | 06 | The context gap increasingly includes capability packaging and progressive disclosure | strong | 4 | 4 |
| claims#16 | 06 | AI-native advantage depends on organizational coherence, not output volume alone | moderate | 3 | 3 |
| claims#17 | 06 | Harness quality now includes capability packaging, not only repo hygiene | strong | 4 | 4 |
| claims#18 | 06 | Context failure is often a capability-exposure problem, not only a retrieval problem | strong | 3 | 3 |
| claims#19 | 06 | Evals are strongest when they are trace-linked and fed by production observability | strong | 6 | 5 |
| claims#24 | 06 | Coordination is the unsolved runtime primitive for multi-agent systems | moderate | 4 | 3 |
| claims#25 | 06 | Context engineering is a primary engineering discipline, not a prompt trick | strong | 4 | 4 |
| claims#26 | 06 | RAG, memory, and GraphRAG solve different jobs; collapsing them into one bucket misses the architecture | strong | 7 | 7 |
| claims#46 | 06 | Once an AI system can act autonomously, bounding its authority becomes the price of deployment | strong | 2 | 2 |
| claims#2 | 07 | Chat is an insufficient control surface for long-running or high-stakes work | strong | 3 | 3 |
| claims#8 | 07 | Evals are a control system, not just a test suite | strong | 7 | 5 |
| claims#11 | 07 | Durable state and workflow semantics are trust features, not backend details | strong | 6 | 5 |
| claims#12 | 07 | Human oversight works best as an architectural layer, not an afterthought | strong | 3 | 3 |
| claims#13 | 07 | High-stakes systems tune agency instead of maximizing it | strong | 4 | 4 |
| claims#28 | 07 | The next failure frontier is context misassembly, not just hallucination | strong | 4 | 4 |
| claims#30 | 07 | Identity is a first-class engineering object for agentic systems | strong | 3 | 3 |
| claims#31 | 07 | Sandbox, least privilege, and auditability are product infrastructure, not security overhead | strong | 5 | 5 |
| claims#32 | 07 | Protocol standardization expands the attack surface if governance lags | strong | 3 | 3 |
| claims#33 | 07 | Enterprise MCP adoption converges on gateways, blessed platforms, and a root of trust | strong | 3 | 3 |
| claims#34 | 07 | Per-tool OAuth flows are a governance and IT visibility problem, not just a UX annoyance | strong | 3 | 3 |
| claims#46 | 07 | Once an AI system can act autonomously, bounding its authority becomes the price of deployment | strong | 2 | 2 |
| claims#50 | 07 | Agent commerce is a new infrastructure layer: agents transact on a human's behalf, shifting the stack from payment rails to delegated intent and verifiable authority | moderate | 4 | 3 |
| claims#11 | 08 | Durable state and workflow semantics are trust features, not backend details | strong | 6 | 5 |
| claims#13 | 08 | High-stakes systems tune agency instead of maximizing it | strong | 4 | 4 |
| claims#20 | 08 | Realtime AI quality is primarily a coordination and latency-engineering problem, not a model-quality problem | strong | 6 | 4 |
| claims#21 | 08 | Voice is best added as a realtime wrapper around a chat agent, not as a rebuild | moderate | 3 | 2 |
| claims#22 | 08 | Half-duplex is the silent architectural ceiling on natural voice conversation | moderate | 2 | 1 |
| claims#23 | 08 | TTS architecture is converging on LLM architecture | moderate | 3 | 2 |
| claims#29 | 08 | Latency masking belongs in the same architectural category as evals, harnesses, and durable runtimes | strong | 4 | 4 |
| claims#1 | 09 | The important transition is from suggestion to delegated execution | strong | 3 | 3 |
| claims#4 | 09 | Harness quality is a major determinant of coding-agent quality | strong | 4 | 4 |
| claims#7 | 09 | Agent-ready codebases are designed, not discovered | moderate | 3 | 3 |
| claims#8 | 09 | Evals are a control system, not just a test suite | strong | 7 | 5 |
| claims#12 | 09 | Human oversight works best as an architectural layer, not an afterthought | strong | 3 | 3 |
| claims#14 | 09 | The harness is evolving from a local loop into a staged software factory | strong | 7 | 6 |
| claims#15 | 09 | The context gap increasingly includes capability packaging and progressive disclosure | strong | 4 | 4 |
| claims#16 | 09 | AI-native advantage depends on organizational coherence, not output volume alone | moderate | 3 | 3 |
| claims#24 | 09 | Coordination is the unsolved runtime primitive for multi-agent systems | moderate | 4 | 3 |
| claims#25 | 09 | Context engineering is a primary engineering discipline, not a prompt trick | strong | 4 | 4 |
| claims#27 | 09 | Enterprise usefulness scales with working-set quality, not corpus size | strong | 4 | 4 |
| claims#33 | 09 | Enterprise MCP adoption converges on gateways, blessed platforms, and a root of trust | strong | 3 | 3 |
| claims#35 | 09 | AI-native advantage is an operating-model redesign, not a procurement decision | strong | 4 | 4 |
| claims#36 | 09 | Broader creation requires tighter review and governance — they rise together or the first becomes a liability | strong | 5 | 5 |
| claims#37 | 09 | Activity-based metrics misread motion as progress in AI-augmented work | strong | 4 | 4 |
| claims#38 | 09 | Review capacity is the throughput limit of an AI-native organization | strong | 3 | 3 |
| claims#39 | 09 | Alignment debt is the AI-native equivalent of technical debt | strong | 3 | 3 |
| claims#40 | 09 | Cheap generation raises the value of taste and judgment rather than lowering it | strong | 3 | 3 |
| claims#1 | 10 | The important transition is from suggestion to delegated execution | strong | 3 | 3 |
| claims#3 | 10 | Reliability comes less from model cleverness than from surrounding scaffolding | strong | 3 | 3 |
| claims#5 | 10 | Specs are not paperwork; they are executable intent | strong | 2 | 2 |
| claims#10 | 10 | Context failure is often a system-assembly problem, not simply a small-context-window problem | strong | 6 | 5 |
| claims#12 | 10 | Human oversight works best as an architectural layer, not an afterthought | strong | 3 | 3 |
| claims#13 | 10 | High-stakes systems tune agency instead of maximizing it | strong | 4 | 4 |
| claims#16 | 10 | AI-native advantage depends on organizational coherence, not output volume alone | moderate | 3 | 3 |
| claims#20 | 10 | Realtime AI quality is primarily a coordination and latency-engineering problem, not a model-quality problem | strong | 6 | 4 |
| claims#23 | 10 | TTS architecture is converging on LLM architecture | moderate | 3 | 2 |
| claims#40 | 10 | Cheap generation raises the value of taste and judgment rather than lowering it | strong | 3 | 3 |