The AI-Native Team Workshop

A facilitated session for an engineering leadership team to locate itself honestly on the path to an AI-native operating model — and leave with the one structural decision worth making next. Print this page (it formats cleanly) and run it as-is.

Take the assessment first →

00 · Warm-up (10 min)

Go around the room. Each person finishes the sentence: “We use AI, but we are not yet AI-native, because…” Capture the answers where everyone can see them. The patterns in this list are your real agenda.

01 · The five dimensions (10 min)

AI-native is a destination, not a toggle. Progress shows up across five dimensions. The facilitator reads each aloud; the team is only orienting here, not scoring yet.

01Delegation Depth. How far the organization has moved from inline suggestion to genuinely delegated execution.
02Eval Maturity. Whether output quality is governed by measurement loops rather than spot-checks and vibes.
03Context Infrastructure. Whether retrieval, memory, and institutional knowledge are treated as first-class infrastructure.
04Runtime Governance. Durable runtimes, scoped authority, audit trails, and a human control plane tuned to task stakes.
05Org Design. Whether the operating model itself — review, alignment, ownership — has been redesigned around cheap execution.

02 · Dimension by dimension (~10 min each)

For each dimension: a quick gut-check (early / developing / advanced), then the three prompts. The goal is not consensus on a number — it is surfacing the one honest gap.

Delegation Depth

1.Name one workflow where we still assume work moves at human speed. What would it take to close it?
2.Where do we use AI as inline autocomplete when we could be delegating a whole unit of work?
3.Who outside engineering could start real work with agents if we let them?

Source: Ch 1 — The Shift: From Assistant to Delegate →

Eval Maturity

1.What do our dashboards actually count — activity (PRs, lines) or outcomes (rework, unreverted ships)?
2.If an agent silently got worse this month, how would we find out, and how long would it take?
3.Which one high-stakes output deserves an automated eval on the path to production first?

Source: Ch 4 — Evals Are the Control System →

Context Infrastructure

1.What institutional knowledge still lives only in senior people’s heads?
2.Are we improving retrieval quality, or just giving agents more raw data?
3.Could we trace what context produced a given agent output if we had to?

Source: Ch 5 — Context Is Infrastructure →

Runtime Governance

1.Do our agents run with scoped, time-limited credentials — or standing human-inherited access?
2.For our riskiest agent action, where exactly does a human re-enter, and is that calibrated to the stakes?
3.What is the lightest governance that would earn us the trust to move this pilot to production?

Source: Ch 6–7 — Runtimes & High-Stakes Trust →

Org Design

1.Is review capacity resourced as the throughput limit of the org, or treated as a QA afterthought?
2.Where do two people’s agent work first meet — in a shared plan, or at the merge?
3.Who owns the judgment layer: deciding which of the many cheap artifacts is worth shipping?

Source: Ch 9 — The AI-Native Organization →

03 · The one decision (15 min)

Phase transitions are governance events, not capability events — an org moves up a level because someone made a structural decision, not because the models got better. Pick the single weakest dimension and name one structural change you will own this quarter: a path hardened, a convention written down, a review system built, a credential scope designed. Assign it. That is the workshop’s only deliverable.

Want a per-person profile to anchor the discussion? Have everyone take the 10-minute assessment before the session.