← All chaptersFrom Copilot to Colleague

Chapter 07 · 16 min read

Security, Identity, and High-Stakes Trust

Why delegated authority needs boundaries, audit trails, and real controls.

Experience in 3D
EVIDENCE OF SOURCE · CHAPTER 07
CHAPTER 07/3,694 words/Drafting

Chapter 7 — Security, Identity, and High-Stakes Trust

Once AI systems can act, trust stops being only a question of model quality. It becomes a question of bounded authority.

A helpful model can get away with being vague about power. An acting system cannot.

The moment an AI system can call tools, execute code, traverse accounts, or continue working after the user has stopped watching, security moves to the center of the product. The important question is no longer whether the model sounds smart. It is whether the system has a clear identity, narrow permissions, real execution boundaries, and enough evidence left behind that a human institution can understand what happened later.

That is why agent security is not mainly a prompt-safety problem. It is a delegated-authority problem. Traditional software security often focused on discrete requests. Agentic systems stretch the unit of risk across a workflow: interpretation, retrieval, tool use, retries, follow-up behavior, and approval boundaries. Trust has to cover the whole path.

The strongest support for this in the current corpus is not a single definitive study. It is a pattern of practitioner convergence. Security talks, identity talks, and high-stakes workflow talks keep arriving at the same practical lesson from different directions: once systems can act on behalf of users, the hard part is no longer only generating a plausible next step. It is controlling what powers were delegated, under what conditions, and with what record of use.

Identity is the substrate of trust

Figure 07.1/Authority boundary collapseCLICK TO ENLARGE

The most common engineering shortcut for connecting an agent to a real system is to give the agent a standing credential — a long-lived API key, a service-account token, a personal access token "borrowed" from the human operator. The shortcut works, in the sense that the system can act. It also dissolves the question the chapter is trying to take seriously.

A standing credential is not a delegation. It is the agent inheriting the entire authority of whoever's credential it borrowed, for as long as the credential remains valid, across whatever surface that credential touches. There is no scope to revoke. There is no expiry that maps to the actual task. There is no record that an agent was acting, rather than a human acting on the same key. The unit of trust the system has just granted is much larger than the unit of work it was asked to do.

Patrick Riley and Carlos Galan at Auth0 frame this directly. Their work on identity for AI agents starts from the position that "we authorize agents, servers" — making the agent and its capabilities first-class citizens in the identity provider, not invisible passengers on a human's credential. The shift looks technical but it is structural. Once the agent is a real principal in the identity layer, it can have its own scopes, its own lifetimes, its own audit footprint, and its own revocation path.

Jared Hanson at Keycard pushes the same argument from the developer side. The common habit of treating an agent like an API consumer — with a permanent key paired to a permanent identity — does not survive contact with workflow reality. Agents act on behalf of specific users, for specific tasks, with specific data. Hanson's case is that securing agents using OAuth is not just a protocol choice; it is the right shape for the delegation. The agent gets a token that is bounded to a session, a user, and a set of scopes. When the work is done, the authority ends.

Garrett Galow at WorkOS pushes the framing further. His talk on cross-app access for proposes the identity provider as a trust bridge between clients and servers — so that the credentials flowing through the agent can be obtained without repeated manual consent flows, and so that the IT organization keeps visibility into which third-party systems are reading which data. That is the structural answer to a question every team running agents at scale eventually faces: how do you let an agent move between tools without making each tool feel like an independent integration project?

The shared move across these three talks is to treat agent identity as a first-class engineering object — not a label, not a service-account workaround, but a principal in the identity system with bounded scope, bounded lifetime, and an audit footprint. An agent that is authenticated as a blurry extension of a human is not delegated. It is impersonating.

A concrete failure case

Figure 07.2/Scoped agent identityCLICK TO ENLARGE

Imagine a tax-preparation agent operating inside a professional workflow of the kind described by Joel Hron at Thomson Reuters. The system ingests source documents, extracts fields, maps them into a tax engine, checks validation errors, revisits documents to resolve missing information, and assembles a draft return. Nothing in that flow requires the model to be malicious for trouble to begin. It only has to be slightly wrong in a place where authority and evidence are easy to blur.

Suppose one document is ambiguous, a number is mapped to the wrong field, and the validation engine throws an error. The agent now has several possible powers: search for more supporting material, pull additional client records, reinterpret the rule, override a warning, or proceed with a draft that looks internally coherent. In a weak design, those powers can collapse into one another. The same system that is supposed to summarize evidence also has enough access to fetch more data, make a judgment call, and keep moving. From the outside, the workflow still looks smooth. The danger is not theatrical failure. The danger is an disappearing inside a competent-looking trajectory.

In high-stakes work, the risky move is often not one bad answer. It is a system quietly crossing from assistance into authorization.

Least privilege is different for agents than for users

Figure 07.3/Step-up OAuthCLICK TO ENLARGE

Least privilege is the oldest piece of security advice, and on its own it is not enough.

In classic SaaS software, a narrow scope usually constrains a bounded API action. In agentic systems, the same scope can be recombined across many steps. A permission that seems harmless in isolation can become more powerful when paired with retrieval, reasoning, persistence, and retries. So "only give the agent the minimum access it needs" is correct but incomplete. Teams also have to ask: minimum access for which stage of the workflow?

This is where Hanson's argument compounds with Galow's. A standing credential gives the agent maximum scope for the maximum duration. An OAuth-scoped token with cross-app access gives the agent a narrow scope for a narrow duration, mediated by an identity provider that can see the request and revoke it. The first option lets the agent do anything the human can do, on a schedule the agent chooses. The second option lets the agent do this specific thing, for this specific task, with an audit trail that survives the interaction.

The deeper is that identity and authority are joint design objects. Granular authority without a principal to attach it to is unenforceable. A principal without scope is just a louder user. The hard work is binding them: producing an agent-shaped identity with agent-shaped scopes that match the agent-shaped workflows the system actually runs.

Sandboxing is product infrastructure

Figure 07.4/Enterprise MCP has one shapeCLICK TO ENLARGE

The same principle appears in code-executing environments, with the stakes turned up.

Fouad Matin's security guidance for coding agents at OpenAI keeps returning to four concrete controls: sandboxing, network restriction, privilege boundaries, and human review. Treat them as the default-on baseline, not the hardening you add after an incident — a capable system will sometimes be confused, manipulated, or overly eager, and each of the four bounds a different failure. Sandboxing contains a bad command, network restriction stops exfiltration and unexpected callouts, privilege boundaries cap what the agent can reach even when it is wrong, and a human review gate catches the trajectory the other three let through. Real safety has to live in the runtime and permission system, not only in the model.

Harshil Agrawal at Cloudflare makes the same case from the platform side. His talk on sandboxing AI-generated code argues that the sandbox is not an afterthought added once the model misbehaves; it is the substrate the agent runs on. If the agent is going to execute code, that execution needs to be bounded by default — network restricted, filesystem scoped, time-limited, resource-capped. The bounds are the product. Without them, the system is not running code on behalf of a user. It is granting the model an open shell on infrastructure.

Apple Private Cloud Compute, profiled by Jmo at CONFSEC, sits at the high end of this spectrum. The architecture is built around the principle that even Apple cannot see what runs inside a user's session. The cryptographic boundary is structural. That model is overkill for most product surfaces, but it makes the point in the strongest possible form: when the cost of getting trust wrong is unbounded, the boundary has to be designed-in rather than enforced socially. The right amount of sandbox for an enterprise coding agent is less than Apple's. It is also not zero.

Sandbox, least privilege, and auditability belong in the same category as evals, , and durable runtimes: they are product infrastructure, not security overhead. A team that treats them as the security team's problem will discover the boundary by getting it wrong in production. A team that treats them as part of the runtime spec has a chance of getting them right before that.

Standardized protocols expand the attack surface

A common hope around the rise of the is that standardization will reduce the security problem by giving everyone a known interface to attack and defend. The actual pattern in the corpus is closer to the opposite. Easier interoperability means more tools can be exposed more quickly, which makes the questions of scope, mediation, review, and audit more urgent rather than less. The practical test: if adopting raises the number of capabilities your agents can reach faster than your team can answer "who can call this, with what scope, and where is it logged?", standardization has expanded your attack surface, not shrunk it.

Tun Shwe at Lenses puts the bottom line plainly: "Your insecure server won't survive production." The talk is essentially a tour of the ways servers ship with assumptions that survive demo conditions and fail under real load. Authentication treated as a configuration option. Tool descriptions trusted as input. Servers exposed publicly because internal routing was the harder problem. None of these are -specific bugs. They are the predictable consequences of pulling a wire protocol into a category — capability exposure — that has not had the security review the wire has.

Karan Sampath at Anthropic, working on enterprise rollouts, names the structural answer from the governance side. "The really important thing for security teams ... is they need to establish a root of trust," he says. The enterprise version of this problem is not just performance, and it is not even just authentication. It is whether the security team can inspect the capability surface, reason about its risks, and produce a defensible record of what was allowed and why. A protocol that makes capability exposure faster does not lower that bar; it raises it.

David Mytton at Arcjet builds the same argument from outside-the-perimeter. His talk on defending sites from AI bots reads as a parallel chapter to the discussion: as the ability to script automated interaction expands, the cost of identifying and bounding that interaction expands with it. The defender's surface is no smaller. It is larger, and the attacker now has standardized tools. It is tempting to assume the bot-defense problem and the -exposure problem are separate teams' work; both are the same question — distinguishing authorized automated callers from unauthorized ones — and standardization arms the attacker on both sides of the perimeter at once.

Protocol standardization expands the attack surface if governance lags. It is not an argument against standardization. It is an argument that standardization is a forcing function for governance work that has to happen in parallel, not afterwards.

Enterprise MCP rolls up to gateways and a root of trust

Once enterprise teams take the security problem seriously, the architecture they reach for converges on a recognizable shape — and the convergence is specific enough to use as a checklist. There is a gateway. There is a policy plane. There is a registry of blessed servers. There is a permissions model that knows about identities, tools, and approval requirements. There is an audit log that records what each agent invocation actually did. If you are standing up enterprise and any one of those five is missing, that gap is the likeliest place for the boundary to fail first. The shape looks a lot like the API-gateway and IAM patterns that mature enterprise SaaS settled into a decade earlier — applied to a new category of capability, which means the teams that already own those patterns are the ones to build the agent version, not a greenfield security project.

Sampath's enterprise work names the architectural conclusion directly. The root of trust is established at the platform, not at the individual tool. Servers are reviewed before they are allowed in the corporate environment. Tools are scoped to the principals that should be able to call them. Audit is built in at the gateway layer rather than tacked onto each integration. This is the enterprise equivalent of moving from "anyone can install any app" to "we have an internal app store with security review." Unglamorous, and exactly what lets the technology be used inside a regulated enterprise at all.

Sam Morrow's GitHub work shows the same shape at production scale. As part of scaling for GitHub's customers, the team filtered tools by PAT (personal access token) scopes — so an agent invoking a GitHub server only sees the tools its token actually authorizes — and used step-up OAuth to request additional privileges only when needed, rather than pre-granting them. The effect is that the agent's effective capability surface is dynamically bounded by the identity it is acting under and the task it is currently performing. Capability follows authority, not the other way around.

These two cases together make the architectural argument concrete. Enterprise adoption pushes toward gateways, blessed platforms, and a root of trust. Naming the pattern matters because it changes what teams build first. Instead of "ship one server and add another, and another," the platform shape becomes the first deliverable. Individual servers are then deployed against a structure that knows how to govern them.

Per-tool OAuth flows are a governance problem

A related issue surfaces in everyday agent operation, and it is one of the cases where what looks like a UX annoyance is actually a governance failure.

Most current and agent integrations require the user to authorize each tool separately. The agent wants to talk to email. Consent flow. The agent wants to talk to calendar. Consent flow. The agent wants to talk to the document store. Consent flow. To the operator, this is a paper-cut sequence of OAuth dialogs. To the security team, it is far worse: it is invisible. Each consent grant happens between the user and the third-party tool, mediated by an identity layer the enterprise may or may not control. The IT organization has no central record of what authority the user just delegated, to which tools, under what scopes, with what expiry.

Galow's cross-app access work at WorkOS is one of the cleanest framings of the structural answer. The identity provider becomes the bridge between clients and servers, so that credentials can be obtained without repeated manual consent flows and so that the enterprise has a single place to see — and revoke — what was delegated. That second part is the load-bearing one. A faster consent flow with no visibility is not progress. A consent flow that produces a visible, revocable audit trail at the identity layer is.

Hanson's OAuth argument fits into the same shape. The right pattern for agent credentials is short-lived, scoped tokens issued through a flow the identity provider can audit — not standing keys, not impersonation, not human credentials borrowed for agent work. Morrow's step-up OAuth on GitHub is the same pattern at the tool layer: the agent never holds more authority than it currently needs, and any escalation goes through a flow that produces a record.

Repeated per-tool OAuth flows are not just annoying; they are a governance and IT visibility problem. The fix is structural — push the trust bridge into the identity provider — and the architectural payoff is the same as in the gateway argument: the enterprise can answer the basic auditing questions ("who delegated what, to whom, when, and for how long?") because the system was built to capture them, not because someone reconstructed them after the fact.

Audit as part of the trust model

The chapter has so far drawn most of its evidence from enterprise and developer-tools contexts. The corpus contains one public-sector talk that is worth surfacing because it raises the constraints to a level that makes them clearer than the enterprise versions.

Mark Myshatyn at Los Alamos National Lab presented on government agents at one of the AI Engineer events. The talk reads, in places, like the rest of this chapter taken to a higher cost function. When the user is a federal employee, the data is classified, and the action might be regulated under specific statutes, the identity, authority, and audit questions stop being best practices and become legal requirements. The agent has to act under a real principal. The scope has to be documented. The audit trail has to survive review. The boundaries have to be enforceable, not aspirational. The talk is useful in not because every enterprise looks like a national lab, but because the national lab makes the constraints visible — and most of them turn out to be the same constraints other regulated industries are starting to discover for themselves.

The observability argument from the previous chapter returns here in a stricter form. Audit trails, trajectory views, approval logs, and replayable histories are not just debugging conveniences. They are part of the trust model. If a system drafts a return, assembles a legal research memo, or performs a sensitive code change, an institution needs to be able to answer basic questions afterward: who authorized this path, what evidence was consulted, what tools were used, what warnings appeared, and where a human judgment entered or failed to enter. Without those answers the institution cannot certify the work. Without certification, the work cannot ship.

This is also where has to stay honest. Conference talks are especially good at surfacing strong patterns, field reports, and architecture instincts. They are weaker as proof of universal effectiveness. So the responsible here is not that the industry has solved trustworthy delegation. It has not. The more grounded is that serious teams keep discovering the same constraints: narrow scopes, explicit identities, mediated tools, sandboxes, review points, and inspectable histories.

A machine colleague is not trustworthy because it sounds careful. It is trustworthy, if at all, because its power has shape.

The next chapter takes the chapter-7 frame and stress-tests it under realtime conditions, where the cost of getting bounded authority wrong becomes audible to the user inside a single conversation. But the move there only works because the chapter behind it has named the shape of power, the substrate of identity, and the architecture of audit. Those are the pieces. The next chapter is what happens when they have to ship under a 200-millisecond clock.

What to do with this

  • Audit your agents for standing credentials. Any long-lived API key, service-account token, or "borrowed" personal access token is the agent inheriting a human's full authority with no scope to revoke and no expiry that maps to the task. Replace them with short-lived, scoped OAuth tokens bound to a session, a user, and a set of scopes — the pattern Hanson at Keycard argues is the right shape for delegation, not just a protocol choice.
  • Make the agent a real principal in your identity provider, not a passenger on a human's credential. Auth0's Riley and Galan frame this as "we authorize agents, servers" — once the agent has its own scopes, lifetime, audit footprint, and revocation path, an agent authenticated as a blurry extension of a human stops being a delegation and starts being impersonation.
  • When scoping agent permissions, ask "minimum access for which stage of the workflow?" — not just "minimum access." A scope that is harmless in isolation can compound across retrieval, reasoning, persistence, and retries, so least privilege for agents has to be bound to the workflow stage, not granted once for the whole run.
  • Make the sandbox the default substrate for any code-executing agent: network restricted, filesystem scoped, time-limited, resource-capped (Agrawal/Cloudflare), plus Matin's four controls — sandboxing, network restriction, privilege boundaries, and human review. Without those bounds you are not running code on behalf of a user; you are granting the model an open shell on your infrastructure.
  • Before standing up enterprise , build the platform shape first, not one server at a time: a gateway, a policy plane, a registry of blessed servers reviewed before they enter the environment, a permissions model over identities and tools, and an audit log at the gateway layer. Sampath's rule is to establish the root of trust at the platform, not the individual tool — treat it like an internal app store with security review.
  • Filter tools by the caller's token scopes and use step-up OAuth to request extra privilege only when needed, the way Morrow's team did for GitHub's — so the agent only ever sees the tools its token authorizes and any escalation produces a record. And push per-tool consent through the identity provider as a trust bridge (Galow/WorkOS) so IT keeps one revocable, auditable record of who delegated what, to whom, when, and for how long.

13 claims · 50 source anchors

Evidence — Source Anchors

Chat is an insufficient control surface for long-running or high-stakes work

  • Chat is one-dimensional. It's a very low bandwidth interface,
    #3 — Jacob Lauritzen, Legoraconfidence: high
  • we're asking AI systems to now produce output and produce judgments and decisions
    #206 — Joel Hron, Thomson Reutersconfidence: high
  • handle state potentially over long periods of time. There needs to be human interaction for approvals
    #167 — Preeti Somal, Temporalconfidence: high

Evals are a control system, not just a test suite

  • improvement without measurement is limited and imprecise.
    #125 — Ido Pesok, Vercel v0confidence: high
  • We still want to build reliable scalable applications and that is still hard
    #184 — Samuel Colvin, Pydanticconfidence: high
  • eval to us it's actually the same problem from a from a systems perspective.
    #628 — Phil Hetzel, Braintrustconfidence: high
  • small CLI tool that we call eval tool
    #689 — Lawrence Jones, incident.ioconfidence: high
  • designed to allow agents to leverage our eval suite files.
    #689 — Lawrence Jones, incident.ioconfidence: high
  • classic benchmark maxing.
    #746 — Ara Khan, Clineconfidence: high
  • There are right ways to use them. There are wrong ways to use them.
    #746 — Ara Khan, Clineconfidence: high

Durable state and workflow semantics are trust features, not backend details

  • once we get into longer running workflows, that's where it really becomes a problem.
    #99 — Samuel Colvin, Pydanticconfidence: high
  • no one's going to trust your agent.
    #167 — Preeti Somal, Temporalconfidence: high
  • the workflow orchestration layer needs to be deterministic. So it can be rerun um in a in a uh deterministic fashion
    #44 — Peter Wielander, Vercelconfidence: high
  • where I've got some big production CI stack to go and run and deployment takes hours, being able to go and change variables in production or in staging very quickly
    #657 — Samuel Colvin, Pydanticconfidence: high
  • you'll be able to assemble agent teams that can complete tasks orders of magnitude harder than what you can complete with a single agent today.
    #653 — Luke Alvoeiro, Factoryconfidence: high
  • minding the gap around observability.
    #680 — Amy Boyd & Nitya Narasimhan, Microsoftconfidence: high

Human oversight works best as an architectural layer, not an afterthought

  • There needs to be human interaction for approvals or other reasons and of course they need to be able to be uh able to run in parallel for efficiency
    #167 — Preeti Somal, Temporalconfidence: high
  • dial these agency dials far up.
    #206 — Joel Hron, Thomson Reutersconfidence: high
  • maintaining a factory would require you to have an overview of the processes you want your coding agents to go through.
    #629 — Eric Zakariasson, Cursorconfidence: high

High-stakes systems tune agency instead of maximizing it

  • a binary thing but as a lever that you can dial
    #206 — Joel Hron, Thomson Reutersconfidence: high
  • agentic workflows we can plan and execute
    #201 — Yogendra Miraje, Factsetconfidence: high
  • send it to me for approval.
    #202 — Rita Kozlov, Cloudflareconfidence: high
  • credentials, payments, and checkout require determinism.
    #745 — Steve Kaliski, Stripeconfidence: high

The next failure frontier is context misassembly, not just hallucination

  • there's this third thing, which I think is like really new and no one is doing it yet, which is training things into weights.
    #48 — Jack Morrisconfidence: high
  • this is really useful if you're building anything related to some sort of internal deep research sort of API
    #47 — Ivan Leo, Manus AI / Meta Superintelligenceconfidence: high
  • you combine it with all your other signals. So now if you look at your ranking function
    #156 — David Karam, Pi Labsconfidence: high
  • it's hybrid search because you have multiple approaches, and then you can either boost them together. You could do reranking, which is becoming more and more popular.
    #172 — Philipp Krenn, Elasticconfidence: high

Identity is a first-class engineering object for agentic systems

  • we actually persist scopes we manage lifetimes of tokens um we do a lot of handling there
    #37 — Patrick Riley & Carlos Galan, Auth0confidence: high
  • we go get API keys that are typically longived and broadly scoped. We paste them into some configuration files and environment variables
    #150 — Jared Hanson, Keycard / Passport.jsconfidence: high
  • if you've used MCP at all extensively, you know that it means consent screens on top of consent screens on top of consent screens.
    #627 — Garrett Galow, WorkOSconfidence: high

Sandbox, least privilege, and auditability are product infrastructure, not security overhead

  • making sure that you're actually providing the correct level of sandboxing, whether it's uh containerization or it's using app level sandboxing,
    #152 — Fouad Matin, OpenAI (Codex, Agent Robustness)confidence: high
  • We have been sandboxing untrusted code for decades. Your browser does it right now. Every tab run in its own sandbox.
    #31 — Harshil Agrawal, Cloudflareconfidence: high
  • these are what they call enforceable guarantees, not just policies.
    #149 — Jmo, CONFSEC, on Apple Private Cloud Computeconfidence: high
  • we see it as the greatest opportunity and the greatest threat to national security,
    #86 — Mark Myshatyn, Los Alamos National Labconfidence: high
  • never compromise trust for convenience.
    #744 — Michael Hablich, Google (Chrome DevTools)confidence: high

Protocol standardization expands the attack surface if governance lags

  • there's no halfway house because you can't do a little bit of production. You're either behind the wall or you're standing out in the open.
    #32 — Tun Shwe, Lensesconfidence: high
  • The really important thing for security teams and enterprises that want to allow this to be decentralized is they need to establish a root of trust.
    #624 — Karan Sampath, Anthropicconfidence: high
  • something like operator just shows up as a Chrome browser and it's much more challenging to understand and detect
    #148 — David Mytton, Arcjetconfidence: high

Enterprise MCP adoption converges on gateways, blessed platforms, and a root of trust

  • we think that the goal for a secure this for any security team is to is to bless one platform.
    #624 — Karan Sampath, Anthropicconfidence: high
  • challenges we've faced building and scaling our remote server, how we've overcome them,
    #625 — Sam Morrow, GitHubconfidence: high
  • if we continue this pattern for hundreds or thousands of agents, we've got a pretty big security problem on our hand.
    #150 — Jared Hanson, Keycardconfidence: high

Per-tool OAuth flows are a governance and IT visibility problem, not just a UX annoyance

  • you have this like lasting access problem that it doesn't have any visibility over
    #627 — Garrett Galow, WorkOSconfidence: high
  • We know how to transition away from static secrets uh, to dynamic access using OOTH.
    #150 — Jared Hanson, Keycardconfidence: high
  • if you log into GitHub MCP with a PAT token that we just immediately filter the tools down by the scopes that the token has.
    #625 — Sam Morrow, GitHubconfidence: high

Once an AI system can act autonomously, bounding its authority becomes the price of deployment

  • we're asking AI systems to now produce output and produce judgments and decisions
    #206 — Joel Hron, Thomson Reutersconfidence: high
  • most primitives the magic happens when you combine these things together
    #138 — Sam Bhagwat, Mastra.aiconfidence: high

Agent commerce is a new infrastructure layer: agents transact on a human's behalf, shifting the stack from payment rails to delegated intent and verifiable authority

  • AI digitizes the participants and their interactions.
    #200 — Adam Behrens, New Generationconfidence: high
  • we go from low-level payment infrastructure to higher level intent infrastructure.
    #200 — Adam Behrens, New Generationconfidence: high
  • help agents interact with the economy starting with e-commerce
    #503 — Justin, Ionicconfidence: high
  • adapt to that new kind of buyer.
    #745 — Steve Kaliski, Stripeconfidence: high