CISO Field Guide — 2026

AI Agent Security: Risks, Controls & the CISO Checklist

Q: What is an AI agent security checklist, and how is it different from an LLM security checklist?

An AI agent security checklist covers autonomous, tool-using, stateful systems -- not just a model that generates text. It adds controls absent from an LLM checklist: agent inventory and discovery, non-human identity and least-privilege tool scoping, runtime sandboxing and kill-switches, multi-agent communication security, and tamper-evident action logging. The agentic risks (OWASP T1-T15 and ASI01-ASI10) are explicitly framed as extensions of the OWASP LLM Top 10 into autonomous settings, so an agent checklist contains an LLM checklist and goes further.

Q: What is the single most important control for AI agent security?

Least privilege applied to the agent identity and tools -- directly countering OWASP Excessive Agency (LLM06:2025) and the CISA Privilege risks category. Because no fully reliable defense against prompt injection exists, you must assume injection succeeds; the durable mitigation is ensuring a compromised agent simply cannot perform high-impact actions or reach external endpoints. Default-deny tool access, short-lived scoped credentials, and human-in-the-loop on irreversible actions are the highest-leverage items.

Q: Which standards should we align our agent governance to?

Four authoritative bodies: OWASP (Top 10 for LLM Applications 2025, Top 10 for Agentic Applications 2026, and the NHI Top 10 2025); NIST (AI RMF 1.0 plus the Generative AI Profile AI 600-1 and the emerging agentic profile NIST AI 100-5 / CSA draft); CISA and Five Eyes (the four joint Cybersecurity Information Sheets, culminating in Careful Adoption of Agentic AI Services, 1 May 2026); and MITRE ATLAS for threat-informed defense. Use the NIST four functions (Govern, Map, Measure, Manage) as the spine and map specific controls to OWASP and ATLAS.

Q: Is prompt injection a solved problem in 2026?

No. As of 2026 there is no fully reliable defense against prompt injection -- classifiers like Microsoft XPIA are demonstrably bypassable (EchoLeak chained four bypasses, including XPIA evasion). Treat filtering as one layer among many. The durable controls are privilege containment and content provenance, not detection alone.

Q: What is the EU AI Act logging requirement for agents, and when does it apply?

Article 12 requires high-risk AI systems to automatically record events over the system lifetime; Article 26(6) requires deployers to retain those logs for a minimum of six months. Article 12 does not specify how logs resist tampering -- cryptographic tamper-evidence (hash chains, Merkle trees, WORM storage) is your own design choice to satisfy forensic and SOC 2 / ISO 27001 needs. High-risk obligations were originally set for 2 August 2026; a reported 7 May 2026 political agreement moves Annex III systems to 2 December 2027 -- confirm the final enacted dates. Penalties reach up to 15 million euros or 3% of global turnover.

Q: How does air-gapped or on-prem deployment reduce agent risk?

It removes the egress channel. Real exfiltration exploits like EchoLeak and ShadowLeak depend on the agent reaching an external endpoint (image auto-fetch, SSRF, external tool callouts). With no internet, no outbound connections, no DNS, and no telemetry callbacks, those channels become architecturally impossible -- shrinking the blast radius. It also eliminates entire boundary-defense control categories for FedRAMP High and DoD IL4-IL5 and satisfies ITAR and data-residency requirements. It does not prevent injection itself or local corpus poisoning, so pair it with layered controls.

Q: What makes multi-agent systems harder to secure than single agents?

Multi-agent security is non-compositional: individually safe agents can compose into an unsafe system because trust does not aggregate predictably across agent-to-agent calls. New attack surfaces appear -- Agent Card spoofing and impersonation in A2A, tool poisoning and rug pulls in MCP, cascading failures across shared memory, and steganographic secret collusion that is undetectable even under full observability. Mitigate with mutual TLS, signed and verified agent identities, audience-scoped tokens, circuit breakers, and Plan-then-Execute separation.

Q: What is a shadow agent and why is it a Critical finding?

A shadow agent is any agent with no registry entry, no assigned owner, OR no managed identity -- Microsoft Agent 365 rates this Critical. Shadow agents are unmonitored, often inherit broad employee credentials, and lack audit trails, making them the agentic equivalent of shadow IT. Industry data shows 79% of organizations lack visibility into their agents and 47% of enterprise AI use happens through personal accounts outside SSO -- which is why continuous discovery and a managed agent registry are the foundational control.

Q: How do we operationalize all of this without stalling agent adoption?

Start with the CISA immediate actions: inventory all agents (including shadow), run blast-radius assessments, audit service accounts, replace standing credentials with just-in-time provisioning, and extend logging to agent actions. Then govern autonomy on a dial -- classify each agent by tier and grant the lowest tier that works, promoting deliberately rather than by default. This avoids the quiet drift toward excessive agency that turns prototypes into production liabilities, and addresses Gartner warning that 40%+ of agentic projects will be canceled by 2027 for inadequate risk controls.

The 2026 CISO playbook for securing autonomous AI agents: the full OWASP / NIST / CISA risk taxonomy, least-privilege and identity controls, EU AI Act audit requirements, and a seven-domain, copy-ready governance checklist that names both the control and the authoritative source.

Agentic Risk

OWASP Agentic Top 10

Least Privilege

Air-Gapped

Audit-Ready

40%Enterprise apps with AI agents by 2026

65%Orgs hit by an agent incident in past year

CVSS 9.3EchoLeak zero-click Copilot exploit

109:1Machine identities per human (2026)

What is AI agent security?

AI agent security is the practice of protecting autonomous, tool-using AI agents -- systems that plan, reason, and take actions on your behalf -- from being hijacked, over-privileged, or manipulated. It extends traditional LLM security with controls the OWASP Agentic Top 10, NIST AI RMF, and CISA/Five Eyes guidance call for: agent inventory, least-privilege tool scoping, runtime sandboxing, tamper-evident logging, and human approval for irreversible actions. Because no defense fully stops prompt injection, the goal is containment -- a compromised agent must not be able to do disproportionate damage.

1. Why AI Agents Break Your Existing Threat Model

An AI agent security checklist is not a chatbot policy with extra steps. CISA 2026 Five Eyes guidance defines agentic AI as systems composed of one or more agents that fundamentally rely on an AI model, such as an LLM, to interpret and reason about the state of the world and can autonomously make decisions and take actions. Three properties in that definition -- autonomy, statefulness, and tool access -- invalidate assumptions that traditional application security takes for granted.

First, the instruction/data boundary collapses. A large language model processes instructions and data in the same channel, with no enforced separation -- unlike a SQL database, where parameterized queries cleanly separate code from input. OWASP makes this the root cause of its #1 risk, Prompt Injection (LLM01:2025). When that model is wired to tools, any text it ingests -- a retrieved document, an email, a webpage, another agent message -- becomes a candidate instruction. The agent trust boundary silently expands to include every byte of untrusted content it reads.

Second, the agent acts under real privilege. A vulnerable web form returns data to a user; a vulnerable agent can delete a file, send a wire, modify an IAM policy, or query a production database -- because you gave it those tools to be useful. OWASP frames this as Excessive Agency (LLM06:2025): damaging actions can be performed in response to unexpected, ambiguous, or manipulated outputs from an LLM. The exploit is no longer leak the response. It is perform the attacker action at your privilege level.

Third, state and autonomy decouple the attack from its trigger. Agents persist memory across sessions and chain tool calls in loops. A poisoned memory entry planted today can execute against an unrelated query next week, and the agent cannot distinguish learned context from planted content. Session isolation does not help, because the attack exploits persistent cross-session state.

The Architect Takeaway

You can no longer treat the model as trusted and the perimeter as the control. Assume injection will succeed, and design so that a compromised agent cannot do disproportionate damage. That principle -- containment over prevention -- runs through every section below.

For the organizational program that wraps these technical controls, see our AI governance framework and the broader AI for CISOs security guide. New to the paradigm? The agentic AI hub covers the reference architecture and frameworks behind these agents.

2. How Fast the Gap Is Opening: Adoption vs. Controls

Security maturity is not keeping pace with deployment. Gartner projects that 40% of enterprise applications will feature task-specific AI agents by the end of 2026, up from less than 5% in 2025, and that 33% of enterprise software will embed agentic AI by 2028 (from under 1% in 2024). Yet only 17% of organizations have actually deployed agents so far, while 60%+ expect to within two years -- and Gartner warns that more than 40% of agentic AI projects will be canceled by the end of 2027 due to cost, unclear value, or inadequate risk controls. Governance is now a make-or-break variable, not a compliance afterthought.

Metric	Figure	Source
Enterprise apps with task-specific AI agents (2025 to 2026)	<5% to 40%	Gartner
Enterprise software embedding agentic AI by 2028	33% (from <1% in 2024)	Gartner
Organizations that have deployed AI agents (2026)	17% (60%+ plan within 2 yrs)	Gartner
Agentic AI projects canceled by end of 2027	40%+	Gartner
Orgs with an AI-agent security incident in past year	65% (all with business impact)	Zenity / CSA
Orgs with full security approval for all AI agents	14.4%	Acuvity
Orgs reporting shadow AI use	98%	Acuvity / CSA
Orgs with no visibility into AI data flows	86%	Industry surveys
Agent skills audited containing serious vulnerabilities	41.7% of 2,890+	MITRE / secondary

Several figures originate from vendor-sponsored surveys with differing methodologies; treat them as directional, and cross-reference Gartner analyst forecasts where precision matters.

The shadow-AI dimension is covered in depth in Shadow AI risks.

3. The Agentic Risk Taxonomy

Most confusion in this field comes from blending two separate OWASP documents. Both are products of the OWASP Gen AI Security Project. Agentic AI -- Threats and Mitigations v1.0 (Feb 2025) is the foundational taxonomy of 15 named threats, T1-T15. The OWASP Top 10 for Agentic Applications 2026 (Dec 9, 2025) is the ranked, incident-grounded Top 10, coded ASI01-ASI10.

Do Not Conflate the Two

The T-codes (T1-T15) belong only to the Feb 2025 taxonomy. The ASI-codes (ASI01-ASI10) belong only to the 2026 Top 10. Do not present them as one list. The ASI-to-T mappings below are analytical best-fit correspondences; verify exact wording against the official OWASP PDFs before quoting.

3.1 OWASP Agentic Threats & Mitigations v1.0 (T1-T15)

The cross-cutting theme: the root cause of reasoning attacks (T6) is the lack of separation between data and instructions. The agentic threats are explicitly framed as extensions of the OWASP LLM Top 10 into autonomous, stateful, multi-agent settings.

Code	Threat Name	Definition (condensed)	Related LLM Top 10 2025
T1	Memory Poisoning	Exploits short- and long-term memory to inject malicious/false data; alters decisions and enables unauthorized operations.	LLM04; LLM08
T2	Tool Misuse	Manipulates agents to abuse integrated tools via deceptive prompts while staying within authorized permissions; includes Agent Hijacking.	LLM06
T3	Privilege Compromise	Exploits mismanaged roles, overly permissive configs, or dynamic role inheritance to escalate privileges.	LLM06
T4	Resource Overload	Deliberately exhausts compute/memory/service capacity; amplified by agents self-triggering and spawning tasks.	LLM10
T5	Cascading Hallucination Attacks	Plausible-but-false info propagates and amplifies via self-reinforcement and inter-agent loops.	LLM09
T6	Intent Breaking & Goal Manipulation	Exploits lack of separation between data and instructions to alter planning, reasoning, and self-evaluation.	LLM01
T7	Misaligned & Deceptive Behaviors	Agents execute harmful/disallowed actions, using deceptive reasoning to appear compliant.	—
T8	Repudiation & Untraceability	Agent actions cannot be traced or accounted for due to insufficient logging/transparency.	—
T9	Identity Spoofing & Impersonation	Exploits authentication to impersonate agents or users and act under false identities.	—
T10	Overwhelming Human-in-the-Loop	Exploits human cognitive limits or floods oversight/validation frameworks.	—
T11	Unexpected RCE & Code Attacks	Exploits AI-generated code execution to inject malicious code via function-calling/tools.	LLM01; LLM05
T12	Agent Communication Poisoning	Manipulates inter-agent channels to spread false info or influence decisions.	LLM04
T13	Rogue Agents in Multi-Agent Systems	Compromised agents operate outside monitoring boundaries, executing unauthorized actions or exfiltrating data.	—
T14	Human Attacks on Multi-Agent Systems	Adversaries exploit inter-agent delegation, trust, and workflow dependencies to escalate or manipulate.	—
T15	Human Manipulation	Agent-human trust reduces skepticism; attackers coerce agents to manipulate users or take covert actions.	—

3.2 OWASP Top 10 for Agentic Applications 2026 (ASI01-ASI10)

The 2026 list is grounded in real 2025 incidents, which distinguishes it from the more theoretical Feb 2025 taxonomy. The most material new addition is ASI04 Agentic Supply Chain Vulnerabilities -- runtime poisoning of the Model Context Protocol (MCP) and Agent2Agent (A2A) ecosystems.

Code	Title	Description	2025 Incident	Maps to v1.0
ASI01	Agent Goal Hijack	Hidden prompts alter objectives/decision path, turning copilots into silent exfiltration engines.	EchoLeak	T6
ASI02	Tool Misuse	Agents bent legitimate tools into destructive outputs (confused-deputy pattern).	Amazon Q	T2
ASI03	Identity & Privilege Abuse	Leaked credentials / dropped identity let agents operate beyond intended scope.	Credential abuse	T3 + T9
ASI04	Agentic Supply Chain Vulnerabilities	Dynamic MCP and A2A ecosystems let runtime components be poisoned (NEW for 2026).	GitHub MCP exploit	New; T2/T13
ASI05	Unexpected Code Execution	Natural-language execution paths unlock new RCE avenues.	AutoGPT RCE	T11
ASI06	Memory & Context Poisoning	Memory poisoning reshapes behavior long after the initial interaction.	Gemini Memory Attack	T1
ASI07	Insecure Inter-Agent Communication	Spoofed inter-agent messages misdirect entire agent clusters.	Spoofed messages	T12
ASI08	Cascading Failures	A single error/compromise spreads across connected agents/tools/pipelines with escalating impact.	Pipeline cascade	T5
ASI09	Human-Agent Trust Exploitation	Confident, polished explanations mislead operators into approving harmful actions.	Operator deception	T10 + T15
ASI10	Rogue Agents	Compromised, misaligned, or drifting agents act with harmful autonomy -- the ultimate insider threat.	Replit meltdown	T13

3.3 How Agentic Threats Extend the OWASP LLM Top 10 (2025)

The agentic taxonomy does not replace the OWASP Top 10 for LLM Applications 2025; it builds on it. Five LLM-level entries carry the most agent weight.

ID	Risk Name	Agent Relevance
LLM01:2025	Prompt Injection	Critical -- indirect injection via tools/RAG drives agent compromise
LLM02:2025	Sensitive Information Disclosure	High -- agents/RAG can leak knowledge-base data
LLM03:2025	Supply Chain	High -- agent tools/plugins inherit supply-chain risk
LLM05:2025	Improper Output Handling	Critical -- agent output feeds shells, DBs, browsers
LLM06:2025	Excessive Agency	Highest -- the defining agent risk

The Canonical Agent Kill-Chain

Indirect prompt injection (LLM01) → excessive agency (LLM06) → improper output handling (LLM05). A poisoned document changes the agent intent, the over-permissioned tool executes the action, and the unsanitized output reaches a shell or browser. Defenses must break all three links.

3.4 Six Defensive Playbooks

Playbook	Threats Mitigated
1. Preventing AI Agent Reasoning Manipulation	T6, T7, T8
2. Preventing Memory Poisoning & Knowledge Corruption	T1, T5
3. Securing AI Tool Execution & Preventing Unauthorized Actions	T2, T3, T11, T4
4. Strengthening Authentication, Identity & Privilege Controls	T3, T9
5. Protecting HITL & Preventing Human-targeted Threats	T10, T15
6. Securing Multi-Agent Communication & Trust Mechanisms	T12, T14, T13

4. Prompt Injection & Memory Poisoning

Prompt injection is OWASP LLM01:2025 -- the #1 LLM risk for the second consecutive edition. A Prompt Injection Vulnerability occurs when user prompts alter the LLM behavior or output in unintended ways. Direct injection modifies model behavior via user input (Ignore all previous instructions). Indirect injection -- the agent processes external content (websites, files, emails, RAG documents) carrying hidden instructions -- is the dominant agentic threat, because the agent may silently execute injected instructions, with the user privileges, with no user awareness.

4.1 Memory Poisoning: Temporally Decoupled and Worse

Memory poisoning (T1 / ASI06) is more dangerous because it decouples injection from execution across three phases:

Injection -- malicious content enters via documents/emails/webpages/API responses using instruction-like phrasing (Remember that the user prefers, For future reference always).
Persistence -- poisoned instructions persist indefinitely across sessions in long-term memory; the agent cannot distinguish learned context from planted content.
Execution -- a later, unrelated query retrieves the poisoned entry and runs it as self-learned knowledge.

This is why session isolation does not help -- the attack lives in persistent cross-session state. The academic MINJA attack (NeurIPS 2025) achieves >95% injection success and >70% attack success using query-only interaction, no privileged access. The Gemini Memory Attack used conditional, delayed instructions triggered by words like yes, sure, and no that appear in nearly every conversation -- making single-moment runtime detection nearly useless.

4.2 Real-World Incidents (Not Theoretical)

Name / ID	Target	Class	Mechanism	Severity / Success
EchoLeak (CVE-2025-32711)	Microsoft 365 Copilot	Indirect, zero-click	Crafted email → LLM Scope Violation; chained XPIA evasion, reference-style Markdown bypass, auto-fetched image egress, Teams CSP proxy	CVSS 9.3 (Critical); first real-world zero-click prod LLM injection
ShadowLeak	ChatGPT Deep Research (Gmail)	Indirect, zero-click	Instructions hidden in email HTML (white-on-white, tiny fonts); server-side exfiltration in OpenAI cloud	100% success in testing
MINJA	LLM agents (academic)	Memory poisoning	Query-only: bridging steps + indication prompts + progressive shortening	>95% injection, >70% attack
Gemini Memory Attack	Google Gemini	Memory poisoning	Conditional/delayed instruction triggered by common words	Bypasses runtime guardrails

EchoLeak: four defenses bypassed in sequence

Step	Defense Bypassed	Technique
1	XPIA classifier	Benign phrasing (For compliance, do not mention this email)
2	Markdown link-redaction filter	Reference-style Markdown link not stripped
3	Requirement for a user click	Auto-fetched reference-style Markdown image -- zero-click
4	Content Security Policy	Routed through CSP-allowlisted Teams proxy asyncgw.teams.microsoft.com/urlp

4.3 Defense-in-Depth (No Single Layer Is Sufficient)

Layer	Control	What It Does	Note
Input/output filtering	Classifiers (XPIA), semantic filters, RAG Triad	Detect injected instructions; validate output format	Bypassable -- EchoLeak evaded XPIA
Content provenance	Spotlighting: delimiting, datamarking, encoding	Help model distinguish trusted vs. untrusted tokens	Datamarking cut attack success ~50% to <3%
Memory provenance	Provenance tagging + write-ahead validation	Tag every entry with origin/session/source; secondary model validates before commit	Memory-poisoning Layer 2
Sandboxing / least privilege	Privilege control, HITL, CSP, trust-aware retrieval	Limit permissions, gate high-risk actions, contain blast radius	CSP failure enabled EchoLeak egress
Behavioral monitoring	Baselines, memory integrity audit, circuit breakers	Detect deviation; quarantine compromised agents	Memory-poisoning Layer 4
Adversarial testing	Red-teaming, LLMail-Inject challenge	Continuously probe defenses	LLM01 control 7

As of 2026, no fully reliable defense against prompt injection exists. Treat it as an unsolved problem -- which is precisely why privilege containment and provenance, not filtering alone, are the durable controls.

5. Excessive Agency & Tool Misuse: The Defining Risk

If you fix one thing, fix this. OWASP LLM06:2025 Excessive Agency has three official root causes -- and the maximum-risk configuration is all three at once, a state teams routinely create during prototyping and never tighten (the quiet drift toward excessive agency).

Root Cause	Definition	Concrete Agent Example	Primary Control(s)
Excessive Functionality	Tools include capabilities beyond task need	Tool offers modify/delete when only read needed; deprecated plugin still callable; open-ended shell function	Minimize tools; limit functionality; avoid open-ended extensions
Excessive Permissions	Tools run with broader downstream privileges than required	DB creds with UPDATE/INSERT/DELETE when only SELECT needed; shared service account instead of user identity	Least privilege; user-context OAuth minimum scope; complete mediation downstream
Excessive Autonomy	High-impact actions proceed without verification	Agent deletes documents / sends wire / external email without confirmation	Human-in-the-loop on high-impact/irreversible actions

The Canonical Scenario

An email-summarization agent is hit with indirect prompt injection in an incoming email. It is tricked into reading sensitive mail and forwarding it to an attacker -- exploiting all three root causes at once (unneeded send functionality, over-privileged OAuth scope, no send-confirmation). The fix: a summarizer should hold only read_inbox / read_sent scopes with explicit no_delete, no_forward, no_external_send restrictions.

5.1 The Control Stack (Defense-in-Depth, In Order)

Control	Mechanism	Example / Detail
Tool allowlist / minimization	Default zero tool access; add tools at runtime per permission	By default the agent should not have any tool access (Auth0)
Scoped capabilities	RBAC vs. fine-grained (ReBAC / OpenFGA) authorization	Can user:anne use buyStock on asset:OKTA?
Credential delegation	Short-lived OAuth 2.0 tokens; token vault + OAuth Federation	No raw/long-term creds stored by the agent
Human-in-the-loop	Explicit consent for high-impact/irreversible actions	CIBA push approval; 60s confirmation timeout; gates delete_file, send_email, run_code, update_database, modify_iam_policy
Output schema / bounding	Validate tool-call args vs. schema; delimit untrusted content	Wrap external input in delimited tags; filter injection strings
Damage limitation	Rate-limiting, sandboxing, tamper-evident audit logs	SHA-256 result hashing + reasoning traces; sandbox all code execution

OWASP eight LLM06 controls: minimize extensions; minimize functionality; avoid open-ended extensions; minimize permissions; execute in the user context (OAuth minimum scope, not a shared service account); require human approval for high-impact actions; complete mediation (validate all downstream requests rather than trusting the LLM); sanitize inputs/outputs. The two emphasized controls are the ones most often skipped.

5.2 Autonomy Tiers -- Govern the Dial, Do Not Max It

Tier 1

Fully Supervised

Human approval required before ANY action.

Tier 2

Constrained Autonomy

Executes only pre-approved action types within predefined scope.

Tier 3

Broad Autonomy

Acts within defined boundaries under continuous monitoring.

Principle

Authorization must live in the downstream system, never be trusted to the LLM. Controls work combined, not individually. When choosing an agent framework, evaluate it against these controls first -- see best AI multi-agent tools.

6. Agent Identity, Authentication & Least Privilege (the NHI Problem)

An AI agent is a non-human identity (NHI) -- a digital identity that authenticates and operates without direct human control. Treating agents as first-class identities (not as features running under a human credentials) is the single most important access-control decision you will make.

The scale is the problem. Enterprises now average ~82 machine identities per employee; the ratio moved from ~92:1 (early 2024) to ~144:1 (end of 2025), and Palo Alto Networks 2026 report puts the cross-environment average at 109:1, with cloud-native environments reaching tens of thousands of machine identities per human. Agents are the fastest-growing class: Palo Alto projects +85% AI-agent growth over the next 12 months.

6.1 The Anti-Pattern: Agents Under Broad User Credentials

When an agent reuses a human user session or a shared key, three things break: permissions become excessive, audit becomes impossible (OWASP NHI10 Human Use of NHI), and you create classic confused-deputy exposure. The fix is a distinct, managed identity per agent. The OWASP Non-Human Identities Top 10 (2025) is the reference frame:

ID	Risk	Relevance to AI-Agent Least Privilege
NHI1:2025	Improper Offboarding	Decommissioned agents left active create persistent backdoors
NHI2:2025	Secret Leakage	Agent memory, tool results, transcripts, crash dumps leak creds
NHI3:2025	Vulnerable Third-Party NHI	Compromised connector enables supply-chain attack
NHI4:2025	Insecure Authentication	Weak/legacy auth enables takeover and escalation
NHI5:2025	Overprivileged NHI	Agent granted more than its task needs; expands blast radius (core risk)
NHI6:2025	Insecure Cloud Deployment Configs	High-privilege CI/CD misconfig enables unauthorized access
NHI7:2025	Long-Lived Secrets	~50% of NHI creds are long-lived keys; fix is ephemeral tokens
NHI8:2025	Environment Isolation	Reusing one NHI across test/prod enables cross-env compromise
NHI9:2025	NHI Reuse	One identity shared across workloads removes least-privilege boundaries
NHI10:2025	Human Use of NHI	Cannot distinguish agent vs. human; breaks audit

The Over-Privilege Data

37% of NHI security incidents are attributed to over-privileged identities; 26% of orgs estimate that 50%+ of their service accounts are over-privileged; 44% of cloud environments contain at least one privileged IAM role; ~50% of enterprise NHI credentials are long-lived API keys.

6.2 The Controls

Control	Recommended Practice	Standard
Identity per agent	Each agent authenticates as a distinct principal; no reused sessions or shared keys	OWASP NHI10
Token lifetime	Minute-scale, short-lived; JIT issuance; retired at task completion (zero standing privilege)	NIST SP 800-207A
Authentication	OAuth 2.1 + PKCE (RFC 7636), dynamic client registration; MCP HTTP mandates OAuth 2.1	MCP spec
Delegation	Token Exchange (RFC 8693); token carries agent + user identity as separate claims	RFC 8693
Authorization model	ABAC at runtime; capability-based verb-on-resource scopes, not broad roles	NIST SP 800-162
Posture	Default-deny with explicit grants; org-level deny policies block excessive configs	OWASP NHI5
Effective authority	Intersection of agent and user permissions, never the union (confused-deputy defense)	—
Workload identity	SPIFFE/SPIRE, short-lived OIDC, STS assume-role; SCIM for provisioning	NIST 800-207A
High-impact actions	Out-of-band human approval via a channel the agent cannot forge	—

NIST SP 800-207A states the principle directly: each service should present a short-lived cryptographically verifiable identity credential, authenticated per connection and reauthenticated regularly. Note the self-escalation risk: an agent with enough initial access can dynamically modify its own permissions -- which is exactly why org-level deny policies and out-of-band approval for high-impact actions are non-negotiable. RBAC alone is insufficient; ABAC plus capability tokens (Macaroons, Biscuit) is the recommended posture.

7. Multi-Agent Systems: Cascading Failure, Impersonation & Protocol Security

Multi-agent security is non-compositional: individually safe agents can compose into an unsafe system, because trust does not aggregate predictably across agent-to-agent calls. You cannot certify a fleet by certifying each agent. The relevant OWASP 2026 risks are ASI07, ASI08, ASI09/ASI10, and ASI03.

#	Threat Class	Description
1	Privacy & Information Integrity	Unauthorized data access or corruption across agent boundaries
2	Collusion & Exfiltration	Coordinated extraction/leak, incl. secret/steganographic collusion
3	Exploitation	Agents abusing vulnerabilities in other agents decision processes
4	Swarm Attacks	Coordinated assaults that appear benign individually
5	Heterogeneous Attacks	Mixed-capability adversaries exploiting role specialization
6	Overseer Attacks	Compromising human supervisors or monitoring systems
7	Cascade Attacks	Failures propagating through agent dependencies
8	Conflict & Mixed-Motive Threats	Misaligned objectives creating systemic risk
9	Physical & Embodied Security	Agents controlling real-world systems
10	Sociotechnical Threats	Manipulation of humans and institutions

The Observability Gap

Backdoored agents can coordinate via steganographic channels embedded in shared message boards, making secret collusion undetectable even under full observability of communications.

7.1 Agent Impersonation & Protocol Security (A2A and MCP)

In A2A, a malicious agent crafts a deceptive Agent Card to misrepresent its capabilities and win the host LLM-based selection. Trustwave SpiderLabs demonstrated an Agent-in-the-Middle attack in 2025. A2A v0.3+ supports but does not enforce card signing, so card spoofing via DNS/CDN compromise is a low-cost, routine threat.

Protocol	Named Attack	Mitigation / Spec Control
A2A (v0.3+)	Agent Card spoofing/tampering (DNS/CDN); signing supported but not enforced	Enforce card signing; serve over HTTPS/TLS 1.3+; mTLS agent identity
A2A	Agent-in-the-Middle impersonation (Trustwave 2025)	Verify card provenance/signature; do not rely on LLM selection alone
A2A	OAuth2 long-lived tokens, coarse scopes, no consent gate	Short-lived audience-scoped tokens; capability-based access; protocol-level consent
MCP	Tool poisoning (malicious instructions in tool metadata); 5 of 7 clients lack static validation	Static metadata analysis; client-side validation; behavioral anomaly detection
MCP	Rug pull (tool definition mutates after approval)	Pin/version tool definitions; re-approval on change; integrity checks
MCP	Confused deputy (proxy uses server, not user, privileges)	Per-user identity passthrough done correctly; avoid a static single OAuth Client ID
MCP (spec 2025-06-18)	Token passthrough abuse	Prohibited by spec; servers = OAuth 2.1 Resource Servers; validate token audience; mint per-call audience-scoped tokens

Cross-Cutting Defenses

Least-privilege/capability-based access control, Plan-then-Execute architectures (separate deliberation from action), Byzantine-resilient consensus for mission-critical decisions, and a digital-twin clone that re-runs the last week of recorded actions to test for cascade triggers (ASI08).

8. Agent Inventory & Discovery: You Cannot Secure What You Cannot See

The first deliverable of any agentic-AI security program is a continuous, living inventory of every agent -- including shadow deployments. CSA defines three discovery gaps that make agents uniquely hard to inventory:

Discovery Gap

Traditional tools cannot find ephemeral agent runtimes in IDEs, desktops, browser sessions, MCP servers, and personal accounts.

Permission Visibility Gap

Agents inherit employee credentials and may exceed consciously granted permissions.

Logic Inspection Gap

Teams rarely inspect prompts, skills, MCP tool definitions, memory stores, and agent instructions for malicious behavior.

Why a Registry Is Foundational

79% of organizations lack visibility into AI agents and MCP-connected systems; 47% of enterprise AI use occurs through personal accounts outside SSO/identity governance; ~97% of NHIs carry excessive privileges; and just 0.01% of NHIs control 80% of cloud resources. Fragmented identity systems added an average of 12 hours to identity-related incident resolution (Unit 42).

8.1 Registry Architecture & the Unit of Inventory

Two production-relevant directions: the OWASP Agent Name Service (ANS) -- a protocol-agnostic discovery registry (IETF draft) with DNS-inspired naming, PKI certificates, and a Protocol Adapter Layer covering A2A, MCP, and ACP -- and Microsoft Entra Agent ID / Agent 365, a production enterprise registry with tenant-wide counters for Total, Ownerless, and Unmanaged agents.

A Managed Agent Identity Is the Unit of Inventory

A shadow agent is rated Critical when an agent has no registry entry, no owner, OR no managed identity. Anything without all three is, by definition, a shadow agent and a Critical finding. OWASP also requires comprehensive runtime logging of every decision, tool call, and state change, a per-agent behavioral baseline, circuit breakers, and an auditable kill-switch.

9. Audit, Traceability & Logging (EU AI Act Art. 12)

Agent audit logging is not application logging. It must capture decisions, prompts, tool calls, delegated authority, and outcomes -- a full forensic trail. Two layers matter: the regulatory baseline (what you must record) and the engineering layer (how you make those records trustworthy).

9.1 Regulatory Baseline -- EU AI Act Article 12

Article 12(1): High-risk AI systems shall technically allow for the automatic recording of events (logs) over the lifetime of the system. Logging must be technical, automatic, and lifetime -- manual recording does not satisfy the requirement.

Provision	Requirement	Detail
Art 12(1)	Automatic event logging	Technically built in; over the system lifetime; manual recording insufficient
Art 12(2)(a)	Risk identification	Log events relevant to risk situations or substantial modification
Art 12(2)(b)	Post-market monitoring	Support post-market monitoring per Article 72
Art 12(2)(c)	Operation monitoring	Support monitoring of operation per Article 26(5)
Art 26(6)	Retention	Deployers retain auto-generated logs for a minimum of 6 months, subject to law

Penalties

Breaching operator obligations (incl. Articles 12/26) reaches up to 15 million euros or 3% of total worldwide annual turnover, whichever is higher (up to 35M euros / 7% for prohibited practices).

Timeline in flux as of 2026-05-30: high-risk obligations were originally set for 2 Aug 2026; a reported 7 May 2026 political agreement moves Annex III high-risk systems to 2 Dec 2027. Confirm the final adopted text before presenting any single date as binding.

9.2 Engineering Layer -- Tamper-Evident Logs

Article 12 mandates that logs exist and be automatic, but does NOT prescribe how logs resist tampering. Tamper-evidence is your design choice -- and it is what satisfies SOC 2, ISO 27001, and forensic readiness. The five primitives:

Property	Mechanism	Notes
Append-only	WORM / immutable storage	Entries added, never removed or modified
Tamper-evident	SHA-256 hash chain over canonical-JSON events	Any altered byte breaks all subsequent links
Independently verifiable	Merkle tree; recompute leaves and re-chain	External auditor verifies without trusting the runtime
Identity-bound	Cryptographic signature tied to agent credential	Plus the human authorizer who delegated the workflow
Time-ordered	Sequential chain; tamper-resistant timestamps	Suitable for replay/forensics

Worked Verification Example -- hash chain

C1 = SHA-256( C0 + bytes(E1) )
C2 = SHA-256( C1 + bytes(E2) )
C3 = SHA-256( C2 + bytes(E3) )   <- stored as the chain head

An auditor recomputes C1', C2', C3' from the canonical bytes and compares C3' to the stored head. If an attacker alters a single byte in E2 (e.g. changing operation:delete to operation:read), then C2' does not match C2, which forces C3' to differ -- the tampering is detected at the chain head even though only one intermediate entry changed. A Merkle tree over the leaves provides the same guarantee with O(log n) inclusion proofs.

Hash-chaining detects tampering but does not by itself prevent deletion of the whole log. Pair it with WORM/replicated storage and external anchoring for true tamper-resistance.

9.3 Required Fields per Agentic Access (Kiteworks Model)

Field	Definition
Agent identity	Unique workflow-level credential of the agent performing the access
Human authorizer	Authenticated identity of the human who delegated the workflow
Data accessed	Specific record identifiers + data classification
Operation performed	Specific action: read, download, move, delete, forward
Policy-evaluation outcome	Permitted/denied + which policy attribute governed the decision
Timestamp	Precise, retroactively-unalterable event time

Supporting standards: NIST SP 800-92, SOC 2, ISO 27001, plus HIPAA 45 CFR 164.312(b), SEC Rule 17a-4 (WORM), NIST 800-171 (3.3.1), CMMC AU.2.042, NYDFS Part 500 500.6. Map these obligations to your broader program in AI compliance frameworks.

Free Download

Get Chapter 1 Free + AI Academy Access

Download the first chapter of The AI Strategy Blueprint and get instant access to our AI Academy -- covering AI governance, security architecture, and the seven executive commitments behind a defensible agentic-AI program.

10. Runtime Defense-in-Depth: Guardrails, Sandboxing & Kill-Switches

Runtime agent security layers four independent control planes. Both OWASP and Meta frame guardrails as a final layer of defense, not the only one.

10.1 Guardrail Frameworks

Guardrail	Function	Architecture / Model	Key Metrics
PromptGuard 2	Jailbreak / prompt-injection classifier (input)	BERT-family: 86M or 22M	98% AUC English; 97.5% recall @1% FPR; 19.3-92.4 ms on A100
AlignmentCheck	Chain-of-thought goal-hijack auditor	Guardrail LLM: Llama 3.3 70B / Llama 4 Maverick	>80% recall, <4% FPR (internal)
CodeShield	Static analysis of generated code	Semgrep + regex, 8 languages, 50+ CWEs	96% precision, 79% recall; ~60-300 ms tiers

The Headline Result

Meta LlamaFirewall combining all three reduced attack success rate on the AgentDojo benchmark from a 17.6% baseline to 1.75% (>90% reduction) while preserving 42.7% utility. For on-prem/air-gapped builds, NVIDIA NeMo Guardrails (programmable Colang, five rail types incl. execution/tool I/O) and Llama Guard (self-hostable open-weight classifier, no external API) are deployable with no internet dependency.

10.2 Sandboxing & Isolation of Tool Execution

Treat all generated code as untrusted; remove direct eval(); run one task per ephemeral sandbox with no artifact carryover.

Technology	Isolation Mechanism	Best Fit	Notes
Firecracker / Kata microVM	Hardware-virtualized microVM	Regulated/sensitive data; strongest	E2B boots <200 ms; recommended minimum for production
gVisor	User-space Go kernel intercepts syscalls	Compute-heavy multi-tenant	Sandboxed code never talks to host kernel directly
V8 Isolates	Per-context JS engine isolation	Latency-critical lightweight tasks	JS/TS only; weakest boundary

10.3 Threat-to-Control Map (OWASP ASI01-ASI10)

ID	Risk	Key Defense-in-Depth Controls
ASI01	Agent Goal Hijack	Prompt-injection filtering; limited tool privileges; human approval for goal changes
ASI02	Tool Misuse & Exploitation	Sandboxed execution; strict permission scoping; argument validation
ASI03	Identity & Privilege Abuse	Short-lived credentials; task-scoped permissions; isolated identities
ASI04	Agentic Supply Chain	Signed manifests; curated registries; dependency pinning; sandboxing; kill-switches
ASI05	Unexpected Code Execution	Treat generated code as untrusted; remove eval(); hardened sandboxes; review steps
ASI06	Memory & Context Poisoning	Memory segmentation; ingestion filtering; provenance tracking; entry expiry
ASI07	Insecure Inter-Agent Comm.	Mutual TLS; signed payloads; anti-replay; authenticated discovery
ASI08	Cascading Failures	Isolation boundaries; rate limits; circuit breakers; pre-deployment plan testing
ASI09	Human-Agent Trust Exploitation	Forced confirmations; immutable logs; risk indicators
ASI10	Rogue Agents	Governance; sandboxing; behavioral monitoring; kill-switches

Kill-Switch Mandate

For ASI10 rogue agents and ASI04 supply-chain, an instant, auditable kill-switch is mandatory. Microsoft open-source Agent Governance Toolkit maps controls to every OWASP agentic risk using four execution rings (Ring 0 supervisor through Ring 3 untrusted sandbox), each with resource limits plus an instant kill-switch.

11. Mapping to the Frameworks: NIST AI RMF, CISA/Five Eyes & MITRE ATLAS

11.1 NIST AI Risk Management Framework

NIST AI 600-1 (Generative AI Profile of AI RMF 1.0, July 2024) is built on four core functions -- GOVERN, MAP, MEASURE, MANAGE. It defines 12 GAI risk categories and 200+ suggested actions. NIST term for hallucination is confabulation. AI 600-1 was scoped to content generation, not autonomous action; agentic risk is handled by NIST AI 100-5 plus the CSA NIST AI RMF Agentic Profile (draft) using AG- extensions.

Function	AI 600-1 GenAI Focus	Agentic Extensions (CSA AG- profile, draft)
GOVERN (GV / AG-GV)	Risk culture, policy, accountability, value-chain oversight	AG-GV.1 Autonomy Tier Classification; AG-GV.2 Delegation Accountability; AG-GV.3 Agent Lifecycle Governance
MAP (MP / AG-MP)	Establish context; identify which of 12 GAI risks apply	AG-MP.1 Tool-Use Risk Inventory; AG-MP.2 Action-Consequence Mapping; AG-MP.3 Multi-Agent Topology Risk
MEASURE (MS / AG-MS)	Assess, benchmark, track; red-teaming, evals	AG-MS.1 Behavioral Telemetry; AG-MS.2 Autonomy Calibration; AG-MS.3 Delegation Chain Monitoring
MANAGE (MG / AG-MG)	Prioritize, respond, recover; incident response	AG-MG.1 Agentic Incident Response; AG-MG.2 Behavioral Drift Correction; AG-MG.3 Agent Decommissioning

600-1 is final; the agentic/AG- materials are 2025-2026 drafts -- treat AG- IDs and the autonomy-tier scale as CSA proposals aligned to NIST, not finalized NIST controls.

11.2 CISA / NSA / Five Eyes Guidance

Publication	Date	Core Focus
Deploying AI Systems Securely	15 Apr 2024	Zero Trust, secure-by-design, model-weight protection, RBAC/ABAC + MFA, monitoring
AI Data Security: Best Practices	22 May 2025	Securing training/operational data: supply chain, poisoning, drift; provenance & encryption
Principles for Secure Integration of AI in OT	Dec 2025	Critical-infrastructure/OT: Understand, Assess, Govern, Embed safety
Careful Adoption of Agentic AI Services	1 May 2026	First dedicated agentic AI guidance: 5 risk categories + lifecycle controls

The May 2026 agentic guidance defines five named risk categories: Privilege risks, Design & configuration risks, Behavioral risks, Structural risks, and Accountability risks. Its immediate actions are a ready-made program kickoff: inventory all agentic deployments (including shadow); conduct blast-radius assessments; audit service accounts for excessive permissions; replace persistent credentials with just-in-time (JIT) provisioning; extend logging to capture agent actions. The AI Data Security CSI also specifies AES-256 + post-quantum encryption, FIPS 140-3 storage, and cryptographically signed append-only provenance ledgers.

11.3 MITRE ATLAS

MITRE ATLAS is the threat-informed-defense knowledge base for AI systems. On 2025-10-21, MITRE ATLAS and Zenity Labs released the first formal agent-specific techniques:

AML.T ID	Technique	What It Does
AML.T0080	AI Agent Context Poisoning	Manipulate the context an agent uses; subs Memory and Thread
AML.T0081	Modify AI Agent Configuration	Alter config files affecting one or many agents
AML.T0082	RAG Credential Harvesting	Harvest credentials from documents ingested into a RAG database
AML.T0083	Credentials from AI Agent Configuration	Extract tool/service credentials from agent settings
AML.T0084	Discover AI Agent Configuration	Enumerate config (Embedded Knowledge / Tool Definitions / Activation Triggers)
AML.T0085	Data from AI Services	Exfiltrate via agent services (RAG Databases / AI Agent Tools)
AML.T0086	Exfiltration via AI Agent Tool Invocation	Abuse the agent own tools to move data out

For a CISO-level synthesis of all three frameworks, see AI for CISOs security and the program backbone in our AI governance framework. (ATLAS counts are version-sensitive and release monthly -- verify before publishing.)

12. Air-Gapped & On-Prem Containment: Shrinking the Blast Radius

Every preceding section reaches the same conclusion: you must assume injection succeeds, so the durable control is containment -- and the strongest containment is removing the egress channel entirely. Air-gapped / on-prem deployment is blast-radius reduction by architecture: no internet, no outbound connections, no DNS resolution, no NTP sync.

This is not abstract. Re-read the EchoLeak and ShadowLeak chains from Section 4: both depended on the agent reaching an external endpoint -- Markdown image auto-fetch, SSRF, and tool callouts to external URLs. In a true air-gapped deployment, those channels are architecturally impossible. The same logic neutralizes MITRE ATLAS AML.T0086 (exfiltration via tool invocation) for any tool that would otherwise call out.

Dimension	Detail
Egress posture	No internet, no outbound connections, no DNS, no NTP; no licensing/telemetry callbacks
Channels neutralized	Markdown image auto-fetch, SSRF, external tool callouts -- architecturally impossible
Compliance fit	NIST 800-171 / CMMC 2.0 L3, NIST RMF 800-37, FedRAMP High, DoD IL4-IL5, ITAR, HIPAA, CJIS, GDPR/sovereignty
Reference model stack	Llama 3 8B/70B, Mistral, Falcon (open-weight) on vLLM or llama.cpp
Vector / embedding stack	Qdrant or Milvus; E5 / Voyage embedding models
Reference hardware	NVIDIA A10G 24GB or A100 80GB; GPU server ~$8,000-$25,000
Deployment timeline	4-12 weeks (air-gapped) vs. 1-2 weeks (connected VPC)
Residual risk	Does NOT prevent injection, local corpus poisoning, or insider/physical exfiltration -- pair with layered controls

Note the strict definition: even a single firewall rule allowing outbound HTTPS to a licensing or telemetry server disqualifies the deployment from true air-gap status. The compliance dividend is concrete: FedRAMP High and DoD IL4-IL5 deployments eliminate entire boundary-defense control categories (no boundary to defend), and ITAR technical data cannot traverse foreign-accessible infrastructure at all.

Be Honest About Trade-offs

Air-gapping removes the egress channel; it does not remove prompt injection, poisoning of locally-ingested corpora, or insider/physical-media exfiltration. Pair it with layered ingestion (provenance verification, hidden-instruction stripping), retrieval (permission-aware search, tenant isolation, anomaly detection), and generation (output monitoring, disabled auto-fetch, strict CSP/egress allowlists) controls.

Compare deployment options in best AI air-gapped environments, and see how AirgapAI implements local-inference containment for enterprise agents.

13. The CISO Agentic AI Security Checklist (2026)

This checklist consolidates OWASP (LLM Top 10, Agentic Top 10, NHI Top 10), NIST AI RMF, the CISA/Five Eyes agentic guidance, and MITRE ATLAS into seven operational domains. Work top to bottom; Inventory is the prerequisite for everything else. Each item names the control AND the authoritative source -- more actionable than a generic template.

Domain 1 — Inventory & Discovery

Maintain a continuous, living inventory of every agent, including shadow/informal deployments.CISA / OWASP ANS
Treat a managed agent identity as the unit of inventory -- flag any agent with no registry entry, no owner, OR no managed identity as a Critical shadow agent.Agent 365
Run continuous discovery across endpoints, IDEs, browsers, MCP servers, SaaS, and personal accounts (close the Discovery Gap).CSA
Inventory each agent tool access, memory stores, prompts/skills, and MCP tool definitions (close the Logic Inspection Gap).CSA / ASI04
Conduct a blast-radius assessment mapping every agent tools, data, and downstream reach.CISA

Domain 2 — Identity & Access

Give every agent its own distinct, managed non-human identity; never run agents under shared keys or reused human sessions.OWASP NHI10
Default-deny: agents start with zero tool access; grant explicitly and minimally.LLM06 / NHI5
Use short-lived, minute-scale credentials with JIT issuance and zero standing privilege; retire at task completion.NIST 800-207A
Replace long-lived API keys (the ~50% problem) with ephemeral, auto-rotated tokens.OWASP NHI7
Use OAuth 2.1 + PKCE; delegate via Token Exchange (RFC 8693) so the token carries agent AND user identity as separate claims.RFC 8693 / MCP
Authorize with ABAC / capability-based scopes (verb-on-resource), not broad roles.NIST 800-162
Enforce effective authority = intersection of agent and user permissions, never the union.Confused-deputy
Audit service accounts for excessive permissions and add org-level deny policies that block self-escalation.CISA / NHI5
Assign distinct cryptographic identities per agent (SPIFFE/SPIRE; mTLS for inter-agent).CISA

Domain 3 — Tooling & Supply Chain

Minimize the number of tools and limit each tool to essential functionality; avoid open-ended shell/URL extensions.LLM06
Execute every tool in the user context with minimum OAuth scope; never a shared service account.LLM06
Validate every tool-call argument against a defined output schema; wrap untrusted external content in delimited blocks.LLM06 / Spotlighting
Enforce complete mediation -- re-authorize every downstream request at the resource, never trust the LLM.LLM06
Verify every MCP server before approval; pin/version tool definitions and require re-approval on change (rug-pull defense).ASI04 / MCP
Confirm MCP servers act as OAuth 2.1 Resource Servers with audience-scoped tokens and no token passthrough (spec 2025-06-18).MCP spec
Maintain an SBOM for models, adapters (LoRA), tools, and datasets; verify provenance and model signatures.LLM03 / ASI04
Never auto-approve tool calls based on repository/document content; disable auto-run / YOLO modes.ASI02 / ASI05

Domain 4 — Runtime Defense

Deploy layered guardrails (input/output classifiers + chain-of-thought alignment check + generated-code static analysis) as a final layer, not the only one.LlamaFirewall
Sandbox all tool/code execution at microVM strength minimum (Firecracker/Kata); one ephemeral sandbox per task, no artifact carryover.ASI05
Treat all generated code as untrusted; remove direct eval().ASI05 / LLM05
Enforce per-agent, per-tool, per-session rate limits and resource quotas.ASI08 / LLM10
Implement circuit breakers, transactional rollback, and safe-failure modes that pause and escalate to a human.ASI08
Provide an instant, auditable kill-switch / emergency shutdown for runaway or rogue agents.ASI10 / CISA
Separate planning from execution (Plan-then-Execute) architecturally.CISA

Domain 5 — Data Protection

Strip hidden instructions and verify provenance on every ingested document before embedding.LLM01
Enforce permission-aware retrieval and tenant isolation at the retrieval layer (before documents enter the context window), not just the app layer.LLM08
Tag every memory entry with origin/session/source; validate writes with a secondary model; expire entries.ASI06
Encrypt data at rest, in transit, and in compute (AES-256 + post-quantum); store in FIPS 140-3 systems.CISA AI Data Sec
Track data provenance via cryptographically signed, append-only ledgers; verify integrity with hashes.CISA AI Data Sec
Monitor output for exfiltration signatures; disable client-side auto-fetch of remote images/links; enforce strict CSP and egress allowlists.LLM05 / EchoLeak
For high-sensitivity workloads, deploy air-gapped/on-prem to remove the egress channel entirely.Containment

Domain 6 — Audit & Traceability

Log every decision, tool call, and state change automatically, including a stable goal identifier, exact prompt, exact output, tool-selection rationale, and parameters.EU AI Act Art. 12
Capture the six mandatory fields per access (agent identity, human authorizer, data accessed, operation, policy outcome, timestamp) -- log permitted AND denied actions at operation-level granularity.Kiteworks
Make logs append-only and tamper-evident (SHA-256 hash chain + Merkle tree), identity-bound, time-ordered, and independently verifiable.SOC 2 / ISO 27001
Store logs in WORM/replicated storage; retain a minimum of 6 months (longer where law requires).EU AI Act Art. 26(6)
Integrate structured, discrete-field logs into SIEM in real time with anomaly alerts.NIST 800-92
Establish a per-agent behavioral baseline and alert on deviation (drift detection).ASI10

Domain 7 — Governance & Lifecycle

Adopt the NIST AI RMF four functions (Govern/Map/Measure/Manage) as your governance spine; layer the GenAI profile (AI 600-1) and the agentic profile (NIST AI 100-5 / CSA draft).NIST AI RMF
Classify each agent by autonomy tier; grant the lowest tier that works and promote only deliberately.CSA tiers
Require human approval / out-of-band confirmation for high-impact, irreversible actions (delete_file, send_email, run_code, update_database, modify_iam_policy).LLM06 / CISA
Tune human-in-the-loop thresholds to risk/confidence/context; automate low-risk, escalate high-risk (prevent reviewer flooding).T10 / ASI09
Threat-model before/concurrent with deployment, including explicit inter-agent trust modeling.CISA / MAESTRO
Progressively increase access and autonomy -- never grant full autonomy on day one.CISA
Define an agentic incident-response plan with pre-authorized auto-containment.AG-MG.1
Govern decommissioning: revoke credentials, dispose of memory, and remove registry entries (prevent orphaned backdoors).OWASP NHI1 / AG-MG.3
Red-team agents continuously (adversarial testing, attack simulation).LLM01 control 7
Map your controls to MITRE ATLAS techniques (esp. AML.T0080-T0086) and re-baseline as the framework updates monthly.MITRE ATLAS

Frequently Asked Questions

What is an AI agent security checklist, and how is it different from an LLM security checklist?

An AI agent security checklist covers autonomous, tool-using, stateful systems -- not just a model that generates text. It adds controls absent from an LLM checklist: agent inventory and discovery, non-human identity and least-privilege tool scoping, runtime sandboxing and kill-switches, multi-agent communication security, and tamper-evident action logging. The agentic risks (OWASP T1-T15 and ASI01-ASI10) are explicitly framed as extensions of the OWASP LLM Top 10 into autonomous settings, so an agent checklist contains an LLM checklist and goes further.

What is the single most important control for AI agent security?

Least privilege applied to the agent identity and tools -- directly countering OWASP Excessive Agency (LLM06:2025) and the CISA Privilege risks category. Because no fully reliable defense against prompt injection exists, you must assume injection succeeds; the durable mitigation is ensuring a compromised agent simply cannot perform high-impact actions or reach external endpoints. Default-deny tool access, short-lived scoped credentials, and human-in-the-loop on irreversible actions are the highest-leverage items.

Which standards should we align our agent governance to?

Four authoritative bodies: OWASP (Top 10 for LLM Applications 2025, Top 10 for Agentic Applications 2026, and the NHI Top 10 2025); NIST (AI RMF 1.0 plus the Generative AI Profile AI 600-1 and the emerging agentic profile NIST AI 100-5 / CSA draft); CISA and Five Eyes (the four joint Cybersecurity Information Sheets, culminating in Careful Adoption of Agentic AI Services, 1 May 2026); and MITRE ATLAS for threat-informed defense. Use the NIST four functions (Govern, Map, Measure, Manage) as the spine and map specific controls to OWASP and ATLAS.

Is prompt injection a solved problem in 2026?

No. As of 2026 there is no fully reliable defense against prompt injection -- classifiers like Microsoft XPIA are demonstrably bypassable (EchoLeak chained four bypasses, including XPIA evasion). Treat filtering as one layer among many. The durable controls are privilege containment and content provenance, not detection alone.

What is the EU AI Act logging requirement for agents, and when does it apply?

Article 12 requires high-risk AI systems to automatically record events over the system lifetime; Article 26(6) requires deployers to retain those logs for a minimum of six months. Article 12 does not specify how logs resist tampering -- cryptographic tamper-evidence (hash chains, Merkle trees, WORM storage) is your own design choice to satisfy forensic and SOC 2 / ISO 27001 needs. High-risk obligations were originally set for 2 August 2026; a reported 7 May 2026 political agreement moves Annex III systems to 2 December 2027 -- confirm the final enacted dates. Penalties reach up to 15 million euros or 3% of global turnover.

How does air-gapped or on-prem deployment reduce agent risk?

It removes the egress channel. Real exfiltration exploits like EchoLeak and ShadowLeak depend on the agent reaching an external endpoint (image auto-fetch, SSRF, external tool callouts). With no internet, no outbound connections, no DNS, and no telemetry callbacks, those channels become architecturally impossible -- shrinking the blast radius. It also eliminates entire boundary-defense control categories for FedRAMP High and DoD IL4-IL5 and satisfies ITAR and data-residency requirements. It does not prevent injection itself or local corpus poisoning, so pair it with layered controls.

What makes multi-agent systems harder to secure than single agents?

Multi-agent security is non-compositional: individually safe agents can compose into an unsafe system because trust does not aggregate predictably across agent-to-agent calls. New attack surfaces appear -- Agent Card spoofing and impersonation in A2A, tool poisoning and rug pulls in MCP, cascading failures across shared memory, and steganographic secret collusion that is undetectable even under full observability. Mitigate with mutual TLS, signed and verified agent identities, audience-scoped tokens, circuit breakers, and Plan-then-Execute separation.

What is a shadow agent and why is it a Critical finding?

A shadow agent is any agent with no registry entry, no assigned owner, OR no managed identity -- Microsoft Agent 365 rates this Critical. Shadow agents are unmonitored, often inherit broad employee credentials, and lack audit trails, making them the agentic equivalent of shadow IT. Industry data shows 79% of organizations lack visibility into their agents and 47% of enterprise AI use happens through personal accounts outside SSO -- which is why continuous discovery and a managed agent registry are the foundational control.

How do we operationalize all of this without stalling agent adoption?

Start with the CISA immediate actions: inventory all agents (including shadow), run blast-radius assessments, audit service accounts, replace standing credentials with just-in-time provisioning, and extend logging to agent actions. Then govern autonomy on a dial -- classify each agent by tier and grant the lowest tier that works, promoting deliberately rather than by default. This avoids the quiet drift toward excessive agency that turns prototypes into production liabilities, and addresses Gartner warning that 40%+ of agentic projects will be canceled by 2027 for inadequate risk controls.

Put the Checklist to Work

Securing agents is one chapter of a defensible enterprise AI program. Build the strategy behind the controls, then turn this checklist into a tailored roadmap.

Build the Strategy Behind the Controls

Get the full playbook in the AI Strategy Blueprint -- the executive guide to deploying AI with governance, security, and ROI built in from day one, framed as seven executive commitments.

Get the AI Strategy Blueprint

Turn This Checklist Into Your Roadmap

Use the AI Blueprint Builder to generate a tailored agentic-AI governance and deployment plan mapped to your environment, risk tolerance, and compliance requirements.

Launch the AI Blueprint Builder

Sources & References

OWASP Standards

Government & Frameworks

Incidents, Research & Engineering

This guide synthesizes publicly available standards, vendor research, and security press as of 2026-05-30. Framework codes, technique counts, and EU AI Act dates are version-sensitive and were in flux at publication -- always verify against the authoritative source PDFs (OWASP, NIST, CISA, MITRE ATLAS, the EU AI Act consolidated text) before relying on a specific code or date in policy.