Agentic AI Security Risks & the CISO Checklist
The 2026 CISO playbook for securing autonomous AI agents: the full OWASP / NIST / CISA risk taxonomy, least-privilege and identity controls, EU AI Act audit requirements, and a seven-domain, copy-ready governance checklist that names both the control and the authoritative source.
1. Why AI Agents Break Your Existing Threat Model
An AI agent security checklist is not a chatbot policy with extra steps. CISA 2026 Five Eyes guidance defines agentic AI as systems composed of one or more agents that fundamentally rely on an AI model, such as an LLM, to interpret and reason about the state of the world and can autonomously make decisions and take actions. Three properties in that definition -- autonomy, statefulness, and tool access -- invalidate assumptions that traditional application security takes for granted.
First, the instruction/data boundary collapses. A large language model processes instructions and data in the same channel, with no enforced separation -- unlike a SQL database, where parameterized queries cleanly separate code from input. OWASP makes this the root cause of its #1 risk, Prompt Injection (LLM01:2025). When that model is wired to tools, any text it ingests -- a retrieved document, an email, a webpage, another agent message -- becomes a candidate instruction. The agent trust boundary silently expands to include every byte of untrusted content it reads.
Second, the agent acts under real privilege. A vulnerable web form returns data to a user; a vulnerable agent can delete a file, send a wire, modify an IAM policy, or query a production database -- because you gave it those tools to be useful. OWASP frames this as Excessive Agency (LLM06:2025): damaging actions can be performed in response to unexpected, ambiguous, or manipulated outputs from an LLM. The exploit is no longer leak the response. It is perform the attacker action at your privilege level.
Third, state and autonomy decouple the attack from its trigger. Agents persist memory across sessions and chain tool calls in loops. A poisoned memory entry planted today can execute against an unrelated query next week, and the agent cannot distinguish learned context from planted content. Session isolation does not help, because the attack exploits persistent cross-session state.
For the organizational program that wraps these technical controls, see our AI governance framework and the broader AI for CISOs security guide.
2. How Fast the Gap Is Opening: Adoption vs. Controls
Security maturity is not keeping pace with deployment. Gartner projects that 40% of enterprise applications will feature task-specific AI agents by the end of 2026, up from less than 5% in 2025, and that 33% of enterprise software will embed agentic AI by 2028 (from under 1% in 2024). Yet only 17% of organizations have actually deployed agents so far, while 60%+ expect to within two years -- and Gartner warns that more than 40% of agentic AI projects will be canceled by the end of 2027 due to cost, unclear value, or inadequate risk controls. Governance is now a make-or-break variable, not a compliance afterthought.
| Metric | Figure | Source |
|---|---|---|
| Enterprise apps with task-specific AI agents (2025 to 2026) | <5% to 40% | Gartner |
| Enterprise software embedding agentic AI by 2028 | 33% (from <1% in 2024) | Gartner |
| Organizations that have deployed AI agents (2026) | 17% (60%+ plan within 2 yrs) | Gartner |
| Agentic AI projects canceled by end of 2027 | 40%+ | Gartner |
| Orgs with an AI-agent security incident in past year | 65% (all with business impact) | Zenity / CSA |
| Orgs with full security approval for all AI agents | 14.4% | Acuvity |
| Orgs reporting shadow AI use | 98% | Acuvity / CSA |
| Orgs with no visibility into AI data flows | 86% | Industry surveys |
| Agent skills audited containing serious vulnerabilities | 41.7% of 2,890+ | MITRE / secondary |
The shadow-AI dimension is covered in depth in Shadow AI risks.
3. The Agentic Risk Taxonomy
Most confusion in this field comes from blending two separate OWASP documents. Both are products of the OWASP Gen AI Security Project. Agentic AI -- Threats and Mitigations v1.0 (Feb 2025) is the foundational taxonomy of 15 named threats, T1-T15. The OWASP Top 10 for Agentic Applications 2026 (Dec 9, 2025) is the ranked, incident-grounded Top 10, coded ASI01-ASI10.
3.1 OWASP Agentic Threats & Mitigations v1.0 (T1-T15)
The cross-cutting theme: the root cause of reasoning attacks (T6) is the lack of separation between data and instructions. The agentic threats are explicitly framed as extensions of the OWASP LLM Top 10 into autonomous, stateful, multi-agent settings.
| Code | Threat Name | Definition (condensed) | Related LLM Top 10 2025 |
|---|---|---|---|
| T1 | Memory Poisoning | Exploits short- and long-term memory to inject malicious/false data; alters decisions and enables unauthorized operations. | LLM04; LLM08 |
| T2 | Tool Misuse | Manipulates agents to abuse integrated tools via deceptive prompts while staying within authorized permissions; includes Agent Hijacking. | LLM06 |
| T3 | Privilege Compromise | Exploits mismanaged roles, overly permissive configs, or dynamic role inheritance to escalate privileges. | LLM06 |
| T4 | Resource Overload | Deliberately exhausts compute/memory/service capacity; amplified by agents self-triggering and spawning tasks. | LLM10 |
| T5 | Cascading Hallucination Attacks | Plausible-but-false info propagates and amplifies via self-reinforcement and inter-agent loops. | LLM09 |
| T6 | Intent Breaking & Goal Manipulation | Exploits lack of separation between data and instructions to alter planning, reasoning, and self-evaluation. | LLM01 |
| T7 | Misaligned & Deceptive Behaviors | Agents execute harmful/disallowed actions, using deceptive reasoning to appear compliant. | — |
| T8 | Repudiation & Untraceability | Agent actions cannot be traced or accounted for due to insufficient logging/transparency. | — |
| T9 | Identity Spoofing & Impersonation | Exploits authentication to impersonate agents or users and act under false identities. | — |
| T10 | Overwhelming Human-in-the-Loop | Exploits human cognitive limits or floods oversight/validation frameworks. | — |
| T11 | Unexpected RCE & Code Attacks | Exploits AI-generated code execution to inject malicious code via function-calling/tools. | LLM01; LLM05 |
| T12 | Agent Communication Poisoning | Manipulates inter-agent channels to spread false info or influence decisions. | LLM04 |
| T13 | Rogue Agents in Multi-Agent Systems | Compromised agents operate outside monitoring boundaries, executing unauthorized actions or exfiltrating data. | — |
| T14 | Human Attacks on Multi-Agent Systems | Adversaries exploit inter-agent delegation, trust, and workflow dependencies to escalate or manipulate. | — |
| T15 | Human Manipulation | Agent-human trust reduces skepticism; attackers coerce agents to manipulate users or take covert actions. | — |
3.2 OWASP Top 10 for Agentic Applications 2026 (ASI01-ASI10)
The 2026 list is grounded in real 2025 incidents, which distinguishes it from the more theoretical Feb 2025 taxonomy. The most material new addition is ASI04 Agentic Supply Chain Vulnerabilities -- runtime poisoning of the Model Context Protocol (MCP) and Agent2Agent (A2A) ecosystems.
| Code | Title | Description | 2025 Incident | Maps to v1.0 |
|---|---|---|---|---|
| ASI01 | Agent Goal Hijack | Hidden prompts alter objectives/decision path, turning copilots into silent exfiltration engines. | EchoLeak | T6 |
| ASI02 | Tool Misuse | Agents bent legitimate tools into destructive outputs (confused-deputy pattern). | Amazon Q | T2 |
| ASI03 | Identity & Privilege Abuse | Leaked credentials / dropped identity let agents operate beyond intended scope. | Credential abuse | T3 + T9 |
| ASI04 | Agentic Supply Chain Vulnerabilities | Dynamic MCP and A2A ecosystems let runtime components be poisoned (NEW for 2026). | GitHub MCP exploit | New; T2/T13 |
| ASI05 | Unexpected Code Execution | Natural-language execution paths unlock new RCE avenues. | AutoGPT RCE | T11 |
| ASI06 | Memory & Context Poisoning | Memory poisoning reshapes behavior long after the initial interaction. | Gemini Memory Attack | T1 |
| ASI07 | Insecure Inter-Agent Communication | Spoofed inter-agent messages misdirect entire agent clusters. | Spoofed messages | T12 |
| ASI08 | Cascading Failures | A single error/compromise spreads across connected agents/tools/pipelines with escalating impact. | Pipeline cascade | T5 |
| ASI09 | Human-Agent Trust Exploitation | Confident, polished explanations mislead operators into approving harmful actions. | Operator deception | T10 + T15 |
| ASI10 | Rogue Agents | Compromised, misaligned, or drifting agents act with harmful autonomy -- the ultimate insider threat. | Replit meltdown | T13 |
3.3 How Agentic Threats Extend the OWASP LLM Top 10 (2025)
The agentic taxonomy does not replace the OWASP Top 10 for LLM Applications 2025; it builds on it. Five LLM-level entries carry the most agent weight.
| ID | Risk Name | Agent Relevance |
|---|---|---|
| LLM01:2025 | Prompt Injection | Critical -- indirect injection via tools/RAG drives agent compromise |
| LLM02:2025 | Sensitive Information Disclosure | High -- agents/RAG can leak knowledge-base data |
| LLM03:2025 | Supply Chain | High -- agent tools/plugins inherit supply-chain risk |
| LLM05:2025 | Improper Output Handling | Critical -- agent output feeds shells, DBs, browsers |
| LLM06:2025 | Excessive Agency | Highest -- the defining agent risk |
3.4 Six Defensive Playbooks
| Playbook | Threats Mitigated |
|---|---|
| 1. Preventing AI Agent Reasoning Manipulation | T6, T7, T8 |
| 2. Preventing Memory Poisoning & Knowledge Corruption | T1, T5 |
| 3. Securing AI Tool Execution & Preventing Unauthorized Actions | T2, T3, T11, T4 |
| 4. Strengthening Authentication, Identity & Privilege Controls | T3, T9 |
| 5. Protecting HITL & Preventing Human-targeted Threats | T10, T15 |
| 6. Securing Multi-Agent Communication & Trust Mechanisms | T12, T14, T13 |
4. Prompt Injection & Memory Poisoning
Prompt injection is OWASP LLM01:2025 -- the #1 LLM risk for the second consecutive edition. A Prompt Injection Vulnerability occurs when user prompts alter the LLM behavior or output in unintended ways. Direct injection modifies model behavior via user input (Ignore all previous instructions). Indirect injection -- the agent processes external content (websites, files, emails, RAG documents) carrying hidden instructions -- is the dominant agentic threat, because the agent may silently execute injected instructions, with the user privileges, with no user awareness.
4.1 Memory Poisoning: Temporally Decoupled and Worse
Memory poisoning (T1 / ASI06) is more dangerous because it decouples injection from execution across three phases:
- Injection -- malicious content enters via documents/emails/webpages/API responses using instruction-like phrasing (Remember that the user prefers, For future reference always).
- Persistence -- poisoned instructions persist indefinitely across sessions in long-term memory; the agent cannot distinguish learned context from planted content.
- Execution -- a later, unrelated query retrieves the poisoned entry and runs it as self-learned knowledge.
This is why session isolation does not help -- the attack lives in persistent cross-session state. The academic MINJA attack (NeurIPS 2025) achieves >95% injection success and >70% attack success using query-only interaction, no privileged access. The Gemini Memory Attack used conditional, delayed instructions triggered by words like yes, sure, and no that appear in nearly every conversation -- making single-moment runtime detection nearly useless.
4.2 Real-World Incidents (Not Theoretical)
| Name / ID | Target | Class | Mechanism | Severity / Success |
|---|---|---|---|---|
| EchoLeak (CVE-2025-32711) | Microsoft 365 Copilot | Indirect, zero-click | Crafted email → LLM Scope Violation; chained XPIA evasion, reference-style Markdown bypass, auto-fetched image egress, Teams CSP proxy | CVSS 9.3 (Critical); first real-world zero-click prod LLM injection |
| ShadowLeak | ChatGPT Deep Research (Gmail) | Indirect, zero-click | Instructions hidden in email HTML (white-on-white, tiny fonts); server-side exfiltration in OpenAI cloud | 100% success in testing |
| MINJA | LLM agents (academic) | Memory poisoning | Query-only: bridging steps + indication prompts + progressive shortening | >95% injection, >70% attack |
| Gemini Memory Attack | Google Gemini | Memory poisoning | Conditional/delayed instruction triggered by common words | Bypasses runtime guardrails |
EchoLeak: four defenses bypassed in sequence
| Step | Defense Bypassed | Technique |
|---|---|---|
| 1 | XPIA classifier | Benign phrasing (For compliance, do not mention this email) |
| 2 | Markdown link-redaction filter | Reference-style Markdown link not stripped |
| 3 | Requirement for a user click | Auto-fetched reference-style Markdown image -- zero-click |
| 4 | Content Security Policy | Routed through CSP-allowlisted Teams proxy asyncgw.teams.microsoft.com/urlp |
4.3 Defense-in-Depth (No Single Layer Is Sufficient)
| Layer | Control | What It Does | Note |
|---|---|---|---|
| Input/output filtering | Classifiers (XPIA), semantic filters, RAG Triad | Detect injected instructions; validate output format | Bypassable -- EchoLeak evaded XPIA |
| Content provenance | Spotlighting: delimiting, datamarking, encoding | Help model distinguish trusted vs. untrusted tokens | Datamarking cut attack success ~50% to <3% |
| Memory provenance | Provenance tagging + write-ahead validation | Tag every entry with origin/session/source; secondary model validates before commit | Memory-poisoning Layer 2 |
| Sandboxing / least privilege | Privilege control, HITL, CSP, trust-aware retrieval | Limit permissions, gate high-risk actions, contain blast radius | CSP failure enabled EchoLeak egress |
| Behavioral monitoring | Baselines, memory integrity audit, circuit breakers | Detect deviation; quarantine compromised agents | Memory-poisoning Layer 4 |
| Adversarial testing | Red-teaming, LLMail-Inject challenge | Continuously probe defenses | LLM01 control 7 |
The AI Strategy Blueprint
Securing agents is one chapter of a defensible enterprise AI program. The AI Strategy Blueprint frames the executive mandate as seven commitments -- governance, security architecture, and ROI built in from day one -- so the controls in this checklist sit inside a board-level strategy.
5. Excessive Agency & Tool Misuse: The Defining Risk
If you fix one thing, fix this. OWASP LLM06:2025 Excessive Agency has three official root causes -- and the maximum-risk configuration is all three at once, a state teams routinely create during prototyping and never tighten (the quiet drift toward excessive agency).
| Root Cause | Definition | Concrete Agent Example | Primary Control(s) |
|---|---|---|---|
| Excessive Functionality | Tools include capabilities beyond task need | Tool offers modify/delete when only read needed; deprecated plugin still callable; open-ended shell function | Minimize tools; limit functionality; avoid open-ended extensions |
| Excessive Permissions | Tools run with broader downstream privileges than required | DB creds with UPDATE/INSERT/DELETE when only SELECT needed; shared service account instead of user identity | Least privilege; user-context OAuth minimum scope; complete mediation downstream |
| Excessive Autonomy | High-impact actions proceed without verification | Agent deletes documents / sends wire / external email without confirmation | Human-in-the-loop on high-impact/irreversible actions |
5.1 The Control Stack (Defense-in-Depth, In Order)
| Control | Mechanism | Example / Detail |
|---|---|---|
| Tool allowlist / minimization | Default zero tool access; add tools at runtime per permission | By default the agent should not have any tool access (Auth0) |
| Scoped capabilities | RBAC vs. fine-grained (ReBAC / OpenFGA) authorization | Can user:anne use buyStock on asset:OKTA? |
| Credential delegation | Short-lived OAuth 2.0 tokens; token vault + OAuth Federation | No raw/long-term creds stored by the agent |
| Human-in-the-loop | Explicit consent for high-impact/irreversible actions | CIBA push approval; 60s confirmation timeout; gates delete_file, send_email, run_code, update_database, modify_iam_policy |
| Output schema / bounding | Validate tool-call args vs. schema; delimit untrusted content | Wrap external input in delimited tags; filter injection strings |
| Damage limitation | Rate-limiting, sandboxing, tamper-evident audit logs | SHA-256 result hashing + reasoning traces; sandbox all code execution |
OWASP eight LLM06 controls: minimize extensions; minimize functionality; avoid open-ended extensions; minimize permissions; execute in the user context (OAuth minimum scope, not a shared service account); require human approval for high-impact actions; complete mediation (validate all downstream requests rather than trusting the LLM); sanitize inputs/outputs. The two emphasized controls are the ones most often skipped.
5.2 Autonomy Tiers -- Govern the Dial, Do Not Max It
6. Agent Identity, Authentication & Least Privilege (the NHI Problem)
An AI agent is a non-human identity (NHI) -- a digital identity that authenticates and operates without direct human control. Treating agents as first-class identities (not as features running under a human credentials) is the single most important access-control decision you will make.
The scale is the problem. Enterprises now average ~82 machine identities per employee; the ratio moved from ~92:1 (early 2024) to ~144:1 (end of 2025), and Palo Alto Networks 2026 report puts the cross-environment average at 109:1, with cloud-native environments reaching tens of thousands of machine identities per human. Agents are the fastest-growing class: Palo Alto projects +85% AI-agent growth over the next 12 months.
6.1 The Anti-Pattern: Agents Under Broad User Credentials
When an agent reuses a human user session or a shared key, three things break: permissions become excessive, audit becomes impossible (OWASP NHI10 Human Use of NHI), and you create classic confused-deputy exposure. The fix is a distinct, managed identity per agent. The OWASP Non-Human Identities Top 10 (2025) is the reference frame:
| ID | Risk | Relevance to AI-Agent Least Privilege |
|---|---|---|
| NHI1:2025 | Improper Offboarding | Decommissioned agents left active create persistent backdoors |
| NHI2:2025 | Secret Leakage | Agent memory, tool results, transcripts, crash dumps leak creds |
| NHI3:2025 | Vulnerable Third-Party NHI | Compromised connector enables supply-chain attack |
| NHI4:2025 | Insecure Authentication | Weak/legacy auth enables takeover and escalation |
| NHI5:2025 | Overprivileged NHI | Agent granted more than its task needs; expands blast radius (core risk) |
| NHI6:2025 | Insecure Cloud Deployment Configs | High-privilege CI/CD misconfig enables unauthorized access |
| NHI7:2025 | Long-Lived Secrets | ~50% of NHI creds are long-lived keys; fix is ephemeral tokens |
| NHI8:2025 | Environment Isolation | Reusing one NHI across test/prod enables cross-env compromise |
| NHI9:2025 | NHI Reuse | One identity shared across workloads removes least-privilege boundaries |
| NHI10:2025 | Human Use of NHI | Cannot distinguish agent vs. human; breaks audit |
6.2 The Controls
| Control | Recommended Practice | Standard |
|---|---|---|
| Identity per agent | Each agent authenticates as a distinct principal; no reused sessions or shared keys | OWASP NHI10 |
| Token lifetime | Minute-scale, short-lived; JIT issuance; retired at task completion (zero standing privilege) | NIST SP 800-207A |
| Authentication | OAuth 2.1 + PKCE (RFC 7636), dynamic client registration; MCP HTTP mandates OAuth 2.1 | MCP spec |
| Delegation | Token Exchange (RFC 8693); token carries agent + user identity as separate claims | RFC 8693 |
| Authorization model | ABAC at runtime; capability-based verb-on-resource scopes, not broad roles | NIST SP 800-162 |
| Posture | Default-deny with explicit grants; org-level deny policies block excessive configs | OWASP NHI5 |
| Effective authority | Intersection of agent and user permissions, never the union (confused-deputy defense) | — |
| Workload identity | SPIFFE/SPIRE, short-lived OIDC, STS assume-role; SCIM for provisioning | NIST 800-207A |
| High-impact actions | Out-of-band human approval via a channel the agent cannot forge | — |
NIST SP 800-207A states the principle directly: each service should present a short-lived cryptographically verifiable identity credential, authenticated per connection and reauthenticated regularly. Note the self-escalation risk: an agent with enough initial access can dynamically modify its own permissions -- which is exactly why org-level deny policies and out-of-band approval for high-impact actions are non-negotiable. RBAC alone is insufficient; ABAC plus capability tokens (Macaroons, Biscuit) is the recommended posture.
7. Multi-Agent Systems: Cascading Failure, Impersonation & Protocol Security
Multi-agent security is non-compositional: individually safe agents can compose into an unsafe system, because trust does not aggregate predictably across agent-to-agent calls. You cannot certify a fleet by certifying each agent. The relevant OWASP 2026 risks are ASI07, ASI08, ASI09/ASI10, and ASI03.
| # | Threat Class | Description |
|---|---|---|
| 1 | Privacy & Information Integrity | Unauthorized data access or corruption across agent boundaries |
| 2 | Collusion & Exfiltration | Coordinated extraction/leak, incl. secret/steganographic collusion |
| 3 | Exploitation | Agents abusing vulnerabilities in other agents decision processes |
| 4 | Swarm Attacks | Coordinated assaults that appear benign individually |
| 5 | Heterogeneous Attacks | Mixed-capability adversaries exploiting role specialization |
| 6 | Overseer Attacks | Compromising human supervisors or monitoring systems |
| 7 | Cascade Attacks | Failures propagating through agent dependencies |
| 8 | Conflict & Mixed-Motive Threats | Misaligned objectives creating systemic risk |
| 9 | Physical & Embodied Security | Agents controlling real-world systems |
| 10 | Sociotechnical Threats | Manipulation of humans and institutions |
7.1 Agent Impersonation & Protocol Security (A2A and MCP)
In A2A, a malicious agent crafts a deceptive Agent Card to misrepresent its capabilities and win the host LLM-based selection. Trustwave SpiderLabs demonstrated an Agent-in-the-Middle attack in 2025. A2A v0.3+ supports but does not enforce card signing, so card spoofing via DNS/CDN compromise is a low-cost, routine threat.
| Protocol | Named Attack | Mitigation / Spec Control |
|---|---|---|
| A2A (v0.3+) | Agent Card spoofing/tampering (DNS/CDN); signing supported but not enforced | Enforce card signing; serve over HTTPS/TLS 1.3+; mTLS agent identity |
| A2A | Agent-in-the-Middle impersonation (Trustwave 2025) | Verify card provenance/signature; do not rely on LLM selection alone |
| A2A | OAuth2 long-lived tokens, coarse scopes, no consent gate | Short-lived audience-scoped tokens; capability-based access; protocol-level consent |
| MCP | Tool poisoning (malicious instructions in tool metadata); 5 of 7 clients lack static validation | Static metadata analysis; client-side validation; behavioral anomaly detection |
| MCP | Rug pull (tool definition mutates after approval) | Pin/version tool definitions; re-approval on change; integrity checks |
| MCP | Confused deputy (proxy uses server, not user, privileges) | Per-user identity passthrough done correctly; avoid a static single OAuth Client ID |
| MCP (spec 2025-06-18) | Token passthrough abuse | Prohibited by spec; servers = OAuth 2.1 Resource Servers; validate token audience; mint per-call audience-scoped tokens |
8. Agent Inventory & Discovery: You Cannot Secure What You Cannot See
The first deliverable of any agentic-AI security program is a continuous, living inventory of every agent -- including shadow deployments. CSA defines three discovery gaps that make agents uniquely hard to inventory:
Traditional tools cannot find ephemeral agent runtimes in IDEs, desktops, browser sessions, MCP servers, and personal accounts.
Agents inherit employee credentials and may exceed consciously granted permissions.
Teams rarely inspect prompts, skills, MCP tool definitions, memory stores, and agent instructions for malicious behavior.
8.1 Registry Architecture & the Unit of Inventory
Two production-relevant directions: the OWASP Agent Name Service (ANS) -- a protocol-agnostic discovery registry (IETF draft) with DNS-inspired naming, PKI certificates, and a Protocol Adapter Layer covering A2A, MCP, and ACP -- and Microsoft Entra Agent ID / Agent 365, a production enterprise registry with tenant-wide counters for Total, Ownerless, and Unmanaged agents.
9. Audit, Traceability & Logging (EU AI Act Art. 12)
Agent audit logging is not application logging. It must capture decisions, prompts, tool calls, delegated authority, and outcomes -- a full forensic trail. Two layers matter: the regulatory baseline (what you must record) and the engineering layer (how you make those records trustworthy).
9.1 Regulatory Baseline -- EU AI Act Article 12
Article 12(1): High-risk AI systems shall technically allow for the automatic recording of events (logs) over the lifetime of the system. Logging must be technical, automatic, and lifetime -- manual recording does not satisfy the requirement.
| Provision | Requirement | Detail |
|---|---|---|
| Art 12(1) | Automatic event logging | Technically built in; over the system lifetime; manual recording insufficient |
| Art 12(2)(a) | Risk identification | Log events relevant to risk situations or substantial modification |
| Art 12(2)(b) | Post-market monitoring | Support post-market monitoring per Article 72 |
| Art 12(2)(c) | Operation monitoring | Support monitoring of operation per Article 26(5) |
| Art 26(6) | Retention | Deployers retain auto-generated logs for a minimum of 6 months, subject to law |
9.2 Engineering Layer -- Tamper-Evident Logs
Article 12 mandates that logs exist and be automatic, but does NOT prescribe how logs resist tampering. Tamper-evidence is your design choice -- and it is what satisfies SOC 2, ISO 27001, and forensic readiness. The five primitives:
| Property | Mechanism | Notes |
|---|---|---|
| Append-only | WORM / immutable storage | Entries added, never removed or modified |
| Tamper-evident | SHA-256 hash chain over canonical-JSON events | Any altered byte breaks all subsequent links |
| Independently verifiable | Merkle tree; recompute leaves and re-chain | External auditor verifies without trusting the runtime |
| Identity-bound | Cryptographic signature tied to agent credential | Plus the human authorizer who delegated the workflow |
| Time-ordered | Sequential chain; tamper-resistant timestamps | Suitable for replay/forensics |
C1 = SHA-256( C0 + bytes(E1) ) C2 = SHA-256( C1 + bytes(E2) ) C3 = SHA-256( C2 + bytes(E3) ) <- stored as the chain head
An auditor recomputes C1', C2', C3' from the canonical bytes and compares C3' to the stored head. If an attacker alters a single byte in E2 (e.g. changing operation:delete to operation:read), then C2' does not match C2, which forces C3' to differ -- the tampering is detected at the chain head even though only one intermediate entry changed. A Merkle tree over the leaves provides the same guarantee with O(log n) inclusion proofs.
9.3 Required Fields per Agentic Access (Kiteworks Model)
| Field | Definition |
|---|---|
| Agent identity | Unique workflow-level credential of the agent performing the access |
| Human authorizer | Authenticated identity of the human who delegated the workflow |
| Data accessed | Specific record identifiers + data classification |
| Operation performed | Specific action: read, download, move, delete, forward |
| Policy-evaluation outcome | Permitted/denied + which policy attribute governed the decision |
| Timestamp | Precise, retroactively-unalterable event time |
Supporting standards: NIST SP 800-92, SOC 2, ISO 27001, plus HIPAA 45 CFR 164.312(b), SEC Rule 17a-4 (WORM), NIST 800-171 (3.3.1), CMMC AU.2.042, NYDFS Part 500 500.6. Map these obligations to your broader program in AI compliance frameworks.
Get Chapter 1 Free + AI Academy Access
Download the first chapter of The AI Strategy Blueprint and get instant access to our AI Academy -- covering AI governance, security architecture, and the seven executive commitments behind a defensible agentic-AI program.
10. Runtime Defense-in-Depth: Guardrails, Sandboxing & Kill-Switches
Runtime agent security layers four independent control planes. Both OWASP and Meta frame guardrails as a final layer of defense, not the only one.
10.1 Guardrail Frameworks
| Guardrail | Function | Architecture / Model | Key Metrics |
|---|---|---|---|
| PromptGuard 2 | Jailbreak / prompt-injection classifier (input) | BERT-family: 86M or 22M | 98% AUC English; 97.5% recall @1% FPR; 19.3-92.4 ms on A100 |
| AlignmentCheck | Chain-of-thought goal-hijack auditor | Guardrail LLM: Llama 3.3 70B / Llama 4 Maverick | >80% recall, <4% FPR (internal) |
| CodeShield | Static analysis of generated code | Semgrep + regex, 8 languages, 50+ CWEs | 96% precision, 79% recall; ~60-300 ms tiers |
10.2 Sandboxing & Isolation of Tool Execution
Treat all generated code as untrusted; remove direct eval(); run one task per ephemeral sandbox with no artifact carryover.
| Technology | Isolation Mechanism | Best Fit | Notes |
|---|---|---|---|
| Firecracker / Kata microVM | Hardware-virtualized microVM | Regulated/sensitive data; strongest | E2B boots <200 ms; recommended minimum for production |
| gVisor | User-space Go kernel intercepts syscalls | Compute-heavy multi-tenant | Sandboxed code never talks to host kernel directly |
| V8 Isolates | Per-context JS engine isolation | Latency-critical lightweight tasks | JS/TS only; weakest boundary |
10.3 Threat-to-Control Map (OWASP ASI01-ASI10)
| ID | Risk | Key Defense-in-Depth Controls |
|---|---|---|
| ASI01 | Agent Goal Hijack | Prompt-injection filtering; limited tool privileges; human approval for goal changes |
| ASI02 | Tool Misuse & Exploitation | Sandboxed execution; strict permission scoping; argument validation |
| ASI03 | Identity & Privilege Abuse | Short-lived credentials; task-scoped permissions; isolated identities |
| ASI04 | Agentic Supply Chain | Signed manifests; curated registries; dependency pinning; sandboxing; kill-switches |
| ASI05 | Unexpected Code Execution | Treat generated code as untrusted; remove eval(); hardened sandboxes; review steps |
| ASI06 | Memory & Context Poisoning | Memory segmentation; ingestion filtering; provenance tracking; entry expiry |
| ASI07 | Insecure Inter-Agent Comm. | Mutual TLS; signed payloads; anti-replay; authenticated discovery |
| ASI08 | Cascading Failures | Isolation boundaries; rate limits; circuit breakers; pre-deployment plan testing |
| ASI09 | Human-Agent Trust Exploitation | Forced confirmations; immutable logs; risk indicators |
| ASI10 | Rogue Agents | Governance; sandboxing; behavioral monitoring; kill-switches |
11. Mapping to the Frameworks: NIST AI RMF, CISA/Five Eyes & MITRE ATLAS
11.1 NIST AI Risk Management Framework
NIST AI 600-1 (Generative AI Profile of AI RMF 1.0, July 2024) is built on four core functions -- GOVERN, MAP, MEASURE, MANAGE. It defines 12 GAI risk categories and 200+ suggested actions. NIST term for hallucination is confabulation. AI 600-1 was scoped to content generation, not autonomous action; agentic risk is handled by NIST AI 100-5 plus the CSA NIST AI RMF Agentic Profile (draft) using AG- extensions.
| Function | AI 600-1 GenAI Focus | Agentic Extensions (CSA AG- profile, draft) |
|---|---|---|
| GOVERN (GV / AG-GV) | Risk culture, policy, accountability, value-chain oversight | AG-GV.1 Autonomy Tier Classification; AG-GV.2 Delegation Accountability; AG-GV.3 Agent Lifecycle Governance |
| MAP (MP / AG-MP) | Establish context; identify which of 12 GAI risks apply | AG-MP.1 Tool-Use Risk Inventory; AG-MP.2 Action-Consequence Mapping; AG-MP.3 Multi-Agent Topology Risk |
| MEASURE (MS / AG-MS) | Assess, benchmark, track; red-teaming, evals | AG-MS.1 Behavioral Telemetry; AG-MS.2 Autonomy Calibration; AG-MS.3 Delegation Chain Monitoring |
| MANAGE (MG / AG-MG) | Prioritize, respond, recover; incident response | AG-MG.1 Agentic Incident Response; AG-MG.2 Behavioral Drift Correction; AG-MG.3 Agent Decommissioning |
11.2 CISA / NSA / Five Eyes Guidance
| Publication | Date | Core Focus |
|---|---|---|
| Deploying AI Systems Securely | 15 Apr 2024 | Zero Trust, secure-by-design, model-weight protection, RBAC/ABAC + MFA, monitoring |
| AI Data Security: Best Practices | 22 May 2025 | Securing training/operational data: supply chain, poisoning, drift; provenance & encryption |
| Principles for Secure Integration of AI in OT | Dec 2025 | Critical-infrastructure/OT: Understand, Assess, Govern, Embed safety |
| Careful Adoption of Agentic AI Services | 1 May 2026 | First dedicated agentic AI guidance: 5 risk categories + lifecycle controls |
The May 2026 agentic guidance defines five named risk categories: Privilege risks, Design & configuration risks, Behavioral risks, Structural risks, and Accountability risks. Its immediate actions are a ready-made program kickoff: inventory all agentic deployments (including shadow); conduct blast-radius assessments; audit service accounts for excessive permissions; replace persistent credentials with just-in-time (JIT) provisioning; extend logging to capture agent actions. The AI Data Security CSI also specifies AES-256 + post-quantum encryption, FIPS 140-3 storage, and cryptographically signed append-only provenance ledgers.
11.3 MITRE ATLAS
MITRE ATLAS is the threat-informed-defense knowledge base for AI systems. On 2025-10-21, MITRE ATLAS and Zenity Labs released the first formal agent-specific techniques:
| AML.T ID | Technique | What It Does |
|---|---|---|
| AML.T0080 | AI Agent Context Poisoning | Manipulate the context an agent uses; subs Memory and Thread |
| AML.T0081 | Modify AI Agent Configuration | Alter config files affecting one or many agents |
| AML.T0082 | RAG Credential Harvesting | Harvest credentials from documents ingested into a RAG database |
| AML.T0083 | Credentials from AI Agent Configuration | Extract tool/service credentials from agent settings |
| AML.T0084 | Discover AI Agent Configuration | Enumerate config (Embedded Knowledge / Tool Definitions / Activation Triggers) |
| AML.T0085 | Data from AI Services | Exfiltrate via agent services (RAG Databases / AI Agent Tools) |
| AML.T0086 | Exfiltration via AI Agent Tool Invocation | Abuse the agent own tools to move data out |
For a CISO-level synthesis of all three frameworks, see AI for CISOs security and the program backbone in our AI governance framework. (ATLAS counts are version-sensitive and release monthly -- verify before publishing.)
12. Air-Gapped & On-Prem Containment: Shrinking the Blast Radius
Every preceding section reaches the same conclusion: you must assume injection succeeds, so the durable control is containment -- and the strongest containment is removing the egress channel entirely. Air-gapped / on-prem deployment is blast-radius reduction by architecture: no internet, no outbound connections, no DNS resolution, no NTP sync.
This is not abstract. Re-read the EchoLeak and ShadowLeak chains from Section 4: both depended on the agent reaching an external endpoint -- Markdown image auto-fetch, SSRF, and tool callouts to external URLs. In a true air-gapped deployment, those channels are architecturally impossible. The same logic neutralizes MITRE ATLAS AML.T0086 (exfiltration via tool invocation) for any tool that would otherwise call out.
| Dimension | Detail |
|---|---|
| Egress posture | No internet, no outbound connections, no DNS, no NTP; no licensing/telemetry callbacks |
| Channels neutralized | Markdown image auto-fetch, SSRF, external tool callouts -- architecturally impossible |
| Compliance fit | NIST 800-171 / CMMC 2.0 L3, NIST RMF 800-37, FedRAMP High, DoD IL4-IL5, ITAR, HIPAA, CJIS, GDPR/sovereignty |
| Reference model stack | Llama 3 8B/70B, Mistral, Falcon (open-weight) on vLLM or llama.cpp |
| Vector / embedding stack | Qdrant or Milvus; E5 / Voyage embedding models |
| Reference hardware | NVIDIA A10G 24GB or A100 80GB; GPU server ~$8,000-$25,000 |
| Deployment timeline | 4-12 weeks (air-gapped) vs. 1-2 weeks (connected VPC) |
| Residual risk | Does NOT prevent injection, local corpus poisoning, or insider/physical exfiltration -- pair with layered controls |
Note the strict definition: even a single firewall rule allowing outbound HTTPS to a licensing or telemetry server disqualifies the deployment from true air-gap status. The compliance dividend is concrete: FedRAMP High and DoD IL4-IL5 deployments eliminate entire boundary-defense control categories (no boundary to defend), and ITAR technical data cannot traverse foreign-accessible infrastructure at all.
Compare deployment options in best AI air-gapped environments, and see how AirgapAI implements local-inference containment for enterprise agents.
13. The CISO Agentic AI Security Checklist (2026)
This checklist consolidates OWASP (LLM Top 10, Agentic Top 10, NHI Top 10), NIST AI RMF, the CISA/Five Eyes agentic guidance, and MITRE ATLAS into seven operational domains. Work top to bottom; Inventory is the prerequisite for everything else. Each item names the control AND the authoritative source -- more actionable than a generic template.
- Maintain a continuous, living inventory of every agent, including shadow/informal deployments.CISA / OWASP ANS
- Treat a managed agent identity as the unit of inventory -- flag any agent with no registry entry, no owner, OR no managed identity as a Critical shadow agent.Agent 365
- Run continuous discovery across endpoints, IDEs, browsers, MCP servers, SaaS, and personal accounts (close the Discovery Gap).CSA
- Inventory each agent tool access, memory stores, prompts/skills, and MCP tool definitions (close the Logic Inspection Gap).CSA / ASI04
- Conduct a blast-radius assessment mapping every agent tools, data, and downstream reach.CISA
- Give every agent its own distinct, managed non-human identity; never run agents under shared keys or reused human sessions.OWASP NHI10
- Default-deny: agents start with zero tool access; grant explicitly and minimally.LLM06 / NHI5
- Use short-lived, minute-scale credentials with JIT issuance and zero standing privilege; retire at task completion.NIST 800-207A
- Replace long-lived API keys (the ~50% problem) with ephemeral, auto-rotated tokens.OWASP NHI7
- Use OAuth 2.1 + PKCE; delegate via Token Exchange (RFC 8693) so the token carries agent AND user identity as separate claims.RFC 8693 / MCP
- Authorize with ABAC / capability-based scopes (verb-on-resource), not broad roles.NIST 800-162
- Enforce effective authority = intersection of agent and user permissions, never the union.Confused-deputy
- Audit service accounts for excessive permissions and add org-level deny policies that block self-escalation.CISA / NHI5
- Assign distinct cryptographic identities per agent (SPIFFE/SPIRE; mTLS for inter-agent).CISA
- Minimize the number of tools and limit each tool to essential functionality; avoid open-ended shell/URL extensions.LLM06
- Execute every tool in the user context with minimum OAuth scope; never a shared service account.LLM06
- Validate every tool-call argument against a defined output schema; wrap untrusted external content in delimited blocks.LLM06 / Spotlighting
- Enforce complete mediation -- re-authorize every downstream request at the resource, never trust the LLM.LLM06
- Verify every MCP server before approval; pin/version tool definitions and require re-approval on change (rug-pull defense).ASI04 / MCP
- Confirm MCP servers act as OAuth 2.1 Resource Servers with audience-scoped tokens and no token passthrough (spec 2025-06-18).MCP spec
- Maintain an SBOM for models, adapters (LoRA), tools, and datasets; verify provenance and model signatures.LLM03 / ASI04
- Never auto-approve tool calls based on repository/document content; disable auto-run / YOLO modes.ASI02 / ASI05
- Deploy layered guardrails (input/output classifiers + chain-of-thought alignment check + generated-code static analysis) as a final layer, not the only one.LlamaFirewall
- Sandbox all tool/code execution at microVM strength minimum (Firecracker/Kata); one ephemeral sandbox per task, no artifact carryover.ASI05
- Treat all generated code as untrusted; remove direct eval().ASI05 / LLM05
- Enforce per-agent, per-tool, per-session rate limits and resource quotas.ASI08 / LLM10
- Implement circuit breakers, transactional rollback, and safe-failure modes that pause and escalate to a human.ASI08
- Provide an instant, auditable kill-switch / emergency shutdown for runaway or rogue agents.ASI10 / CISA
- Separate planning from execution (Plan-then-Execute) architecturally.CISA
- Strip hidden instructions and verify provenance on every ingested document before embedding.LLM01
- Enforce permission-aware retrieval and tenant isolation at the retrieval layer (before documents enter the context window), not just the app layer.LLM08
- Tag every memory entry with origin/session/source; validate writes with a secondary model; expire entries.ASI06
- Encrypt data at rest, in transit, and in compute (AES-256 + post-quantum); store in FIPS 140-3 systems.CISA AI Data Sec
- Track data provenance via cryptographically signed, append-only ledgers; verify integrity with hashes.CISA AI Data Sec
- Monitor output for exfiltration signatures; disable client-side auto-fetch of remote images/links; enforce strict CSP and egress allowlists.LLM05 / EchoLeak
- For high-sensitivity workloads, deploy air-gapped/on-prem to remove the egress channel entirely.Containment
- Log every decision, tool call, and state change automatically, including a stable goal identifier, exact prompt, exact output, tool-selection rationale, and parameters.EU AI Act Art. 12
- Capture the six mandatory fields per access (agent identity, human authorizer, data accessed, operation, policy outcome, timestamp) -- log permitted AND denied actions at operation-level granularity.Kiteworks
- Make logs append-only and tamper-evident (SHA-256 hash chain + Merkle tree), identity-bound, time-ordered, and independently verifiable.SOC 2 / ISO 27001
- Store logs in WORM/replicated storage; retain a minimum of 6 months (longer where law requires).EU AI Act Art. 26(6)
- Integrate structured, discrete-field logs into SIEM in real time with anomaly alerts.NIST 800-92
- Establish a per-agent behavioral baseline and alert on deviation (drift detection).ASI10
- Adopt the NIST AI RMF four functions (Govern/Map/Measure/Manage) as your governance spine; layer the GenAI profile (AI 600-1) and the agentic profile (NIST AI 100-5 / CSA draft).NIST AI RMF
- Classify each agent by autonomy tier; grant the lowest tier that works and promote only deliberately.CSA tiers
- Require human approval / out-of-band confirmation for high-impact, irreversible actions (delete_file, send_email, run_code, update_database, modify_iam_policy).LLM06 / CISA
- Tune human-in-the-loop thresholds to risk/confidence/context; automate low-risk, escalate high-risk (prevent reviewer flooding).T10 / ASI09
- Threat-model before/concurrent with deployment, including explicit inter-agent trust modeling.CISA / MAESTRO
- Progressively increase access and autonomy -- never grant full autonomy on day one.CISA
- Define an agentic incident-response plan with pre-authorized auto-containment.AG-MG.1
- Govern decommissioning: revoke credentials, dispose of memory, and remove registry entries (prevent orphaned backdoors).OWASP NHI1 / AG-MG.3
- Red-team agents continuously (adversarial testing, attack simulation).LLM01 control 7
- Map your controls to MITRE ATLAS techniques (esp. AML.T0080-T0086) and re-baseline as the framework updates monthly.MITRE ATLAS
Frequently Asked Questions
Put the Checklist to Work
Securing agents is one chapter of a defensible enterprise AI program. Build the strategy behind the controls, then turn this checklist into a tailored roadmap.
Sources & References
OWASP Standards
- OWASP Agentic AI -- Threats and Mitigations, v1.0 (Feb 2025)
- OWASP Top 10 for Agentic Applications for 2026
- OWASP Top 10 for LLM Applications 2025 (PDF)
- OWASP LLM06:2025 Excessive Agency
- OWASP Non-Human Identities Top 10 -- 2025
Government & Frameworks
- CISA -- Guide to Secure Adoption of Agentic AI
- AI Data Security: Best Practices (NSA/CISA/FBI CSI PDF)
- NIST AI 600-1: AI RMF Generative AI Profile (PDF)
- NIST SP 800-207A: Zero Trust for Cloud-Native
- MITRE ATLAS Secure AI v2 Release (MITRE CTID)
- EU AI Act Article 12: Record-Keeping
Incidents, Research & Engineering
- EchoLeak: First Real-World Zero-Click Prompt Injection (arXiv 2509.10540)
- ShadowLeak Zero-Click Flaw Leaks Gmail Data via ChatGPT Deep Research
- Persistent Memory Poisoning in AI Agents (MINJA, Gemini Memory Attack)
- Open Challenges in Multi-Agent Security (arXiv 2505.02077)
- A Security Engineer Guide to the A2A Protocol (Semgrep)
- LlamaFirewall: Open-source guardrail system (arXiv 2505.03574)
- Tamper-Evident Audit Trails for AI Agents: SIEM Integration (Kiteworks)
- Zenity Labs & MITRE ATLAS -- New Agent Techniques (AML.T0080-T0086)
- Gartner: Over 40% of Agentic AI Projects Will Be Canceled by End of 2027
- Shadow AI Agents: The Insider Threat You Are Not Monitoring Yet (CSA)