How do I secure tool and function calling in AI agents?

Question

Accepted Answer

Securing tool and function calling in AI agents requires robust validation, authorization, and continuous monitoring to prevent misuse and ensure actions align with intended goals. This is crucial because agents can autonomously execute multi-step task chains, access various tools, and interact with external LLMs, creating potential data exfiltration paths, lateral movement vectors, or privilege escalation opportunities.

To secure tool and function calling:

Implement tool-call validation gates including schema validation, allowlisted tools/actions, and parameter constraints. Schema validation on every call is the most effective and cheapest check, as agents under prompt injection often produce malformed responses, and refusing to proceed on schema violation can interrupt many attacks. This addresses the OWASP LLM Top 10 risk of "Tool misuse and unsafe tool calls".
Enforce strong authorization for agents, observing agent behavior to recommend policies based on which tools each agent calls, what data it accesses, and who it delegates to. For custom-built agents, use an SDK that wraps every tool call and agent-to-agent handoff, sending context to an authorization judge at each step. When calls cross trust boundaries, issue short-lived, cryptographically signed transaction tokens that carry the user's verified identity, the agent chain, and the session's original declared intent to close the Confused Deputy gap.
Perform intent re-verification before any consequential action, re-deriving whether the action falls within the originally attested intent, rather than the agent’s potentially corrupted current reasoning. This helps mitigate "Goal misalignment cascades".
Maintain a trusted registry of tools and validate tool schemas and descriptions to ensure that every tool an agent can invoke is trustworthy and that the agent understands what the tool actually does. Track agent-tool relationships across the entire business to gain visibility into which agents can use which tools and how these relationships evolve.
Implement sandboxing for code execution using isolation technologies like gVisor or Firecracker, restricting host filesystem access, setting kernel-level resource limits, and only allowing outbound network access through a broker. This helps prevent "Container escape from sandboxed code execution".
Design agent tools with reversibility in mind, incorporating features like soft-delete defaults, transactional staging, or two-phase commit for high-stakes actions to enable action rollback when necessary. This aligns with the NIST AI RMF function of "Human Oversight & Override".

How do I secure tool and function calling in AI agents?

How does your AI agent score?

Related questions