How much autonomy should an AI agent have and when should actions require approval?

Question

Accepted Answer

An AI agent's autonomy should be multi-dimensional, with each capability having its own autonomy setting ranging from fully autonomous to never-permitted, and actions should require approval based on factors like reversibility, blast radius, cost, external visibility, and compliance sensitivity.

Concrete controls for managing AI agent autonomy and approval requirements include:

Tiered Authorization Implement tiered authorization where low-risk actions proceed with logging only, medium-risk actions undergo anomaly checks, high-risk actions require synchronous approval, and catastrophic-risk actions necessitate multi-party approval. This aligns with the NIST AI RMF function of Govern and the OWASP LLM Top 10 risk of LLM01: Prompt Injection by ensuring appropriate oversight for actions with varying impact.
Human Oversight & Override Design intervention points into the architecture for high-stakes irreversible actions, ambiguous edge cases, regulatory requirements for human review, or situations where the agent's confidence is low. This includes pre-action approval gates for high-stakes actions (e.g., financial transactions above a threshold, external communications, irreversible operations) and post-action review queues for sampling completed actions. This addresses the NIST AI RMF function of Govern and Map by establishing clear human intervention points.
Autonomy Policy and Re-attestation Define explicit, machine-readable autonomy policies with clear precedence rules to prevent autonomy ambiguity. Periodically re-attest autonomy levels with explicit sign-off and track boundary changes to mitigate autonomy creep. This relates to the NIST AI RMF function of Govern and Manage by ensuring policies are clear and regularly reviewed.
Budget-Based Autonomy Implement budget-based autonomy where an agent halts and requires human review if it exceeds a configured budget of tokens, dollars, calls, or affected records, regardless of the risk level. This provides a backstop against uncontrolled resource consumption and aligns with the NIST AI RMF function of Manage.
Intent Re-verification Before any consequential action, the system should re-derive whether the action falls within the declared intent, operating from the originally attested intent rather than the agent’s current reasoning. This helps to mitigate goal misalignment cascades and is a control for LLM01: Prompt Injection.
Action Rollback Design agent tools with reversibility in mind, such as soft-delete defaults, transactional staging, and two-phase commit for high-stakes actions, to preserve the option to undo actions if something goes wrong. This is a critical control for the NIST AI RMF function of Protect and Recover.

How much autonomy should an AI agent have and when should actions require approval?

How does your AI agent score?

Related questions