Home · AI Security Answers · OWASP LLM Top 10
What is OWASP LLM10 unbounded consumption and how do I cap tokens, cost, and denial-of-wallet on an LLM API?
OWASP LLM10 Unbounded Consumption refers to resource exhaustion or denial-of-wallet attacks, as well as model extraction and theft through unbounded querying. To cap tokens, cost, and prevent denial-of-wallet on an LLM API, implement the following controls:
- Rate Limits and Quotas: Implement rate limiting (e.g., sliding window, configurable RPM) and quotas on API requests to prevent excessive usage. This aligns with the OWASP LLM10 Unbounded Consumption risk.
- Token and Spend Caps: Define explicit token limits for API calls, such as
max_tokens, and enforce them by capping the maximum allowed tokens. Implement a cost accounting system that tracks token usage and multiplies it by per-million pricing tables to attribute spend. This system should accumulate per-session totals and warn or gate the user when spending crosses a predefined threshold. This directly addresses the OWASP LLM10 Unbounded Consumption risk and the NIST AI RMF GOVERN function by managing financial impact. - Context Management with Thresholds: Utilize context management strategies that detect approaching context window limits early and apply progressively more expensive remedies. Define token thresholds that trigger proactive actions (e.g., auto-compacting conversation history at 70% of the effective window) and block new requests entirely at higher thresholds (e.g., 98%). This helps prevent
prompt_too_longerrors and manages token consumption. - Circuit Breakers for Compaction Failures: Implement circuit breakers that stop attempts to compact context after a certain number of consecutive failures (e.g., 3 failures) to avoid burning API budget on persistently failing operations.
- Session-Scoped Cost Tracking: Ensure that costs are tracked per session, and that resuming a different session does not bleed costs over from a previous one. This prevents accidental inflation of budgets across sessions.
- Budget Gates and Atomic Refusal: Implement a hard budget gate that raises an exception (e.g.,
BudgetExceeded) before an LLM call would push daily spend past a defined budget cap. This ensures atomic refusal of calls that would exceed the budget. - Audit Logging: Every sampling call should be audit-logged with details such as the model used, token count, and server name. This provides visibility into usage and helps in identifying potential abuse.
Grounded in
- Chapter 6: Context Management at Scale (Claude Code vs. Hermes Agent)
- Chapter 13: MCP Integration — Connecting Agents to the World (Claude Code vs. Hermes Agent)
- Chapter 1: Hermes Agent: Cost & Token-Usage Accounting (Claude Code vs. Hermes Agent)
- owasp_llm_top10
How does your AI agent score?
Get a free, instant AI agent security readiness snapshot — mapped to NIST, OWASP & ISO — then unlock the full report with a prioritized, cited fix-list.
This AI-generated answer is for guidance only — not a certification, audit, or penetration test. Grounded in the NIST AI RMF, OWASP LLM Top 10, and ISO/IEC 42001 control text; verify applicability to your environment.