Home · AI Security Answers · OWASP LLM Top 10
What is system prompt leakage and how do I prevent it?
System prompt leakage (OWASP LLM07) occurs when secrets, access rules, or sensitive logic embedded within the system prompt are exposed, mistakenly assuming prompt confidentiality for security. This can lead to an adversary gaining knowledge of internal workings or sensitive information.
To prevent system prompt leakage, implement the following controls:
- Keep no secrets/credentials/authorization logic in prompts. Instead, enforce controls in code or infrastructure, not within the prompt text itself.
- Design plugins and tools to be safe even if the prompt is fully known. This means that even if the system prompt is exposed, the tools should not allow for damaging actions.
- Utilize a hierarchical context architecture where a sealed top layer contains system prompts and policies that are never compacted or summarized. This prevents policy instructions from being dropped or pushed out by adversarial inputs.
- Maintain a provenance graph for every context element to trace any token to its source and enforce segregation through structural delimiters and role-based channels. This helps in understanding the trust level of each context segment.
- Apply zero trust to the runtime window by conditioning the model on segment provenance, ensuring that instructions in low-trust segments are treated as data, not directives. This prevents content from untrusted sources from being interpreted as instructions without explicit authorization.
- Implement output filtering and content classification on outgoing data to prevent agents from including context content in tool calls or external responses, which could leak internal information.
Grounded in
- owasp_llm_top10
- Designing Agentic AI Systems with the ORCHIDEAS Framework
- LAAF: Logic-Layer Automated Attack Framework - A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems
- Unpacking the GPT-5.5 System Card
- Chapter 6: Context Management at Scale (Claude Code vs. Hermes Agent)
- Claude Code Harness Pattern 8: Memory Systems and State Persistence
How does your AI agent score?
Get a free, instant AI agent security readiness snapshot — mapped to NIST, OWASP & ISO — then unlock the full report with a prioritized, cited fix-list.
This AI-generated answer is for guidance only — not a certification, audit, or penetration test. Grounded in the NIST AI RMF, OWASP LLM Top 10, and ISO/IEC 42001 control text; verify applicability to your environment.