← prompt-guard (AI-Research-SKILLs)
prompt-guard (AI-Research-SKILLs) — agentic threat model
The prompt-guard skill presents low overall agentic risk due to its narrow, defensive focus as a classifier, but vulnerabilities in its code execution environment or bypasses of its detection logic could undermine downstream LLM security.
OWASP AIVSS score rationale
| Autonomy of Action | 0.20 | |
| Goal-Driven Planning | 0.10 | |
| Self-Modification | 0.10 | |
| Dynamic Tool Use | 0.30 | |
| Persistent Memory | 0.10 | |
| Contextual Awareness | 0.30 | |
| Dynamic Identity | 0.10 | |
| Multi-Agent Interactions | 0.10 | |
| Non-Determinism | 0.20 | |
| Opacity & Reflexivity | 0.20 |
Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.
MAESTRO 7-layer threat model
Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.
Integrates Meta's Prompt Guard classifier model. Primary threats include adversarial examples designed to bypass the classifier, and model evasion techniques that exploit the classifier's decision boundaries.
Not certain from the listing — No explicit data operations, training pipelines, or vector stores are mentioned, though it processes input text streams dynamically.
The skill writes and runs classifier code. Threats include insecure tool integration or code execution vulnerabilities within the agent framework hosting this skill.
Not certain from the listing — The hosting environment, sandboxing of the classifier code execution, and network isolation are not specified.
Acts directly as an input guardrail. Threats include blind spots in detection capabilities, failure to log blocked injection attempts, and lack of drift monitoring for new jailbreak techniques.
Not certain from the listing — No details are provided regarding access controls, authentication, or compliance frameworks governing the deployment of this skill.
Not certain from the listing — While designed as a reusable skill, there is no explicit mention of multi-agent orchestration or ecosystem-level trust boundaries.
MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).