prompt-guard (AI-Research-SKILLs) — agentic threat model

6.2AIVSS 6.2 · Medium

The prompt-guard skill presents low overall agentic risk due to its narrow, defensive focus as a classifier, but vulnerabilities in its code execution environment or bypasses of its detection logic could undermine downstream LLM security.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 6.3AARS uplift 0.63Factor sum 1.7/10Threat ×1.0Mitigation ×0.9

Autonomy of Action		0.20
Goal-Driven Planning		0.10
Self-Modification		0.10
Dynamic Tool Use		0.30
Persistent Memory		0.10
Contextual Awareness		0.30
Dynamic Identity		0.10
Multi-Agent Interactions		0.10
Non-Determinism		0.20
Opacity & Reflexivity		0.20

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

Integrates Meta's Prompt Guard classifier model. Primary threats include adversarial examples designed to bypass the classifier, and model evasion techniques that exploit the classifier's decision boundaries.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — No explicit data operations, training pipelines, or vector stores are mentioned, though it processes input text streams dynamically.

L3 · Agent Frameworks✓ mapped

The skill writes and runs classifier code. Threats include insecure tool integration or code execution vulnerabilities within the agent framework hosting this skill.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — The hosting environment, sandboxing of the classifier code execution, and network isolation are not specified.

L5 · Evaluation & Observability✓ mapped

Acts directly as an input guardrail. Threats include blind spots in detection capabilities, failure to log blocked injection attempts, and lack of drift monitoring for new jailbreak techniques.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — No details are provided regarding access controls, authentication, or compliance frameworks governing the deployment of this skill.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — While designed as a reusable skill, there is no explicit mention of multi-agent orchestration or ecosystem-level trust boundaries.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).