writing-skills — agentic threat model

9.5AIVSS 9.5 · Critical

This agent presents an exceptionally high risk profile because it acts as a self-modifying meta-programming tool, writing and verifying executable skills directly in the runtime directory. Without strict sandboxing and human-in-the-loop validation, it could easily be exploited to inject malicious behaviors or achieve arbitrary code execution.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 8.8AARS uplift 0.74Factor sum 5.6/10Threat ×1.1Mitigation ×1.0

Autonomy of Action		0.70
Goal-Driven Planning		0.80
Self-Modification		1.00
Dynamic Tool Use		0.60
Persistent Memory		0.40
Contextual Awareness		0.50
Dynamic Identity		0.10
Multi-Agent Interactions		0.20
Non-Determinism		0.70
Opacity & Reflexivity		0.60

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models⚠ not certain from listing

Not certain from the listing — the underlying foundation models are not specified, but they are highly vulnerable to indirect prompt injection which could trick the model into generating backdoored or malicious skills.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — no explicit vector store or training data operations are detailed, though the agent reads and writes directly to the local runtime skills directory.

L3 · Agent Frameworks✓ mapped

The agent framework orchestrates a complex TDD loop (create, edit, verify). The primary threat is insecure tool integration, where the verification step may execute untrusted, newly-generated skill code within the active runtime framework.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — the hosting environment and sandboxing of the runtime skills directory are unspecified. If the runtime lacks strict OS-level isolation, executing newly written skills could lead to host compromise.

L5 · Evaluation & Observability✓ mapped

The agent uses a TDD workflow to verify skills. However, there is a threat of evaluation gaming or blind spots where malicious behavior in generated SKILL.md files bypasses the automated verification tests.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — there are no mentioned access controls, authorization policies, or audit logs to restrict who can invoke this meta-skill to modify the agent's capabilities.

L7 · Agent Ecosystem✓ mapped

Since this agent authors skills that run as agent behavior, compromised skill generation can introduce vulnerabilities that propagate to other agents in the ecosystem that utilize these skills.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).