Wan2.2 S2V AI: S2VAI Speech to Vide — agentic threat model

6.1AIVSS 6.1 · Medium

Wan2.2 S2V AI is a generative speech-to-video model with minimal agentic capabilities, presenting low direct operational risk but high potential for misuse in deepfake generation and misinformation campaigns due to the lack of built-in guardrails.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 5.3AARS uplift 0.8Factor sum 1.7/10Threat ×1.0Mitigation ×1.0

Autonomy of Action		0.10
Goal-Driven Planning		0.10
Self-Modification		0.00
Dynamic Tool Use		0.00
Persistent Memory		0.00
Contextual Awareness		0.20
Dynamic Identity		0.00
Multi-Agent Interactions		0.00
Non-Determinism		0.60
Opacity & Reflexivity		0.70

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

The core of the system is the Wan2.2 speech-to-video foundation model. Primary threats include adversarial audio inputs designed to bypass safety alignments, model manipulation, and the generation of highly convincing deepfakes or misaligned outputs.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — training data pipeline, audio/video dataset curation, and vector storage are not specified, raising potential data poisoning or copyright/provenance risks.

L3 · Agent Frameworks⚠ not certain from listing

Not certain from the listing — there is no evidence of an agentic orchestration framework, planning loops, or tool-calling capabilities in this speech-to-video model.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — deployment infrastructure (local vs. cloud hosting, GPU sandboxing, API exposure) is not detailed, though open-source models are often self-hosted.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — no built-in guardrails, content moderation filters, or output monitoring tools are described to prevent toxic or deepfake video generation.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — compliance with deepfake regulations (e.g., EU AI Act watermarking requirements), user authentication, and access controls are not specified.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — there is no mention of multi-agent orchestration, marketplace integrations, or external ecosystem dependencies.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).