Kokoro AI — agentic threat model

4.7AIVSS 4.7 · Medium

Kokoro AI is a low-risk, specialized text-to-speech utility with minimal agentic capabilities, posing primary risks around model/voicepack intellectual property theft and the potential generation of unauthorized or deepfaked audio.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 4.3AARS uplift 0.36Factor sum 0.7/10Threat ×0.9Mitigation ×1.0

Autonomy of Action		0.10
Goal-Driven Planning		0.00
Self-Modification		0.00
Dynamic Tool Use		0.00
Persistent Memory		0.00
Contextual Awareness		0.10
Dynamic Identity		0.00
Multi-Agent Interactions		0.00
Non-Determinism		0.30
Opacity & Reflexivity		0.20

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

The core of the agent is an 82M parameter TTS model. Primary threats include model stealing of this lightweight proprietary engine, adversarial text inputs designed to exploit the synthesis engine, and unauthorized voice cloning using the custom voicepack feature.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — details regarding the training datasets for the 6 supported languages and the storage of custom voicepacks are omitted. Potential threats include data exfiltration of proprietary voicepacks and training data poisoning.

L3 · Agent Frameworks⚠ not certain from listing

Not certain from the listing — Kokoro AI functions as a direct utility rather than a complex agentic framework. Orchestration threats are minimal, though input validation vulnerabilities could exist in the text processing pipeline.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — hosting infrastructure for the streaming and instant audio generation is not detailed. Standard cloud hosting threats apply, including API abuse, denial of service (DoS) on the 5000-character streaming endpoint, and insecure API keys.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — there is no mention of output guardrails to prevent the generation of abusive, hateful, or deepfaked audio content, nor any logging/observability mechanisms for tracking generation requests.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — compliance alignments (e.g., GDPR, EU AI Act regarding synthetic voice disclosure) and access controls for custom voicepacks are not specified.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — the agent operates as a standalone vertical tool with no described multi-agent interactions or marketplace integrations.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).