Mistral OCR — agentic threat model

5.9AIVSS 5.9 · Medium

Mistral OCR is a low-autonomy document processing utility with minimal agentic risk, primarily exposed to data privacy risks and indirect prompt injection via adversarial text or images embedded within processed documents.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 5.5AARS uplift 0.4Factor sum 0.9/10Threat ×1.0Mitigation ×1.0

Autonomy of Action		0.10
Goal-Driven Planning		0.10
Self-Modification		0.00
Dynamic Tool Use		0.10
Persistent Memory		0.00
Contextual Awareness		0.20
Dynamic Identity		0.00
Multi-Agent Interactions		0.00
Non-Determinism		0.20
Opacity & Reflexivity		0.20

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

Utilizes Mistral's vision-language or specialized OCR models. Primary threats include adversarial document inputs (e.g., text-based prompt injection hidden in PDFs) and model hallucinations when parsing complex equations or tables.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — details regarding document ingestion, temporary storage, caching, or vector database integration are not provided. If documents are cached or stored, they represent a high-value target for data exfiltration.

L3 · Agent Frameworks⚠ not certain from listing

Not certain from the listing — the orchestration framework is unspecified. The primary risk at this layer involves insecure integration of document parsing libraries (e.g., PDF parsers vulnerable to buffer overflows or denial of service).

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — deployment could be via Mistral's hosted API or self-hosted since it is tagged as Open Source. Self-hosted deployments face standard container/host compromise risks, while API usage risks API key exposure.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — there is no mention of built-in guardrails, input validation for malicious files, or observability logging to detect anomalous extraction requests.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — no compliance certifications (such as SOC2, HIPAA, or GDPR compliance for document processing) are mentioned, which is critical given that OCR tools frequently handle sensitive PII and financial data.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — the tool is described as a vertical utility and does not appear to participate in multi-agent ecosystems or marketplace integrations.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).