Crab — agentic threat model

8.4AIVSS 8.4 · High

Crab is an environment-building and benchmarking framework with a high risk profile due to its support for executing agent actions across Docker, VMs, and physical machines, which could lead to host compromise if not strictly sandboxed.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 8.4AARS uplift 0.45Factor sum 2.7/10Threat ×1.05Mitigation ×0.95

Autonomy of Action		0.30
Goal-Driven Planning		0.20
Self-Modification		0.10
Dynamic Tool Use		0.60
Persistent Memory		0.20
Contextual Awareness		0.40
Dynamic Identity		0.10
Multi-Agent Interactions		0.30
Non-Determinism		0.30
Opacity & Reflexivity		0.20

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models⚠ not certain from listing

Not certain from the listing — Crab is model-agnostic and focuses on environment/benchmarking orchestration rather than hosting specific foundation models, leaving L1 security dependent on the integrated LLMs.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — Crab does not explicitly detail data ingestion or vector database integrations, though it manages environment states and benchmarking data which could be subject to manipulation.

L3 · Agent Frameworks✓ mapped

Crab provides a Python-centric interface for defining agent environments and actions. Vulnerabilities here could allow insecure tool integration or arbitrary code execution during action execution.

L4 · Deployment & Infrastructure✓ mapped

Supports deployment across Docker, VMs, and physical machines. This multi-environment capability introduces risks of container escape, privilege escalation, and unauthorized host access if environments are not properly sandboxed.

L5 · Evaluation & Observability✓ mapped

Features a novel benchmarking suite and a fine-grained Graph Evaluator. Security risks include evaluation gaming or manipulation of benchmarking metrics to hide malicious agent behavior.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — The description does not mention built-in authentication, authorization, or compliance frameworks, suggesting security controls must be implemented externally.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — While designed by Camel AI (a multi-agent framework group), Crab focuses on environment benchmarking and does not explicitly detail multi-agent marketplace or interaction security.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).