Surf.new — agentic threat model

9.4AIVSS 9.4 · Critical

Surf.new presents a high agentic risk profile due to its integration of browser-use and computer-use frameworks, which allow LLMs to interact directly with the web. The lack of explicit sandboxing or safety guardrails in the playground listing increases the potential for indirect prompt injection and unauthorized web actions.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 8.4AARS uplift 1.0Factor sum 5.7/10Threat ×1.1Mitigation ×1.0

Autonomy of Action		0.80
Goal-Driven Planning		0.80
Self-Modification		0.20
Dynamic Tool Use		0.90
Persistent Memory		0.30
Contextual Awareness		0.80
Dynamic Identity		0.30
Multi-Agent Interactions		0.20
Non-Determinism		0.80
Opacity & Reflexivity		0.60

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

Supports advanced models like Claude 3.7, DeepSeek R1, and OpenAI models. The primary threat is indirect prompt injection, where malicious content on browsed webpages hijacks the model's instructions to perform unauthorized actions.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — The agent ingests dynamic web data during browsing sessions, but there is no mention of persistent vector stores, RAG databases, or data lineage controls.

L3 · Agent Frameworks✓ mapped

Integrates Browser-use, Claude Computer-use, and LangChain. The orchestration of browser automation tools poses severe risks of tool misuse, where the agent may be manipulated into executing unintended clicks, form submissions, or navigation.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — While Ollama allows local execution, the hosting, sandboxing, and isolation mechanisms for the browser automation environment (especially Claude Computer-use) are not detailed.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — No built-in logging, monitoring, or guardrail frameworks are specified to detect or prevent anomalous agent behavior during web sessions.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — There is no mention of access control, session isolation, user authentication, or compliance policies governing the playground's use.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — The playground focuses on single-agent web browsing and framework experimentation; multi-agent coordination or ecosystem-level interactions are not described.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).

These scores are auto-generated from public information (the agent's own listing, docs, and repository) using the canonical OWASP AIVSS formula and the MAESTRO framework — an estimate for guidance, not a penetration test, audit, or certification. See the scoring methodology. Are you the vendor? Factual corrections are free.