OpenAGI — agentic threat model
OpenAGI presents a high-risk profile due to its self-improving reinforcement learning feedback loops and multi-model orchestration, which can lead to unpredictable emergent behaviors and reward-hacking vulnerabilities if deployed without strict sandboxing.
OWASP AIVSS score rationale
| Autonomy of Action | 0.70 | |
| Goal-Driven Planning | 0.80 | |
| Self-Modification | 0.80 | |
| Dynamic Tool Use | 0.70 | |
| Persistent Memory | 0.50 | |
| Contextual Awareness | 0.70 | |
| Dynamic Identity | 0.20 | |
| Multi-Agent Interactions | 0.60 | |
| Non-Determinism | 0.80 | |
| Opacity & Reflexivity | 0.70 |
Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.
MAESTRO 7-layer threat model
Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.
Integrates LLMs with domain-specific expert models. Threats include adversarial examples, model reprogramming, and misaligned outputs, which are amplified by the self-improving reinforcement learning feedback loop.
Not certain from the listing — The directory listing mentions benchmark and open-ended tasks but does not detail the underlying data architecture, vector stores, or training data pipelines, leaving it potentially vulnerable to data poisoning or lineage gaps.
Orchestrates LLMs and expert models for complex, multi-step tasks. Vulnerabilities include insecure tool/expert model integration, planning bypasses, and framework-level orchestration flaws.
Not certain from the listing — As an open-source R&D platform, deployment infrastructure (sandboxing, hosting, secrets management) is left to the user, posing risks of container compromise or privilege escalation if run unsafely.
Uses benchmark tasks and reinforcement learning feedback. Threats include evaluation gaming, reward hacking, and feedback loop poisoning where malicious inputs skew the self-improvement mechanism.
Not certain from the listing — No mention of built-in access controls, authentication, or compliance frameworks (like NIST/ISO) in the public directory listing.
Integrates LLMs with expert models in an extensible, modular architecture. Threats include cascading failures across expert models and trust abuse between the orchestrator and domain-specific models.
MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).