Llama Guard

AI Securityopen-sourceOpen SourceArtificial Intelligence, Technology, Content Moderation

LLM-based safeguard model ensuring safe human-AI conversations.

🛡️ AgentReady threat assessment

MAESTRO 7-layer threat model + OWASP AIVSS risk score for Llama Guard, derived from its capabilities.

AIVSS 4.8 · Medium

These scores are auto-generated from public information (the agent's own listing, docs, and repository) using the canonical OWASP AIVSS formula and the MAESTRO framework — an estimate for guidance, not a penetration test, audit, or certification. See the scoring methodology. Are you the vendor? Factual corrections are free.

Overview

Llama Guard is a Large Language Model (LLM)-based safeguard developed to ensure safe and appropriate human-AI interactions. It functions by classifying both user inputs and AI-generated outputs to identify and mitigate potential safety risks, such as prompt injections or inappropriate content. The model is instruction-tuned to handle various safety categories and can be customized to align with specific use cases. Llama Guard supports multi-class classification and generates binary decision scores to effectively moderate AI conversations.

Key features

AI safety
content moderation
LLM safeguard
prompt injection prevention
human-AI interaction

Use cases

Ensuring safe and appropriate interactions in human-AI conversations.
Mitigating prompt injection vulnerabilities in AI systems.
Classifying and moderating content in AI-generated responses.
Customizing safety protocols for specific AI use cases.