AgentReadyHomeAgent Listing

โ† Agent Listing

llm-eval-harness

Agent SkillsFreeOpen Source

Benchmark any OpenAI/Anthropic-compatible LLM on speed, concurrency, protocol compliance and quality.

๐Ÿ›ก๏ธ AgentReady threat assessment

MAESTRO 7-layer threat model + OWASP AIVSS risk score for llm-eval-harness, derived from its capabilities.

AIVSS 7.6 ยท High
View MAESTRO 7-layer threat model โ†’

Overview

Community Agent Skill that evaluates an LLM endpoint across speed (TTFT, tokens/sec), concurrency/stability (success rate, p50/p90, breaking point), Anthropic protocol compliance (thinking-block trigger rate), and quality regression via blind-judge precision. Issues concurrent API calls and writes benchmark reports.

Key features

Use cases