transformer-lens (AI-Research-SKILLs)
Mechanistic-interpretability skill for using TransformerLens to probe LLM internals.
๐ก๏ธ AgentReady threat assessment
MAESTRO 7-layer threat model + OWASP AIVSS risk score for transformer-lens (AI-Research-SKILLs), derived from its capabilities.
AIVSS 9.2 ยท Critical
View MAESTRO 7-layer threat model โOverview
Part of Orchestra-Research's AI research skills library โ a skill for mechanistic interpretability work with the TransformerLens library (activation caching, hooks, circuit analysis). Surface: injects library-specific guidance and writes/runs Python interpretability code.
Key features
- TransformerLens hooks and activation caching
- Circuit-analysis workflows
- One of a large AI-research skill library
Use cases
- Probe an LLM's internal activations
- Run a mechanistic-interpretability experiment