dec-bench-run
OfficialBuild, run, and review DEC Bench scenarios.
Author514-labs
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Build and run DEC Bench scenarios, inspect results, and open the audit UI to streamline agent evaluation workflows.
Core Features & Use Cases
- Build and run DEC Bench scenarios from the CLI, including single scenarios and full matrices across agents, harnesses, and domains.
- Inspect results, gate scores, and artifacts, and open the audit UI for detailed traces and metadata.
- Reproduce experiments reliably and validate scenarios across multiple setups to ensure consistent evaluation.
Quick Start
Ask me to build and run a DEC Bench scenario, and I will execute it and present the results.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: dec-bench-run Download link: https://github.com/514-labs/agent-evals/archive/main.zip#dec-bench-run Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.