dec-bench-run

Official

Build, run, and review DEC Bench scenarios.

Author514-labs
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Build and run DEC Bench scenarios, inspect results, and open the audit UI to streamline agent evaluation workflows.

Core Features & Use Cases

  • Build and run DEC Bench scenarios from the CLI, including single scenarios and full matrices across agents, harnesses, and domains.
  • Inspect results, gate scores, and artifacts, and open the audit UI for detailed traces and metadata.
  • Reproduce experiments reliably and validate scenarios across multiple setups to ensure consistent evaluation.

Quick Start

Ask me to build and run a DEC Bench scenario, and I will execute it and present the results.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: dec-bench-run
Download link: https://github.com/514-labs/agent-evals/archive/main.zip#dec-bench-run

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.