benchmark-triage

Community

Turn benchmark chaos into clear next edits.

Authorjapurcell
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Triages benchmark results to produce a concise, actionable triage report that highlights failure clusters, flaky patterns, and clear paths for improvement.

Core Features & Use Cases

  • Automated analysis of benchmark directories containing benchmark.json and related eval artifacts to summarize results.
  • Representative deep dives: selects evaluative cases with repeated failures or high variance and documents root causes with cited evidence.
  • Actionable guidance: outputs a structured report with recommended next edits and priorities to guide skill iteration.

Quick Start

Provide the path to benchmark.json or to an iteration directory containing it and the skill will generate a short triage report.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: benchmark-triage
Download link: https://github.com/japurcell/skills/archive/main.zip#benchmark-triage

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.