benchmark-triage
CommunityTurn benchmark chaos into clear next edits.
Authorjapurcell
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Triages benchmark results to produce a concise, actionable triage report that highlights failure clusters, flaky patterns, and clear paths for improvement.
Core Features & Use Cases
- Automated analysis of benchmark directories containing benchmark.json and related eval artifacts to summarize results.
- Representative deep dives: selects evaluative cases with repeated failures or high variance and documents root causes with cited evidence.
- Actionable guidance: outputs a structured report with recommended next edits and priorities to guide skill iteration.
Quick Start
Provide the path to benchmark.json or to an iteration directory containing it and the skill will generate a short triage report.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: benchmark-triage Download link: https://github.com/japurcell/skills/archive/main.zip#benchmark-triage Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.