instruction-eval
CommunityPrevent skill changes from breaking Claude.
Software Engineering#regression testing#skill routing#anthropic api#instruction evaluation#claude posture#git ab testing#ci reporting
Authordavid-driscoll
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you detect whether updates to skills or CLAUDE.md improve or degrade Claude’s operational behavior, preventing silent regressions before they reach users.
Core Features & Use Cases
- Spot-Check Validation: Run interactive probes using the test prompts in the skill’s references to quickly judge routing and constraint compliance.
- Automated Regression Testing: Execute the evaluation runner to systematically score probes and produce CI-friendly results.
- Actionable PASS/PARTIAL/FAIL Interpretation: Translate scoring outcomes into concrete next steps, including when manual review is required.
- Git-Aware A/B Checks: Compare behavior across git refs to pinpoint whether a change set caused improvements or regressions.
Quick Start
Run python .claude/skills/instruction-eval/scripts/run-eval.py --json to execute all probes and generate a machine-readable CI report.
Dependency Matrix
Required Modules
anthropicpyyaml
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: instruction-eval Download link: https://github.com/david-driscoll/stargate-command-cluster/archive/main.zip#instruction-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.