instruction-eval

Community

Prevent skill changes from breaking Claude.

Authordavid-driscoll
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps you detect whether updates to skills or CLAUDE.md improve or degrade Claude’s operational behavior, preventing silent regressions before they reach users.

Core Features & Use Cases

  • Spot-Check Validation: Run interactive probes using the test prompts in the skill’s references to quickly judge routing and constraint compliance.
  • Automated Regression Testing: Execute the evaluation runner to systematically score probes and produce CI-friendly results.
  • Actionable PASS/PARTIAL/FAIL Interpretation: Translate scoring outcomes into concrete next steps, including when manual review is required.
  • Git-Aware A/B Checks: Compare behavior across git refs to pinpoint whether a change set caused improvements or regressions.

Quick Start

Run python .claude/skills/instruction-eval/scripts/run-eval.py --json to execute all probes and generate a machine-readable CI report.

Dependency Matrix

Required Modules

anthropicpyyaml

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: instruction-eval
Download link: https://github.com/david-driscoll/stargate-command-cluster/archive/main.zip#instruction-eval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.