eval-harness
CommunityFormal eval framework for Claude Code sessions
Authorflag3
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides a structured framework to define, execute, and observe evaluations for Claude Code sessions, enabling repeatable, traceable improvements through eval-driven development.
Core Features & Use Cases
- Capability Evals: formal tests that verify new features behave as intended.
- Regression Evals: checks that existing functionality remains intact after changes.
- Grader Options: supports Code-Based, Model-Based, and Human graders for flexible assessment.
- Workflow & Storage: a repeatable workflow with defined storage for eval definitions and histories.
Quick Start
Define a new evaluation workflow for a Claude Code feature and run: /eval define <feature-name>
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: eval-harness Download link: https://github.com/flag3/dotfiles/archive/main.zip#eval-harness Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.