Name: eval-harness
Availability: InStock
Author: flag3

System Documentation

What problem does it solve?

This Skill provides a structured framework to define, execute, and observe evaluations for Claude Code sessions, enabling repeatable, traceable improvements through eval-driven development.

Core Features & Use Cases

Capability Evals: formal tests that verify new features behave as intended.
Regression Evals: checks that existing functionality remains intact after changes.
Grader Options: supports Code-Based, Model-Based, and Human graders for flexible assessment.
Workflow & Storage: a repeatable workflow with defined storage for eval definitions and histories.

Quick Start

Define a new evaluation workflow for a Claude Code feature and run: /eval define <feature-name>

Please help me install this Skill: Name: eval-harness Download link: https://github.com/flag3/dotfiles/archive/main.zip#eval-harness Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

eval-harness

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper