Name: eval-harness-kit
Availability: InStock
Author: aufrank

System Documentation

What problem does it solve?

This Skill automates the creation and execution of evaluation suites for agent workflows, ensuring reproducible results and tracking capabilities or regressions.

Core Features & Use Cases

Manifest-driven Evals: Define tasks, inputs, and grading criteria in a JSON manifest.
Deterministic & LLM Grading: Supports exact match, regex, JSON comparison, and optional LLM rubrics.
Use Case: You've developed a new agent for summarizing documents. Use this Skill to create an evaluation suite with various documents and expected summary qualities, then run the agent against it to measure performance and identify regressions over time.

Quick Start

Run the example manifest using python <CODEX_HOME>/skills/eval-harness-kit/scripts/run_eval.py --manifest <path> --run-id <id>.

Please help me install this Skill: Name: eval-harness-kit Download link: https://github.com/aufrank/agent-skills/archive/main.zip#eval-harness-kit Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

eval-harness-kit

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper