Name: evaluation-harness
Availability: InStock
Author: patricio0312rev

System Documentation

What problem does it solve?

Builds repeatable evaluation systems for LLM applications, enabling standardized benchmarking, scoring, and regression analysis.

Core Features & Use Cases

Golden dataset format, scoring rubrics, and regression reports for end-to-end model evaluation.
Continuous evaluation and CI integration to track performance across commits.
Flexible test runner and report generation to compare model outputs against expected results.

Quick Start

Run the evaluation harness on your model to load a dataset and generate a regression report.

Please help me install this Skill: Name: evaluation-harness Download link: https://github.com/patricio0312rev/skillset/archive/main.zip#evaluation-harness Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

evaluation-harness

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper