Name: skill-eval-loop
Availability: InStock
Author: fyodoriv

System Documentation

What problem does it solve?

Benchmark an agent skill against realistic prompts, score the outputs with an explicit rubric, save run history, and optionally hand off to rewrite loops for another pass. Use when you want to measure or improve a skill. Don't use for creating a new skill from scratch (use skill-creator) or for one-off skill edits without benchmarks (use skill-rewriter).

Core Features & Use Cases

Phase-based evaluation workflow from setup to hand-off to rewrite loops
Score results with a rubric and store run history for comparability across rewrites
Easily benchmark different target skills and compare delta between runs
Supports integration with rewrite loops for iterative improvement

Quick Start

Run a benchmark on a target skill using your prompts and rubric, then save results to runs.jsonl and review the delta.

Please help me install this Skill: Name: skill-eval-loop Download link: https://github.com/fyodoriv/agentbrew/archive/main.zip#skill-eval-loop Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

skill-eval-loop

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper