skill-eval-loop

Community

Benchmark AI skills with repeatable eval

Authorfyodoriv
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Benchmark an agent skill against realistic prompts, score the outputs with an explicit rubric, save run history, and optionally hand off to rewrite loops for another pass. Use when you want to measure or improve a skill. Don't use for creating a new skill from scratch (use skill-creator) or for one-off skill edits without benchmarks (use skill-rewriter).

Core Features & Use Cases

  • Phase-based evaluation workflow from setup to hand-off to rewrite loops
  • Score results with a rubric and store run history for comparability across rewrites
  • Easily benchmark different target skills and compare delta between runs
  • Supports integration with rewrite loops for iterative improvement

Quick Start

Run a benchmark on a target skill using your prompts and rubric, then save results to runs.jsonl and review the delta.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: skill-eval-loop
Download link: https://github.com/fyodoriv/agentbrew/archive/main.zip#skill-eval-loop

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.