skill-eval-loop
CommunityBenchmark AI skills with repeatable eval
Authorfyodoriv
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Benchmark an agent skill against realistic prompts, score the outputs with an explicit rubric, save run history, and optionally hand off to rewrite loops for another pass. Use when you want to measure or improve a skill. Don't use for creating a new skill from scratch (use skill-creator) or for one-off skill edits without benchmarks (use skill-rewriter).
Core Features & Use Cases
- Phase-based evaluation workflow from setup to hand-off to rewrite loops
- Score results with a rubric and store run history for comparability across rewrites
- Easily benchmark different target skills and compare delta between runs
- Supports integration with rewrite loops for iterative improvement
Quick Start
Run a benchmark on a target skill using your prompts and rubric, then save results to runs.jsonl and review the delta.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: skill-eval-loop Download link: https://github.com/fyodoriv/agentbrew/archive/main.zip#skill-eval-loop Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.