eval-pipeline
CommunityEnd-to-end eval pipeline for AI agents.
Authorronniegeraghty
Version1.0.0
Installs0
System Documentation
What problem does it solve?
The eval pipeline provides a structured, end-to-end workflow to generate, grade, review, and report on AI-agent prompts, enabling consistent evaluation and faster debugging of prompts and agents.
Core Features & Use Cases
- End-to-end orchestration of generation, multi-model grading, reviewer feedback, and report generation for AI prompts.
- Provides detailed action timelines and workspace isolation to ensure reproducibility and auditability.
- Use Case: Run a full evaluation for a given prompt to identify strengths and weaknesses across generation quality, adherence to guidelines, and safety considerations.
Quick Start
Run an evaluation pipeline that generates code, grades it using multiple graders, and compiles a final report for a selected prompt.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: eval-pipeline Download link: https://github.com/ronniegeraghty/hyoka/archive/main.zip#eval-pipeline Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.