eval-pipeline

Community

End-to-end eval pipeline for AI agents.

Authorronniegeraghty
Version1.0.0
Installs0

System Documentation

What problem does it solve?

The eval pipeline provides a structured, end-to-end workflow to generate, grade, review, and report on AI-agent prompts, enabling consistent evaluation and faster debugging of prompts and agents.

Core Features & Use Cases

  • End-to-end orchestration of generation, multi-model grading, reviewer feedback, and report generation for AI prompts.
  • Provides detailed action timelines and workspace isolation to ensure reproducibility and auditability.
  • Use Case: Run a full evaluation for a given prompt to identify strengths and weaknesses across generation quality, adherence to guidelines, and safety considerations.

Quick Start

Run an evaluation pipeline that generates code, grades it using multiple graders, and compiles a final report for a selected prompt.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: eval-pipeline
Download link: https://github.com/ronniegeraghty/hyoka/archive/main.zip#eval-pipeline

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.