prompt-evaluation-and-tuning

Community

Systematically evaluate and tune prompts.

Authorshichiyou
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Brings rigor to prompt design by providing a framework for validating prompts with blank-slate subagents and automated CI regression, reducing bias and hidden errors in instruction.

Core Features & Use Cases

  • Empirical Subagent Validation: run prompts against blank-slate subagents to measure requirement achievement, enabling data-driven improvements.
  • Declarative CI Regression: set up a promptfoo pipeline that automatically evaluates prompt behavior against a matrix of tests.
  • Reusable guidance: apply to AGENTS.md, CLAUDE.md, and Wiki procedures for consistent prompt quality across teams.

Quick Start

Run the empirical validation workflow by configuring scenarios, dispatching blank-slate subagents, and reviewing results using the promptfoo CI pipeline.

Dependency Matrix

Required Modules

None required

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: prompt-evaluation-and-tuning
Download link: https://github.com/shichiyou/hermes-agent-001/archive/main.zip#prompt-evaluation-and-tuning

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.