prompt-evaluation-and-tuning
CommunitySystematically evaluate and tune prompts.
Software Engineering#subagents#ci-cd#prompt-engineering#promptfoo#regression-testing#prompt-evaluation
Authorshichiyou
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Brings rigor to prompt design by providing a framework for validating prompts with blank-slate subagents and automated CI regression, reducing bias and hidden errors in instruction.
Core Features & Use Cases
- Empirical Subagent Validation: run prompts against blank-slate subagents to measure requirement achievement, enabling data-driven improvements.
- Declarative CI Regression: set up a promptfoo pipeline that automatically evaluates prompt behavior against a matrix of tests.
- Reusable guidance: apply to AGENTS.md, CLAUDE.md, and Wiki procedures for consistent prompt quality across teams.
Quick Start
Run the empirical validation workflow by configuring scenarios, dispatching blank-slate subagents, and reviewing results using the promptfoo CI pipeline.
Dependency Matrix
Required Modules
None requiredComponents
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: prompt-evaluation-and-tuning Download link: https://github.com/shichiyou/hermes-agent-001/archive/main.zip#prompt-evaluation-and-tuning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.