exp-pilot-eval

Community

Turn pilot results into clear go/no-go decisions.

Authorduany049
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill prevents wasted effort by evaluating noisy pilot experiment results against predefined success criteria and then updating the corresponding research idea page with an actionable verdict.

Core Features & Use Cases

  • Pilot verdict evaluation: Reads pilot metrics and compares them to the pilot spec’s success criteria to classify outcomes as pass, fail, or inconclusive.
  • Idea wiki updates: Updates wiki/ideas/{slug}.md fields including pilot_result, failure_reason (with mandatory [pilot] prefix on failure), and status when appropriate.
  • Persistent reporting & logging: Writes a durable verdict report to experiments/pilot/{slug}/report.md and appends an audit entry to wiki/log.md, while printing PILOT_VERDICT_REPORT to the terminal.

Quick Start

Run exp-pilot-eval for the idea whose pilot just finished using the slug you want to evaluate, and include --auto if you want it to proceed without pausing for confirmation.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: exp-pilot-eval
Download link: https://github.com/duany049/multi-skill-orchestration/archive/main.zip#exp-pilot-eval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.