Name: exp-pilot-eval
Availability: InStock
Author: duany049

System Documentation

What problem does it solve?

This Skill prevents wasted effort by evaluating noisy pilot experiment results against predefined success criteria and then updating the corresponding research idea page with an actionable verdict.

Core Features & Use Cases

Pilot verdict evaluation: Reads pilot metrics and compares them to the pilot spec’s success criteria to classify outcomes as pass, fail, or inconclusive.
Idea wiki updates: Updates wiki/ideas/{slug}.md fields including pilot_result, failure_reason (with mandatory [pilot] prefix on failure), and status when appropriate.
Persistent reporting & logging: Writes a durable verdict report to experiments/pilot/{slug}/report.md and appends an audit entry to wiki/log.md, while printing PILOT_VERDICT_REPORT to the terminal.

Quick Start

Run exp-pilot-eval for the idea whose pilot just finished using the slug you want to evaluate, and include --auto if you want it to proceed without pausing for confirmation.

Please help me install this Skill: Name: exp-pilot-eval Download link: https://github.com/duany049/multi-skill-orchestration/archive/main.zip#exp-pilot-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

exp-pilot-eval

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper