exp-eval

Community

Judge experiments and update ideas automatically

Authorduany049
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It turns completed experiment results into an evidence-based verdict that updates the linked idea’s lifecycle status and records why it succeeded or failed.

Core Features & Use Cases

  • Cross-model adjudication: Uses an impartial Review LLM to evaluate support vs refutation, then Claude synthesizes a final conservative outcome.
  • Automated wiki updates: Writes changes to the linked idea page (status, failure_reason, date_resolved) and fills the experiment page’s “Idea updates” section.
  • Graph evidence tracking: Adds supports or invalidates edges (experiment → idea) and rebuilds derived graph artifacts (context brief and open questions).
  • Best-practice safety constraints: Refuses to run unless the experiment is marked completed and requires linked_idea to be present.

Quick Start

Use exp-eval to evaluate the completed experiment 'exp-slug-123' and update the linked idea with a single verdict: run exp-eval with argument-hint input 'exp-slug-123' and include --auto to apply wiki changes without pausing.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: exp-eval
Download link: https://github.com/duany049/multi-skill-orchestration/archive/main.zip#exp-eval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.