exp-eval
CommunityJudge experiments and update ideas automatically
Education & Research#research orchestration#status transitions#evidence aggregation#experiment evaluation#llm verdict#wiki updates#graph edges
Authorduany049
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It turns completed experiment results into an evidence-based verdict that updates the linked idea’s lifecycle status and records why it succeeded or failed.
Core Features & Use Cases
- Cross-model adjudication: Uses an impartial Review LLM to evaluate support vs refutation, then Claude synthesizes a final conservative outcome.
- Automated wiki updates: Writes changes to the linked idea page (status, failure_reason, date_resolved) and fills the experiment page’s “Idea updates” section.
- Graph evidence tracking: Adds
supportsorinvalidatesedges (experiment → idea) and rebuilds derived graph artifacts (context brief and open questions). - Best-practice safety constraints: Refuses to run unless the experiment is marked
completedand requireslinked_ideato be present.
Quick Start
Use exp-eval to evaluate the completed experiment 'exp-slug-123' and update the linked idea with a single verdict: run exp-eval with argument-hint input 'exp-slug-123' and include --auto to apply wiki changes without pausing.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: exp-eval Download link: https://github.com/duany049/multi-skill-orchestration/archive/main.zip#exp-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.