exp-eval

Name: exp-eval
Availability: InStock
Author: duany049

Community

Judge experiments and update ideas automatically

Education & Research #research orchestration #status transitions #evidence aggregation #experiment evaluation #llm verdict #wiki updates #graph edges

Authorduany049

Version1.0.0

Installs0

System Documentation

What problem does it solve?

It turns completed experiment results into an evidence-based verdict that updates the linked idea’s lifecycle status and records why it succeeded or failed.

Core Features & Use Cases

Cross-model adjudication: Uses an impartial Review LLM to evaluate support vs refutation, then Claude synthesizes a final conservative outcome.
Automated wiki updates: Writes changes to the linked idea page (status, failure_reason, date_resolved) and fills the experiment page’s “Idea updates” section.
Graph evidence tracking: Adds supports or invalidates edges (experiment → idea) and rebuilds derived graph artifacts (context brief and open questions).
Best-practice safety constraints: Refuses to run unless the experiment is marked completed and requires linked_idea to be present.

Quick Start

Use exp-eval to evaluate the completed experiment 'exp-slug-123' and update the linked idea with a single verdict: run exp-eval with argument-hint input 'exp-slug-123' and include --auto to apply wiki changes without pausing.

exp-eval

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper