os-eval-runner

Name: os-eval-runner
Availability: InStock
Author: richfrem

Community

Automates safe, autonomous skill evaluation loops.

Software Engineering #git #python #evaluation #skill-improvement #autonomous-loop #os-eval-runner

Authorrichfrem

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Stateless evaluation engine that scores and gates skill improvement iterations using headless Python evaluation scripts. Use when the user says "evaluate this skill", "run autoresearch loop on", "optimize this skill", "run the eval loop", or when another agent proposes a change to an existing skill and needs empirical validation before applying it. Supports autonomous loop mode for iterative improvement and single-shot QA mode for validating one specific proposed change. Requires Python 3.8+ and a git repository.

Core Features & Use Cases

Pure metric producer: eval_runner.py reads a target and evals.json and emits objective scores without side effects.
Loop gate: evaluate.py enforces baseline, keeps or reverts changes, and appends results to the per-target ledger.
Scaffold templates: init_autoresearch.py deploys standard program/evals/results templates into your experiment.
Autonomous optimization: runs iterative mutations on a single-mutation target with KEEP/DISCARD decisions, or validates a specific change (QA mode).

Quick Start

Install and initialize an autoresearch experiment, then baseline the target with evaluate.py and start the loop.

os-eval-runner

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper