eval-creation-workflow

Name: eval-creation-workflow
Availability: InStock
Author: MRiabov

Community

Seed and verify Problemologist eval datasets.

Data & Analytics #validation #dataset #llm-evaluation #eval-seed #seed-workflow #handover-contracts

AuthorMRiabov

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Create or repair Problemologist eval seeds by adding role-based dataset rows plus stage-correct seeded workspace artifacts with exact deterministic fields, then verify them with minimal-scope runs through dataset/evals/run_evals.py. Use this when asked to add benchmark, engineer, or reviewer evals, or when a role-based eval dataset looks structurally invalid.

Core Features & Use Cases

Seeds role-based dataset rows and stage-aligned workspace artifacts with exact deterministic fields to enable reliable evaluation, auditing, and reproducibility.
Supports adding benchmark, engineer, and reviewer evals, and helps fix structurally invalid eval datasets through guided references and deterministic checks.
Validates seeds with minimal-scope runs via dataset/evals/run_evals.py to quickly detect schema or contract mismatches before full evaluation.

Quick Start

Seed a new eval for a target role by populating role-based dataset rows and seeded workspace artifacts with exact deterministic fields, then verify the seed with a minimal-scope run of dataset/evals/run_evals.py.

eval-creation-workflow

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper