dw-skill-eval-verify

Community

Verify evaluation quality and regression health.

Authorxurik
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It helps teams ensure their skill evaluation workflows produce reliable, comparable results by validating quality metrics, performing regression checks after changes, and surfacing actionable insights.

Core Features & Use Cases

  • Automated quality checks across evaluation dimensions (distinguishability, difficulty distribution, judge consistency, calibration).
  • Baseline regression workflow to compare current scores against the latest baseline and generate regression reports.
  • Guided, automated reporting and artifact generation to support audits and release processes.

Quick Start

Run a verification pass with --quality to validate fresh evaluations, or --regression to compare against the latest baseline.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: dw-skill-eval-verify
Download link: https://github.com/xurik/dataworks-skill-evaluator/archive/main.zip#dw-skill-eval-verify

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.