dw-skill-eval-score

Community

End-to-end three-level scoring for skill evaluation.

Authorxurik
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates end-to-end scoring of transcripts to evaluate DataWorks Skill performance, combining rule-based checks, LLM judgments, and statistical aggregation to produce a transparent scores.yaml.

Core Features & Use Cases

  • Rule-based evaluation of transcript API calls, order, and parameters to ensure correct usage and safe operations.
  • LLM-based judging with redacted model information, multiple judges, and calibration checks to produce robust scores.
  • Layered statistics and calibration checks that summarize run scores per case and per model, enabling auditable quality control.
  • Use Case: Assess a new DataWorks Skill by running the evaluation pipeline to generate a complete scores.yaml for comparison.

Quick Start

Place a skill folder with SKILL.md and run the evaluation workflow to generate scores.yaml.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: dw-skill-eval-score
Download link: https://github.com/xurik/dataworks-skill-evaluator/archive/main.zip#dw-skill-eval-score

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.