Name: dw-skill-eval-score
Availability: InStock
Author: xurik

System Documentation

What problem does it solve?

This Skill automates end-to-end scoring of transcripts to evaluate DataWorks Skill performance, combining rule-based checks, LLM judgments, and statistical aggregation to produce a transparent scores.yaml.

Core Features & Use Cases

Rule-based evaluation of transcript API calls, order, and parameters to ensure correct usage and safe operations.
LLM-based judging with redacted model information, multiple judges, and calibration checks to produce robust scores.
Layered statistics and calibration checks that summarize run scores per case and per model, enabling auditable quality control.
Use Case: Assess a new DataWorks Skill by running the evaluation pipeline to generate a complete scores.yaml for comparison.

Quick Start

Place a skill folder with SKILL.md and run the evaluation workflow to generate scores.yaml.

Please help me install this Skill: Name: dw-skill-eval-score Download link: https://github.com/xurik/dataworks-skill-evaluator/archive/main.zip#dw-skill-eval-score Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

dw-skill-eval-score

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper