dw-skill-eval-score
CommunityEnd-to-end three-level scoring for skill evaluation.
Data & Analytics#calibration#claude-code#transcripts#skill-evaluation#llm-judge#evaluation-pipeline#scores.yaml
Authorxurik
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates end-to-end scoring of transcripts to evaluate DataWorks Skill performance, combining rule-based checks, LLM judgments, and statistical aggregation to produce a transparent scores.yaml.
Core Features & Use Cases
- Rule-based evaluation of transcript API calls, order, and parameters to ensure correct usage and safe operations.
- LLM-based judging with redacted model information, multiple judges, and calibration checks to produce robust scores.
- Layered statistics and calibration checks that summarize run scores per case and per model, enabling auditable quality control.
- Use Case: Assess a new DataWorks Skill by running the evaluation pipeline to generate a complete scores.yaml for comparison.
Quick Start
Place a skill folder with SKILL.md and run the evaluation workflow to generate scores.yaml.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: dw-skill-eval-score Download link: https://github.com/xurik/dataworks-skill-evaluator/archive/main.zip#dw-skill-eval-score Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.