meta-eval-judge
CommunityRigorous, evidence-based scoring of agent outputs
System Documentation
What problem does it solve?
This Skill automatically and rigorously evaluates a single agent or subagent output against a provided YAML rubric or a gold-reference ExpectedOutput, producing an evidence-backed numeric score, a validated calculation trace, and actionable suggestions for prompt engineering.
Core Features & Use Cases
- Rubric-driven scoring: Parse atomic YAML rubrics, evaluate each criterion with direct evidence excerpts, sum achieved and positive points, and compute a mathematically verified final score.
- Gold-reference comparison: When Judge is missing, compare ActualOutput to ExpectedOutput across semantic consistency, content completeness, and format/experience, with weighted aggregation.
- Structured, machine-parseable reports: Emit a standardized Markdown evaluation report including total score, per-dimension judgments, full mathematical verification, advantages, shortcomings with root-cause tags, and specific prompt-level improvement suggestions.
- RunLog-aware diagnostics: Optionally ingest run logs to assess tool-calling patterns, retries, and efficiency, and reflect those findings in relevant dimensions.
- Use cases: CI test harness for agent development, subagent evaluation within meta-plan orchestration, and iterative prompt-engineering feedback loops.
Quick Start
Evaluate a single test case by supplying the TestCaseFile path, CaseIndex, and ActualOutputFile so the skill reads inputs, runs the chosen evaluation mode, and writes the structured case_N_eval_result.md to the specified output directory.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: meta-eval-judge Download link: https://github.com/slowman2084/meta-agent/archive/main.zip#meta-eval-judge Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.