diyu-eval-harness
CommunityGovernance-focused AI behavior evaluation harness.
Software Engineering#automation#evaluation#governance#scorecard#ai-assessment#capability-eval#regression-eval
Authorandyan77
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides a structured framework to identify and quantify AI behavior across governance, capability, and regression for eight registered objects, enabling objective evaluation and traceable decision-making.
Core Features & Use Cases
- Governance evaluation: verify presence and integrity of governance artifacts and ensure alignment with upstream references.
- Capability evaluation: assess declared capabilities against actual behavior for active objects.
- Regression evaluation: compare current results against a baseline to detect degradation or improvement.
- Scorecard generation: compute a 6-dimension, 10-point scorecard and emit both human-readable reports and machine-readable artifacts.
- Evidence and auditing: produce Markdown reports and YAML scorecards suitable for governance reviews and archives.
Quick Start
Invoke the evaluation harness to run governance, capability, and regression checks across all registered objects and generate the 6-dimension scorecard outputs.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: diyu-eval-harness Download link: https://github.com/andyan77/diyu-agent/archive/main.zip#diyu-eval-harness Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.