skill-benchmark

Community

Quantify skill impact with controlled benchmarks.

Authorpkuppens
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Evaluates the real-world impact of a skill by comparing agent outputs with and without activation, enabling evidence-based validation for skill improvements, PRs, and regression testing.

Core Features & Use Cases

  • Benchmark design: defines reproducible tasks and standardized prompts for consistent evaluation.
  • Baseline vs. skill comparison: captures parallel outputs to measure delta in quality and coverage.
  • Reporting and scoring: produces a scored, interpretable report showing where the skill adds value.

Quick Start

Select a representative benchmark task, run baseline output without the skill, run with the skill activated, and compare results using the predefined scoring framework.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: skill-benchmark
Download link: https://github.com/pkuppens/pkuppens/archive/main.zip#skill-benchmark

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.