pop-benchmark-runner
CommunityRun side-by-side benchmark trials for PopKit
Authorjrc1883
Version1.0.0
Installs0
System Documentation
What problem does it solve?
The Benchmark Runner skill automates the end-to-end process of running side-by-side benchmarks that compare PopKit-enabled Claude Code against a baseline, delivering objective measurements and reports.
Core Features & Use Cases
- Orchestrates paired trials (WITH PopKit vs BASELINE) across separate workspaces to enable real-time comparison.
- Collects detailed recordings, runs statistical analysis (t-tests, Cohen's d, confidence intervals), and generates comprehensive reports.
- Produces markdown and HTML reports to share insights with stakeholders, CI pipelines, or team dashboards.
Quick Start
Run a benchmark using the command pattern /popkit-ops:benchmark run <task-id> to compare PopKit-enabled vs baseline Claude Code.
Dependency Matrix
Required Modules
pyyaml
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: pop-benchmark-runner Download link: https://github.com/jrc1883/popkit-ai/archive/main.zip#pop-benchmark-runner Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.