pop-benchmark-runner

Community

Run side-by-side benchmark trials for PopKit

Authorjrc1883
Version1.0.0
Installs0

System Documentation

What problem does it solve?

The Benchmark Runner skill automates the end-to-end process of running side-by-side benchmarks that compare PopKit-enabled Claude Code against a baseline, delivering objective measurements and reports.

Core Features & Use Cases

  • Orchestrates paired trials (WITH PopKit vs BASELINE) across separate workspaces to enable real-time comparison.
  • Collects detailed recordings, runs statistical analysis (t-tests, Cohen's d, confidence intervals), and generates comprehensive reports.
  • Produces markdown and HTML reports to share insights with stakeholders, CI pipelines, or team dashboards.

Quick Start

Run a benchmark using the command pattern /popkit-ops:benchmark run <task-id> to compare PopKit-enabled vs baseline Claude Code.

Dependency Matrix

Required Modules

pyyaml

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: pop-benchmark-runner
Download link: https://github.com/jrc1883/popkit-ai/archive/main.zip#pop-benchmark-runner

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.