Name: llm-benchmark
Availability: InStock
Author: andreasronge

System Documentation

What problem does it solve?

Design rigorous, statistically powered benchmarks to evaluate LLM prompts and configurations, ensuring experiments produce actionable, unbiased conclusions.

Core Features & Use Cases

Rigorous experimental design templates for policy vs mechanism benchmarks
Data leakage prevention guidance to keep prompts domain-blind and test data separate
Per-turn metrics and statistical analysis guidance to interpret results across models and tasks
Use cases include prompt ablation studies, A/B testing prompts, and design of held-out test suites

Quick Start

Run a baseline benchmark on your prompts and report observed pass rates, per-turn metrics, and recommended sample sizes.

Please help me install this Skill: Name: llm-benchmark Download link: https://github.com/andreasronge/ptc_runner/archive/main.zip#llm-benchmark Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

llm-benchmark

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper