Name: coding-benchmark-runner
Availability: InStock
Author: crycriM

System Documentation

What problem does it solve?

The Coding Benchmark Runner enables rapid evaluation of local model performance on coding tasks by running a standardized 15-problem Python benchmark against models served through the llama.cpp router.

Core Features & Use Cases

Automates end-to-end benchmarking of coding tasks across local models.
Provides per-problem timing, token usage, and scoring data for objective comparison.
Use Case: A team wants to compare multiple local models to select the best candidate for code generation, then iterate on improvements.

Quick Start

Run the benchmark by executing the Python harness against your local model served via the llama.cpp router using the prompts.jsonl file

Please help me install this Skill: Name: coding-benchmark-runner Download link: https://github.com/crycriM/hermes-skills/archive/main.zip#coding-benchmark-runner Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

coding-benchmark-runner

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper