coding-benchmark-runner

Community

Benchmark local coding models in 15 problems.

AuthorcrycriM
Version1.0.0
Installs0

System Documentation

What problem does it solve?

The Coding Benchmark Runner enables rapid evaluation of local model performance on coding tasks by running a standardized 15-problem Python benchmark against models served through the llama.cpp router.

Core Features & Use Cases

  • Automates end-to-end benchmarking of coding tasks across local models.
  • Provides per-problem timing, token usage, and scoring data for objective comparison.
  • Use Case: A team wants to compare multiple local models to select the best candidate for code generation, then iterate on improvements.

Quick Start

Run the benchmark by executing the Python harness against your local model served via the llama.cpp router using the prompts.jsonl file

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: coding-benchmark-runner
Download link: https://github.com/crycriM/hermes-skills/archive/main.zip#coding-benchmark-runner

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.