coding-benchmark-runner
CommunityBenchmark local coding models in 15 problems.
AuthorcrycriM
Version1.0.0
Installs0
System Documentation
What problem does it solve?
The Coding Benchmark Runner enables rapid evaluation of local model performance on coding tasks by running a standardized 15-problem Python benchmark against models served through the llama.cpp router.
Core Features & Use Cases
- Automates end-to-end benchmarking of coding tasks across local models.
- Provides per-problem timing, token usage, and scoring data for objective comparison.
- Use Case: A team wants to compare multiple local models to select the best candidate for code generation, then iterate on improvements.
Quick Start
Run the benchmark by executing the Python harness against your local model served via the llama.cpp router using the prompts.jsonl file
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: coding-benchmark-runner Download link: https://github.com/crycriM/hermes-skills/archive/main.zip#coding-benchmark-runner Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.