pinchbench
OfficialBenchmark real-world AI coding agents.
Authorpinchbench
Version1.0.0
Installs0
System Documentation
What problem does it solve?
PinchBench benchmarks how well AI models perform as the brains of OpenClaw agents by executing real-world tasks and surfacing results on a public leaderboard.
Core Features & Use Cases
- Real-world, end-to-end task execution across productivity, research, writing, coding, analysis, and memory
- Flexible scoring models: automated, llm_judge, and hybrid with per-task rubrics
- Leaderboard submission and model comparison to drive improvements
Quick Start
Run uv run benchmark.py --model <provider/model> to start benchmarking an OpenClaw agent.
Dependency Matrix
Required Modules
pyyamlfabricparamiko
Components
scriptsassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: pinchbench Download link: https://github.com/pinchbench/skill/archive/main.zip#pinchbench Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.