pinchbench

Official

Benchmark real-world AI coding agents.

Authorpinchbench
Version1.0.0
Installs0

System Documentation

What problem does it solve?

PinchBench benchmarks how well AI models perform as the brains of OpenClaw agents by executing real-world tasks and surfacing results on a public leaderboard.

Core Features & Use Cases

  • Real-world, end-to-end task execution across productivity, research, writing, coding, analysis, and memory
  • Flexible scoring models: automated, llm_judge, and hybrid with per-task rubrics
  • Leaderboard submission and model comparison to drive improvements

Quick Start

Run uv run benchmark.py --model <provider/model> to start benchmarking an OpenClaw agent.

Dependency Matrix

Required Modules

pyyamlfabricparamiko

Components

scriptsassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: pinchbench
Download link: https://github.com/pinchbench/skill/archive/main.zip#pinchbench

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.