Name: ai-agent-bench
Availability: InStock
Author: reidemeister94

System Documentation

What problem does it solve?

This Skill enables quantitative benchmarking of AI agents (Claude Code, Codex, OpenCode) on real coding tasks within the current repository, so you can compare performance, accuracy, and behavior.

Core Features & Use Cases

Orchestrated agent trials: isolated worktrees, baseline checks, agent execution, post-check, and artifact collection.
Rich telemetry: transcripts, diffs, timings, and resulting metrics for cross-agent comparison.
Real-world scenarios: refactoring tasks, performance experiments, and agent evaluation across diverse codebases.

Quick Start

Run the ai-agent-bench workflow on a repository with a task prompt to benchmark and compare agents.

Please help me install this Skill: Name: ai-agent-bench Download link: https://github.com/reidemeister94/development-skills/archive/main.zip#ai-agent-bench Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

ai-agent-bench

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper