gaia-playbook
OfficialGAIA benchmark playbook for failure-mode debug.
Data & Analytics#benchmark#tool-design#gaia#failure-modes#trajectory-analysis#score-improvement#harnessx
AuthorDarwin-Agent
Version1.0.0
Installs0
System Documentation
What problem does it solve?
GAIA benchmark guidance to diagnose trajectory gaps, map failure modes to actionable interventions, and streamline tool-spec authoring for performance improvement.
Core Features & Use Cases
- Maps GAIA failure modes (A-H) to concrete tooling patterns and harness configurations.
- Provides templates and best practices for drafting TOOL_SPEC.md and coordinating with reference materials.
- Supports teams in planning, experimentation, and score improvement across multi-hop GAIA tasks.
Quick Start
Identify your GAIA trajectory gaps and draft a TOOL_SPEC.md to address the most critical capability missing.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gaia-playbook Download link: https://github.com/Darwin-Agent/HarnessX/archive/main.zip#gaia-playbook Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.