research-agent-benchmark

Community

Benchmark autonomous research subagents locally.

AuthorcrycriM
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Enable systematic evaluation of local LLMs as autonomous research subagents, producing reproducible references, experiment results, and a composite score for fair model comparison.

Core Features & Use Cases

  • End-to-end benchmark orchestration: reference generation, contender runs, and scoring on a structured academic topic.
  • Structured deliverables: reference responses and metadata suitable for comparison and replication.
  • Use Case: researchers compare model autonomy and synthesis quality across models on scholarly topics.

Quick Start

Prompt the system to initiate Phase 1 reference generation for a chosen topic and review the generated reference and metadata.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: research-agent-benchmark
Download link: https://github.com/crycriM/hermes-skills/archive/main.zip#research-agent-benchmark

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.