research-agent-benchmark
CommunityBenchmark autonomous research subagents locally.
Education & Research#tooling#llm-evaluation#literature-search#citation-validation#research-benchmark#autonomous-subagents
AuthorcrycriM
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Enable systematic evaluation of local LLMs as autonomous research subagents, producing reproducible references, experiment results, and a composite score for fair model comparison.
Core Features & Use Cases
- End-to-end benchmark orchestration: reference generation, contender runs, and scoring on a structured academic topic.
- Structured deliverables: reference responses and metadata suitable for comparison and replication.
- Use Case: researchers compare model autonomy and synthesis quality across models on scholarly topics.
Quick Start
Prompt the system to initiate Phase 1 reference generation for a chosen topic and review the generated reference and metadata.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: research-agent-benchmark Download link: https://github.com/crycriM/hermes-skills/archive/main.zip#research-agent-benchmark Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.