mat-bench-client
CommunityRun full mat-agent-bench evaluations end to end
Software Engineering#CLI#REST API#agent evaluation#materials science#mat-bench-client#benchmark scoring#multipart submission
Authorruoyuwang1995nya
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates the setup, execution, submission, and scoring loop for evaluating AI agents on materials science benchmark questions hosted by a mat-bench server.
Core Features & Use Cases
- One-time authentication & session management: Creates and persists a server session using a provided API token so you can run multiple questions without reconfiguring.
- Question discovery and prompt retrieval: Lists questions by capability/domain and fetches a specific question to obtain the full prompt and required data filenames.
- Data download, result submission, and polling: Downloads the question’s data files, uploads produced outputs for grading, and polls until checkpoints and weighted scores are available.
- Use cases: Run structured materials-science tasks (structure retrieval/construction, input generation, workflow orchestration, batch processing, diagnosis, scientific analysis, execution contract checks, and safety/refusal) and produce auditable, scored runs for agents.
Quick Start
Ask mat-bench-client to fetch a question prompt and data by running mat-bench-client question SR_db_001_20260411v2.
Dependency Matrix
Required Modules
pyyaml
Components
Standard packageđź’» Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: mat-bench-client Download link: https://github.com/ruoyuwang1995nya/mat_agent_bench/archive/main.zip#mat-bench-client Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.