mat-bench-client

Community

Run full mat-agent-bench evaluations end to end

Authorruoyuwang1995nya
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the setup, execution, submission, and scoring loop for evaluating AI agents on materials science benchmark questions hosted by a mat-bench server.

Core Features & Use Cases

  • One-time authentication & session management: Creates and persists a server session using a provided API token so you can run multiple questions without reconfiguring.
  • Question discovery and prompt retrieval: Lists questions by capability/domain and fetches a specific question to obtain the full prompt and required data filenames.
  • Data download, result submission, and polling: Downloads the question’s data files, uploads produced outputs for grading, and polls until checkpoints and weighted scores are available.
  • Use cases: Run structured materials-science tasks (structure retrieval/construction, input generation, workflow orchestration, batch processing, diagnosis, scientific analysis, execution contract checks, and safety/refusal) and produce auditable, scored runs for agents.

Quick Start

Ask mat-bench-client to fetch a question prompt and data by running mat-bench-client question SR_db_001_20260411v2.

Dependency Matrix

Required Modules

pyyaml

Components

Standard package

đź’» Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: mat-bench-client
Download link: https://github.com/ruoyuwang1995nya/mat_agent_bench/archive/main.zip#mat-bench-client

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.