mat-bench-client

Name: mat-bench-client
Availability: InStock
Author: ruoyuwang1995nya

Community

Run full mat-agent-bench evaluations end to end

Software Engineering #CLI #REST API #agent evaluation #materials science #mat-bench-client #benchmark scoring #multipart submission

Authorruoyuwang1995nya

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill automates the setup, execution, submission, and scoring loop for evaluating AI agents on materials science benchmark questions hosted by a mat-bench server.

Core Features & Use Cases

One-time authentication & session management: Creates and persists a server session using a provided API token so you can run multiple questions without reconfiguring.
Question discovery and prompt retrieval: Lists questions by capability/domain and fetches a specific question to obtain the full prompt and required data filenames.
Data download, result submission, and polling: Downloads the question’s data files, uploads produced outputs for grading, and polls until checkpoints and weighted scores are available.
Use cases: Run structured materials-science tasks (structure retrieval/construction, input generation, workflow orchestration, batch processing, diagnosis, scientific analysis, execution contract checks, and safety/refusal) and produce auditable, scored runs for agents.

Quick Start

Ask mat-bench-client to fetch a question prompt and data by running mat-bench-client question SR_db_001_20260411v2.

mat-bench-client

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper