setup-benchmark-inputs

Official

Prepare MoE benchmark workspace.

Authormlc-ai
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Set up the minimal artifacts needed to benchmark, profile, or regression-test a MoE model in PithTrain: a tokenized DCLM corpus shard and a converted HF checkpoint in DCP format. The process is idempotent and safe to re-run, ensuring outputs exist at workspace paths.

Core Features & Use Cases

  • Idempotent setup of benchmark artifacts: corpus shard tokenization and HF checkpoint conversion.
  • Supports multiple models (e.g., deepseek-v2-lite and qwen3-30b-a3b) with model-specific outputs under workspace.
  • Integrates with a local Python environment (.venv) and shell scripts to orchestrate fetch, tokenize, import, and conversion for reproducible benchmarking.

Quick Start

Run the setup script to generate the benchmark workspace for your chosen model.

Dependency Matrix

Required Modules

huggingface_hub

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: setup-benchmark-inputs
Download link: https://github.com/mlc-ai/pith-train/archive/main.zip#setup-benchmark-inputs

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.