perf-moe-dispatcher-selection

Official

Optimize MoE dispatchers for your hardware and model scale.

AuthorNVIDIA-NeMo
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps developers choose the optimal MoE token dispatcher (such as alltoall, DeepEP, or HybridEP) based on hardware platform, model size, and EP degree, ensuring maximum performance and efficiency.

Core Features & Use Cases

  • Dispatcher Recommendation: Guides users in selecting the best dispatcher configuration for H100, B200, GB200, or GB300 systems.
  • Performance Tuning Advice: Provides insights on tuning SM counts and routing modes for specific models and hardware.
  • Use Case: A researcher working on large-scale MoE models on GB200 systems can determine whether to use DeepEP or HybridEP for optimal throughput and memory utilization.

Quick Start

Ask the AI which MoE dispatcher setting is best for a 685B model running on a 256×GB200 system to improve performance.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: perf-moe-dispatcher-selection
Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-moe-dispatcher-selection

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.