perf-moe-dispatcher-selection
OfficialOptimize MoE dispatchers for your hardware and model scale.
Software Engineering#performance optimization#model scaling#distributed training#dispatchers#moe#hardware tuning
AuthorNVIDIA-NeMo
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps developers choose the optimal MoE token dispatcher (such as alltoall, DeepEP, or HybridEP) based on hardware platform, model size, and EP degree, ensuring maximum performance and efficiency.
Core Features & Use Cases
- Dispatcher Recommendation: Guides users in selecting the best dispatcher configuration for H100, B200, GB200, or GB300 systems.
- Performance Tuning Advice: Provides insights on tuning SM counts and routing modes for specific models and hardware.
- Use Case: A researcher working on large-scale MoE models on GB200 systems can determine whether to use DeepEP or HybridEP for optimal throughput and memory utilization.
Quick Start
Ask the AI which MoE dispatcher setting is best for a 685B model running on a 256×GB200 system to improve performance.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: perf-moe-dispatcher-selection Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-moe-dispatcher-selection Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.