perf-moe-long-context
OfficialOptimize long-context MoE training for large-scale models.
AuthorNVIDIA-NeMo
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides guidance for efficient training of MoE models with extensive context lengths, addressing memory constraints and throughput challenges.
Core Features & Use Cases
- Long-Sequence Training Optimization: Offers strategies for scaling MoE models to sequences up to 256K tokens, including CP sizing and recompute methods.
- Configuration Guidance: Details specific dispatcher choices, precision settings, and CUDA graph policies for big models.
- Use Case: Developers training long-context MoE models can implement these best practices to improve performance and resource utilization, such as handling 128K token sequences in NLP tasks.
Quick Start
Refer to the recommended configuration patterns and scaling rules in the documentation to set up your long-context MoE training environment effectively.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: perf-moe-long-context Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-moe-long-context Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.