nemo-mbridge-perf-moe-long-context
CommunityGuidance for long-context MoE model training.
Authorsayalinvidia
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Provides practical guidance for training MoE models with long-context sequences, including CP sizing, selective recompute, and dispatcher choices to avoid OOM and throughput degradation.
Core Features & Use Cases
- Long-context MoE training guidance covering CP sizing rules, selective recompute recommendations (up_proj, norm, moe, mlp), and CUDA graph considerations.
- Representative config templates and best practices for DSV3 and Qwen3-Next long-context experiments, enabling efficient planning and deployment.
- Pitfalls and safety considerations to help engineers avoid memory and performance issues when scaling to 128K+ context lengths.
Quick Start
Configure your training using the CP ~= seq_len / 4096 heuristic with preserved DP and selective recompute for long-context MoE workloads.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: nemo-mbridge-perf-moe-long-context Download link: https://github.com/sayalinvidia/sayali-skills-test/archive/main.zip#nemo-mbridge-perf-moe-long-context Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.