perf-moe-long-context

Official

Optimize long-context MoE training for large-scale models.

AuthorNVIDIA-NeMo
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides guidance for efficient training of MoE models with extensive context lengths, addressing memory constraints and throughput challenges.

Core Features & Use Cases

  • Long-Sequence Training Optimization: Offers strategies for scaling MoE models to sequences up to 256K tokens, including CP sizing and recompute methods.
  • Configuration Guidance: Details specific dispatcher choices, precision settings, and CUDA graph policies for big models.
  • Use Case: Developers training long-context MoE models can implement these best practices to improve performance and resource utilization, such as handling 128K token sequences in NLP tasks.

Quick Start

Refer to the recommended configuration patterns and scaling rules in the documentation to set up your long-context MoE training environment effectively.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: perf-moe-long-context
Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-moe-long-context

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.