Name: perf-moe-long-context
Availability: InStock
Author: NVIDIA-NeMo

System Documentation

What problem does it solve?

This Skill provides guidance for efficient training of MoE models with extensive context lengths, addressing memory constraints and throughput challenges.

Core Features & Use Cases

Long-Sequence Training Optimization: Offers strategies for scaling MoE models to sequences up to 256K tokens, including CP sizing and recompute methods.
Configuration Guidance: Details specific dispatcher choices, precision settings, and CUDA graph policies for big models.
Use Case: Developers training long-context MoE models can implement these best practices to improve performance and resource utilization, such as handling 128K token sequences in NLP tasks.

Quick Start

Refer to the recommended configuration patterns and scaling rules in the documentation to set up your long-context MoE training environment effectively.

Please help me install this Skill: Name: perf-moe-long-context Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-moe-long-context Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

perf-moe-long-context

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper