perf-parallelism-strategies
OfficialOptimize model parallelism for best performance.
Software Engineering#model training#distributed training#parallelism#tensor parallel#pipeline parallel#Megatron
AuthorNVIDIA-NeMo
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps users understand how to effectively choose and combine various parallelism strategies in Megatron Bridge to improve training efficiency and scalability.
Core Features & Use Cases
- Parallelism Strategy Selection: Guides users in selecting appropriate data, tensor, pipeline, expert, and sequence parallelism based on model size, hardware topology, and sequence length.
- Profiling and Sizing Advice: Provides heuristic guidelines and minimum GPU counts for different parallelism configurations, helping users plan resources efficiently.
- Technical Resource: Explains the technical constraints, performance considerations, and implementation details necessary for advanced parallelism setup.
Quick Start
Configure the tensor_model_parallel_size, pipeline_model_parallel_size, and sequence_parallel settings in the model provider based on your hardware topology and model size to optimize training performance.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: perf-parallelism-strategies Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-parallelism-strategies Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.