nemo-mbridge-perf-memory-tuning
CommunityReduce peak GPU memory for Megatron Bridge.
Software Engineering#memory-management#oom#gpu-memory#megatron-bridge#expandable_segments#activation_recompute
Authorsayalinvidia
Version1.0.0
Installs0
System Documentation
What problem does it solve?
GPU training with Megatron Bridge often hits out-of-memory due to memory fragmentation and suboptimal allocator behavior. The skill introduces fixes like using expandable memory segments and guidelines to tune parallelism, activation recompute, and CPU offloading to reduce peak memory without sacrificing throughput.
Core Features & Use Cases
- Expandable segments to reduce memory fragmentation and prevent OOMs with zero throughput cost.
- Memory planning guidance: theoretical memory estimator, and activation recompute as a secondary option.
- Compatibility constraints: CPU offloading is blocked when pipeline model parallel size > 1; CUDA graph considerations.
Quick Start
Before launching training, set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to enable memory fragmentation mitigation and validate that the OOM does not occur.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: nemo-mbridge-perf-memory-tuning Download link: https://github.com/sayalinvidia/sayali-skills-test/archive/main.zip#nemo-mbridge-perf-memory-tuning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.