nemo-mbridge-perf-memory-tuning

Community

Reduce peak GPU memory for Megatron Bridge.

Authorsayalinvidia
Version1.0.0
Installs0

System Documentation

What problem does it solve?

GPU training with Megatron Bridge often hits out-of-memory due to memory fragmentation and suboptimal allocator behavior. The skill introduces fixes like using expandable memory segments and guidelines to tune parallelism, activation recompute, and CPU offloading to reduce peak memory without sacrificing throughput.

Core Features & Use Cases

  • Expandable segments to reduce memory fragmentation and prevent OOMs with zero throughput cost.
  • Memory planning guidance: theoretical memory estimator, and activation recompute as a secondary option.
  • Compatibility constraints: CPU offloading is blocked when pipeline model parallel size > 1; CUDA graph considerations.

Quick Start

Before launching training, set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to enable memory fragmentation mitigation and validate that the OOM does not occur.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: nemo-mbridge-perf-memory-tuning
Download link: https://github.com/sayalinvidia/sayali-skills-test/archive/main.zip#nemo-mbridge-perf-memory-tuning

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.