Name: nemo-mbridge-perf-memory-tuning
Availability: InStock
Author: sayalinvidia

System Documentation

What problem does it solve?

GPU training with Megatron Bridge often hits out-of-memory due to memory fragmentation and suboptimal allocator behavior. The skill introduces fixes like using expandable memory segments and guidelines to tune parallelism, activation recompute, and CPU offloading to reduce peak memory without sacrificing throughput.

Core Features & Use Cases

Expandable segments to reduce memory fragmentation and prevent OOMs with zero throughput cost.
Memory planning guidance: theoretical memory estimator, and activation recompute as a secondary option.
Compatibility constraints: CPU offloading is blocked when pipeline model parallel size > 1; CUDA graph considerations.

Quick Start

Before launching training, set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to enable memory fragmentation mitigation and validate that the OOM does not occur.

Please help me install this Skill: Name: nemo-mbridge-perf-memory-tuning Download link: https://github.com/sayalinvidia/sayali-skills-test/archive/main.zip#nemo-mbridge-perf-memory-tuning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

nemo-mbridge-perf-memory-tuning

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper