Name: perf-memory-tuning
Availability: InStock
Author: NVIDIA-NeMo

System Documentation

What problem does it solve?

This Skill addresses GPU out-of-memory (OOM) failures caused by memory fragmentation and inefficient resource utilization during model training.

Core Features & Use Cases

Memory fragmentation mitigation through setting PYTORCH_CUDA_ALLOC_CONF to expandable segments.
Parallelism adjustment techniques such as increasing tensor parallelism or pipeline parallelism to fit models within GPU memory constraints.
Activation recompute strategies to reduce peak memory consumption.
Use Case: When training large language models on H100 GPUs, users can prevent OOM errors by applying these techniques, ensuring stable training without hardware upgrades.

Quick Start

Set the environment variable PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True before launching your training session to dramatically reduce memory fragmentation and eliminate OOM errors.

Please help me install this Skill: Name: perf-memory-tuning Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-memory-tuning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

perf-memory-tuning

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper