perf-cpu-offloading

Official

Optimize GPU memory and performance with CPU offloading strategies.

AuthorNVIDIA-NeMo
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps users reduce GPU memory usage and improve training stability by enabling CPU offloading of activations and optimizer states.

Core Features & Use Cases

  • Memory Management: Offload transformer layer activations and optimizer states to CPU to prevent OOM errors during large model training.
  • Performance Tuning: Adjust offloading fractions and configurations for balanced memory savings and training throughput.
  • Use Case: An engineer training a 30B model on limited GPU memory can offload 50% of optimizer states to enable larger batch sizes without upgrading hardware.

Quick Start

Use the perf-cpu-offloading skill to configure optimizer offloading with 50% fraction and enable activation offloading for small models that fit with PP=1.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: perf-cpu-offloading
Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-cpu-offloading

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.