perf-cpu-offloading
OfficialOptimize GPU memory and performance with CPU offloading strategies.
Software Engineering#deep learning#performance tuning#gpu#memory optimization#cpu offloading#optimizer states
AuthorNVIDIA-NeMo
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps users reduce GPU memory usage and improve training stability by enabling CPU offloading of activations and optimizer states.
Core Features & Use Cases
- Memory Management: Offload transformer layer activations and optimizer states to CPU to prevent OOM errors during large model training.
- Performance Tuning: Adjust offloading fractions and configurations for balanced memory savings and training throughput.
- Use Case: An engineer training a 30B model on limited GPU memory can offload 50% of optimizer states to enable larger batch sizes without upgrading hardware.
Quick Start
Use the perf-cpu-offloading skill to configure optimizer offloading with 50% fraction and enable activation offloading for small models that fit with PP=1.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: perf-cpu-offloading Download link: https://github.com/NVIDIA-NeMo/Megatron-Bridge/archive/main.zip#perf-cpu-offloading Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.