nemo-mbridge-perf-cpu-offloading

Community

Reduce GPU memory pressure with CPU offloading.

Authorsayalinvidia
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill helps Megatron Bridge users reduce GPU memory pressure by offloading computations to CPU and distributing optimizer states, enabling training of larger models or longer sequences and aiding diagnosis of OOM or crashes due to offload configuration.

Core Features & Use Cases

  • Layer-level activation offloading (cpu_offloading) to move activations to CPU and reduce GPU memory usage; suitable for small-to-medium models with PP constraints.
  • Optimizer CPU offloading (optimizer_cpu_offload) using HybridDeviceOptimizer to place optimizer states on CPU; supports configurable offload fractions for memory-speed tradeoffs.
  • Flexible offload_fraction and overlap options (optimizer_offload_fraction, overlap_cpu_optimizer_d2h_h2d) to tune performance versus memory savings.
  • Compatibility guidance and caveats: activation offloading requires PP=1 and cannot be combined with recompute or CUDA graphs; optimizer offloading has no PP restriction but may incur CPU overhead.

Quick Start

Enable optimizer CPU offload and optional activation offload in Megatron Bridge configuration to begin reducing GPU memory usage.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: nemo-mbridge-perf-cpu-offloading
Download link: https://github.com/sayalinvidia/sayali-skills-test/archive/main.zip#nemo-mbridge-perf-cpu-offloading

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.