nemo-mbridge-perf-activation-recompute
CommunityReduce GPU memory usage with selective recompute.
Software Engineering#memory-optimization#cuda-graphs#megatron-bridge#activation-recompute#selective-recompute#full-recompute#llama3-70b
Authorsayalinvidia
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Activation recompute trades GPU memory for compute by discarding intermediate activations during the forward pass and recomputing them during backward in Megatron Bridge, enabling training under tighter memory budgets.
Core Features & Use Cases
- Supports selective recompute to save memory by recomputing specific submodules (e.g., core_attn, layernorm) with moderate compute cost.
- Supports full-layer recompute to maximize memory savings when memory pressure is extreme, with guidance on when to apply.
- Provides compatibility guidance for TE-scoped CUDA graphs and related constraints during memory tuning for large models.
Quick Start
Configure selective recompute first (e.g., core_attn; optionally add layernorm), then escalate to full recompute with recompute_num_layers and recompute_method if memory pressure persists.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: nemo-mbridge-perf-activation-recompute Download link: https://github.com/sayalinvidia/sayali-skills-test/archive/main.zip#nemo-mbridge-perf-activation-recompute Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.