gpu-training-acceleration
CommunityMaximize PyTorch GPU throughput and memory.
System Documentation
What problem does it solve?
Optimize PyTorch training speed and reduce GPU memory usage on CUDA GPUs by providing actionable, config-gated acceleration patterns, mixed precision guidance, fused optimizer kernels, Triton kernel fusion, and monitoring practices. It helps diagnose low GPU utilization, avoid out-of-memory failures, and safely adopt new compiler or kernel features without destabilizing training runs.
Core Features & Use Cases
- Config-gated acceleration: Patterns to gate features like torch.compile, fused optimizers, and gradient checkpointing behind configuration so they can be toggled and logged.
- Precision and kernel guidance: Advice for TF32, bf16/fp16 mixed precision, cuDNN autotuning, and when to prefer fused kernels or Triton implementations.
- Failure handling and telemetry: Safe defaults, try/except fallbacks for torch.compile, and recommendations to always log acceleration state for reproducibility and debugging.
- Memory and dataflow patterns: Strategies for latent-space training, contiguous memory enforcement, empty_cache placement, and NVCC build flags for custom CUDA extensions.
- Use Case: Speed up transformer or generative model training on multi-GPU clusters by enabling TF32, using bf16 mixed precision where safe, compiling stable decoder submodules, and switching to fused AdamW for optimizer speedups.
Quick Start
Run a short fastrun with TF32 enabled, mixed precision set to bf16, fused optimizer turned on, and compile only stable-shape decoder submodules to validate throughput and memory before full-scale training.
Dependency Matrix
Required Modules
None requiredComponents
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gpu-training-acceleration Download link: https://github.com/dongzhuoyao/tao-research-skills/archive/main.zip#gpu-training-acceleration Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.