perf-torch-sync-free
OfficialMake PyTorch code truly async by removing syncs.
AuthorNVIDIA
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Identify and eliminate host-device synchronizations in PyTorch code. Detects sync points (.item(), .cpu(), boolean indexing, torch.tensor on CUDA), classifies false vs true dependencies, provides sync-free alternatives.
Core Features & Use Cases
- Detects and classifies CPU-GPU synchronization points in PyTorch workloads.
- Provides actionable, sync-free alternatives to common patterns like .item(), .cpu(), and tensor transfers.
- Guides integration into existing codebases with a step-by-step workflow and verification.
Quick Start
Run through a PyTorch workload to detect and remove host-device syncs, then verify performance gains with sync-debug mode and profiling tools.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: perf-torch-sync-free Download link: https://github.com/NVIDIA/skills/archive/main.zip#perf-torch-sync-free Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.