ml-profile
CommunityPinpoint ML bottlenecks and optimize GPUs.
Education & Research#memory debugging#dataloader#tensorboard#gpu utilization#pytorch lightning#ml profiling#pytorch profiler
Authornishide-dev
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you identify why training is slow or unstable by profiling where time and memory are going, then guiding targeted performance fixes.
Core Features & Use Cases
- Training performance profiling: Run PyTorch Lightning profilers (simple, advanced, and PyTorch profiler) for operator-level insight and trace visualization in TensorBoard.
- Data loading diagnostics: Measure DataLoader throughput and find an optimal
num_workersto reduce input bottlenecks. - GPU utilization and memory checks: Monitor GPU utilization and GPU memory usage to detect compute starvation, inefficient data pipelines, and OOM risks.
- Use case: When a training run shows low GPU utilization and long iteration gaps, profile the DataLoader and training steps, then adjust worker counts, preprocessing, and batch/memory strategies based on the results.
Quick Start
Use ml-profile when you notice low GPU utilization or OOM symptoms by asking for a short profiling run and interpreting the traces in TensorBoard.
Dependency Matrix
Required Modules
matplotlibhydra-corepytorch-lightningtorch
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: ml-profile Download link: https://github.com/nishide-dev/claude-code-ml-research/archive/main.zip#ml-profile Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.