Name: perf-workload-profiling
Availability: InStock
Author: NVIDIA

System Documentation

What problem does it solve?

Code instrumentation for timing GPU workloads, enabling precise measurements of per-iteration latency, throughput, and data loading for training loops, as well as standalone kernel timing with warmup and per-iteration statistics; NVTX labeling for profiler timelines is also supported.

Core Features & Use Cases

Manual timing for training loops to report per-iteration latency and throughput, with data-load timing.
CUDA event-based timing for standalone kernels or ops, with warmup and per-iteration statistics.
NVTX annotations to label profiler timelines for clearer visualization.

Quick Start

Provide a timing harness for your training loop or kernel to collect per-iteration latency, throughput, and data-load metrics with NVTX labeling.

Please help me install this Skill: Name: perf-workload-profiling Download link: https://github.com/NVIDIA/skills/archive/main.zip#perf-workload-profiling Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

perf-workload-profiling

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper