kernel-profile
CommunityDiagnose kernel bottlenecks with NCU evidence.
Software Engineering#triton#gpu profiling#cutlass#nsight compute#bottleneck diagnosis#kernel correctness#cuda-cpp
Authorfmh66
Version1.0.0
Installs0
System Documentation
What problem does it solve?
kernel-profile helps you determine whether a CUDA/CUTLASS/CuTe DSL/Triton kernel is memory-bound, compute-bound, latency-bound, occupancy-limited, or mixed by validating correctness first and then collecting Nsight Compute (NCU) metrics with nsight-python.
Core Features & Use Cases
- Environment readiness & backend availability checks: verifies CUDA/PyTorch/GPU selection plus readiness of ncu and nsight-python, and confirms at least one backend is usable (cuda-cpp, cute-dsl, cutlass, or triton).
- Correctness validation before profiling: runs the provided kernel implementation against a Python reference (with configurable atol/rtol) and writes correctness.md.
- Deterministic NCU profiling workflow: collects kernel runtime timing (CUDA events) plus NCU metric summaries and detailed per-metric results, then classifies bottlenecks using the generated ncu_summary.md evidence.
- Use cases: diagnosing slow GPU kernels, choosing the next optimization direction (e.g., coalescing/misalignment vs compute pipeline utilization vs synchronization vs divergence), and comparing occupancy/memory stalls across kernel variants under fixed dimensions and settings.
Quick Start
Use the kernel-profile skill to validate correctness and generate NCU profiling reports for your kernel implementation against a Python reference, then read ncu_summary.md to decide the bottleneck category.
Dependency Matrix
Required Modules
numpytorchtritonnsight-pythonnvidia-cutlass-dsl
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: kernel-profile Download link: https://github.com/fmh66/kernel-opt-agent/archive/main.zip#kernel-profile Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.