kernel-profile

Community

Diagnose kernel bottlenecks with NCU evidence.

Authorfmh66
Version1.0.0
Installs0

System Documentation

What problem does it solve?

kernel-profile helps you determine whether a CUDA/CUTLASS/CuTe DSL/Triton kernel is memory-bound, compute-bound, latency-bound, occupancy-limited, or mixed by validating correctness first and then collecting Nsight Compute (NCU) metrics with nsight-python.

Core Features & Use Cases

  • Environment readiness & backend availability checks: verifies CUDA/PyTorch/GPU selection plus readiness of ncu and nsight-python, and confirms at least one backend is usable (cuda-cpp, cute-dsl, cutlass, or triton).
  • Correctness validation before profiling: runs the provided kernel implementation against a Python reference (with configurable atol/rtol) and writes correctness.md.
  • Deterministic NCU profiling workflow: collects kernel runtime timing (CUDA events) plus NCU metric summaries and detailed per-metric results, then classifies bottlenecks using the generated ncu_summary.md evidence.
  • Use cases: diagnosing slow GPU kernels, choosing the next optimization direction (e.g., coalescing/misalignment vs compute pipeline utilization vs synchronization vs divergence), and comparing occupancy/memory stalls across kernel variants under fixed dimensions and settings.

Quick Start

Use the kernel-profile skill to validate correctness and generate NCU profiling reports for your kernel implementation against a Python reference, then read ncu_summary.md to decide the bottleneck category.

Dependency Matrix

Required Modules

numpytorchtritonnsight-pythonnvidia-cutlass-dsl

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: kernel-profile
Download link: https://github.com/fmh66/kernel-opt-agent/archive/main.zip#kernel-profile

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.