cuda-roofline-strategy

Community

Pick the next CUDA optimization with confidence.

AuthorRomaosir
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill solves the problem of wasting GPU-kernel optimization experiments by choosing the next technique class without a principled diagnosis of what is currently limiting performance.

Core Features & Use Cases

  • Classifies roofline bottlenecks (compute-bound, bandwidth-bound, occupancy-limited, latency-bound, balanced) from NCU metrics like SM throughput, DRAM/SOL throughput, achieved occupancy, and top warp stall reason.
  • Selects a technique tier using iteration phase (early, mid, late, plateau) so recommendations scale with “how desperate” the run is and whether heavy rewrites are warranted.
  • Supports per-workload refinement to avoid misleading averages by re-profiling slow outlier workloads and adapting dispatch when regimes differ.

Quick Start

Use this Skill when you have a fresh NCU profile and need to decide which category of CUDA kernel optimization to try next based on the current roofline position and your iteration phase.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cuda-roofline-strategy
Download link: https://github.com/Romaosir/IF_Romao_kernel_optimize/archive/main.zip#cuda-roofline-strategy

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.