Choose CUDA Launch Configuration
OfficialTune CUDA launch settings for max occupancy.
Software Engineering#cuda#occupancy#launch-configuration#block-size#gpu-kernels#grid-size#register-pressure
Authortensormux
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill guides AI agents through selecting efficient CUDA thread block and grid dimensions to maximize occupancy while respecting register and shared memory constraints, tail effects, and cooperatively launched grids.
Core Features & Use Cases
- Occupancy analysis: estimate theoretical and actual active warps per SM for candidate block sizes.
- Resource-aware tuning: account for register usage, shared memory, and dynamic smem in planning launches.
- Use Case: optimize a kernel with significant shared memory and registers by choosing block sizes that maximize occupancy and minimize tail effects, while considering cooperative launch constraints.
Quick Start
Start by profiling kernel resource usage, then select a block size and grid that maximize occupancy within hardware limits.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: Choose CUDA Launch Configuration Download link: https://github.com/tensormux/kernel-skills/archive/main.zip#choose-cuda-launch-configuration Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.