kernel-tileir-optimization
OfficialTune Triton kernels with TileIR on Blackwell GPUs
System Documentation
What problem does it solve?
Optimizes existing Triton kernels for NVIDIA's TileIR backend on Blackwell GPUs by automatically generating and applying TileIR-specific autotune configurations, enabling higher performance through TMA descriptor usage, 2CTA configurations, and occupancy tuning. This skill coordinates the end-to-end workflow from kernel classification to optimization and benchmarking, reducing manual tuning effort and ensuring TileIR compatibility.
Core Features & Use Cases
- TileIR-aware autotuning for dot-related kernels (GEMM, attention) using TMA descriptors and 2CTA strategies.
- Norm-like and reduction kernel support with high-occupancy configurations and varied num_warps.
- Element-wise kernels optimization with extended block sizes and occupancy ranges, plus robust validation against PTX baseline.
- End-to-end workflow: classify kernel, apply TileIR optimizations, validate, and benchmark TileIR vs PTX backends on Blackwell hardware.
Quick Start
Run the classification and TileIR optimization workflow on a sample Triton kernel to generate optimized configs.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: kernel-tileir-optimization Download link: https://github.com/NVIDIA/skills/archive/main.zip#kernel-tileir-optimization Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.