kernel-tileir-optimization

Official

Tune Triton kernels with TileIR on Blackwell GPUs

AuthorNVIDIA
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Optimizes existing Triton kernels for NVIDIA's TileIR backend on Blackwell GPUs by automatically generating and applying TileIR-specific autotune configurations, enabling higher performance through TMA descriptor usage, 2CTA configurations, and occupancy tuning. This skill coordinates the end-to-end workflow from kernel classification to optimization and benchmarking, reducing manual tuning effort and ensuring TileIR compatibility.

Core Features & Use Cases

  • TileIR-aware autotuning for dot-related kernels (GEMM, attention) using TMA descriptors and 2CTA strategies.
  • Norm-like and reduction kernel support with high-occupancy configurations and varied num_warps.
  • Element-wise kernels optimization with extended block sizes and occupancy ranges, plus robust validation against PTX baseline.
  • End-to-end workflow: classify kernel, apply TileIR optimizations, validate, and benchmark TileIR vs PTX backends on Blackwell hardware.

Quick Start

Run the classification and TileIR optimization workflow on a sample Triton kernel to generate optimized configs.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: kernel-tileir-optimization
Download link: https://github.com/NVIDIA/skills/archive/main.zip#kernel-tileir-optimization

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.