kernel-triton-writing

Official

Guided Triton kernel writing for high performance

AuthorNVIDIA
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Triton kernel development is complex and requires a structured workflow to design, implement, verify, and benchmark kernels that integrate with a fixed contract (kernel_fn, reference_fn, get_inputs) and a library of patterns (elementwise, reductions, tiled GEMM, and flash attention). The workflow articulates a repeatable process from request routing through design, implementation, verification, and optional benchmarking to ensure correctness and performance.

Core Features & Use Cases

  • Phase-driven kernel development: route ambiguous requests, analyze operators, design and implement kernels, verify correctness, and benchmark performance.
  • Pattern coverage: supports elementwise, reductions (softmax, LayerNorm, RMSNorm), matmul/gemm tiling, Flash Attention, and fusion patterns as documented in references.
  • Fixed-contract exports: standardized kernel_fn wrapper, reference_fn, and get_inputs for compatibility with the verification and benchmarking harness.

Quick Start

Describe your kernel requirement and the system will generate a ready-to-run Triton kernel following the fixed-contract workflow.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: kernel-triton-writing
Download link: https://github.com/NVIDIA/skills/archive/main.zip#kernel-triton-writing

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.