kernel-triton-writing
OfficialGuided Triton kernel writing for high performance
System Documentation
What problem does it solve?
Triton kernel development is complex and requires a structured workflow to design, implement, verify, and benchmark kernels that integrate with a fixed contract (kernel_fn, reference_fn, get_inputs) and a library of patterns (elementwise, reductions, tiled GEMM, and flash attention). The workflow articulates a repeatable process from request routing through design, implementation, verification, and optional benchmarking to ensure correctness and performance.
Core Features & Use Cases
- Phase-driven kernel development: route ambiguous requests, analyze operators, design and implement kernels, verify correctness, and benchmark performance.
- Pattern coverage: supports elementwise, reductions (softmax, LayerNorm, RMSNorm), matmul/gemm tiling, Flash Attention, and fusion patterns as documented in references.
- Fixed-contract exports: standardized kernel_fn wrapper, reference_fn, and get_inputs for compatibility with the verification and benchmarking harness.
Quick Start
Describe your kernel requirement and the system will generate a ready-to-run Triton kernel following the fixed-contract workflow.
Dependency Matrix
Required Modules
None requiredComponents
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: kernel-triton-writing Download link: https://github.com/NVIDIA/skills/archive/main.zip#kernel-triton-writing Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.