cute-dsl-kernel
OfficialBuild high-performance CuTe DSL GPU kernels
System Documentation
What problem does it solve?
Developers need a reliable, reference-backed workflow to write, port, debug, and integrate CuTe DSL GPU kernels that meet correctness and performance expectations across multiple NVIDIA GPU generations. This Skill centralizes the CuTe DSL API snapshots, architecture-specific optimization guidance, profiling advice, and validation requirements so engineers do not guess APIs or optimization strategies from memory.
Core Features & Use Cases
- API Reference Bundles: Includes curated CuTe DSL API snapshots and runtime helpers for rapid lookup while coding.
- Architecture Guidance: Per-generation optimization notes (sm89, sm90, sm100, sm103, sm120) and TMEM/TMA/WGMMA guidance for interpreting nsys and ncu results.
- Implementation Workflow: Step-by-step workflow for designing, implementing, integrating, testing, and validating kernels, including unit-test and PyTorch-baseline comparisons.
- Profiling & Debugging: Recommended profiling order and actionable checks (nsys → ncu) with shape-reduction and synchronization diagnostics for pipelined or shared-memory kernels.
- Rewrite & Integration Rules: Guidance for preserves-behavior-first rewrites, artifact layout, launcher expectations, and pairing with operator migration tasks.
Quick Start
Implement a tiled GEMM CuTe DSL kernel targeting SM100, consult the bundled cute.md and sm100-optimization-guide.md for tiling and TMA usage, compile, and validate numerics and performance against a PyTorch baseline.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cute-dsl-kernel Download link: https://github.com/vipshop/cache-dit/archive/main.zip#cute-dsl-kernel Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.