cute-dsl-kernel

Official

Build high-performance CuTe DSL GPU kernels

Authorvipshop
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Developers need a reliable, reference-backed workflow to write, port, debug, and integrate CuTe DSL GPU kernels that meet correctness and performance expectations across multiple NVIDIA GPU generations. This Skill centralizes the CuTe DSL API snapshots, architecture-specific optimization guidance, profiling advice, and validation requirements so engineers do not guess APIs or optimization strategies from memory.

Core Features & Use Cases

  • API Reference Bundles: Includes curated CuTe DSL API snapshots and runtime helpers for rapid lookup while coding.
  • Architecture Guidance: Per-generation optimization notes (sm89, sm90, sm100, sm103, sm120) and TMEM/TMA/WGMMA guidance for interpreting nsys and ncu results.
  • Implementation Workflow: Step-by-step workflow for designing, implementing, integrating, testing, and validating kernels, including unit-test and PyTorch-baseline comparisons.
  • Profiling & Debugging: Recommended profiling order and actionable checks (nsys → ncu) with shape-reduction and synchronization diagnostics for pipelined or shared-memory kernels.
  • Rewrite & Integration Rules: Guidance for preserves-behavior-first rewrites, artifact layout, launcher expectations, and pairing with operator migration tasks.

Quick Start

Implement a tiled GEMM CuTe DSL kernel targeting SM100, consult the bundled cute.md and sm100-optimization-guide.md for tiling and TMA usage, compile, and validate numerics and performance against a PyTorch baseline.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cute-dsl-kernel
Download link: https://github.com/vipshop/cache-dit/archive/main.zip#cute-dsl-kernel

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.