kernel-cute-writing
OfficialAuthor high-performance CuTe DSL GPU kernels
AuthorNVIDIA
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Write and implement GPU kernels using NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT for Triton, CUDA C++, or conceptual explanations. Trigger only when the user wants to write or implement a kernel, not when asking questions about CuTe DSL concepts or layouts. CuTe DSL uses cute.jit/cute.kernel decorators and cutlass.cute imports. Covers element-wise kernels, GEMM patterns, reductions, memory hierarchy.
Core Features & Use Cases
- CuTe DSL kernel authoring: create element-wise and GEMM-style kernels with Python decorators.
- Workflow guidance: follow patterns for elements like element-wise operations, reductions, and memory layouts.
- Framework integration: build host wrappers and connect with Torch/JAX via DLPack for testing and deployment.
Quick Start
Install the CuTe DSL, implement a kernel with @cute.kernel, and run a host wrapper to verify.
Dependency Matrix
Required Modules
None requiredComponents
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: kernel-cute-writing Download link: https://github.com/NVIDIA/skills/archive/main.zip#kernel-cute-writing Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.