kernel-cute-writing

Official

Author high-performance CuTe DSL GPU kernels

AuthorNVIDIA
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Write and implement GPU kernels using NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT for Triton, CUDA C++, or conceptual explanations. Trigger only when the user wants to write or implement a kernel, not when asking questions about CuTe DSL concepts or layouts. CuTe DSL uses cute.jit/cute.kernel decorators and cutlass.cute imports. Covers element-wise kernels, GEMM patterns, reductions, memory hierarchy.

Core Features & Use Cases

  • CuTe DSL kernel authoring: create element-wise and GEMM-style kernels with Python decorators.
  • Workflow guidance: follow patterns for elements like element-wise operations, reductions, and memory layouts.
  • Framework integration: build host wrappers and connect with Torch/JAX via DLPack for testing and deployment.

Quick Start

Install the CuTe DSL, implement a kernel with @cute.kernel, and run a host wrapper to verify.

Dependency Matrix

Required Modules

None required

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: kernel-cute-writing
Download link: https://github.com/NVIDIA/skills/archive/main.zip#kernel-cute-writing

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.