cutlass-skill
CommunityWrite, debug, and optimize CUTLASS GPU kernels.
System Documentation
What problem does it solve?
Write, debug, and optimize CUTLASS and CuTeDSL GPU kernels using local source code, examples, and header references. Use when the user mentions CUTLASS, CuTe, CuTeDSL, cute::Layout, cute::Tensor, TiledMMA, TiledCopy, CollectiveMainloop, CollectiveEpilogue, GEMM kernel, grouped GEMM, sparse GEMM, flash attention CUTLASS, blackwell GEMM, hopper GEMM, FP8 GEMM, FP4 GEMM, blockwise scaling, MoE GEMM, StreamK, warp specialization CUTLASS, TMA CUTLASS, epilogue fusion, EVT (Epilogue Visitor Tree), pycute, Layout algebra, Swizzle pattern, GemmUniversal, KernelSchedule, EpilogueSchedule, CUTLASS collective builder, CUTLASS pipeline, or asks about writing high-performance CUDA kernels with CUTLASS/CuTe templates. Also use when the user wants to understand CUTLASS source code structure, compile CUTLASS examples, or debug CUTLASS template errors. This Skill focuses on providing practical guidance, repository navigation, build and debug workflows for CUTLASS and CuTeDSL projects.
Core Features & Use Cases
- Local source code exploration and quick-start templates for CUTLASS and CuTeDSL
- Build, run, and debug CUTLASS examples and CuTeDSL kernels
- Understand repository structure, headers, and common templates to diagnose issues
Quick Start
Set CUTLASS_REPO to your local path, update or clone the CUTLASS sources, then build and debug a CuTeDSL kernel.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cutlass-skill Download link: https://github.com/m0at/claudemd/archive/main.zip#cutlass-skill Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.