cutlass-cpp-kernel

Official

Author, debug and tune CUTLASS/CuTe kernels

Authorvipshop
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Provides a focused workflow to read, implement, debug, and optimize CUTLASS and CuTe C++ kernels so engineers can achieve correct, high-performance GEMM, pipeline, and epilogue implementations across modern NVIDIA architectures.

Core Features & Use Cases

  • Source navigation: Maps key CUTLASS and CuTe header locations and example kernels for fast discovery and reuse.
  • Template and tiling reasoning: Analyze tile shapes, MMA/copy atoms, schedules, stage counts, and epilogue choices to guide kernel configuration.
  • Architecture-aware tuning: Bundled smXX optimization guides and Nsight interpretation advice for Hopper, Blackwell, and desktop Blackwell variants.
  • Debugging & validation: Stepwise checklist for synchronization, schedule errors, numerical parity, unit tests, and PyTorch baseline comparisons.
  • Rewrite guidance: Preserve operator contracts and validate accuracy/performance when porting between handwritten C++, CUTLASS, and CuTe DSL workflows.

Quick Start

Open the workspace CUTLASS checkout at /workspace/dev/vipshop/cutlass and ask to analyze or optimize a specific CUTLASS/CuTe kernel by naming the target file, target GPU architecture, data types, and the desired correctness or performance goal.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cutlass-cpp-kernel
Download link: https://github.com/vipshop/cache-dit/archive/main.zip#cutlass-cpp-kernel

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.