cutlass-cpp-kernel
OfficialAuthor, debug and tune CUTLASS/CuTe kernels
Authorvipshop
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Provides a focused workflow to read, implement, debug, and optimize CUTLASS and CuTe C++ kernels so engineers can achieve correct, high-performance GEMM, pipeline, and epilogue implementations across modern NVIDIA architectures.
Core Features & Use Cases
- Source navigation: Maps key CUTLASS and CuTe header locations and example kernels for fast discovery and reuse.
- Template and tiling reasoning: Analyze tile shapes, MMA/copy atoms, schedules, stage counts, and epilogue choices to guide kernel configuration.
- Architecture-aware tuning: Bundled smXX optimization guides and Nsight interpretation advice for Hopper, Blackwell, and desktop Blackwell variants.
- Debugging & validation: Stepwise checklist for synchronization, schedule errors, numerical parity, unit tests, and PyTorch baseline comparisons.
- Rewrite guidance: Preserve operator contracts and validate accuracy/performance when porting between handwritten C++, CUTLASS, and CuTe DSL workflows.
Quick Start
Open the workspace CUTLASS checkout at /workspace/dev/vipshop/cutlass and ask to analyze or optimize a specific CUTLASS/CuTe kernel by naming the target file, target GPU architecture, data types, and the desired correctness or performance goal.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cutlass-cpp-kernel Download link: https://github.com/vipshop/cache-dit/archive/main.zip#cutlass-cpp-kernel Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.