cutlass
CommunityHigh-performance GPU kernels for linear algebra and AI workloads.
Software Engineering#gpu#high performance#cuda#matrix multiplication#tensor cores#accelerated computing
Authorjstzwj
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides access to NVIDIA's CUTLASS library, enabling the deployment of optimized GPU kernels for matrix multiplication and tensor operations critical in AI and scientific computing.
Core Features & Use Cases
- Accelerated GEMM and Tensor Operations: Implements high-throughput matrix multiplications leveraging Tensor Cores across architectures like Volta, Turing, Ampere, Hopper, and Blackwell.
- Architecture-Aware Optimization: Tailors kernel execution strategies for specific GPU architectures, maximizing performance and resource utilization.
- Versatile Data Type Support: Includes FP64, FP32, FP16, BF16, TF32, FP8, INT8, INT4, and complex types, catering to diverse workload precision requirements.
- Use Case: Example: Optimize training of large language models by leveraging CUTLASS kernels for mixed-precision matrix multiplications.
Quick Start
Provide the target matrices' pointers and dimensions to run a GEMM operation with the CUTLASS library's Python API or C++ interface, specifying data types and architecture identifiers as needed.
Dependency Matrix
Required Modules
None requiredComponents
assetsreferencesscripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cutlass Download link: https://github.com/jstzwj/ai-infra-plugins/archive/main.zip#cutlass Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.