cutlass

Community

High-performance GPU kernels for linear algebra and AI workloads.

Authorjstzwj
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides access to NVIDIA's CUTLASS library, enabling the deployment of optimized GPU kernels for matrix multiplication and tensor operations critical in AI and scientific computing.

Core Features & Use Cases

  • Accelerated GEMM and Tensor Operations: Implements high-throughput matrix multiplications leveraging Tensor Cores across architectures like Volta, Turing, Ampere, Hopper, and Blackwell.
  • Architecture-Aware Optimization: Tailors kernel execution strategies for specific GPU architectures, maximizing performance and resource utilization.
  • Versatile Data Type Support: Includes FP64, FP32, FP16, BF16, TF32, FP8, INT8, INT4, and complex types, catering to diverse workload precision requirements.
  • Use Case: Example: Optimize training of large language models by leveraging CUTLASS kernels for mixed-precision matrix multiplications.

Quick Start

Provide the target matrices' pointers and dimensions to run a GEMM operation with the CUTLASS library's Python API or C++ interface, specifying data types and architecture identifiers as needed.

Dependency Matrix

Required Modules

None required

Components

assetsreferencesscripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cutlass
Download link: https://github.com/jstzwj/ai-infra-plugins/archive/main.zip#cutlass

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.