Name: Write CUDA GEMM Kernel
Availability: InStock
Author: KrxGu

System Documentation

What problem does it solve?

This Skill guides engineers through designing and implementing a correct, performance-aware CUDA GEMM kernel (C = alpha * A * B + beta * C) with decisions about tiling strategy, memory hierarchy usage, tensor core eligibility, and when to defer to cuBLAS or CUTLASS.

Core Features & Use Cases

Guided design of a custom GEMM kernel including tiling, shared memory layout, and epilogue fusion decisions.
Decision framework for when to use cuBLAS or CUTLASS and how to handle batched or nonstandard layouts.
Applicable in constrained hardware scenarios or research contexts where maximum control over memory and compute is required.

Quick Start

Provide the problem specifications (M, N, K; dtypes; layouts; transpositions; target SM) and initiate the kernel design workflow for a CUDA GEMM.

Please help me install this Skill: Name: Write CUDA GEMM Kernel Download link: https://github.com/KrxGu/kernel-skills/archive/main.zip#write-cuda-gemm-kernel Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

Write CUDA GEMM Kernel

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper