Name: triton-cuda-matmul
Availability: InStock
Author: xchang1121

System Documentation

What problem does it solve?

This skill automates the generation and optimization of CUDA Triton matmul kernels to maximize performance for large matrix products, enabling efficient GEMM-like workloads.

Core Features & Use Cases

Tensor Core aware tiling and shared memory strategies for matmul, bmm, and linear layers on NVIDIA GPUs
Autotune-enabled kernel generation with configurable BLOCK sizes and groupings to suit different M, N, K scales
Host-side integration patterns for tensor inputs, verification, and benchmarking of generated kernels

Quick Start

Invoke the Triton-CUDA MatMul workflow with your matrix dimensions to generate and autotune a Triton kernel for your MxK and KxN matrices.

Please help me install this Skill: Name: triton-cuda-matmul Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-cuda-matmul Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

triton-cuda-matmul

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper