triton-cuda-matmul

Community

High-performance Triton-CUDA MatMul kernels.

Authorxchang1121
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill automates the generation and optimization of CUDA Triton matmul kernels to maximize performance for large matrix products, enabling efficient GEMM-like workloads.

Core Features & Use Cases

  • Tensor Core aware tiling and shared memory strategies for matmul, bmm, and linear layers on NVIDIA GPUs
  • Autotune-enabled kernel generation with configurable BLOCK sizes and groupings to suit different M, N, K scales
  • Host-side integration patterns for tensor inputs, verification, and benchmarking of generated kernels

Quick Start

Invoke the Triton-CUDA MatMul workflow with your matrix dimensions to generate and autotune a Triton kernel for your MxK and KxN matrices.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: triton-cuda-matmul
Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-cuda-matmul

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.