triton-cuda-matmul
CommunityHigh-performance Triton-CUDA MatMul kernels.
Authorxchang1121
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This skill automates the generation and optimization of CUDA Triton matmul kernels to maximize performance for large matrix products, enabling efficient GEMM-like workloads.
Core Features & Use Cases
- Tensor Core aware tiling and shared memory strategies for matmul, bmm, and linear layers on NVIDIA GPUs
- Autotune-enabled kernel generation with configurable BLOCK sizes and groupings to suit different M, N, K scales
- Host-side integration patterns for tensor inputs, verification, and benchmarking of generated kernels
Quick Start
Invoke the Triton-CUDA MatMul workflow with your matrix dimensions to generate and autotune a Triton kernel for your MxK and KxN matrices.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: triton-cuda-matmul Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-cuda-matmul Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.