Name: Optimize Triton Block Parameters
Availability: InStock
Author: tensormux

System Documentation

What problem does it solve?

This skill guides the agent to systematically select and tune Triton launch parameters (BLOCK_M, BLOCK_N, BLOCK_K, num_warps, and num_stages) to maximize GEMM-style kernel throughput while respecting hardware constraints.

Core Features & Use Cases

Systematic autotune for block sizes and parallelism to improve performance on GPUs such as A100 and H100.
Shape- and dtype-aware configuration that preserves correctness and efficiency across representative problem instances.
Reproducible benchmark-driven workflow that documents the winning configuration and expected throughput.

Quick Start

Provide a representative Triton GEMM kernel and hardware target, then run the autotune workflow to discover the optimal BLOCK_M, BLOCK_N, BLOCK_K, and related parameters.

Please help me install this Skill: Name: Optimize Triton Block Parameters Download link: https://github.com/tensormux/kernel-skills/archive/main.zip#optimize-triton-block-parameters Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

Optimize Triton Block Parameters

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper