cutile-autotuning

Community

Autotune CuTile kernels safely and fast

Authoryo-steven
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill prevents slow or incorrect performance tuning by giving a structured, architecture-aware workflow for adding CuTile autotuning to kernels.

Core Features & Use Cases

  • Tune-once/cache/launch pattern: Uses exhaustive_search to find the best config once, then reuses a cached tuned kernel for fast repeated launches.
  • Search space design for CuTile: Builds a small, precise config set (≤ 30) using occupancy-only tuning for memory/bandwidth-bound kernels or full tile searches for compute-bound kernels.
  • Safety for in-place kernels: Applies the split-buffer strategy during exhaustive_search to avoid data corruption across trial runs.
  • DISABLE_AUTOTUNE fallback: Supports CI and profiling determinism by bypassing tuning when DISABLE_AUTOTUNE=1.
  • Common pitfall prevention: Includes guardrails for empty search spaces, compilation timeouts, and avoiding replace_hints on the hot path.

Quick Start

Use it to add autotuning to a new CuTile kernel by first classifying it with the decision tree, then generating the smallest relevant search space, then implementing the tune-once/cache/launch wrapper with a DISABLE_AUTOTUNE-safe fallback.

Dependency Matrix

Required Modules

None required

Components

referencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cutile-autotuning
Download link: https://github.com/yo-steven/skills-exploration-20260522/archive/main.zip#cutile-autotuning

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.