cutile-autotuning
CommunityAutotune CuTile kernels safely and fast
Software Engineering#kernel optimization#GPU performance#autotuning#cutile#exhaustive_search#in-place safety
Authoryo-steven
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill prevents slow or incorrect performance tuning by giving a structured, architecture-aware workflow for adding CuTile autotuning to kernels.
Core Features & Use Cases
- Tune-once/cache/launch pattern: Uses
exhaustive_searchto find the best config once, then reuses a cached tuned kernel for fast repeated launches. - Search space design for CuTile: Builds a small, precise config set (≤ 30) using occupancy-only tuning for memory/bandwidth-bound kernels or full tile searches for compute-bound kernels.
- Safety for in-place kernels: Applies the split-buffer strategy during
exhaustive_searchto avoid data corruption across trial runs. - DISABLE_AUTOTUNE fallback: Supports CI and profiling determinism by bypassing tuning when
DISABLE_AUTOTUNE=1. - Common pitfall prevention: Includes guardrails for empty search spaces, compilation timeouts, and avoiding
replace_hintson the hot path.
Quick Start
Use it to add autotuning to a new CuTile kernel by first classifying it with the decision tree, then generating the smallest relevant search space, then implementing the tune-once/cache/launch wrapper with a DISABLE_AUTOTUNE-safe fallback.
Dependency Matrix
Required Modules
None requiredComponents
referencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: cutile-autotuning Download link: https://github.com/yo-steven/skills-exploration-20260522/archive/main.zip#cutile-autotuning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.