kernel-loop
CommunityOptimize GPU kernels with evidence gates.
System Documentation
What problem does it solve?
It reduces the risk of performance regressions by turning GPU kernel optimization into a disciplined, evidence-backed iteration loop with correctness and Nsight Compute (NCU) gates between changes.
Core Features & Use Cases
- Measured one-change iterations: Forces each version transition to apply exactly one independent performance variable, preserving prior work and preventing bundled tactics.
- Guardrails and traceability: Requires correctness, NCU artifacts, KBS evidence, and a structured hypothesis before allowing the next version to be created.
- Deterministic end-to-end loop orchestration: Coordinates the expected artifact layout, iteration gating script, and final reporting/benchmark step sequencing across existing profiling, KBS, and benchmarking skills.
Quick Start
Tell your system to run the kernel-loop using your baseline kernel as v0 and provide the required inputs (kernel, ref.py, implementation, dims, and optional GPU and iteration count) so it can iteratively validate, profile, gather KBS evidence, apply one-variable hypotheses, and produce final benchmark and report artifacts.
Dependency Matrix
Required Modules
None requiredComponents
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: kernel-loop Download link: https://github.com/fmh66/kernel-opt-agent/archive/main.zip#kernel-loop Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.