cuda-agent-team

Community

Speed up CUDA kernel tuning with parallel agents

AuthorRomaosir
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It solves the slow, plateauing progress of single-agent CUDA kernel optimization by running multiple independent optimization directions concurrently, then coordinating safe synchronization and merges to find better kernels faster.

Core Features & Use Cases

  • Parallel multi-agent optimization: Orchestrates two or more worker agents that each run the single-agent optimization loop in isolated working directories.
  • GPU-pinned, contention-safe execution: Assigns and pins each worker to its own GPU (via CUDA_VISIBLE_DEVICES) to keep speedup measurements comparable.
  • Round-based orchestration, sync, and merge: Tracks per-worker perf logs, decides when to adopt a winning baseline, and merges compatible wins via a defined merge protocol with correctness checks.
  • Use Case: If your kernel tuning has flattened for multiple iterations, you can split the effort into orthogonal hypotheses (e.g., memory restructuring vs. compute restructuring) and explore them in parallel across GPUs.

Quick Start

Ask the system to run the CUDA agent team for parallel kernel optimization when you have at least two GPUs available and want two distinct optimization directions explored concurrently.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cuda-agent-team
Download link: https://github.com/Romaosir/IF_Romao_kernel_optimize/archive/main.zip#cuda-agent-team

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.