cuda-agent-team

Name: cuda-agent-team
Availability: InStock
Author: Romaosir

Community

Speed up CUDA kernel tuning with parallel agents

Software Engineering #multi-agent #performance tuning #cuda #parallel search #kernel optimization #gpu pinning #merge protocol

AuthorRomaosir

Version1.0.0

Installs0

System Documentation

What problem does it solve?

It solves the slow, plateauing progress of single-agent CUDA kernel optimization by running multiple independent optimization directions concurrently, then coordinating safe synchronization and merges to find better kernels faster.

Core Features & Use Cases

Parallel multi-agent optimization: Orchestrates two or more worker agents that each run the single-agent optimization loop in isolated working directories.
GPU-pinned, contention-safe execution: Assigns and pins each worker to its own GPU (via CUDA_VISIBLE_DEVICES) to keep speedup measurements comparable.
Round-based orchestration, sync, and merge: Tracks per-worker perf logs, decides when to adopt a winning baseline, and merges compatible wins via a defined merge protocol with correctness checks.
Use Case: If your kernel tuning has flattened for multiple iterations, you can split the effort into orthogonal hypotheses (e.g., memory restructuring vs. compute restructuring) and explore them in parallel across GPUs.

Quick Start

Ask the system to run the CUDA agent team for parallel kernel optimization when you have at least two GPUs available and want two distinct optimization directions explored concurrently.

cuda-agent-team

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper