Name: vllm-performance-tuning
Availability: InStock
Author: air-gapped

System Documentation

What problem does it solve?

vLLM performance tuning for MoE and hardware configurations reduces time-to-value in deploying large-scale LLMs by enabling structured, repeatable optimization workflows.

Core Features & Use Cases

MoE kernel autotuning with benchmark_moe.py to generate tuned configurations and compare performance across token shapes.
Comprehensive guidance for tensor-parallel, data-parallel, expert-parallel, and pipeline-parallel setups, plus disaggregation patterns (Nixl/Mooncake/LMCache) for scaling.
End-to-end workflow from baseline to re-bench, including CUDA graphs, compile cache, and scheduler knob tuning to hit SLOs.

Quick Start

Tune MoE kernels on the target GPU and load the generated configs before serving.

Please help me install this Skill: Name: vllm-performance-tuning Download link: https://github.com/air-gapped/skills/archive/main.zip#vllm-performance-tuning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

vllm-performance-tuning

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper