vllm-benchmarking
OfficialBenchmark vLLM deployments with repeatable tests.
System Documentation
What problem does it solve?
Benchmark production vLLM deployments to measure latency, throughput, and SLO compliance under realistic load. It applies to air-gapped environments and a variety of bench scenarios, including health checks, change comparisons, and SLO validation across multiple subcommands. It supports structured JSON output, warmup controls, percentile metrics, goodput budgets, and repeatable rate sweeps for reproducible performance analysis.
Core Features & Use Cases
- Provides guidance and tooling for benchmarking vLLM deployments across serve, sweep, startup, latency, and throughput subcommands.
- Demonstrates health-check, A/B change comparison, and SLO-constrained measurements with repeatable, auditable workflows.
- Documents air-gapped patterns (mirrors, ModelScope substitutions, offline caches) and a complete dataset catalog for production-like workloads.
Quick Start
Run a health-check benchmark against a running vLLM deployment to generate baseline latency and throughput numbers.
Dependency Matrix
Required Modules
None requiredComponents
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: vllm-benchmarking Download link: https://github.com/air-gapped/skills/archive/main.zip#vllm-benchmarking Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.