vllm-benchmarking

Official

Benchmark vLLM deployments with repeatable tests.

Authorair-gapped
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Benchmark production vLLM deployments to measure latency, throughput, and SLO compliance under realistic load. It applies to air-gapped environments and a variety of bench scenarios, including health checks, change comparisons, and SLO validation across multiple subcommands. It supports structured JSON output, warmup controls, percentile metrics, goodput budgets, and repeatable rate sweeps for reproducible performance analysis.

Core Features & Use Cases

  • Provides guidance and tooling for benchmarking vLLM deployments across serve, sweep, startup, latency, and throughput subcommands.
  • Demonstrates health-check, A/B change comparison, and SLO-constrained measurements with repeatable, auditable workflows.
  • Documents air-gapped patterns (mirrors, ModelScope substitutions, offline caches) and a complete dataset catalog for production-like workloads.

Quick Start

Run a health-check benchmark against a running vLLM deployment to generate baseline latency and throughput numbers.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: vllm-benchmarking
Download link: https://github.com/air-gapped/skills/archive/main.zip#vllm-benchmarking

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.