llm-serving-performance-tuning
CommunityTune vLLM and gateway to hit p99 safely.
Software Engineering#load testing#performance tuning#vllm#llm serving#p99 latency#fastapi gateway#database pool
Authorsaintgo7
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill fixes LLM serving bottlenecks by ensuring you measure the real constraint across vLLM, the FastAPI gateway, and the database instead of guessing based only on GPU utilization.
Core Features & Use Cases
- 6-step, evidence-driven tuning workflow: baseline load testing, bottleneck classification, vLLM parameter tuning, gateway tuning, DB tuning, then re-measuring to confirm impact.
- Production-focused diagnostics: maps symptoms like queue backlog, 429/5xx distributions, sustained GPU underutilization, and DB pool exhaustion to the correct tuning step.
- Concurrency validation for GEM-LLM-style workloads: guides performance decisions using multi-user scenarios (e.g., 50/100/200 concurrent users) with tok/s, RPS, and p99 latency targets.
Quick Start
Run the skill install entrypoint to apply the 6-step tuning workflow for your vLLM + FastAPI + DB deployment: ./install.sh llm-serving-performance-tuning
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: llm-serving-performance-tuning Download link: https://github.com/saintgo7/claude-skills/archive/main.zip#llm-serving-performance-tuning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.