Name: llm-serving-performance-tuning
Availability: InStock
Author: saintgo7

System Documentation

What problem does it solve?

This Skill fixes LLM serving bottlenecks by ensuring you measure the real constraint across vLLM, the FastAPI gateway, and the database instead of guessing based only on GPU utilization.

Core Features & Use Cases

6-step, evidence-driven tuning workflow: baseline load testing, bottleneck classification, vLLM parameter tuning, gateway tuning, DB tuning, then re-measuring to confirm impact.
Production-focused diagnostics: maps symptoms like queue backlog, 429/5xx distributions, sustained GPU underutilization, and DB pool exhaustion to the correct tuning step.
Concurrency validation for GEM-LLM-style workloads: guides performance decisions using multi-user scenarios (e.g., 50/100/200 concurrent users) with tok/s, RPS, and p99 latency targets.

Quick Start

Run the skill install entrypoint to apply the 6-step tuning workflow for your vLLM + FastAPI + DB deployment: ./install.sh llm-serving-performance-tuning

Please help me install this Skill: Name: llm-serving-performance-tuning Download link: https://github.com/saintgo7/claude-skills/archive/main.zip#llm-serving-performance-tuning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

llm-serving-performance-tuning

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper