vllm-docs

Name: vllm-docs
Availability: InStock
Author: wenerme

Community

Deploy and troubleshoot vLLM with confidence.

Software Engineering #troubleshooting #multimodal #quantization #vllm #inference serving #openai-compatible api #distributed deployment

Authorwenerme

Version1.0.0

Installs0

System Documentation

What problem does it solve?

vLLM users need fast, accurate guidance for configuring high-performance LLM inference and resolving tricky serving issues across models, quantization methods, and distributed setups.

Core Features & Use Cases

OpenAI-compatible serving and deployment guidance: Covers how to run vLLM as an OpenAI-compatible server and how to deploy it with Docker, Kubernetes, and reverse proxies.
Distributed and parallelism troubleshooting: Explains data/pipeline/tensor/expert/context parallel serving concepts and provides dedicated troubleshooting guidance for distributed environments.
Advanced performance topics: Documents key vLLM capabilities such as quantization (AWQ, GPTQ, FP8, GGUF, INT4/8), speculative decoding, LoRA adapters, structured outputs, multimodal inputs, and memory optimization mechanisms like PagedAttention.

Quick Start

Use the vllm-docs skill to locate the exact documentation page for the feature you need and follow it to configure your deployment for OpenAI-compatible serving.

vllm-docs

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper