sglang-serving

Community

Serve structured LLMs at high throughput

Authorjayll1303
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Deploying and tuning LLM inference servers that require structured outputs, shared-prefix caching, and multi-GPU scaling is complex and error-prone. This skill consolidates launch patterns, performance tuning, and diagnostics for SGLang Runtime to reliably serve constrained JSON/regex/EBNF outputs, enable RadixAttention prefix caching, and run quantized models in production.

Core Features & Use Cases

  • Server Launch & Configuration: Examples and recommended flags for launching sglang.launch_server with model-path, port, tensor parallelism (tp), mem-fraction, and chunked prefill.
  • RadixAttention & Prefix Caching: Strategies and monitoring tips for maximizing throughput on shared-prefix workloads such as long system prompts and few-shot examples.
  • Structured Output: Native constrained decoding via json_schema, regex, and EBNF for always-valid JSON, format-constrained responses, and grammar-enforced outputs.
  • Quantized & Multi-GPU Serving: Guidance for serving FP8, AWQ, GPTQ models, TP/DP configurations, and recommendations to avoid OOM and NCCL/NVLink pitfalls.
  • Diagnostics & Tuning: Procedures for VRAM estimation, cache hit monitoring, chunked prefill sizing, TP requirements, and resolving model load failures.
  • Use Case Example: Serve a user-profile JSON API that enforces a schema and scales across GPUs using RadixAttention to accelerate many clients sharing the same system prompt.

Quick Start

Launch an SGLang server for meta-llama/Llama-3.1-8B-Instruct on port 30000 and verify the /v1/models endpoint returns the model list.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: sglang-serving
Download link: https://github.com/jayll1303/AIEKit/archive/main.zip#sglang-serving

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.