vllm-gemma-4-31b
OfficialGemma-4-31B on vLLM: operating points.
Authorair-gapped
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Provide an operating-point reference for deploying Gemma-4-31B AWQ-4bit on vLLM that helps platform engineers balance throughput, latency, and memory constraints across short- and long-context workloads.
Core Features & Use Cases
- Operating-point reference for Gemma-4-31B AWQ-4bit on vLLM across short- and long-context workloads, including recommended max_model_len, kv-cache sizing, and batch considerations.
- Decision guide for tensor-parallel configurations (LIGHT vs PUSH) with workload-specific trade-offs for latency and throughput, plus guidance on bottlenecks like HBM-bandwidth saturation.
- Reproducible benchmarks and caveats with references to hbm-saturation and bench-numbers to inform deployment and tuning.
Quick Start
Configure Gemma-4-31B AWQ-4bit on a multi-GPU vLLM deployment using the LIGHT or PUSH recipes provided in this guide.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: vllm-gemma-4-31b Download link: https://github.com/air-gapped/skills/archive/main.zip#vllm-gemma-4-31b Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.