Name: vllm-gemma-4-31b
Availability: InStock
Author: air-gapped

System Documentation

What problem does it solve?

Provide an operating-point reference for deploying Gemma-4-31B AWQ-4bit on vLLM that helps platform engineers balance throughput, latency, and memory constraints across short- and long-context workloads.

Core Features & Use Cases

Operating-point reference for Gemma-4-31B AWQ-4bit on vLLM across short- and long-context workloads, including recommended max_model_len, kv-cache sizing, and batch considerations.
Decision guide for tensor-parallel configurations (LIGHT vs PUSH) with workload-specific trade-offs for latency and throughput, plus guidance on bottlenecks like HBM-bandwidth saturation.
Reproducible benchmarks and caveats with references to hbm-saturation and bench-numbers to inform deployment and tuning.

Quick Start

Configure Gemma-4-31B AWQ-4bit on a multi-GPU vLLM deployment using the LIGHT or PUSH recipes provided in this guide.

Please help me install this Skill: Name: vllm-gemma-4-31b Download link: https://github.com/air-gapped/skills/archive/main.zip#vllm-gemma-4-31b Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

vllm-gemma-4-31b

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper