vllm-gemma-4-31b

Official

Gemma-4-31B on vLLM: operating points.

Authorair-gapped
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Provide an operating-point reference for deploying Gemma-4-31B AWQ-4bit on vLLM that helps platform engineers balance throughput, latency, and memory constraints across short- and long-context workloads.

Core Features & Use Cases

  • Operating-point reference for Gemma-4-31B AWQ-4bit on vLLM across short- and long-context workloads, including recommended max_model_len, kv-cache sizing, and batch considerations.
  • Decision guide for tensor-parallel configurations (LIGHT vs PUSH) with workload-specific trade-offs for latency and throughput, plus guidance on bottlenecks like HBM-bandwidth saturation.
  • Reproducible benchmarks and caveats with references to hbm-saturation and bench-numbers to inform deployment and tuning.

Quick Start

Configure Gemma-4-31B AWQ-4bit on a multi-GPU vLLM deployment using the LIGHT or PUSH recipes provided in this guide.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: vllm-gemma-4-31b
Download link: https://github.com/air-gapped/skills/archive/main.zip#vllm-gemma-4-31b

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.