gke-inference-gateway
CommunityMaster GKE Inference Gateway multi-model routing.
AuthorRiku-KANO
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This skill consolidates expert knowledge on GKE Inference Gateway, including the correct API groups (inference.networking.k8s.io), the role of BBR, per-pool Endpoint Pickers, and safe migration paths away from deprecated InferenceModel to modern constructs.
Core Features & Use Cases
- API group guidance: explains the two API groups (stable v1 and alpha v1alpha2) and when to use each.
- BBR & HTTPRoute-based routing: shows how body→pool dispatch is done by BBR and how to map models to HTTPRoute header rules.
- Operational patterns: how to deploy multi-model stacks, per-pool EPP, health checks, timeouts, and troubleshooting steps.
- Uses example sequences from the real Gemma + vLLM deployment to visualize end-to-end flow.
Quick Start
Follow this skill to configure or debug GKE Inference Gateway using BBR, HTTPRoute header-based routing, and per-pool EPP.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gke-inference-gateway Download link: https://github.com/Riku-KANO/gemma4-gke-demo/archive/main.zip#gke-inference-gateway Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.