gke-inference-gateway

Community

Master GKE Inference Gateway multi-model routing.

AuthorRiku-KANO
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill consolidates expert knowledge on GKE Inference Gateway, including the correct API groups (inference.networking.k8s.io), the role of BBR, per-pool Endpoint Pickers, and safe migration paths away from deprecated InferenceModel to modern constructs.

Core Features & Use Cases

  • API group guidance: explains the two API groups (stable v1 and alpha v1alpha2) and when to use each.
  • BBR & HTTPRoute-based routing: shows how body→pool dispatch is done by BBR and how to map models to HTTPRoute header rules.
  • Operational patterns: how to deploy multi-model stacks, per-pool EPP, health checks, timeouts, and troubleshooting steps.
  • Uses example sequences from the real Gemma + vLLM deployment to visualize end-to-end flow.

Quick Start

Follow this skill to configure or debug GKE Inference Gateway using BBR, HTTPRoute header-based routing, and per-pool EPP.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: gke-inference-gateway
Download link: https://github.com/Riku-KANO/gemma4-gke-demo/archive/main.zip#gke-inference-gateway

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.