multi-llm-routing-pattern
CommunityRoute models across one gateway with fallback.
Software Engineering#fastapi#llm routing#canary deployment#api gateway#fallback#vllm#weighted load balancing
Authorsaintgo7
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill solves the operational problem of routing requests to multiple LLM backends (vLLM/OpenAI/TGI) through a single OpenAI-compatible gateway without client changes or fragile ad-hoc logic.
Core Features & Use Cases
- Five routing modes: static model-to-upstream mapping, weighted load balancing, fallback chain, user-plan-based routing, and hash-sticky A/B canary.
- Production-grade fallback behavior: retries only on 5xx and network/timeout errors while returning 4xx immediately to the client.
- Operational correctness checks: ensures consistency between
upstream_mapand the gateway’s served model list to prevent “/v1/models lies” UI mismatches. - Use case examples: send free users to a smaller model, run canary traffic for a new model safely, and fail over to a warm spare backend when one instance degrades.
Quick Start
Use the multi-llm-routing-pattern skill to add static model routing for qwen2.5-coder-32b and qwen3-coder-30b behind a single OpenAI-compatible endpoint, then enable fallback for 5xx/timeout only.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: multi-llm-routing-pattern Download link: https://github.com/saintgo7/claude-skills/archive/main.zip#multi-llm-routing-pattern Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.