multi-llm-routing-pattern

Community

Route models across one gateway with fallback.

Authorsaintgo7
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill solves the operational problem of routing requests to multiple LLM backends (vLLM/OpenAI/TGI) through a single OpenAI-compatible gateway without client changes or fragile ad-hoc logic.

Core Features & Use Cases

  • Five routing modes: static model-to-upstream mapping, weighted load balancing, fallback chain, user-plan-based routing, and hash-sticky A/B canary.
  • Production-grade fallback behavior: retries only on 5xx and network/timeout errors while returning 4xx immediately to the client.
  • Operational correctness checks: ensures consistency between upstream_map and the gateway’s served model list to prevent “/v1/models lies” UI mismatches.
  • Use case examples: send free users to a smaller model, run canary traffic for a new model safely, and fail over to a warm spare backend when one instance degrades.

Quick Start

Use the multi-llm-routing-pattern skill to add static model routing for qwen2.5-coder-32b and qwen3-coder-30b behind a single OpenAI-compatible endpoint, then enable fallback for 5xx/timeout only.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: multi-llm-routing-pattern
Download link: https://github.com/saintgo7/claude-skills/archive/main.zip#multi-llm-routing-pattern

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.