multi-llm-routing-pattern

Name: multi-llm-routing-pattern
Availability: InStock
Author: saintgo7

Community

Route models across one gateway with fallback.

Software Engineering #fastapi #llm routing #canary deployment #api gateway #fallback #vllm #weighted load balancing

Authorsaintgo7

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill solves the operational problem of routing requests to multiple LLM backends (vLLM/OpenAI/TGI) through a single OpenAI-compatible gateway without client changes or fragile ad-hoc logic.

Core Features & Use Cases

Five routing modes: static model-to-upstream mapping, weighted load balancing, fallback chain, user-plan-based routing, and hash-sticky A/B canary.
Production-grade fallback behavior: retries only on 5xx and network/timeout errors while returning 4xx immediately to the client.
Operational correctness checks: ensures consistency between upstream_map and the gateway’s served model list to prevent “/v1/models lies” UI mismatches.
Use case examples: send free users to a smaller model, run canary traffic for a new model safely, and fail over to a warm spare backend when one instance degrades.

Quick Start

Use the multi-llm-routing-pattern skill to add static model routing for qwen2.5-coder-32b and qwen3-coder-30b behind a single OpenAI-compatible endpoint, then enable fallback for 5xx/timeout only.

multi-llm-routing-pattern

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper