sglang-hicache
OfficialScale KV caches beyond GPU memory.
Authorair-gapped
Version1.0.0
Installs0
System Documentation
What problem does it solve?
SGLang HiCache enables hierarchical KV caching to extend per-rank GPU memory with L2 host RAM and optional L3 storage, unlocking larger models and longer context.
Core Features & Use Cases
- Three-tier KV cache (L1/L2/L3) with per-rank sizing, eviction policies, and configurable prefetch.
- Supports multiple L3 backends (mooncake, hf3fs, nixl, aibrix, eic, simm, file) and runtime attach/detach for swapping backends.
- Ideal for production workloads with long-context agents, multi-tenant inference, and hybrid-model deployments.
Quick Start
Start a SGLang server with hierarchical cache enabled and pick a backend (e.g., Mooncake) using per-rank sizing and a production-safe prefetch policy.
Dependency Matrix
Required Modules
curlpython3
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: sglang-hicache Download link: https://github.com/air-gapped/skills/archive/main.zip#sglang-hicache Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.