sglang-hicache

Official

Scale KV caches beyond GPU memory.

Authorair-gapped
Version1.0.0
Installs0

System Documentation

What problem does it solve?

SGLang HiCache enables hierarchical KV caching to extend per-rank GPU memory with L2 host RAM and optional L3 storage, unlocking larger models and longer context.

Core Features & Use Cases

  • Three-tier KV cache (L1/L2/L3) with per-rank sizing, eviction policies, and configurable prefetch.
  • Supports multiple L3 backends (mooncake, hf3fs, nixl, aibrix, eic, simm, file) and runtime attach/detach for swapping backends.
  • Ideal for production workloads with long-context agents, multi-tenant inference, and hybrid-model deployments.

Quick Start

Start a SGLang server with hierarchical cache enabled and pick a backend (e.g., Mooncake) using per-rank sizing and a production-safe prefetch policy.

Dependency Matrix

Required Modules

curlpython3

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: sglang-hicache
Download link: https://github.com/air-gapped/skills/archive/main.zip#sglang-hicache

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.