Name: sglang-hicache
Availability: InStock
Author: air-gapped

System Documentation

What problem does it solve?

SGLang HiCache enables hierarchical KV caching to extend per-rank GPU memory with L2 host RAM and optional L3 storage, unlocking larger models and longer context.

Core Features & Use Cases

Three-tier KV cache (L1/L2/L3) with per-rank sizing, eviction policies, and configurable prefetch.
Supports multiple L3 backends (mooncake, hf3fs, nixl, aibrix, eic, simm, file) and runtime attach/detach for swapping backends.
Ideal for production workloads with long-context agents, multi-tenant inference, and hybrid-model deployments.

Quick Start

Start a SGLang server with hierarchical cache enabled and pick a backend (e.g., Mooncake) using per-rank sizing and a production-safe prefetch policy.

Please help me install this Skill: Name: sglang-hicache Download link: https://github.com/air-gapped/skills/archive/main.zip#sglang-hicache Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

sglang-hicache

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper