llama-slot-pinning
CommunityPin prompt slots for persistent caches
AuthorcrycriM
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This skill provides a reproducible setup for configuring the llama-server with slot pinning to persist prompt KV caches across restarts, enabling stable performance in multi-model deployments.
Core Features & Use Cases
- Enables per-model slot isolation by allocating dedicated slots and independent save paths to prevent cross-model cache contamination.
- Supports saving, restoring, and erasing slot states to preserve prompt context across restarts and deployments.
- Provides guidance for running multiple server instances on different ports with consistent slot management.
Quick Start
Launch the llama-server with a chosen parallel slot count and a slot-save-path, then verify the available slots via the /slots endpoint.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: llama-slot-pinning Download link: https://github.com/crycriM/hermes-skills/archive/main.zip#llama-slot-pinning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.