llama-slot-pinning

Community

Pin prompt slots for persistent caches

AuthorcrycriM
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill provides a reproducible setup for configuring the llama-server with slot pinning to persist prompt KV caches across restarts, enabling stable performance in multi-model deployments.

Core Features & Use Cases

  • Enables per-model slot isolation by allocating dedicated slots and independent save paths to prevent cross-model cache contamination.
  • Supports saving, restoring, and erasing slot states to preserve prompt context across restarts and deployments.
  • Provides guidance for running multiple server instances on different ports with consistent slot management.

Quick Start

Launch the llama-server with a chosen parallel slot count and a slot-save-path, then verify the available slots via the /slots endpoint.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: llama-slot-pinning
Download link: https://github.com/crycriM/hermes-skills/archive/main.zip#llama-slot-pinning

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.