llama-cpp-inference

Community

Run GGUF models locally with llama.cpp

Authorjayll1303
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill removes the complexity of running GGUF-format large language models locally by providing clear patterns for launching servers, using CLIs, embedding Python bindings, configuring GPU offload, and tuning multi-user batching so models run reliably and efficiently on desktop and server hardware.

Core Features & Use Cases

  • OpenAI-compatible serving: Launch llama-server with health checks, embeddings, chat/completion endpoints and slot monitoring for multi-user deployments.
  • CLI and programmatic inference: Use llama-cli for interactive/batch runs and llama-cpp-python bindings to embed GGUF models inside Python applications.
  • GPU and backend configuration: Build and run with CUDA, Metal, or Vulkan backends, set n_gpu_layers, flash-attn, and tune context and parallel slots for throughput or low-memory setups.
  • Performance tuning & troubleshooting: Guidance for context sizing, continuous batching, speculative decoding, OOM recovery, CUDA detection, and model format validation.
  • Use Case: Deploy a local OpenAI-compatible GGUF server for development, perform low-latency interactive chat on a laptop with Metal/CUDA, or run multi-user inference with continuous batching in a small team.

Quick Start

Launch a local GGUF model server using llama-server with the model path, appropriate context size, and -ngl set to control GPU offload.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: llama-cpp-inference
Download link: https://github.com/jayll1303/AIEKit/archive/main.zip#llama-cpp-inference

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.